Home » Glossary

Glossary

ACID

What is ACID? Four “rules” ensure that a database transaction is timely. When a database adheres to these rules, it is said to be ACID-compliant. A for Atomicity: all database transactions can be broken down in smaller parts. Atomicity refers to the integrity of the whole transaction, not just one… 

API

  • by

An Application Programming Interface (API) is a computing interface that defines how different software interact with each other. It defines the rules of interacting: how to make calls, what data format is expected, and in what data format the response will be returned. It is a popular way for applications… 

Ascertainment Bias

  • by

Ascertainment bias is the systematic difference in the identification of individuals in a study, or the data collected. It results in a distortion in measuring the true frequency of a phenomenon in the population. “When the chance of a person being sampled, or feature being observed, depends on some background… 

Bootstrapping

  • by

Bootstrapping is a very popular resampling method with replacement. It assigns measures of accuracy to sample estimates. Bootstrapping allows the estimation of the sampling distribution of nearly any statistic. “A way of generating confidence intervals and the distribution of test statistics through sampling the observed data rather than through assuming… 

Confusion Matrix

What is a confusion matrix? The confusion matrix (or “error matrix“) is a table that is used to describe the performance of a classification model by comparing its predictions to a data set of which the true values are known. In a binary classification task, the confusion matrix is a… 

Dashboards

  • by

What are dashboards? Dashboarding is the practice of visualizing data in an easy-to-comprehend overview. In a dashboard, all information is presented in one or multiple (not too many) panes. A good dashboard is tailored to each individual user, in line with the project or KPIs they are working on. Modern… 

Data Integration

  • by

What is data integration? Data integration is the practice of collecting and bringing together data from multiple data sources. This can be for a variety of reasons such as reporting, analysis, artificial intelligence, a customer 360° view, etc. Bringing data together is useful because it helps you see the bigger… 

Data Lake

  • by

What is a data lake? A data lake is a central data repository where structured, semistructured, and unstructured data can be stored at any scale, usually as blobs or files. Contrary to a data warehouse, the schema is not known at the time of storage. Because data lakes come with… 

Data Lakehouse

What is a data lakehouse? For decades, two concurrent paradigms operated next to each other: the data lake and the data warehouse. The first one offering extremely cheap storage capacity for unstructured data formats, the second one offering easy querying capabilities on structured data formats. A data lakehouse is a… 

Data Leakage

What is data leakage? Within the field of machine learning, data leakage is a term used to describe how data from outside the training data set is used to create the model. This is a problem because, within machine learning, our goal is to develop a model that is good…