Skip to content
Home » Glossary

Glossary

ACID

What is ACID? Four “rules” ensure that a database transaction is timely. When a database adheres to these rules, it is said to be ACID-compliant. A for Atomicity: all database transactions can be broken down in smaller parts. Atomicity refers to the integrity of the whole transaction, not just one… 

API

  • by

An Application Programming Interface (API) is a computing interface that defines how different software interact with each other. It defines the rules of interacting: how to make calls, what data format is expected, and in what data format the response will be returned. It is a popular way for applications… 

Ascertainment Bias

  • by

Ascertainment bias is the systematic difference in the identification of individuals in a study, or the data collected. It results in a distortion in measuring the true frequency of a phenomenon in the population. “When the chance of a person being sampled, or feature being observed, depends on some background… 

Bootstrapping

  • by

Bootstrapping is a very popular resampling method with replacement. It assigns measures of accuracy to sample estimates. Bootstrapping allows the estimation of the sampling distribution of nearly any statistic. “A way of generating confidence intervals and the distribution of test statistics through sampling the observed data rather than through assuming… 

Change Data Capture

  • by

Change data capture (CDC) is a data replication method that tracks changes in a database and publishes them as messages to a real-time stream. Downstream systems can consume this row-level change feed for various (analytical and operational) purposes. There are various ways that an event stream is processed, but they… 

Confusion Matrix

What is a confusion matrix? The confusion matrix (or “error matrix“) is a table that is used to describe the performance of a classification model by comparing its predictions to a data set of which the true values are known. In a binary classification task, the confusion matrix is a… 

Dashboards

  • by

What are dashboards? Dashboarding is the practice of visualizing data in an easy-to-comprehend overview. In a dashboard, all information is presented in one or multiple (not too many) panes. A good dashboard is tailored to each individual user, in line with the project or KPIs they are working on. Modern… 

Data Integration

  • by

What is data integration? Data integration is the practice of collecting and bringing together data from multiple data sources. This can be for a variety of reasons such as reporting, analysis, artificial intelligence, a customer 360° view, etc. Bringing data together is useful because it helps you see the bigger… 

Data Lake

  • by

What is a data lake? A data lake is a central data repository where structured, semistructured, and unstructured data can be stored at any scale, usually as blobs or files. Contrary to a data warehouse, the schema is not known at the time of storage. Because data lakes come with… 

Data Lakehouse

  • by

What is a data lakehouse? For decades, two concurrent paradigms operated next to each other: the data lake and the data warehouse. The first one offering extremely cheap storage capacity for unstructured data formats, the second one offering easy querying capabilities on structured data formats. A data lakehouse is a…