Glossary

Data Leakage

by roelpi
September 27, 2020

What is data leakage? Within the field of machine learning, data leakage is a term used to describe how data from outside the training data set is used to create the model. This is a problem because, within machine learning, our goal is to develop a model that is good…

by roelpi
April 21, 2022August 8, 2022

Data lineage provides a map of your data’s journey: where it originates, every stop it makes. On top of that, it describes which filters and transformations are applied. Data lineage is important in various ways: It helps business users understand what their data means, and where it comes from. It…

by roelpi
November 15, 2020November 15, 2020

What is a data mart? A data mart is best regarded as a subset of a data warehouse. Oftentimes its use is oriented towards one specific business unit. They are useful to store summarized data, easily accessible to business users within that specific business unit. A data mart has several…

by roelpi
November 15, 2020

What is a data mesh? A data mesh is an architectural paradigm with the purpose of enabling analytic capabilities — such as analytics, machine learning, or data services — at scale by unlocking access to a multitude of data sets and tables from a variety of domains. A data mesh…

by roelpi
October 4, 2022October 4, 2022

Data monitoring is closely related to data testing. They both intend to preserve or improve the quality of data. But monitoring starts from another philosophy. Instead of testing data against known scenarios, monitoring your data means collecting, storing, and analyzing various properties of data. When a data monitoring system detects an anomaly,…

by roelpi
November 23, 2020October 4, 2022

What is Data Shift? Data shift— or dataset shift, model drift, data drift– is the phenomenon that describes the change in input data in your model (over time), relative to the data it was trained on. It is one of the most common reasons for degrading model accuracy. That’s why…

by roelpi
October 4, 2022October 4, 2022

The truth is that you rarely completely control how or what data is collected. That’s why you should evaluate your data for its quality. There are many dimensions to data quality. The list will be longer or shorter, depending on who you ask. Data validity: To store dates or times,…

by roelpi
November 14, 2020

What is a data warehouse? A data warehouse is a central repository that contains all data of an organization. The data in a data warehouse often comes from a variety of data sources within marketing, sales, finance and operations. Most often, all data in the warehouse has already been cleaner…

by roelpi
February 19, 2022February 19, 2022

What is DataOps? To understand what DataOps is, take a look at DevOps. The term DevOps was coined around 2010 and is a portmanteau that combines development and operations into a single term. It’s a set of practices and tools to integrate the process of developing and deploying software. The…

by roelpi
August 20, 2020August 20, 2020

What is linear regression? A linear regression is a linear approach to model the relationship between a dependent variable and one or more explanatory variables — the independent variables. We can make a distinction between: Simple linear regression: has one explanatory variable Multiple linear regression: has multiple explanatory variables In…

Glossary

Data Leakage

Data Lineage

Data Mart

Data Mesh

Data Monitoring

Data Shift

Data Testing

Data Warehouse

DataOps

Linear Regression