Skip to content
Home » Glossary » Page 2

Glossary

Data Leakage

What is data leakage? Within the field of machine learning, data leakage is a term used to describe how data from outside the training data set is used to create the model. This is a problem because, within machine learning, our goal is to develop a model that is good… 

Data Lineage

  • by

Data lineage provides a map of your data’s journey: where it originates, every stop it makes. On top of that, it describes which filters and transformations are applied. Data lineage is important in various ways: It helps business users understand what their data means, and where it comes from. It… 

Data Mart

  • by

What is a data mart? A data mart is best regarded as a subset of a data warehouse. Oftentimes its use is oriented towards one specific business unit. They are useful to store summarized data, easily accessible to business users within that specific business unit. A data mart has several… 

Data Mesh

What is a data mesh? A data mesh is an architectural paradigm with the purpose of enabling analytic capabilities — such as analytics, machine learning, or data services — at scale by unlocking access to a multitude of data sets and tables from a variety of domains. A data mesh… 

Data Monitoring

  • by

Data monitoring is closely related to data testing. They both intend to preserve or improve the quality of data. But monitoring starts from another philosophy. Instead of testing data against known scenarios, monitoring your data means collecting, storing, and analyzing various properties of data. When a data monitoring system detects an anomaly,… 

Data Shift

  • by

What is Data Shift? Data shift— or dataset shift, model drift, data drift– is the phenomenon that describes the change in input data in your model (over time), relative to the data it was trained on. It is one of the most common reasons for degrading model accuracy. That’s why… 

Data Testing

  • by

The truth is that you rarely completely control how or what data is collected. That’s why you should evaluate your data for its quality. There are many dimensions to data quality. The list will be longer or shorter, depending on who you ask. Data validity: To store dates or times,… 

Data Warehouse

What is a data warehouse? A data warehouse is a central repository that contains all data of an organization. The data in a data warehouse often comes from a variety of data sources within marketing, sales, finance and operations. Most often, all data in the warehouse has already been cleaner… 

DataOps

  • by

What is DataOps? To understand what DataOps is, take a look at DevOps. The term DevOps was coined around 2010 and is a portmanteau that combines development and operations into a single term. It’s a set of practices and tools to integrate the process of developing and deploying software.  The… 

Linear Regression

  • by

What is linear regression? A linear regression is a linear approach to model the relationship between a dependent variable and one or more explanatory variables — the independent variables. We can make a distinction between: Simple linear regression: has one explanatory variable Multiple linear regression: has multiple explanatory variables In…