Home » Glossary » Page 2

# Glossary

## Data Mart

What is a data mart? A data mart is best regarded as a subset of a data warehouse. Oftentimes its use is oriented towards one specific business unit. They are useful to store summarized data, easily accessible to business users within that specific business unit. A data mart has several…

## Data Mesh

What is a data mesh? A data mesh is an architectural paradigm with the purpose of enabling analytic capabilities — such as analytics, machine learning, or data services — at scale by unlocking access to a multitude of data sets and tables from a variety of domains. A data mesh…

## Data Shift

What is Data Shift? Data shift— or dataset shift, model drift, data drift– is the phenomenon that describes the change in input data in your model (over time), relative to the data it was trained on. It is one of the most common reasons for degrading model accuracy. That’s why…

## Data Warehouse

What is a data warehouse? A data warehouse is a central repository that contains all data of an organization. The data in a data warehouse often comes from a variety of data sources within marketing, sales, finance and operations. Most often, all data in the warehouse has already been cleaner…

## Linear Regression

What is linear regression? A linear regression is a linear approach to model the relationship between a dependent variable and one or more explanatory variables — the independent variables. We can make a distinction between: Simple linear regression: has one explanatory variable Multiple linear regression: has multiple explanatory variables In…

## Performance Metrics in Machine Learning

Performance Metrics Performance metrics tell you something about the performance of a machine learning model. Each metric has a specific focus. Because of the confusion matrix’ nature, a lot of metrics have a close sibling. Equally confusing is that many performance metrics have multiple synonyms, depending on the context. Given…

## Performance Metrics: Accuracy

What is the Accuracy? The Accuracy is a performance metric that tells you the fraction of the predictions that were correct, without distinguishing between positive and negative predictions. The Accuracy can be a very misleading metric when the data set is unbalanced (when the prevalence is either very high or very…

## Performance Metrics: Balanced Accuracy

What is Balanced Accuracy? Balanced Accuracy is a performance metric to evaluate a binary classifier. Why not use regular accuracy? Balanced accuracy is a better instrument for assessing models that are trained on data with very imbalanced target variables. I.e. very high, or very low prevalence. This will result in…

## Performance Metrics: Classification Success Index

What is the Classification Success Index? The Classification Success Index (CSI) is a (fairly uncommon) measure for evaluating classifiers. The CSI focuses exclusively on the positive class. It is calculated as follows: The terms (1-PPV) and (1-TPR) correspond to the proportions of type I and type II errors. The measure…

## Performance Metrics: Diagnostic Odds Ratio

What is the Diagnostic Odds Ratio? The Diagnostic Odds Ratio (DOR) is a performance metric to assess the effectiveness of a diagnostic test or — within the context of machine learning — a binary classification model. It is defined as the ratio of the odds of the prediction being positive is…