Skip to content
Home » Data Science

Data Science

Ascertainment Bias

  • by

Ascertainment bias is the systematic difference in the identification of individuals in a study, or the data collected. It results in a distortion in measuring the true frequency of a phenomenon in the population. “When the chance of a person being sampled, or feature being observed, depends on some background… 

Bootstrapping

  • by

Bootstrapping is a very popular resampling method with replacement. It assigns measures of accuracy to sample estimates. Bootstrapping allows the estimation of the sampling distribution of nearly any statistic. “A way of generating confidence intervals and the distribution of test statistics through sampling the observed data rather than through assuming… 

Confusion Matrix

What is a confusion matrix? The confusion matrix (or “error matrix“) is a table that is used to describe the performance of a classification model by comparing its predictions to a data set of which the true values are known. In a binary classification task, the confusion matrix is a… 

Data Leakage

What is data leakage? Within the field of machine learning, data leakage is a term used to describe how data from outside the training data set is used to create the model. This is a problem because, within machine learning, our goal is to develop a model that is good… 

Data Shift

  • by

What is Data Shift? Data shift— or dataset shift, model drift, data drift– is the phenomenon that describes the change in input data in your model (over time), relative to the data it was trained on. It is one of the most common reasons for degrading model accuracy. That’s why… 

Linear Regression

  • by

What is linear regression? A linear regression is a linear approach to model the relationship between a dependent variable and one or more explanatory variables — the independent variables. We can make a distinction between: Simple linear regression: has one explanatory variable Multiple linear regression: has multiple explanatory variables In… 

Performance Metrics: Accuracy

  • by

What is the Accuracy? The Accuracy is a performance metric that tells you the fraction of the predictions that were correct, without distinguishing between positive and negative predictions. The Accuracy can be a very misleading metric when the data set is unbalanced (when the prevalence is either very high or very…