What is Balanced Accuracy?

Balanced Accuracy is a performance metric to evaluate a binary classifier.

Why not use regular accuracy? Balanced accuracy is a better instrument for assessing models that are trained on data with very imbalanced target variables. I.e. very high, or very low prevalence. This will result in a classifier that is biased towards the most frequent class. When this classifier is applied to the test set (biased in the same direction), this classifier will yield an overly optimistic “conventional” accuracy. In extreme situations, the classifier will always predict the dominant class, achieving an accuracy equal to the prevalence in the test set.

This is a well-known phenomenon, and it can happen in all sciences, in business, and in engineering. Think earthquake prediction, fraud detection, crime prediction, etc. It is also known as the accuracy paradox.

Most often, the formula for Balanced Accuracy is described as half the sum of the true positive ratio (TPR) and the true negative ratio (TNR). This formula demonstrates how the balanced accuracy is a lot lower than the conventional accuracy measure when either the TPR or TNR is low due to a bias in the classifier towards the dominant class.

$\frac{1}{2}(\frac{TP}{TP + FN}+\frac{TN}{TN + FP}) = \frac{1}{2}(\frac{TP}{P}+\frac{TN}{N}) = \frac{TPR + TNR}{2}$

However, there’s no need to hold onto the symmetry regarding the classes. This assumption can be dropped by varying the cost associated with a low TPR or TNR.

$c\frac{TP}{P}+(1-c)\frac{TN}{N} where c ∈ [0,1]$

Performance Metrics: Balanced Accuracy

What is Balanced Accuracy?

Related Posts

Solving “CommandError: Unable to serialize database: ‘charmap’ codec can’t encode character…”

Starting a remote Selenium server in R

What digital professionals should know about recent privacy evolutions