Performance Metrics: Matthews Correlation Coefficient

What is the Matthews Correlation Coefficient?

Matthews Correlation Coefficient (MCC) has many names:

Contrary to other performance metrics (such as F1-Score), the MCC is regarded as one of the best measures to evaluate class predictions in a binary setting — even if there is a severe class imbalance. Although there is no single measure to describe model performance, the MCC is often your best option.

In essence, the MCC is a correlation coefficient between the predicted values and the true values. That’s why it will return a value between -1 and 1. When the predictions are perfect, the MCC will be +1. When it does no better than random prediction, it will be 0. Finally, when the predictions and observations disagree, the MCC will be -1. In most situations, you’d want this value to be as close to 1.

The formula for the Matthews Correlation Coefficient:

$\text{MCC} = \frac{ \mathit{TP} \times \mathit{TN} - \mathit{FP} \times \mathit{FN} } {\sqrt{ (\mathit{TP} + \mathit{FP}) ( \mathit{TP} + \mathit{FN} ) ( \mathit{TN} + \mathit{FP} ) ( \mathit{TN} + \mathit{FN} ) } }$

Interestingly, if any of the four sums in the denominator is 0, the MCC will also be zero. This happens when:

One of the classes is never found in the data (e.g. TP + FN = 0)
If all predictions return the same value (e.g. TP + FP = 0)

It is worth noting that the formula can also be written using ratios only:

$\text{MCC} = \sqrt{\mathit{PPV} \times \mathit{TPR} \times \mathit{TNR} \times \mathit{NPV}} -\sqrt{\mathit{FDR} \times \mathit{FNR} \times \mathit{FPR} \times \mathit{FOR}}$

Performance Metrics: Matthews Correlation Coefficient

What is the Matthews Correlation Coefficient?

Related Posts

Solving “CommandError: Unable to serialize database: ‘charmap’ codec can’t encode character…”

Starting a remote Selenium server in R

What digital professionals should know about recent privacy evolutions