What is the Matthews Correlation Coefficient?

The Matthews correlation coefficient is a single-value statistic derived from the confusion matrix—the 2×2 table of observed versus predicted binary outcomes. It measures how well a classifier discriminates between positive and negative cases.

Ranges and interpretation:

  • +1: Perfect classification; all predictions match reality
  • 0: Performance indistinguishable from random chance
  • −1: Complete disagreement; predictions are perfectly wrong

MCC excels when class distributions are unbalanced (e.g., 95% healthy, 5% diseased). Traditional accuracy can mislead in such scenarios—a classifier labeling everyone as healthy achieves 95% accuracy despite being useless. MCC penalizes this false competence.

Matthews Correlation Coefficient Formula

The MCC calculation combines all four confusion matrix cells:

MCC = (TP × TN − FP × FN) / √[(TP + FP)(TP + FN)(TN + FP)(TN + FN)]

  • TP — True positives: positive cases correctly predicted
  • TN — True negatives: negative cases correctly predicted
  • FP — False positives: negative cases incorrectly predicted as positive
  • FN — False negatives: positive cases incorrectly predicted as negative

Beyond MCC, five companion metrics provide complementary perspectives on classifier behaviour:

  • Sensitivity (recall): Of actual positives, what fraction did the model catch? TP / (TP + FN)
  • Specificity: Of actual negatives, what fraction was correctly rejected? TN / (TN + FP)
  • Precision: Of predicted positives, how many were correct? TP / (TP + FP)
  • Accuracy: Overall correctness across both classes. (TP + TN) / (TP + TN + FP + FN)
  • F1 score: Harmonic mean balancing precision and recall. 2 × TP / (2 × TP + FP + FN)

MCC integrates all four confusion matrix terms, making it more robust than any single metric alone.

Common Pitfalls in Binary Classification Evaluation

Avoid these mistakes when assessing model performance with MCC and related statistics.

  1. Relying on accuracy for imbalanced data — If your positive class represents only 2% of observations, a naive classifier predicting everything as negative will score 98% accuracy. Always examine sensitivity and specificity separately, or use MCC, which penalises both false positives and false negatives equally.
  2. Confusing sensitivity with specificity — Sensitivity catches disease-positive patients (true positive rate); specificity identifies disease-free patients correctly (true negative rate). High sensitivity with low specificity means you flag everyone as sick, causing unnecessary treatment and harm.
  3. Forgetting the denominator can be zero — If your data contains no true positives and no false positives, the denominator in some formulas becomes zero, causing division errors. Ensure your confusion matrix has realistic distributions before computing metrics.
  4. Misinterpreting negative MCC values — Negative MCC does not mean the model is merely worse than random—it indicates systematic disagreement, as if predictions were inverted. This warrants investigation into labeling conventions, feature scaling, or data leakage rather than dismissal.

Practical Example: Quality Control in Manufacturing

A ceramic factory inspects 100 plates for defects. An automated system flags 30 plates as defective, but manual inspection reveals only 25 are actually defective. Of the 25 truly defective plates, the system caught 20.

Confusion matrix:

  • TP (correctly flagged as defective): 20
  • FP (incorrectly flagged as defective): 10
  • TN (correctly passed): 65
  • FN (missed defects): 5

MCC = (20 × 65 − 10 × 5) / √[(30)(25)(75)(70)] = 1200 / √3,937,500 ≈ 0.60. This moderate positive value indicates the system performs reasonably but has room for improvement in reducing false positives (wasted rework).

Frequently Asked Questions

Why is Matthews correlation coefficient better than accuracy?

MCC accounts for all four confusion matrix cells equally, while accuracy treats correct predictions uniformly without distinguishing between reducing false positives versus false negatives. In imbalanced datasets, MCC prevents inflated scores from a trivial classifier. For instance, a medical screening test with 99% negatives can achieve 99% accuracy by predicting everyone as negative—MCC would reveal this as poor performance by assigning zero or negative values.

What does an MCC of 0.5 indicate?

An MCC of 0.5 represents moderate agreement between predictions and reality. The classifier is performing substantially better than random chance but leaves considerable room for improvement. In practical applications, MCC values above 0.7 are often considered good, while values above 0.9 approach excellent discrimination. Context matters: in high-stakes domains like medical diagnosis, even 0.7 may be insufficient.

Can Matthews correlation coefficient be negative?

Yes. Negative MCC indicates systematic disagreement—the model's predictions are negatively correlated with true labels. For example, MCC = −0.8 suggests the classifier behaves almost like it inverted the class labels. This signals either a data preprocessing error, reversed labeling convention, or a fundamentally broken model rather than mere poor performance. Negative values warrant investigation.

How do I calculate MCC if I only have accuracy and F1 score?

You cannot reconstruct MCC from accuracy and F1 score alone—you need the full confusion matrix (TP, TN, FP, FN). Both metrics obscure different aspects of performance. Always preserve the raw confusion matrix when evaluating classifiers, as it enables computing any downstream metric. If only summary statistics are available, request the original predictions and labels.

Is MCC suitable for multi-class classification?

Standard MCC applies to binary (two-class) problems. Multi-class extensions exist, such as macro-averaged or weighted-average MCC across all pairwise class comparisons, but these are less common. For multi-class problems, consider one-vs-rest MCC per class or other metrics like Cohen's kappa or weighted F1 scores designed for multiple categories.

What sample size do I need for reliable MCC?

Larger samples produce more stable MCC estimates. With fewer than 50 total observations, MCC can fluctuate considerably due to random variation. Aim for at least 100–200 samples, and larger if classes are severely imbalanced. Always report confidence intervals or bootstrap estimates alongside MCC, especially with small datasets, to convey uncertainty.

More statistics calculators (see all)