Understanding the 2×2 Contingency Table

Diagnostic tests produce four possible outcomes. A 2×2 table organizes them clearly:

  • True positive (TP): Disease present and test positive
  • False positive (FP): Disease absent but test positive
  • True negative (TN): Disease absent and test negative
  • False negative (FN): Disease present but test negative

These four counts form the foundation for all downstream calculations. Gathering accurate data from your study population or clinical cohort is essential—any misclassification will propagate through all metrics.

Core Diagnostic Metrics Formulas

Sensitivity and specificity measure intrinsic test properties independent of disease prevalence. Predictive values depend on how common the condition is in your population.

Sensitivity = TP ÷ (TP + FN)

Specificity = TN ÷ (FP + TN)

Accuracy = (TP + TN) ÷ (TP + TN + FP + FN)

PPV = (Sensitivity × Prevalence) ÷ [(Sensitivity × Prevalence) + ((1 − Specificity) × (1 − Prevalence))]

NPV = (Specificity × (1 − Prevalence)) ÷ [((1 − Sensitivity) × Prevalence) + (Specificity × (1 − Prevalence))]

Positive LR = Sensitivity ÷ (1 − Specificity)

Negative LR = (1 − Sensitivity) ÷ Specificity

  • TP — True positive count—cases correctly identified as diseased
  • FN — False negative count—diseased cases missed by the test
  • TN — True negative count—healthy cases correctly identified
  • FP — False positive count—healthy individuals incorrectly marked positive
  • Prevalence — Proportion of the target population with the disease (as a decimal, 0–1)

Sensitivity vs. Specificity: What They Mean Clinically

Sensitivity answers: Of all people with the disease, how many does the test catch? A sensitive test rarely misses cases—it has few false negatives. Sensitive tests are preferred for serious conditions where missing a diagnosis is costly (e.g., cancer screening).

Specificity answers: Of all people without the disease, how many does the test correctly exclude? A specific test rarely over-diagnoses—it has few false positives. Specific tests are preferred when false positives lead to unnecessary treatment or anxiety (e.g., confirmatory tests after initial screening).

No test is perfect. Trade-offs between sensitivity and specificity are determined by adjusting the test threshold. Lowering the threshold increases sensitivity but decreases specificity, and vice versa.

Predictive Values and Prevalence Dependency

Positive predictive value (PPV) tells you: If a patient tests positive, what is the probability they truly have the disease? This depends critically on disease prevalence. In a rare disease, a positive result may be unreliable even if the test is highly sensitive and specific, because false positives outnumber true positives.

Negative predictive value (NPV) tells you: If a patient tests negative, what is the probability they are truly disease-free? NPV is generally high for rare diseases (since most people are healthy anyway) but may decline for common diseases.

This prevalence dependency explains why a test performing well in one population may perform poorly in another. Always consider your patient population's disease burden when interpreting results.

Key Pitfalls and Practical Considerations

Avoid these common mistakes when interpreting diagnostic test performance:

  1. Confusing sensitivity with PPV — Sensitivity is a test property; PPV is population-dependent. A test can have high sensitivity but low PPV in a low-prevalence setting. Always calculate or report prevalence alongside sensitivity to avoid misleading conclusions.
  2. Ignoring spectrum bias — Test performance varies by patient population. A test validated in hospitalized patients with advanced disease may perform very differently in primary care screening. Check whether published performance metrics match your target population.
  3. Overweighting accuracy in imbalanced datasets — If one outcome vastly outnumbers the other (e.g., 1% disease prevalence), accuracy can be misleadingly high even if the test is useless. Prioritize sensitivity and specificity instead.
  4. Forgetting that likelihood ratios shift pre-test probability — Likelihood ratios multiply your pre-test odds of disease to give post-test odds. A positive LR of 10 is strong; a negative LR of 0.1 is strong. Values near 1.0 have minimal diagnostic value.

Frequently Asked Questions

What is the difference between sensitivity and specificity?

Sensitivity measures the proportion of actual diseased individuals correctly identified as positive—it reflects the test's ability to detect disease. Specificity measures the proportion of healthy individuals correctly identified as negative—it reflects the test's ability to exclude disease. A highly sensitive test has few false negatives (good for ruling out disease); a highly specific test has few false positives (good for confirming disease). Most clinical decisions require balancing both metrics.

How does disease prevalence affect PPV and NPV?

PPV and NPV depend entirely on disease prevalence in the population being tested. In low-prevalence settings, PPV drops sharply even for sensitive and specific tests, because false positives become relatively common. Conversely, NPV remains high in low-prevalence populations since most people are disease-free. For high-prevalence populations, PPV rises but NPV falls. This is why confirmatory testing or specialist evaluation is crucial in low-prevalence screening scenarios.

When should I use likelihood ratios instead of sensitivity and specificity?

Likelihood ratios are powerful for sequential testing and Bayesian reasoning. A positive likelihood ratio of 10 or higher is considered strong evidence for disease; a negative likelihood ratio of 0.1 or lower is strong evidence against disease. Unlike sensitivity and specificity, likelihood ratios directly answer the clinician's question: how much does a test result shift my confidence in the diagnosis? They're especially useful when integrating multiple tests or clinical findings.

Why doesn't accuracy tell the full story about test performance?

Accuracy is the proportion of correct results overall, but it's misleading in imbalanced datasets. If disease prevalence is 1%, a test that diagnoses everyone as healthy achieves 99% accuracy yet is completely useless. Sensitivity and specificity separately reveal performance on diseased and healthy populations, preventing this distortion. Always report all three metrics or focus on sensitivity and specificity when prevalence is very low or high.

Can I calculate PPV and NPV from sensitivity and specificity alone?

Yes, but only if you know disease prevalence. The formulas for PPV and NPV require sensitivity, specificity, and prevalence as inputs. If you only have the raw counts (TP, FP, TN, FN) from a specific study, you can calculate PPV and NPV directly without prevalence. However, prevalence is needed to generalize results to other populations.

What does a likelihood ratio of 1.0 mean?

A likelihood ratio of 1.0 means the test result provides no diagnostic information—it doesn't change your odds of disease. LRs greater than 1.0 increase the probability of disease (positive LR) or decrease it (negative LR). As a rule of thumb, LRs between 0.5 and 2.0 are considered weak evidence; 2–10 or 0.1–0.5 are moderate; and above 10 or below 0.1 are strong.

More statistics calculators (see all)