Understanding the Bonferroni Correction

The Bonferroni correction is a straightforward method for controlling the family-wise error rate—the probability of making at least one false-positive discovery across all tests in an analysis. Without adjustment, running 10 independent tests at α = 0.05 gives roughly a 40% chance of at least one spurious significant result. Dividing your significance threshold by the number of tests restores control.

Two variants exist in practice:

  • Classical method: Divide α by the count of tests. Simple, widely understood, and conservative.
  • Šidák correction: Use a multiplicative adjustment assuming independence between tests. Slightly less stringent than the classical approach, often preferred when test dependence is modest.

The choice depends on your research context and whether you prioritize minimising false positives (classical) or preserving statistical power (Šidák).

Bonferroni Correction Formulas

Two formulas govern this adjustment, depending on your chosen method:

Classical: α_corrected = α ÷ n

Šidák: correction = 1 − (1 − p)^(1/n)

  • α — Original significance level (e.g., 0.05 for 5%)
  • n — Total number of independent statistical tests performed
  • p — Observed p-value from an individual test

When to Apply Bonferroni Correction

Use Bonferroni correction in scenarios where you conduct multiple hypothesis tests on the same dataset:

  • Factorial designs: Comparing multiple groups or conditions simultaneously.
  • Genetic studies: Testing thousands of single-nucleotide polymorphisms (SNPs) for disease association.
  • Exploratory research: Screening many candidate variables before confirmatory analysis.
  • Post-hoc comparisons: Pairwise contrasts following a significant omnibus test.

The correction becomes essential as the number of tests increases. With 50 tests at uncorrected α = 0.05, the family-wise error rate climbs to approximately 92%. Correction restores the overall error rate to your chosen threshold.

Key Considerations and Pitfalls

Proper use of Bonferroni correction requires awareness of its strengths and limitations in different analytical contexts.

  1. Conservative trade-off — The classical Bonferroni method is intentionally stringent, reducing false positives at the cost of increased false negatives (Type II errors). For exploratory work with large sample sizes, this conservatism may eliminate genuine signals. Šidák's variant offers a middle ground when tests show weak dependence.
  2. Independence assumption — Bonferroni works best when tests are independent or nearly so. Correlated tests—such as overlapping genetic regions or variables derived from the same measure—violate this assumption. In such cases, effective sample size drops, and traditional correction becomes overly conservative.
  3. Specifying the test count — Decide upfront whether to include all planned tests or only those yielding interesting results. Post-hoc correction of selected findings inflates false positives. Document your intended test set before analysis to maintain statistical validity.
  4. Alternatives for large test sets — With hundreds or thousands of tests, consider Holm-Bonferroni (less conservative stepwise control) or false discovery rate methods (Benjamini-Hochberg), which tolerate a small proportion of false positives while preserving power. Choose based on your field's standards and research goals.

Practical Example

Suppose you conduct four independent t-tests comparing treatment groups, with α = 0.05. The Bonferroni-corrected threshold becomes α_c = 0.05 ÷ 4 = 0.0125. Only p-values below 0.0125 qualify as significant at the family-wise level.

If your four p-values are 0.003, 0.008, 0.023, and 0.041, the first two pass the corrected threshold, while the latter two, though individually "significant," do not. This stricter gate prevents false discoveries but demands larger effect sizes or sample sizes to achieve statistical power.

Using the Šidák method instead: correction = 1 − (1 − 0.05)^(1/4) ≈ 0.0127, a marginally less demanding threshold that often yields similar conclusions.

Frequently Asked Questions

How do I apply the Bonferroni correction to my analysis?

First, count all tests you plan to perform, including post-hoc comparisons. Divide your significance level (typically 0.05) by this number to obtain the corrected threshold. Compare each test's p-value against this lower threshold rather than the original α. Alternatively, multiply each p-value by the test count and compare the adjusted p-value to α = 0.05. Document your test count beforehand to avoid selective correction, which biases results.

What is the corrected significance level for 10 tests?

Using the classical Bonferroni method: α_c = 0.05 ÷ 10 = 0.005. Using Šidák's approach: correction = 1 − (1 − 0.05)^(1/10) ≈ 0.00513. Both yield thresholds around 0.005, though Šidák is marginally less stringent. Choose classical if you prioritise simplicity and maximum false-positive control; choose Šidák if tests show approximate independence and you want slightly more statistical power.

Why does multiple testing increase false-positive risk?

When you perform independent tests at α = 0.05, each has a 5% chance of a false positive under the null hypothesis. Across k tests, the family-wise error rate—probability of at least one false positive—equals 1 − (0.95)^k. With ten tests, this reaches roughly 40%. Bonferroni correction lowers the per-test threshold, restoring family-wise control. Without adjustment, published results from exploratory analyses appear significant due to chance alone rather than true effects.

Is Bonferroni correction always necessary?

Not universally. If you conduct a single, pre-registered hypothesis test, correction is unnecessary. For confirmatory studies with few planned comparisons, correction maintains statistical integrity. However, Bonferroni becomes increasingly conservative and power-draining with many tests. Genomics and high-dimensional fields often favour false discovery rate methods instead, accepting a small proportion of false positives to retain statistical sensitivity. Choose based on your field's conventions and research goals.

How does the Šidák correction differ from classical Bonferroni?

Classical Bonferroni divides α by n, assuming complete dependence. Šidák uses 1 − (1 − α)^(1/n), which assumes independence and yields a slightly higher threshold, improving power. The difference is small for moderate test counts but grows with many tests. Šidák is generally preferred in practice when independence is reasonable, though both are widely accepted. Classical remains standard in conservative fields like medical diagnostics.

What happens if I ignore multiple comparison correction?

Ignoring correction inflates false-positive rates dramatically. A researcher running 20 tests uncorrected faces roughly a 64% chance of reporting at least one spurious finding as significant. Over many studies and researchers, this generates irreproducible results—the replication crisis affecting many fields. Journals and funding agencies increasingly demand correction or explicit justification for its omission, so reporting practices now strongly favour transparency and control of error rates.

More statistics calculators (see all)