Understanding the Bonferroni Correction
The Bonferroni correction is a straightforward method for controlling the family-wise error rate—the probability of making at least one false-positive discovery across all tests in an analysis. Without adjustment, running 10 independent tests at α = 0.05 gives roughly a 40% chance of at least one spurious significant result. Dividing your significance threshold by the number of tests restores control.
Two variants exist in practice:
- Classical method: Divide α by the count of tests. Simple, widely understood, and conservative.
- Šidák correction: Use a multiplicative adjustment assuming independence between tests. Slightly less stringent than the classical approach, often preferred when test dependence is modest.
The choice depends on your research context and whether you prioritize minimising false positives (classical) or preserving statistical power (Šidák).
Bonferroni Correction Formulas
Two formulas govern this adjustment, depending on your chosen method:
Classical: α_corrected = α ÷ n
Šidák: correction = 1 − (1 − p)^(1/n)
α— Original significance level (e.g., 0.05 for 5%)n— Total number of independent statistical tests performedp— Observed p-value from an individual test
When to Apply Bonferroni Correction
Use Bonferroni correction in scenarios where you conduct multiple hypothesis tests on the same dataset:
- Factorial designs: Comparing multiple groups or conditions simultaneously.
- Genetic studies: Testing thousands of single-nucleotide polymorphisms (SNPs) for disease association.
- Exploratory research: Screening many candidate variables before confirmatory analysis.
- Post-hoc comparisons: Pairwise contrasts following a significant omnibus test.
The correction becomes essential as the number of tests increases. With 50 tests at uncorrected α = 0.05, the family-wise error rate climbs to approximately 92%. Correction restores the overall error rate to your chosen threshold.
Key Considerations and Pitfalls
Proper use of Bonferroni correction requires awareness of its strengths and limitations in different analytical contexts.
- Conservative trade-off — The classical Bonferroni method is intentionally stringent, reducing false positives at the cost of increased false negatives (Type II errors). For exploratory work with large sample sizes, this conservatism may eliminate genuine signals. Šidák's variant offers a middle ground when tests show weak dependence.
- Independence assumption — Bonferroni works best when tests are independent or nearly so. Correlated tests—such as overlapping genetic regions or variables derived from the same measure—violate this assumption. In such cases, effective sample size drops, and traditional correction becomes overly conservative.
- Specifying the test count — Decide upfront whether to include all planned tests or only those yielding interesting results. Post-hoc correction of selected findings inflates false positives. Document your intended test set before analysis to maintain statistical validity.
- Alternatives for large test sets — With hundreds or thousands of tests, consider Holm-Bonferroni (less conservative stepwise control) or false discovery rate methods (Benjamini-Hochberg), which tolerate a small proportion of false positives while preserving power. Choose based on your field's standards and research goals.
Practical Example
Suppose you conduct four independent t-tests comparing treatment groups, with α = 0.05. The Bonferroni-corrected threshold becomes α_c = 0.05 ÷ 4 = 0.0125. Only p-values below 0.0125 qualify as significant at the family-wise level.
If your four p-values are 0.003, 0.008, 0.023, and 0.041, the first two pass the corrected threshold, while the latter two, though individually "significant," do not. This stricter gate prevents false discoveries but demands larger effect sizes or sample sizes to achieve statistical power.
Using the Šidák method instead: correction = 1 − (1 − 0.05)^(1/4) ≈ 0.0127, a marginally less demanding threshold that often yields similar conclusions.