Understanding P-Values

A p-value is fundamentally a conditional probability: given that the null hypothesis holds, what is the chance of observing a test statistic as extreme as (or more extreme than) the one you calculated from your sample?

The p-value does not tell you the probability that the null hypothesis is true. Instead, it measures compatibility between your data and the null hypothesis. Under repeated sampling from the same population, a smaller p-value suggests your observed result would be rarer under the null hypothesis.

The interpretation hinge on the significance level you choose (commonly α = 0.05):

  • p-value < α: Reject the null hypothesis. The data provide evidence against it.
  • p-value ≥ α: Fail to reject the null hypothesis. The data are consistent with it.

This framework applies uniformly across all test distributions, though the calculation method differs.

P-Value Calculation Formulas

The p-value depends on both your test statistic and the type of hypothesis test. Let cdf denote the cumulative distribution function of your chosen distribution.

Left-tailed test: p-value = cdf(x)

Right-tailed test: p-value = 1 − cdf(x)

Two-tailed test: p-value = 2 × min{cdf(x), 1 − cdf(x)}

  • x — Your test statistic (Z-score, t-score, χ², or F-value)
  • cdf(x) — Cumulative distribution function evaluated at x, specific to your distribution

Selecting the Right Distribution

Choose your distribution based on what you know about your data and test:

  • Z-test (Normal Distribution): Use when testing a population mean with known population standard deviation, or for large samples (n > 30).
  • t-test (t-Student Distribution): Use for small samples or when population standard deviation is unknown. Specify degrees of freedom (typically n − 1 for one-sample tests).
  • Chi-squared Test: Use when testing proportions or independence in categorical data, or goodness-of-fit tests. Requires degrees of freedom equal to the number of categories minus constraints.
  • F-test (Fisher–Snedecor Distribution): Use when comparing variances across groups or in regression analysis. Requires two degrees-of-freedom parameters: numerator and denominator.

Common Pitfalls When Interpreting P-Values

Misunderstanding p-values is endemic in statistical practice. Avoid these frequent mistakes.

  1. P-value ≠ Probability of Null Hypothesis — A p-value is not the probability your null hypothesis is true. It's the probability of seeing your data (or more extreme) if the null were true. A small p-value is evidence against the null, not proof that an alternative is true.
  2. One-Tailed vs Two-Tailed Tests — Using the wrong tail direction inflates your false positive rate. A two-tailed test splits α equally between both extremes; one-tailed tests concentrate it in one direction. Choose your tail structure before analyzing, not after seeing results.
  3. Multiple Testing Compounds Error — Running many statistical tests without correction inflates the overall error rate. If you perform 20 independent tests at α = 0.05, you expect ~1 false positive by chance. Use corrections like Bonferroni when testing multiple hypotheses.
  4. P-Value &lt; 0.05 Does Not Guarantee Replication — Statistical significance at p &lt; 0.05 does not ensure your finding will replicate. With low statistical power or publication bias, significant results often fail to reproduce. Report effect sizes and confidence intervals alongside p-values.

Worked Example: Z-Test P-Value

Suppose a factory claims lightbulbs last 1,000 hours on average. You test 100 bulbs and find a mean lifetime of 980 hours with a known population standard deviation of 50 hours. Your null hypothesis is that μ = 1,000; your alternative is that μ ≠ 1,000 (two-tailed).

First, calculate the Z-score:

Z = (980 − 1000) ÷ (50 ÷ √100) = −20 ÷ 5 = −4

For a two-tailed test, the p-value is 2 × Φ(−4) ≈ 2 × 0.00003 ≈ 0.00006. This tiny p-value (well below 0.05) provides strong evidence to reject the factory's claim.

Frequently Asked Questions

What is the difference between a one-tailed and two-tailed p-value?

A one-tailed p-value tests whether your statistic is extreme in one specific direction (left or right). A two-tailed p-value tests whether it is extreme in either direction. Use one-tailed tests only when you have strong prior reason to predict the direction; otherwise, use two-tailed tests to avoid bias. Two-tailed p-values are typically twice as large as one-tailed p-values for the same test statistic.

How do I choose between t-test and Z-test?

Use a Z-test when the population standard deviation is known and sample size is large (n > 30). Use a t-test for small samples or when the population standard deviation is unknown and you must estimate it from your sample. The t-distribution has heavier tails than the normal distribution, making it more conservative for small samples. As n increases, the t-distribution converges to the normal distribution.

Can a p-value be exactly zero?

In practice, no. A p-value of exactly zero would mean the observed outcome is impossible under the null hypothesis, which almost never occurs with continuous distributions. Software may report very small p-values (e.g., p < 0.0001) when the true p-value is smaller than machine precision. Always interpret p < 0.001 as 'very small,' not as exactly zero.

What does it mean if my p-value equals 0.05?

A p-value of exactly 0.05 falls on the boundary of the conventional significance threshold. Whether you reject the null hypothesis depends on your choice: some researchers use ≤ 0.05 (reject), others use < 0.05 (fail to reject). The distinction is arbitrary and convention-dependent. In borderline cases, consider reporting the exact p-value and effect size rather than relying solely on the threshold.

How do degrees of freedom affect the p-value?

Degrees of freedom shape the distribution of your test statistic. For t-tests, higher degrees of freedom make the distribution closer to normal, yielding smaller p-values for the same test statistic. For chi-squared and F-tests, degrees of freedom determine the distribution's shape entirely. Always specify the correct degrees of freedom; using the wrong value will give incorrect p-values.

Is a p-value of 0.001 twice as significant as 0.05?

No. P-values are not directly comparable as 'levels of significance' in that way. Both p = 0.001 and p = 0.05 fall below the typical threshold of α = 0.05, so both lead to rejecting the null hypothesis. However, p = 0.001 provides stronger evidence against the null than p = 0.05. Report the exact p-value rather than simply noting whether it crosses α.

More statistics calculators (see all)