Understanding Hypothesis Testing

Hypothesis testing frames research questions as statistical decisions. You begin with a null hypothesis (H₀), which asserts no effect or no difference—the status quo. The alternative hypothesis (H₁) proposes the opposite: that a meaningful effect or difference exists.

The process relies on sample data to compute a test statistic, which is then compared against a critical threshold. This threshold depends on your chosen significance level (α)—typically 0.05—which represents the probability of rejecting H₀ when it is actually true (a Type I error). If your test statistic falls in the rejection region, you reject H₀ and conclude there is sufficient evidence for H₁. Otherwise, you fail to reject H₀.

The choice of test depends on your data: use a z-test for large samples (n ≥ 30) with known population standard deviation, a t-test for smaller samples or unknown population variation, and a chi-square test for categorical associations.

One-Tailed vs. Two-Tailed Tests

Hypothesis tests differ in directionality. A two-tailed test checks whether a parameter differs in either direction from the hypothesized value, splitting your significance level equally between both tails of the distribution. This is the most conservative approach, requiring stronger evidence.

A one-tailed test focuses on a single direction. A right-tailed test asks whether the parameter is greater than the hypothesized value, placing the entire rejection region on the right. A left-tailed test asks whether it is less than, placing rejection on the left. One-tailed tests are more powerful—they require less extreme evidence—but only when your research hypothesis genuinely has a directional prediction.

Choosing the wrong tail can invalidate your conclusions, so decide before analyzing your data based on your research question, not your results.

Test Statistic Formulas

The formula you use depends on your sample size and whether you know the population standard deviation.

Z-Test Statistic: Use when n ≥ 30 or the population standard deviation is known.

z = (x̄ − μ₀) ÷ (σ ÷ √n)

  • — Sample mean
  • μ₀ — Hypothesized population mean
  • σ — Population standard deviation (or sample standard deviation for large n)
  • n — Sample size

T-Test Statistic for Small Samples

When your sample size is under 30 and the population standard deviation is unknown, use the t-test. The t-distribution has heavier tails than the normal distribution, accounting for extra uncertainty in small samples.

t = (x̄ − μ₀) ÷ (s ÷ √n)

  • — Sample mean
  • μ₀ — Hypothesized population mean
  • s — Sample standard deviation
  • n — Sample size

Common Pitfalls in Hypothesis Testing

Avoid these mistakes when designing and interpreting your tests.

  1. Confusing p-value with probability of H₀ — A p-value is <em>not</em> the probability that H₀ is true. It is the probability of observing data as extreme as yours if H₀ were true. A small p-value means your data is unlikely under H₀, not that H₀ is unlikely to be true.
  2. Stopping your study early if results look good — Repeatedly checking results and stopping when you see significance inflates your Type I error rate. Decide your sample size and stopping rule before collecting data.
  3. Choosing your tail direction after seeing results — Selecting a one-tailed test because your sample mean is in that direction amounts to p-hacking. Define your hypothesis direction in advance based on theory, not data.
  4. Using the wrong test for your data type — T-tests assume roughly normal data; chi-square tests require categorical variables with adequate expected frequencies. Applying the wrong test produces invalid conclusions.

Frequently Asked Questions

What does a significance level of 0.05 actually mean?

A significance level (α) of 0.05 is your chosen threshold for deciding when sample evidence is strong enough to reject H₀. It means you are willing to tolerate a 5% risk of incorrectly rejecting H₀ when it is truly correct (Type I error). This is conventionally standard in many fields, though some disciplines use stricter thresholds like 0.01 for high-stakes decisions. The choice should reflect the consequences of a false positive in your context.

How do I interpret a p-value of 0.03 when my α is 0.05?

A p-value of 0.03 means there is a 3% probability of observing data at least as extreme as yours if H₀ were true. Since 0.03 < 0.05, it falls below your significance threshold, so you reject H₀. This provides evidence supporting H₁. However, this does not guarantee H₁ is true; it simply means your data is inconsistent with H₀ at the 5% level.

What is the difference between Type I and Type II errors?

A <strong>Type I error</strong> (α) occurs when you reject H₀ when it is actually true—a false positive. A <strong>Type II error</strong> (β) occurs when you fail to reject H₀ when it is actually false—a false negative. Reducing both simultaneously is impossible without increasing sample size. Your significance level directly controls Type I error; increasing sample size reduces Type II error.

Should I use a one-tailed or two-tailed test?

Use a two-tailed test unless your research question explicitly predicts a direction before you see the data. Two-tailed tests are more conservative and are the default in most applications. One-tailed tests are appropriate only when there is a strong theoretical or practical reason to expect an effect in only one direction, and you commit to that direction in advance.

When should I choose a t-test over a z-test?

Use a t-test when your sample size is below 30 and the population standard deviation is unknown—common in real research. Use a z-test for large samples (n ≥ 30) or when the population standard deviation is known. The t-distribution accounts for uncertainty from using a sample standard deviation as an estimate; as sample size grows, the t-distribution converges to the normal distribution.

Can I conduct hypothesis testing with raw data or do I need summary statistics?

You can do either. If you have raw data, the calculator computes the sample mean and standard deviation automatically. If you only have summary statistics—sample mean, standard deviation, and sample size—you can input those directly. Both approaches yield the same results; raw data simply provides transparency about how statistics were computed.

More statistics calculators (see all)