Understanding the T-Test

The t-test is a parametric hypothesis test for evaluating whether sample data align with a stated population parameter. Unlike the z-test, which requires known population variance, the t-test works with sample standard deviation alone—making it practical for real-world research where population parameters are rarely available.

The t-distribution, also called Student's t-distribution, has heavier tails than the normal distribution, particularly with small samples. This extra conservatism protects against false positives when sample sizes are modest. As degrees of freedom increase, the t-distribution converges toward the normal distribution; with 30+ observations, results from either test are nearly identical.

Key assumptions for valid t-test results:

  • Data are approximately normally distributed (or sample size is large enough for the central limit theorem to apply)
  • Observations are independent
  • For two-sample tests with equal-variance assumption: both populations have similar variance

Which T-Test Should You Use?

One-sample t-test: Tests whether a single sample mean differs from a hypothesized population mean. Example: measuring whether cans labelled 330 ml actually contain that volume on average.

Two-sample t-test: Compares means across two independent groups. Use this when you have data from separate samples—such as comparing weight loss between a treatment group and a control group. You can choose between a standard approach (assuming equal variances) or Welch's method (unequal variances).

Paired t-test: Compares measurements from the same subjects measured twice, such as before and after an intervention. The test works on the differences between paired observations, eliminating individual variation.

Your choice depends entirely on your study design: one group (one-sample), two independent groups (two-sample), or one group measured twice (paired).

T-Test Formulas

The t-score standardizes the difference between your observed sample statistic and the null hypothesis value, scaled by the standard error. Below are the three main formulas:

One-sample t-score:

t = (x̄ − μ₀) / (s / √n)

Two-sample t-score (equal variances):

t = (x̄₁ − x̄₂ − Δ) / √[s²ₚ(1/n₁ + 1/n₂)]

where s²ₚ = [(n₁−1)s₁² + (n₂−1)s₂²] / (n₁ + n₂ − 2)

Welch's t-score (unequal variances):

t = (x̄₁ − x̄₂ − Δ) / √(s₁²/n₁ + s₂²/n₂)

Paired t-score:

t = (d̄ − Δ) / (sₐ / √n)

  • x̄, x̄₁, x̄₂ — Sample mean(s)
  • μ₀, Δ — Hypothesized population mean or mean difference under the null hypothesis
  • s, s₁, s₂, sₐ — Sample standard deviation(s)
  • n, n₁, n₂ — Sample size(s)
  • s²ₚ — Pooled variance for equal-variance assumption
  • — Mean of the paired differences

P-Values and Critical Regions

After computing your t-score, you obtain a p-value: the probability of observing a test statistic as extreme or more extreme than your result, assuming the null hypothesis is true. A p-value below your chosen significance level (α, typically 0.05) leads to rejection of the null hypothesis.

Alternatively, the critical-region approach defines rejection boundaries based on α and degrees of freedom. Your t-score falls either inside the critical region (reject the null) or outside (fail to reject).

One-tailed vs. two-tailed: Use two-tailed when testing whether means differ in either direction. Use one-tailed (left or right) when your hypothesis specifies a direction—for example, testing whether a new process is faster than the old one, not just different.

Degrees of freedom (df) affect the t-distribution shape: df = n − 1 for one-sample tests, df = n₁ + n₂ − 2 for two-sample (equal variance), and Welch's method uses a more complex formula accounting for unequal variances.

Common Pitfalls and Practical Advice

Avoid these mistakes when interpreting t-test results:

  1. Confusing Statistical and Practical Significance — A tiny p-value does not guarantee a meaningful real-world difference. With large samples, even trivial effects become statistically significant. Always examine the actual difference in means and confidence intervals, not just p-values.
  2. Violating Normality Without Justification — The t-test assumes approximate normality, particularly critical with small samples (n < 15). With large samples, minor deviations are tolerable due to the central limit theorem. Always check a histogram or Q-Q plot; if data are severely skewed or contain outliers, consider a non-parametric alternative like the Mann–Whitney U test.
  3. Forgetting to Check Variance Homogeneity — For two-sample tests, verify that group variances are similar before using the standard (equal-variance) formula. Levene's test or visual inspection of standard deviations can guide your choice. If variances differ substantially, use Welch's t-test instead.
  4. Performing Multiple Tests Without Correction — Running many t-tests on the same dataset inflates the false positive rate. If comparing three or more groups, use ANOVA. If conducting multiple pairwise comparisons, apply a correction (e.g., Bonferroni) or pre-specify your comparisons.

Frequently Asked Questions

When should I use a t-test instead of a z-test?

Use a t-test when the population standard deviation is unknown and you must estimate it from your sample. The z-test requires knowing the population standard deviation, which is rarely available in practice. For small samples (fewer than 30 observations), the t-test is essential because the t-distribution accounts for additional uncertainty. With 30+ observations and unknown variance, t and z results converge, but the t-test remains the safer choice.

What is the difference between a one-tailed and two-tailed t-test?

A two-tailed test evaluates whether means differ in either direction—greater or less than the null hypothesis value. Use this when you have no prior expectation about directionality. A one-tailed test specifies a direction: you hypothesize the mean is greater than or less than the null value. One-tailed tests have more statistical power for detecting effects in the hypothesized direction but cannot detect effects in the opposite direction, so use them only when directionality is theoretically justified.

How do degrees of freedom affect the t-distribution?

Degrees of freedom (df) parameterize the t-distribution shape. Lower df values produce heavier tails, requiring larger t-scores to reach significance—a built-in conservative correction for smaller samples. As df increases, the t-distribution approaches a standard normal distribution. For a one-sample test, df equals sample size minus one (n − 1). For a two-sample test with equal variances, df = n₁ + n₂ − 2. Higher df always makes it easier to achieve statistical significance with the same underlying effect.

Can I use a t-test if my data are not normally distributed?

Small deviations from normality are usually acceptable, especially with larger samples. The central limit theorem ensures that sample means approximate a normal distribution when n ≥ 30, regardless of the underlying population distribution. However, with small samples, if your data show severe skewness or contain extreme outliers, the t-test may be unreliable. In such cases, consider a non-parametric test like the Wilcoxon signed-rank test (paired) or Mann–Whitney U test (two independent samples).

What is the paired t-test used for?

The paired t-test compares two measurements from the same subjects, eliminating individual variation. Common applications include before-and-after studies (e.g., blood pressure before and after medication), matched-pair designs, or repeated measurements under different conditions. Because you work with differences rather than raw values, the test is more statistically powerful than a two-sample test on independent groups. The null hypothesis typically assumes zero mean difference; rejection indicates a significant change.

How do I choose between equal-variance and Welch's t-test for two samples?

If sample standard deviations are roughly similar (within a 1.5× ratio), the standard equal-variance t-test is appropriate. If variances differ substantially—visible as one group having much larger spread than the other—use Welch's t-test instead. Welch's method does not assume equal variances and uses a modified degrees-of-freedom calculation. When in doubt, Welch's is the safer choice; it performs nearly as well as the standard test when variances are equal but is more accurate when they diverge.

More statistics calculators (see all)