Understanding the T-Test
The t-test is a parametric hypothesis test for evaluating whether sample data align with a stated population parameter. Unlike the z-test, which requires known population variance, the t-test works with sample standard deviation alone—making it practical for real-world research where population parameters are rarely available.
The t-distribution, also called Student's t-distribution, has heavier tails than the normal distribution, particularly with small samples. This extra conservatism protects against false positives when sample sizes are modest. As degrees of freedom increase, the t-distribution converges toward the normal distribution; with 30+ observations, results from either test are nearly identical.
Key assumptions for valid t-test results:
- Data are approximately normally distributed (or sample size is large enough for the central limit theorem to apply)
- Observations are independent
- For two-sample tests with equal-variance assumption: both populations have similar variance
Which T-Test Should You Use?
One-sample t-test: Tests whether a single sample mean differs from a hypothesized population mean. Example: measuring whether cans labelled 330 ml actually contain that volume on average.
Two-sample t-test: Compares means across two independent groups. Use this when you have data from separate samples—such as comparing weight loss between a treatment group and a control group. You can choose between a standard approach (assuming equal variances) or Welch's method (unequal variances).
Paired t-test: Compares measurements from the same subjects measured twice, such as before and after an intervention. The test works on the differences between paired observations, eliminating individual variation.
Your choice depends entirely on your study design: one group (one-sample), two independent groups (two-sample), or one group measured twice (paired).
T-Test Formulas
The t-score standardizes the difference between your observed sample statistic and the null hypothesis value, scaled by the standard error. Below are the three main formulas:
One-sample t-score:
t = (x̄ − μ₀) / (s / √n)
Two-sample t-score (equal variances):
t = (x̄₁ − x̄₂ − Δ) / √[s²ₚ(1/n₁ + 1/n₂)]
where s²ₚ = [(n₁−1)s₁² + (n₂−1)s₂²] / (n₁ + n₂ − 2)
Welch's t-score (unequal variances):
t = (x̄₁ − x̄₂ − Δ) / √(s₁²/n₁ + s₂²/n₂)
Paired t-score:
t = (d̄ − Δ) / (sₐ / √n)
x̄, x̄₁, x̄₂— Sample mean(s)μ₀, Δ— Hypothesized population mean or mean difference under the null hypothesiss, s₁, s₂, sₐ— Sample standard deviation(s)n, n₁, n₂— Sample size(s)s²ₚ— Pooled variance for equal-variance assumptiond̄— Mean of the paired differences
P-Values and Critical Regions
After computing your t-score, you obtain a p-value: the probability of observing a test statistic as extreme or more extreme than your result, assuming the null hypothesis is true. A p-value below your chosen significance level (α, typically 0.05) leads to rejection of the null hypothesis.
Alternatively, the critical-region approach defines rejection boundaries based on α and degrees of freedom. Your t-score falls either inside the critical region (reject the null) or outside (fail to reject).
One-tailed vs. two-tailed: Use two-tailed when testing whether means differ in either direction. Use one-tailed (left or right) when your hypothesis specifies a direction—for example, testing whether a new process is faster than the old one, not just different.
Degrees of freedom (df) affect the t-distribution shape: df = n − 1 for one-sample tests, df = n₁ + n₂ − 2 for two-sample (equal variance), and Welch's method uses a more complex formula accounting for unequal variances.
Common Pitfalls and Practical Advice
Avoid these mistakes when interpreting t-test results:
- Confusing Statistical and Practical Significance — A tiny p-value does not guarantee a meaningful real-world difference. With large samples, even trivial effects become statistically significant. Always examine the actual difference in means and confidence intervals, not just p-values.
- Violating Normality Without Justification — The t-test assumes approximate normality, particularly critical with small samples (n < 15). With large samples, minor deviations are tolerable due to the central limit theorem. Always check a histogram or Q-Q plot; if data are severely skewed or contain outliers, consider a non-parametric alternative like the Mann–Whitney U test.
- Forgetting to Check Variance Homogeneity — For two-sample tests, verify that group variances are similar before using the standard (equal-variance) formula. Levene's test or visual inspection of standard deviations can guide your choice. If variances differ substantially, use Welch's t-test instead.
- Performing Multiple Tests Without Correction — Running many t-tests on the same dataset inflates the false positive rate. If comparing three or more groups, use ANOVA. If conducting multiple pairwise comparisons, apply a correction (e.g., Bonferroni) or pre-specify your comparisons.