Understanding the Chi-Square Test

The chi-square test quantifies the discrepancy between what you observe in your data and what theory predicts. It applies to categorical variables—grades, color preferences, survey responses, or defect types—where you have frequency counts rather than continuous measurements.

The test works by comparing each observed count to its expected count under the null hypothesis. Large differences signal that your data may not conform to the proposed distribution. The chi-square value itself is dimensionless and always non-negative, since it involves squared deviations.

Common applications include:

  • Testing whether a die is fair (equal probability for each face)
  • Validating that product defects occur randomly across production batches
  • Assessing whether survey responses match demographic expectations
  • Confirming genetic inheritance ratios in controlled breeding

The Chi-Square Formula

For each category, calculate the squared difference between observed and expected counts, then divide by the expected count. The total chi-square statistic sums these individual contributions.

χ² = Σ(O − E)² ÷ E

For a single category:

χ² = (O − E)² ÷ E

  • O — Observed frequency (actual count in your data)
  • E — Expected frequency (count predicted by the hypothesis)
  • Σ — Sum across all categories in your distribution

Interpreting Results and Degrees of Freedom

After summing the chi-square components across all categories, you compare the result to a chi-square distribution table using degrees of freedom (df). The degrees of freedom equal the number of categories minus one.

For example, if you have four grade levels, df = 3. If you have six color categories, df = 5.

The chi-square table gives you a critical value for your chosen significance level (typically α = 0.05). If your calculated chi-square exceeds the critical value, you reject the null hypothesis—your data differs significantly from the expected distribution. A smaller chi-square suggests good agreement with expectations.

Note that chi-square is sensitive to sample size: very large samples can yield high chi-square values even for minor deviations, while small samples may fail to detect real differences.

Common Pitfalls in Chi-Square Testing

Avoid these mistakes when performing or interpreting chi-square tests:

  1. Low expected frequencies — If any expected count drops below 5, the chi-square test becomes unreliable. Combine adjacent categories if possible, or use Fisher's exact test for small samples. This assumption protects the validity of the chi-square distribution approximation.
  2. Confusing observed and expected values — Ensure you're comparing actual counts (observed) against theoretical or hypothesized counts (expected), not percentages. If the hypothesis specifies a 40% share, calculate 40% of your total sample size as the expected frequency.
  3. Forgetting to sum across all categories — The final chi-square statistic is the total of all individual category calculations, not just the largest one. Missing categories or calculating only a subset will give incorrect results.
  4. Misapplying degrees of freedom — The df formula is (number of categories − 1), but if you've estimated parameters from your data, you must subtract additional degrees of freedom. Ignoring this adjustment can lead to misleading p-values.

Practical Example: Grading Distribution

Suppose you expected a class of 60 students to earn grades as follows: 15% grade 5, 40% grade 4, 30% grade 3, and 15% grade 2. Your actual results were: 7 students grade 2, 26 grade 3, 22 grade 4, and 5 grade 5.

First, calculate expected counts:

  • Grade 2: 0.15 × 60 = 9 students
  • Grade 3: 0.30 × 60 = 18 students
  • Grade 4: 0.40 × 60 = 24 students
  • Grade 5: 0.15 × 60 = 9 students

Then compute chi-square for each grade:

  • Grade 2: (7 − 9)² ÷ 9 = 0.444
  • Grade 3: (26 − 18)² ÷ 18 = 3.556
  • Grade 4: (22 − 24)² ÷ 24 = 0.167
  • Grade 5: (5 − 9)² ÷ 9 = 1.778

Total χ² = 0.444 + 3.556 + 0.167 + 1.778 = 5.945. With df = 3, compare this against the critical value (3.841 at α = 0.05), suggesting a marginally significant difference from the intended distribution.

Frequently Asked Questions

What is the minimum expected frequency for a valid chi-square test?

The standard rule is that no expected frequency should be less than 5. If any category has an expected count below 5, the test becomes unreliable because the chi-square distribution approximation breaks down. For very small sample sizes, consider combining categories with similar characteristics or using exact tests. Some statisticians allow one or two cells below 5 if the total sample is large and most expected frequencies exceed 10, but this requires caution.

Can chi-square values be negative?

No. Chi-square values are always zero or positive because they involve squared deviations in the numerator. A chi-square of 0 means perfect agreement between observed and expected frequencies. As observed values diverge further from expectations, chi-square increases. This non-negativity is one reason the test is sensitive only to the magnitude of departure, not its direction.

How do degrees of freedom affect the chi-square critical value?

As degrees of freedom increase, the chi-square distribution shifts rightward, meaning the critical value for a fixed significance level (like 0.05) grows. A test with more categories has higher df and therefore requires a larger chi-square to reject the null hypothesis. For df = 1, the critical value at α = 0.05 is 3.841; for df = 10, it is 18.307. This adjustment accounts for the increased opportunity for random variation in larger datasets.

What sample size do I need for a chi-square goodness-of-fit test?

There's no single minimum, but your total sample size should be large enough that expected frequencies meet the ≥5 rule. In practice, samples of 50 or more are often sufficient, though the exact requirement depends on the number of categories. If you expect many categories with low frequencies, you'll need a larger sample. For pilot studies or exploratory analyses, even smaller samples may be acceptable if you acknowledge the limitations.

Is chi-square the same as chi-square independence test?

No, they are related but distinct. A goodness-of-fit test evaluates whether one categorical variable matches a hypothesized distribution. A chi-square independence test (or contingency table test) compares two categorical variables to see if they are associated. Both use the same formula and chi-square distribution, but the degrees of freedom and research questions differ. Independence tests use df = (rows − 1) × (columns − 1).

Can I use chi-square with continuous data?

Not directly. Chi-square requires categorical or grouped data—frequency counts in discrete categories. If you have continuous measurements (heights, times, weights), you must first bin them into intervals, creating categories. However, this binning introduces arbitrary choices that can affect results. For continuous data, consider Kolmogorov-Smirnov or Anderson-Darling tests instead, which don't require binning.

More statistics calculators (see all)