What Are Descriptive Statistics?

Descriptive statistics are numerical summaries that characterize a dataset's essential features. Rather than listing every observation, they compress information into a few key measures that reveal what the data represents.

Descriptive statistics typically address three dimensions:

  • Central tendency — the dataset's typical or central value (mean, median, mode)
  • Dispersion — how widely values spread around the centre (variance, standard deviation, range)
  • Shape — the asymmetry or concentration of values (skewness, kurtosis)

Unlike inferential statistics, which test hypotheses about populations using sample data, descriptive statistics simply present what the data shows. They form the foundation for all subsequent statistical analysis.

Core Descriptive Statistics Formulas

The most frequently computed measures rely on straightforward arithmetic. Here are the essential formulas:

Mean (μ or x̄):
μ = (x₁ + x₂ + ... + xₙ) ÷ n

Variance (σ² or s²):
σ² = Σ(xᵢ − μ)² ÷ n   (population)
s² = Σ(xᵢ − x̄)² ÷ (n − 1)   (sample)

Standard Deviation (σ or s):
σ = √(σ²)

Range:
Range = max(x) − min(x)

  • n — Number of observations in the dataset
  • xᵢ — Individual data point
  • μ — Population mean
  • — Sample mean
  • σ² — Population variance
  • — Sample variance

Understanding the Five-Number Summary

The five-number summary provides a snapshot of data distribution using five percentiles:

  • Minimum — the smallest value
  • First quartile (Q₁) — the 25th percentile, below which 25% of data falls
  • Median (Q₂) — the 50th percentile, the middle value when sorted
  • Third quartile (Q₃) — the 75th percentile, below which 75% of data falls
  • Maximum — the largest value

This summary resists distortion from outliers and clearly shows where data clusters. The interquartile range (Q₃ − Q₁) indicates the spread of the middle 50% of observations.

Descriptive Statistics in Practice

Descriptive statistics appear constantly in real-world applications:

  • Education: Grade point averages summarise academic performance across multiple courses
  • Manufacturing: Standard deviation of component measurements identifies production consistency
  • Climate science: Mean annual rainfall and temperature variance reveal regional weather patterns
  • Finance: Portfolio volatility (standard deviation of returns) quantifies investment risk
  • Healthcare: Average patient recovery time and outcome variance guide treatment protocols

These measures enable stakeholders to communicate findings without requiring detailed examination of raw data.

Common Pitfalls When Interpreting Descriptive Statistics

Avoid these frequent mistakes when summarising and comparing datasets:

  1. Assuming mean equals median — The mean is pulled toward outliers while the median stays central. In skewed distributions (like household income), these differ substantially. Always examine both—if they diverge significantly, your data likely contains extreme values that deserve investigation.
  2. Ignoring sample versus population distinction — Sample statistics use <em>n</em> − 1 in variance calculations (Bessel's correction) because samples tend to underestimate population variability. Applying population formulas to samples systematically underestimates true dispersion.
  3. Overlooking the units in standard deviation — Standard deviation shares the original variable's units. A standard deviation of £2,000 in salary data is not directly comparable to £20 in daily expenses. Always contextualise dispersion within the measurement scale.
  4. Trusting summaries without checking distribution shape — Two datasets can share identical mean and standard deviation yet have completely different shapes. One might be symmetric while the other is bimodal. Plot your data visually before relying solely on numerical summaries.

Frequently Asked Questions

How do sample and population statistics differ?

A population includes every individual or observation in your group of interest, while a sample is a subset you've actually measured. Sample statistics carry additional uncertainty because they estimate population parameters. The key difference appears in variance calculations: sample variance divides by (n − 1) instead of n. This Bessel's correction accounts for the fact that sample variation underestimates population variation. Use sample calculations unless you possess complete population data, which is rare in practice.

When should I use mean versus median?

Use the mean when data are roughly symmetric and you need a centre point sensitive to every value. Use the median when outliers exist or data are skewed, since it represents the actual middle position. For income datasets, median is typically more informative than mean because a few high earners inflate the mean without affecting median. Reporting both provides fuller insight into central tendency.

What does standard deviation actually measure?

Standard deviation quantifies how far, on average, individual observations deviate from the mean. It's the square root of variance and shares the original measurement units, making it intuitive to interpret. A standard deviation of 5 metres in height data means typical variation is around 5 metres from the mean. In a normal distribution, approximately 68% of values fall within one standard deviation of the mean, 95% within two, and 99.7% within three.

Why does the calculator distinguish between population and sample?

The distinction affects variance and standard deviation calculations. Samples systematically underestimate population spread because they contain less extreme variation than the full population. Dividing by (n − 1) rather than n corrects this bias. If you've sampled data from a larger group, select 'sample' to get unbiased estimates. Only choose 'population' if you possess measurements for every single member of your group of interest.

Can descriptive statistics reveal cause and effect?

No. Descriptive statistics summarise patterns in data but cannot establish causation. They show <em>what</em> happened, not <em>why</em>. If you observe that students with higher study hours have higher exam scores, descriptive statistics document this association. However, they cannot prove study hours caused the improvement—confounding factors like prior knowledge or motivation might explain both. Use inferential statistics and designed experiments to test causal claims.

How many data points do I need for reliable descriptive statistics?

Larger samples generally produce more stable estimates, but even small samples yield valid descriptive statistics. A sample of 10 observations produces legitimate mean and variance. However, sample statistics become increasingly unreliable as your sample shrinks relative to population size. With 5 observations, you're describing those 5 values accurately, but any inference to a larger population becomes questionable. Aim for at least 30 observations if you plan to make population estimates.

More statistics calculators (see all)