What Are Descriptive Statistics?
Descriptive statistics are numerical summaries that characterize a dataset's essential features. Rather than listing every observation, they compress information into a few key measures that reveal what the data represents.
Descriptive statistics typically address three dimensions:
- Central tendency — the dataset's typical or central value (mean, median, mode)
- Dispersion — how widely values spread around the centre (variance, standard deviation, range)
- Shape — the asymmetry or concentration of values (skewness, kurtosis)
Unlike inferential statistics, which test hypotheses about populations using sample data, descriptive statistics simply present what the data shows. They form the foundation for all subsequent statistical analysis.
Core Descriptive Statistics Formulas
The most frequently computed measures rely on straightforward arithmetic. Here are the essential formulas:
Mean (μ or x̄):
μ = (x₁ + x₂ + ... + xₙ) ÷ n
Variance (σ² or s²):
σ² = Σ(xᵢ − μ)² ÷ n (population)
s² = Σ(xᵢ − x̄)² ÷ (n − 1) (sample)
Standard Deviation (σ or s):
σ = √(σ²)
Range:
Range = max(x) − min(x)
n— Number of observations in the datasetxᵢ— Individual data pointμ— Population meanx̄— Sample meanσ²— Population variances²— Sample variance
Understanding the Five-Number Summary
The five-number summary provides a snapshot of data distribution using five percentiles:
- Minimum — the smallest value
- First quartile (Q₁) — the 25th percentile, below which 25% of data falls
- Median (Q₂) — the 50th percentile, the middle value when sorted
- Third quartile (Q₃) — the 75th percentile, below which 75% of data falls
- Maximum — the largest value
This summary resists distortion from outliers and clearly shows where data clusters. The interquartile range (Q₃ − Q₁) indicates the spread of the middle 50% of observations.
Descriptive Statistics in Practice
Descriptive statistics appear constantly in real-world applications:
- Education: Grade point averages summarise academic performance across multiple courses
- Manufacturing: Standard deviation of component measurements identifies production consistency
- Climate science: Mean annual rainfall and temperature variance reveal regional weather patterns
- Finance: Portfolio volatility (standard deviation of returns) quantifies investment risk
- Healthcare: Average patient recovery time and outcome variance guide treatment protocols
These measures enable stakeholders to communicate findings without requiring detailed examination of raw data.
Common Pitfalls When Interpreting Descriptive Statistics
Avoid these frequent mistakes when summarising and comparing datasets:
- Assuming mean equals median — The mean is pulled toward outliers while the median stays central. In skewed distributions (like household income), these differ substantially. Always examine both—if they diverge significantly, your data likely contains extreme values that deserve investigation.
- Ignoring sample versus population distinction — Sample statistics use <em>n</em> − 1 in variance calculations (Bessel's correction) because samples tend to underestimate population variability. Applying population formulas to samples systematically underestimates true dispersion.
- Overlooking the units in standard deviation — Standard deviation shares the original variable's units. A standard deviation of £2,000 in salary data is not directly comparable to £20 in daily expenses. Always contextualise dispersion within the measurement scale.
- Trusting summaries without checking distribution shape — Two datasets can share identical mean and standard deviation yet have completely different shapes. One might be symmetric while the other is bimodal. Plot your data visually before relying solely on numerical summaries.