What Is a Five-Number Summary?
A five-number summary is a foundational descriptive statistic that partitions a dataset into quarters, showing you exactly how values spread across the entire range. Instead of collapsing everything into a mean (which can mislead), these five values—minimum, Q1, median, Q3, and maximum—paint an honest picture of distribution.
Consider a hiring manager who tells candidates their company's average salary is £40,000 annually. That sounds reasonable until you discover one employee earns £15,000 while the CEO takes £200,000. The mean obscures what's really happening. A five-number summary would reveal this immediately: minimum £15,000, Q1 £28,000, median £38,000, Q3 £52,000, maximum £200,000. Suddenly the compensation structure becomes transparent.
This approach works for any quantitative dataset: exam scores, response times, product measurements, or survey ratings. It's the first calculation analysts perform when exploring unfamiliar data.
Calculating the Five-Number Summary
To find each component, arrange your data in ascending order, then identify the extremes and quartiles:
Minimum = smallest value in dataset
Maximum = largest value in dataset
Median (Q2) = middle value (or average of two middle values if n is even)
Q1 = median of lower half of data
Q3 = median of upper half of data
n— total number of data pointsQ1— first quartile; 25th percentile separating the lowest quarter of valuesQ2 (Median)— second quartile; 50th percentile dividing the dataset in halfQ3— third quartile; 75th percentile separating the highest quarter of values
Understanding the Box-and-Whisker Plot
The five-number summary translates perfectly into a box-and-whisker plot, a visual that makes distribution instantly recognisable. The plot consists of:
- Whiskers (lines): extend from minimum to Q1 and from Q3 to maximum, showing the outer 50% of data
- Box: spans from Q1 to Q3, containing the middle 50% (interquartile range, or IQR)
- Line inside the box: marks the median, often visually distinct
A symmetric box indicates balanced data. Whiskers or a box skewed left or right signal asymmetry. Isolated points beyond the whiskers may represent outliers worth investigating separately. Compare multiple box plots side by side to see how different groups or conditions affect distribution without getting lost in raw numbers.
Step-by-Step Calculation Process
Working through a five-number summary by hand reinforces what the calculator automates:
- Sort your data in ascending order: this is non-negotiable for accurate results.
- Identify the minimum and maximum: the first and last values after sorting.
- Find the median: if you have an odd number of values, pick the middle one; if even, average the two middle values.
- Split the dataset: divide your data at the median into lower and upper halves (exclude the median itself if n is odd).
- Calculate Q1 and Q3: find the median of each half. Q1 marks where 25% of data falls below; Q3 marks where 75% falls below.
For large datasets, rounding and interpolation between values may occur, but the principle remains: these five numbers tell you where your data clusters, spreads, and clusters again.
Common Pitfalls and Practical Advice
Avoid these mistakes when interpreting or calculating five-number summaries.
- Including the median in quartile calculations — When splitting data into halves for Q1 and Q3, exclude the median value itself (if n is odd). Including it biases your quartiles and breaks the symmetry of the summary. Always treat the halves as separate datasets.
- Confusing the summary with outlier detection — The five-number summary shows you the range and quartiles, but doesn't automatically label outliers. Use the interquartile range (IQR = Q3 − Q1) and multiply by 1.5 to identify extreme values: anything below Q1 − 1.5×IQR or above Q3 + 1.5×IQR warrants closer inspection.
- Assuming symmetry means normal distribution — A symmetric five-number summary (equal gaps between quartiles) suggests balanced spread, but doesn't guarantee a bell curve. Skewness, kurtosis, and other properties vary. Always visualise the data and check distributional assumptions if your analysis depends on normality.
- Overlooking the importance of sorting — Skipping the sort step introduces catastrophic errors. Minimum and maximum become meaningless, and quartile positions misalign. Even for 30 data points, always arrange them first—no shortcuts.