Understanding the Median
The median is the middle value in an ordered dataset. It divides a distribution into two equal halves: one containing values below the median, one containing values above. For this reason, statisticians often call it the 50th percentile or second quartile.
The median differs fundamentally from the mean (average) because it ignores the magnitude of extreme values. If you have a salary dataset of £30,000, £35,000, £40,000, £45,000, and £500,000, the mean is £130,000—distorted by the outlier. The median is £40,000, reflecting what a typical earner actually makes. This robustness makes the median essential in fields like real estate, healthcare, and economics.
The median also differs from the mode (the most frequently occurring value). For a perfectly symmetric distribution like the normal distribution, all three measures—mean, median, and mode—align. But skewed datasets reveal their differences, making median selection crucial for accurate interpretation.
Median Calculation Formula
Finding the median involves two steps: sort your data and locate the middle value(s). The process depends on whether your dataset has an odd or even number of entries.
For odd n: Median = value at position (n + 1) ÷ 2
For even n: Median = (value at position n ÷ 2 + value at position (n ÷ 2) + 1) ÷ 2
n— Total number of observations in the dataset
Step-by-Step Calculation Example
Consider the dataset: 58, 47, 55, 6, 5, 14, 60, 3, 39, 6, 28, 15, 87, 31, 19
Step 1: Sort in ascending order
3, 5, 6, 6, 14, 15, 19, 28, 31, 39, 47, 55, 58, 60, 87
Step 2: Count the values
There are 15 values (odd number), so the median is the middle value.
Step 3: Find the position
Position = (15 + 1) ÷ 2 = 8
Step 4: Identify the median
The 8th value is 28, so the median is 28.
For an even-length dataset such as 5, 13, 18, 23, 53, 65, 71, 71, 74, 74, 75, 82, 87, 92, 97, 98 (16 values), the two middle values are at positions 8 and 9: 71 and 74. The median is (71 + 74) ÷ 2 = 72.5.
Practical Insights for Using Median Effectively
Here are key considerations to avoid common pitfalls when working with the median.
- Don't forget to sort first — The most frequent error is calculating the median without sorting. An unsorted dataset will give you the wrong answer. Always arrange values in ascending or descending order before identifying the middle point.
- Watch for tied middle values in even-length data — When your dataset has an even count and both middle values are identical (e.g., {1, 1, 1, 18}), the average remains the same. However, with different middle values, ensure you calculate the average correctly to avoid rounding mistakes.
- Use median for non-normal distributions — Real-world data often contains outliers. Census income data, housing prices, and medical test results frequently exhibit skewness. In these cases, the median outperforms the mean, providing a more representative 'typical' value that stakeholders can trust.
- Distinguish median from mean for reporting — When presenting findings to non-technical audiences, clearly state whether you're using the median or mean. Median is often more intuitive ('half of people earn above this amount') whereas mean can mislead if outliers exist.
Median versus Mean and Mode
Understanding when to use each measure of central tendency is crucial for statistical accuracy. The mean works best for symmetric, normally distributed data without extreme outliers. It uses all data points, so it captures the full picture when the distribution is well-behaved.
The median excels with skewed distributions and datasets containing outliers. It is insensitive to how far an extreme value lies from the centre, making it stable and interpretable. The mode identifies the most frequently occurring value and is most useful for categorical data or distributions with obvious peaks.
In symmetric distributions like test scores across a large population, the mean, median, and mode often converge to the same value. In asymmetric distributions—such as personal wealth, where a few billionaires create a right skew—the median provides a more accurate picture of where the 'typical' person sits compared to a mean inflated by extremes.