Understanding the Median

The median is the middle value in an ordered dataset. It divides a distribution into two equal halves: one containing values below the median, one containing values above. For this reason, statisticians often call it the 50th percentile or second quartile.

The median differs fundamentally from the mean (average) because it ignores the magnitude of extreme values. If you have a salary dataset of £30,000, £35,000, £40,000, £45,000, and £500,000, the mean is £130,000—distorted by the outlier. The median is £40,000, reflecting what a typical earner actually makes. This robustness makes the median essential in fields like real estate, healthcare, and economics.

The median also differs from the mode (the most frequently occurring value). For a perfectly symmetric distribution like the normal distribution, all three measures—mean, median, and mode—align. But skewed datasets reveal their differences, making median selection crucial for accurate interpretation.

Median Calculation Formula

Finding the median involves two steps: sort your data and locate the middle value(s). The process depends on whether your dataset has an odd or even number of entries.

For odd n: Median = value at position (n + 1) ÷ 2

For even n: Median = (value at position n ÷ 2 + value at position (n ÷ 2) + 1) ÷ 2

  • n — Total number of observations in the dataset

Step-by-Step Calculation Example

Consider the dataset: 58, 47, 55, 6, 5, 14, 60, 3, 39, 6, 28, 15, 87, 31, 19

Step 1: Sort in ascending order

3, 5, 6, 6, 14, 15, 19, 28, 31, 39, 47, 55, 58, 60, 87

Step 2: Count the values

There are 15 values (odd number), so the median is the middle value.

Step 3: Find the position

Position = (15 + 1) ÷ 2 = 8

Step 4: Identify the median

The 8th value is 28, so the median is 28.

For an even-length dataset such as 5, 13, 18, 23, 53, 65, 71, 71, 74, 74, 75, 82, 87, 92, 97, 98 (16 values), the two middle values are at positions 8 and 9: 71 and 74. The median is (71 + 74) ÷ 2 = 72.5.

Practical Insights for Using Median Effectively

Here are key considerations to avoid common pitfalls when working with the median.

  1. Don't forget to sort first — The most frequent error is calculating the median without sorting. An unsorted dataset will give you the wrong answer. Always arrange values in ascending or descending order before identifying the middle point.
  2. Watch for tied middle values in even-length data — When your dataset has an even count and both middle values are identical (e.g., {1, 1, 1, 18}), the average remains the same. However, with different middle values, ensure you calculate the average correctly to avoid rounding mistakes.
  3. Use median for non-normal distributions — Real-world data often contains outliers. Census income data, housing prices, and medical test results frequently exhibit skewness. In these cases, the median outperforms the mean, providing a more representative 'typical' value that stakeholders can trust.
  4. Distinguish median from mean for reporting — When presenting findings to non-technical audiences, clearly state whether you're using the median or mean. Median is often more intuitive ('half of people earn above this amount') whereas mean can mislead if outliers exist.

Median versus Mean and Mode

Understanding when to use each measure of central tendency is crucial for statistical accuracy. The mean works best for symmetric, normally distributed data without extreme outliers. It uses all data points, so it captures the full picture when the distribution is well-behaved.

The median excels with skewed distributions and datasets containing outliers. It is insensitive to how far an extreme value lies from the centre, making it stable and interpretable. The mode identifies the most frequently occurring value and is most useful for categorical data or distributions with obvious peaks.

In symmetric distributions like test scores across a large population, the mean, median, and mode often converge to the same value. In asymmetric distributions—such as personal wealth, where a few billionaires create a right skew—the median provides a more accurate picture of where the 'typical' person sits compared to a mean inflated by extremes.

Frequently Asked Questions

What is the median and why is it different from the mean?

The median is the middle value in a sorted dataset, with 50% of observations below and 50% above it. Unlike the mean (average), the median ignores the magnitude of extreme values. For example, in a salary dataset of £30,000, £35,000, £40,000, £45,000, and £500,000, the mean is £130,000, but the median is £40,000. The median is often more representative of a 'typical' value when outliers distort the average. This makes it superior for analysing skewed real-world datasets in economics, healthcare, and social science.

How do I find the median of a dataset with an even number of values?

When you have an even count of values, identify the two middle values by sorting first, then finding positions n÷2 and (n÷2)+1. Add these two middle values and divide by 2 to get your median. For instance, in the dataset {5, 13, 18, 23, 53, 65, 71, 71, 74, 74, 75, 82, 87, 92, 97, 98}, the 8th and 9th values are 71 and 74. The median is (71 + 74) ÷ 2 = 72.5. Always ensure your data is sorted before performing this calculation to avoid errors.

When should I use the median instead of the mean?

Use the median when your dataset is skewed (asymmetric) or contains clear outliers that don't represent the broader pattern. Income and wealth distributions, property prices, and medical measurements often exhibit such characteristics. The median is also preferable when communicating with non-technical audiences, as it's intuitive: 'half of people earn above this amount.' Reserve the mean for symmetric, normally distributed data without significant outliers, where it leverages all data points for a complete picture.

What does the median symbol mean in statistics?

The median lacks a universal symbol, but common notations include x̃ (x-tilde), μ₁/₂ (mu subscript one-half), and M (uppercase). These symbols appear in academic papers and textbooks to denote the median value. When reading statistical analysis, check the author's notation guide to confirm which symbol represents the median, especially if multiple measures of central tendency are being discussed.

How is the median affected by duplicate values?

Duplicate values do not prevent median calculation; they are treated as separate entries. In the dataset {0, 1, 1, 18}, which has four values, the two middle values are both 1. The median is (1 + 1) ÷ 2 = 1. Duplicates simply fill positions in the sorted list. If many duplicates cluster at the centre, the median remains that repeated value, which accurately reflects the distribution's concentration.

When do the median, mean, and mode all equal the same value?

The median, mean, and mode coincide in perfectly symmetric, unimodal distributions—most commonly the normal (bell curve) distribution. This alignment occurs because symmetry ensures the average matches the middle value, and a single peak concentrates the most frequent observations at the centre. In real-world data with skewness or multiple modes, these three measures diverge, highlighting the importance of selecting the appropriate statistic for your analysis.

More statistics calculators (see all)