Central Tendency: Mean, Median, and Mode
The three primary measures of central tendency describe where most values in a dataset cluster. Each serves a distinct purpose depending on your data's characteristics.
- Mean is the arithmetic average: the sum of all values divided by how many values exist. It's sensitive to extreme values and works best for symmetric, normally distributed data.
- Median is the middle value when data is sorted in order. Half the values fall below it and half above. The median resists the pull of outliers, making it ideal for skewed distributions.
- Mode is the most frequently occurring value. A dataset can have one mode (unimodal), multiple modes (multimodal or bimodal), or no mode at all if every value appears equally often.
In practice, examining all three together provides a fuller picture. A salary dataset might have a mean pulled upward by high earners, a median closer to typical wages, and a mode reflecting the most common salary bracket.
Calculating Mean, Median, and Range
The mean uses a straightforward summation formula. Once you have your dataset sorted, finding the median requires identifying the position of the middle value(s). Range and midrange are computed from the extreme values in your set.
Mean = (x₁ + x₂ + x₃ + ... + xₙ) ÷ n
Median position = (n + 1) ÷ 2
Range = Max − Min
Midrange = (Max + Min) ÷ 2
x₁, x₂, ..., xₙ— Individual values in your datasetn— Total count of valuesMax— Largest value in the datasetMin— Smallest value in the dataset
Finding the Median in Odd and Even Datasets
The median calculation differs slightly depending on whether you have an odd or even number of observations.
Odd-sized datasets: Sort the numbers from lowest to highest, then select the single middle value. For {3, 5, 7}, the median is 5.
Even-sized datasets: Sort the numbers, then calculate the average of the two centermost values. For {2, 4, 6, 8}, the median is (4 + 6) ÷ 2 = 5.
This approach works because it ensures exactly half your data lies on each side of the median. For large datasets, use the formula: position = (n + 1) ÷ 2 to locate which value(s) to extract.
Understanding Mode and Multimodal Distributions
The mode is straightforward in principle—it's the value appearing most often—yet distributions can present different scenarios. Count the frequency of each value to identify patterns.
- Unimodal: One value dominates. In {2, 3, 3, 5, 8}, the mode is 3 (appears twice).
- Bimodal: Exactly two values share the highest frequency. In {1, 1, 2, 2, 7}, both 1 and 2 are modes.
- Multimodal: Three or more values tie for highest frequency. In {1, 1, 2, 2, 3, 3}, all three are modes.
- No mode: All values appear with equal frequency. In {4, 5, 6, 7}, no mode exists.
Mode is particularly useful for categorical data and identifying the most popular category, making it valuable in market research and quality control.
Common Pitfalls When Using Central Tendency Measures
These practical considerations will help you choose and interpret the right statistic for your analysis.
- Mean and outliers — The mean can be dramatically skewed by a single extreme value. In {10, 12, 13, 100}, the mean jumps to 33.75, yet most observations cluster near 11–13. Always check for outliers using visualization or the median to validate whether the mean represents your typical value.
- Median vs. mean for real-world data — Real-world datasets like income, house prices, or medical costs often contain a long tail of high values. The median better represents the 'typical' person or transaction in such cases. Publishing mean salary in a company with one CEO and 99 workers can mislead about typical wages.
- Mode limitations in continuous data — Mode works well for categorical or discrete data (favourite colour, number of goals scored) but becomes unhelpful for continuous measurements like height or weight, where values rarely repeat. Summarize continuous data using mean and median instead.
- Range as a spread measure — Range only considers the two extreme values and ignores everything in between. A dataset spanning 0 to 100 has range 100, whether values cluster at the ends or spread evenly. Pair range with standard deviation or interquartile range for a fuller picture of variability.