What Is Covariance?
Covariance is a statistical measure that describes how two variables move in relation to each other. When one variable tends to be high while the other is also high (or both low), they exhibit positive covariance. Conversely, negative covariance occurs when high values in one variable align with low values in the other.
Unlike correlation, which is bounded between −1 and +1, covariance has no fixed range and depends entirely on the units of measurement. A covariance of zero indicates no linear relationship. The magnitude of covariance alone cannot tell you strength of association—you must compare it against the standard deviations of both variables to understand practical significance.
Covariance Formula
For a sample of n paired observations from two variables, covariance estimates the population parameter. The sample covariance divides by n − 1 (Bessel's correction) to avoid underestimating population covariance. The population formula divides by n directly when you have the entire population.
Cov(x, y) = Σ(xᵢ − x̄)(yᵢ − ȳ) / (n − 1)
Cov_pop(x, y) = Σ(xᵢ − x̄)(yᵢ − ȳ) / n
xᵢ, yᵢ— Individual observations from each variablex̄, ȳ— Mean (average) of each variablen— Number of paired observationsΣ— Sum across all observation pairs
Interpreting Covariance Results
Positive covariance suggests that when one variable exceeds its mean, the other tends to as well. For example, advertising spend and product sales often show positive covariance. Negative covariance indicates inverse movement—as one variable climbs above average, the other typically falls below it. Bond prices and interest rates exhibit this pattern.
Because covariance is scale-dependent, comparing raw values between different datasets is meaningless. A covariance of 50 in one context might indicate weak association, while 50 in another could signal strong dependence. To standardise comparisons, convert covariance to correlation by dividing by the product of standard deviations: r = Cov(x,y) / (σ_x × σ_y).
Common Pitfalls When Using Covariance
Avoid these frequent mistakes when calculating and interpreting covariance.
- Confusing covariance magnitude with strength — Covariance values lack inherent bounds. A covariance of 100 could represent weak, moderate, or strong association depending on variable scales. Always convert to correlation (−1 to +1) for meaningful strength assessment.
- Forgetting the sample vs. population distinction — Use Bessel's correction (divide by n − 1) when your data represents a sample from a larger population. Use the population formula (divide by n) only when you have complete population data. Mixing these inflates or deflates estimates.
- Assuming causation from covariance — Covariance detects linear association only. It reveals nothing about causal mechanisms. Two variables may move together due to a third hidden factor, random chance, or reverse causation.
- Ignoring outlier effects — Covariance calculations are sensitive to extreme values. A single outlier pair can dramatically shift the covariance. Always inspect scatter plots and consider robust alternatives like trimmed covariance if outliers are present.
Sample vs. Population Covariance in Practice
When you observe historical stock prices, survey responses, or measurement readings, you have a sample—a subset of all possible observations. The sample covariance uses n − 1 in the denominator, providing an unbiased estimate of what the true population parameter would be. This correction is essential for making reliable inferences.
Population covariance applies only in rare cases where you genuinely possess all data points (a finite, complete dataset). Manufacturing quality might use population covariance if every unit produced is inspected. In research and finance, sample covariance dominates because you can never observe all possible future market movements or survey every individual in a population. Using sample covariance is the standard practice for hypothesis testing and confidence interval construction.