What Is Covariance?

Covariance is a statistical measure that describes how two variables move in relation to each other. When one variable tends to be high while the other is also high (or both low), they exhibit positive covariance. Conversely, negative covariance occurs when high values in one variable align with low values in the other.

Unlike correlation, which is bounded between −1 and +1, covariance has no fixed range and depends entirely on the units of measurement. A covariance of zero indicates no linear relationship. The magnitude of covariance alone cannot tell you strength of association—you must compare it against the standard deviations of both variables to understand practical significance.

Covariance Formula

For a sample of n paired observations from two variables, covariance estimates the population parameter. The sample covariance divides by n − 1 (Bessel's correction) to avoid underestimating population covariance. The population formula divides by n directly when you have the entire population.

Cov(x, y) = Σ(xᵢ − x̄)(yᵢ − ȳ) / (n − 1)

Cov_pop(x, y) = Σ(xᵢ − x̄)(yᵢ − ȳ) / n

  • xᵢ, yᵢ — Individual observations from each variable
  • x̄, ȳ — Mean (average) of each variable
  • n — Number of paired observations
  • Σ — Sum across all observation pairs

Interpreting Covariance Results

Positive covariance suggests that when one variable exceeds its mean, the other tends to as well. For example, advertising spend and product sales often show positive covariance. Negative covariance indicates inverse movement—as one variable climbs above average, the other typically falls below it. Bond prices and interest rates exhibit this pattern.

Because covariance is scale-dependent, comparing raw values between different datasets is meaningless. A covariance of 50 in one context might indicate weak association, while 50 in another could signal strong dependence. To standardise comparisons, convert covariance to correlation by dividing by the product of standard deviations: r = Cov(x,y) / (σ_x × σ_y).

Common Pitfalls When Using Covariance

Avoid these frequent mistakes when calculating and interpreting covariance.

  1. Confusing covariance magnitude with strength — Covariance values lack inherent bounds. A covariance of 100 could represent weak, moderate, or strong association depending on variable scales. Always convert to correlation (−1 to +1) for meaningful strength assessment.
  2. Forgetting the sample vs. population distinction — Use Bessel's correction (divide by n − 1) when your data represents a sample from a larger population. Use the population formula (divide by n) only when you have complete population data. Mixing these inflates or deflates estimates.
  3. Assuming causation from covariance — Covariance detects linear association only. It reveals nothing about causal mechanisms. Two variables may move together due to a third hidden factor, random chance, or reverse causation.
  4. Ignoring outlier effects — Covariance calculations are sensitive to extreme values. A single outlier pair can dramatically shift the covariance. Always inspect scatter plots and consider robust alternatives like trimmed covariance if outliers are present.

Sample vs. Population Covariance in Practice

When you observe historical stock prices, survey responses, or measurement readings, you have a sample—a subset of all possible observations. The sample covariance uses n − 1 in the denominator, providing an unbiased estimate of what the true population parameter would be. This correction is essential for making reliable inferences.

Population covariance applies only in rare cases where you genuinely possess all data points (a finite, complete dataset). Manufacturing quality might use population covariance if every unit produced is inspected. In research and finance, sample covariance dominates because you can never observe all possible future market movements or survey every individual in a population. Using sample covariance is the standard practice for hypothesis testing and confidence interval construction.

Frequently Asked Questions

How does covariance differ from correlation?

Covariance and correlation both measure directional association, but correlation standardises the relationship to a −1 to +1 scale, making comparisons meaningful across different datasets. Covariance lacks this bound, so its magnitude depends on the units and variability of your variables. If you scale your data (e.g., convert inches to centimetres), covariance changes while correlation remains identical. For this reason, correlation is preferred when you need to compare strength of relationships across studies or datasets.

Why divide by n − 1 instead of n when calculating sample covariance?

Dividing by n − 1 applies Bessel's correction, which accounts for the fact that sample means are used in place of unknown population means. This adjustment removes bias from the estimate. When you calculate covariance from a sample, the deviations from the sample mean underestimate the true population dispersion slightly. The (n − 1) correction compensates for this downward bias, yielding an unbiased estimator. If you divided by n, your estimates would systematically underestimate population covariance.

Can covariance be negative, and what does that mean?

Yes, covariance ranges from negative infinity to positive infinity. Negative covariance indicates that as one variable increases, the other tends to decrease. For example, temperature and heating costs show negative covariance: higher outdoor temperatures correlate with lower heating expenses. A negative value simply signals inverse movement. The magnitude (e.g., −50 vs. −5000) is hard to interpret without knowing variable scales, so convert to correlation to assess the strength of that inverse relationship.

What if I have different numbers of observations in my two samples?

Covariance requires paired observations—each data point in the first variable must correspond to exactly one in the second. If your samples have different lengths, you cannot calculate covariance directly. You must either remove unpaired observations to match lengths or use imputation techniques. Misaligned pairing introduces serious errors, so verify data alignment before computation.

Is covariance useful for deciding whether to include a variable in a regression model?

Covariance alone is insufficient for variable selection. High covariance between an independent variable and outcome suggests potential usefulness, but it does not control for confounding. A third variable might drive both, creating spurious covariance. In regression, examine partial correlations, p-values, and domain knowledge. Additionally, high covariance between predictor variables (multicollinearity) can destabilise regression coefficients even if those predictors matter individually.

How do I use covariance to build a diversified investment portfolio?

Covariance between asset returns reveals whether they move together or oppose each other. Negative covariance between stocks means one asset's gains offset another's losses, reducing overall portfolio volatility. Investors compute a covariance matrix across all candidate assets and use it in mean-variance optimisation to find the portfolio mix that maximises expected return for a given risk level. Most portfolio software automates this, but understanding covariance helps you recognise why some asset combinations reduce risk better than others.

More statistics calculators (see all)