Understanding Pearson's Correlation Coefficient
Pearson's correlation coefficient, denoted r, measures whether two continuous variables exhibit a linear relationship. When you increase one variable by a fixed amount, a perfectly linear pairing means the other changes by a consistent amount—whether incrementing from 1 to 2 or from 100 to 101. Classical examples include the link between study hours and exam scores, or ambient temperature and ice cream sales.
- Positive correlation: Both variables climb or fall together.
- Negative correlation: One rises while the other descends.
- No correlation: Variables move independently.
The coefficient ranges from −1 to +1. Magnitudes closer to the extremes signal stronger linear relationships, while values near zero indicate weak or absent linear patterns. If r = 1 or −1, every observation sits precisely on the fitted regression line; at r = 0, no linear trend exists.
Pearson Correlation Formula
Pearson's r is formally the covariance between two variables divided by the product of their standard deviations. This captures both how variables co-vary and their respective spreads:
r = [Σ(xᵢ − x̄)(yᵢ − ȳ)] / √[Σ(xᵢ − x̄)²] × √[Σ(yᵢ − ȳ)²]
xᵢ, yᵢ— Individual paired data pointsx̄, ȳ— Mean (average) of x and y values respectivelyΣ— Sum across all n observations
Interpreting Your Result
The sign and magnitude of r work together to reveal the relationship's character:
- r between 0.8 and 1.0: Very strong positive linear relationship.
- r between 0.6 and 0.8: Strong positive linear relationship.
- r between 0.4 and 0.6: Moderate positive linear relationship.
- r between 0.2 and 0.4: Weak positive linear relationship.
- r between 0.0 and 0.2: Very weak or negligible linear relationship.
- Negative values: Apply the same thresholds to |r| but denote inverse movement.
These benchmarks follow Evans' convention (1996), though field-specific standards may vary. Always consider your domain context; a correlation of 0.5 might be exceptional in psychology yet routine in engineering.
Pearson Correlation and Linear Regression
Pearson's r connects directly to the coefficient of determination, denoted R², in simple linear regression. Squaring r yields R², representing the fraction of variance in one variable explained by the other. For example, if r = 0.7, then R² ≈ 0.49, meaning roughly 49% of the target variable's variation is accounted for by the predictor.
The regression slope also incorporates Pearson's coefficient: the slope a equals r multiplied by the ratio of the standard deviations (s_y / s_x). This elegant relationship shows that stronger correlation between two variables with different spreads still produces proportional steepness in the fitted line.
Common Pitfalls and Key Caveats
Misinterpreting correlation is among the most frequent statistical errors; here are critical safeguards.
- Correlation Does Not Imply Causation — A powerful correlation between sunglasses sales and drowning rates does not mean eyewear causes drowning. Typically, a hidden third variable—hot weather—drives both. Always investigate plausible causal mechanisms rather than assuming directionality from correlation alone.
- Outliers Distort Results Significantly — A single extreme data point can shift <em>r</em> substantially, especially in small samples. Plot your data visually before trusting the coefficient. If you suspect outliers, consider reporting both the standard Pearson correlation and a robust alternative like Spearman's rank correlation.
- Non-Linear Relationships Hide Below the Surface — Two variables may have a strong curved or parabolic relationship yet show <em>r</em> near zero. Pearson's coefficient only captures linear patterns. If your scatter plot reveals curvature or clusters, explore polynomial regression or non-parametric methods.
- Minimum Sample Size Matters for Reliability — With fewer than 30 paired observations, confidence in the coefficient weakens. Tiny samples can yield misleading correlations by chance. Larger datasets provide more stable estimates and stronger statistical power for hypothesis testing.