[1] 0.5416654
[1] 0.7359792
[1] 0.5416654
Part II — Goodness of Fit, OLS Properties & Multiple Regression
Lisbon Accounting and Business School — Polytechnic University of Lisbon
Lecture 9 — Goodness of Fit, OLS Properties & Multiple Regression
Note
Reference: Newbold Ch. 11.4–11.5 (SLR fit), Ch. 12.1–12.3 (MLR).
For each observation \(i\), we can write:
\[Y_i - \bar{Y} = \underbrace{(\hat{Y}_i - \bar{Y})}_{\text{explained by regression}} + \underbrace{(Y_i - \hat{Y}_i)}_{\text{residual}}\]
Squaring and summing over all observations:
\[\underbrace{\sum(Y_i - \bar{Y})^2}_{\text{SST}} = \underbrace{\sum(\hat{Y}_i - \bar{Y})^2}_{\text{SSE}} + \underbrace{\sum(Y_i - \hat{Y}_i)^2}_{\text{SSR}}\]
| Term | Name | Meaning |
|---|---|---|
| SST | Total Sum of Squares | Total variation in \(Y\) |
| SSE | Explained Sum of Squares | Variation explained by \(X\) |
| SSR | Residual Sum of Squares | Variation not explained by \(X\) |
The coefficient of determination measures the proportion of total variation in \(Y\) explained by the regression:
\[\boxed{R^2 = \frac{\text{SSE}}{\text{SST}} = 1 - \frac{\text{SSR}}{\text{SST}}}\]
Key properties:
Interpretation:
“The model explains \(R^2 \times 100\%\) of the total variation in \(Y\).”
An \(R^2\) of 0.82 means 82% of the variation in \(Y\) is explained by \(X\) via the estimated model.
Important
Use \(R^2\) as one indicator of fit — always combine it with the scatter diagram, residual plots, and hypothesis tests.
In SLR, the sample Pearson correlation coefficient is:
\[r = \frac{\displaystyle\sum(X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\displaystyle\sum(X_i-\bar{X})^2 \cdot \sum(Y_i-\bar{Y})^2}} = \frac{S_{XY}}{\sqrt{S_{XX} \cdot S_{YY}}}\]
Properties:
Link to \(R^2\):
In SLR (one regressor): \[R^2 = r^2\]
The coefficient of determination equals the square of the correlation coefficient.
Note
Correlation \(\neq\) causation. A high \(|r|\) tells us the variables move together linearly — nothing more.
Continuing with our running example:
[1] 0.5416654
[1] 0.7359792
[1] 0.5416654
Interpretation: the model explains 54.2% of the total variation in operating profit. The correlation between revenue and profit is \(r = 0.736\) — a strong positive linear association.
Important
Gauss–Markov Theorem:
Under assumptions A1–A5, the OLS estimators \(\hat{\beta}_0\) and \(\hat{\beta}_1\) are BLUE:
Unbiasedness means: if we repeated our sample many times, the average of the \(\hat{\beta}_j\) values would equal the true \(\beta_j\).
Minimum variance means: no other linear unbiased estimator can be more precise than OLS.
We can show that, under A1–A5:
\[E[\hat{\beta}_1] = \beta_1, \qquad E[\hat{\beta}_0] = \beta_0\]
The intuition: OLS residuals are constructed to be uncorrelated with \(X\) (by the first-order conditions), so the error term \(\varepsilon_i\) does not “leak” into the estimates on average.
Variance of the slope estimator:
\[\text{Var}(\hat{\beta}_1) = \frac{\sigma^2}{S_{XX}} = \frac{\sigma^2}{\displaystyle\sum(X_i-\bar{X})^2}\]
Note
The larger \(S_{XX}\) (more spread in \(X\)), the smaller the variance of \(\hat{\beta}_1\) — more variation in \(X\) helps us pin down the slope more precisely.
Suppose the true model is:
\[Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \varepsilon_i\]
But we estimate only:
\[Y_i = \alpha_0 + \alpha_1 X_{1i} + u_i \qquad \text{(misspecified model)}\]
What is \(E[\hat{\alpha}_1]\)? One can show:
\[E[\hat{\alpha}_1] = \beta_1 + \beta_2 \cdot \frac{S_{X_1 X_2}}{S_{X_1 X_1}}\]
Important
Omitted Variable Bias (OVB):
\[\text{Bias} = E[\hat{\alpha}_1] - \beta_1 = \beta_2 \cdot \frac{S_{X_1 X_2}}{S_{X_1 X_1}}\]
OLS is biased and inconsistent whenever \(\beta_2 \neq 0\) and \(X_1\) and \(X_2\) are correlated.
The bias is non-zero only when both conditions hold:
Condition 1: \(\beta_2 \neq 0\)
The omitted variable \(X_2\) actually affects \(Y\).
(If it doesn’t matter, leaving it out is harmless.)
Condition 2: \(S_{X_1 X_2} \neq 0\)
The omitted variable \(X_2\) is correlated with the included \(X_1\).
(If they are uncorrelated, OLS for \(\beta_1\) is still unbiased.)
Direction of bias: \(\text{sign}(\text{Bias}) = \text{sign}(\beta_2) \times \text{sign}(r_{X_1 X_2})\)
| \(\beta_2\) | \(\text{Corr}(X_1, X_2)\) | Bias direction |
|---|---|---|
| \(+\) | \(+\) | Upward (overestimate \(\beta_1\)) |
| \(+\) | \(-\) | Downward |
| \(-\) | \(+\) | Downward |
| \(-\) | \(-\) | Upward |
Hypothetical scenario: We want to estimate the effect of advertising spend \(X_1\) on sales \(Y\).
The true model includes firm size \(X_2\) (larger firms spend more and sell more).
adspend
5.099163
adspend
2.313567
The short regression overestimates the true effect of advertising (\(\beta_1 = 2\)) because it absorbs the positive effect of firm size. This is why we need MLR.
The Multiple Linear Regression (MLR) model with \(k\) regressors:
\[\boxed{Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \cdots + \beta_k X_{ki} + \varepsilon_i}\]
| Symbol | Meaning |
|---|---|
| \(\beta_0\) | Intercept — mean of \(Y\) when all \(X_j = 0\) |
| \(\beta_j\) | Partial slope — change in mean \(Y\) per unit increase in \(X_j\), holding all other regressors constant (ceteris paribus) |
| \(\varepsilon_i\) | Error term — same assumptions as SLR |
Note
The key word in MLR is ceteris paribus (“all else equal”). \(\hat{\beta}_j\) measures the effect of \(X_j\) after controlling for all other included variables.
OLS still minimises \(\text{SSR} = \sum e_i^2\), but now in \(k+1\) dimensions.
The normal equations become a system of \(k+1\) linear equations — solved efficiently in matrix form:
\[\hat{\boldsymbol{\beta}} = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{y}\]
Note
You do not need to solve this by hand. In R, lm(Y ~ X1 + X2 + ... + Xk) handles everything. The interpretation principles are the same as SLR — only the ceteris paribus qualifier changes.
The Gauss–Markov theorem extends to MLR: under A1–A5 (appropriately generalised), OLS is still BLUE.
Hypothetical example: Explain operating profit (\(Y\)) using both sales revenue (\(X_1\)) and number of employees (\(X_2\)).
(Intercept) revenue2 employees
21.4612459 0.1290236 0.6126250
Interpretation:
Call:
lm(formula = profit2 ~ revenue2 + employees)
Residuals:
Min 1Q Median 3Q Max
-21.8947 -9.0708 -0.1219 6.0029 25.3186
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 21.46125 9.51476 2.256 0.0324 *
revenue2 0.12902 0.02048 6.299 9.66e-07 ***
employees 0.61262 0.11789 5.196 1.80e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 12.43 on 27 degrees of freedom
Multiple R-squared: 0.6543, Adjusted R-squared: 0.6287
F-statistic: 25.55 on 2 and 27 DF, p-value: 5.923e-07
In MLR, adding any variable — even a completely irrelevant one — will weakly increase \(R^2\):
\[R^2 = 1 - \frac{\text{SSR}}{\text{SST}} \quad \Rightarrow \quad R^2 \text{ can only increase as we add regressors}\]
This makes \(R^2\) unreliable for comparing models with different numbers of regressors.
Note
Quick demonstration:
The adjusted \(R^2\) penalises for the number of regressors \(k\):
\[\boxed{\bar{R}^2 = 1 - \frac{\text{SSR}/(n-k-1)}{\text{SST}/(n-1)} = 1 - (1-R^2)\frac{n-1}{n-k-1}}\]
Properties:
Use \(\bar{R}^2\) when:
Adj_R2_without Adj_R2_with
0.6286853 0.6145915
Adding the irrelevant noise variable increases plain \(R^2\) but decreases \(\bar{R}^2\) — correctly signalling that the additional variable does not improve the model.
Lecture 10 — Part III: Statistical Inference & Model Validation
Note
Reference: Newbold Ch. 11.6–11.8 (inference), Ch. 12.4–12.6 (MLR inference and diagnostics).
Statistics II — Linear Regression: Goodness of Fit & MLR