(Intercept) revenue
21.8268848 0.1435997
Part I — Introduction & OLS Estimation
Lisbon Accounting and Business School — Polytechnic University of Lisbon
Lecture 8 — Introduction & OLS Estimation
lm()Note
Reference: Newbold, Carlson & Thorne — Statistics for Business and Economics, Chapters 11–12.
In many situations, we observe that variables tend to move together:
Accounting & Finance
Economics & Management
Regression analysis lets us:
Deterministic: \[C = 2\pi r\]
Every value of \(r\) gives exactly one \(C\).
No randomness. No need for statistics.
Statistical: \[\text{Profit}_i = \beta_0 + \beta_1\,\text{Revenue}_i + \varepsilon_i\]
Same revenue \(\rightarrow\) different profit across firms.
The error \(\varepsilon_i\) captures everything else.
Important
Regression models the conditional mean of \(Y\) given \(X\):
\[E[Y \mid X = x] = \beta_0 + \beta_1\,x\]
We do not claim \(X\) determines \(Y\) exactly — only on average.
| Role | Name | Also called | Symbol |
|---|---|---|---|
| What we explain | Dependent variable | Response, regressand | \(Y\) |
| What explains it | Independent variable | Regressor, predictor | \(X\) |
Running example (hypothetical data throughout):
A firm’s operating profit (\(Y\), m.u.) as a function of sales revenue (\(X\), m.u.).
Note
All numerical examples use hypothetical data constructed for illustration purposes only.
Before fitting any model, always plot the data. The scatter diagram displays the \(n\) observed pairs \((x_i, y_i)\).
From the scatter diagram we assess:
Important
Always plot before you fit.
A high \(R^2\) on a non-linear or structurally misspecified relationship can be meaningless. Classic illustration: Anscombe’s Quartet.
\(\hat{\beta}_0 \approx 3.0\), \(\hat{\beta}_1 \approx 0.50\), \(R^2 \approx 0.67\) for all four — but only Dataset 1 works.
The Simple Linear Regression (SLR) model:
\[\boxed{Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i, \qquad i = 1, 2, \ldots, n}\]
| Symbol | Name | Meaning |
|---|---|---|
| \(Y_i\) | Dependent variable | Observed outcome for unit \(i\) |
| \(X_i\) | Independent variable | Known regressor for unit \(i\) |
| \(\beta_0\) | Intercept | Mean of \(Y\) when \(X = 0\) |
| \(\beta_1\) | Slope | Change in mean \(Y\) per unit increase in \(X\) |
| \(\varepsilon_i\) | Error term | All other factors affecting \(Y_i\) |
Note
\(\beta_0\) and \(\beta_1\) are unknown population parameters. Our goal is to estimate them from data as \(\hat{\beta}_0\) and \(\hat{\beta}_1\).
\[Y_i = \underbrace{\beta_0 + \beta_1 X_i}_{\text{systematic part}} + \underbrace{\varepsilon_i}_{\text{random part}}\]
The error captures:
Important
The error is not a mistake in the model — it is an unavoidable feature of any statistical relationship. What matters is that we make appropriate assumptions about its behaviour.
For OLS to have good statistical properties, we require:
| # | Name | Statement |
|---|---|---|
| A1 | Linearity | \(Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i\) |
| A2 | Zero mean | \(E[\varepsilon_i] = 0 \;\forall\, i\) |
| A3 | Homoscedasticity | \(\text{Var}(\varepsilon_i) = \sigma^2 \;\forall\, i\) |
| A4 | No autocorrelation | \(\text{Cov}(\varepsilon_i, \varepsilon_j) = 0 \;\forall\, i \neq j\) |
| A5 | Exogeneity | \(X_i\) is fixed or independent of \(\varepsilon_i\) |
| A6 | Normality | \(\varepsilon_i \sim N(0,\, \sigma^2)\) |
Note
A1–A5 are the Gauss–Markov conditions. Under them, OLS is BLUE (Best Linear Unbiased Estimator). A6 is additionally required for exact \(t\)- and \(F\)-tests in small samples.
Under A1–A6, for each \(i\):
\[Y_i \mid X_i \;\sim\; N\!\left(\beta_0 + \beta_1 X_i,\; \sigma^2\right)\]
Given data \((x_1,y_1),\ldots,(x_n,y_n)\), the fitted line is:
\[\hat{Y}_i = \hat{\beta}_0 + \hat{\beta}_1 X_i\]
The residual for observation \(i\):
\[e_i = Y_i - \hat{Y}_i\]
Important
Ordinary Least Squares (OLS): choose \(\hat{\beta}_0\) and \(\hat{\beta}_1\) to minimise the Sum of Squared Residuals:
\[\min_{\hat{\beta}_0,\,\hat{\beta}_1} \;\text{SSR} = \sum_{i=1}^{n} e_i^2 = \sum_{i=1}^{n}(Y_i - \hat{\beta}_0 - \hat{\beta}_1 X_i)^2\]
Why square the residuals? To penalise large errors symmetrically and to obtain a unique closed-form solution.
Taking partial derivatives of SSR and setting them to zero yields:
\[\boxed{\hat{\beta}_1 = \frac{\displaystyle\sum_{i=1}^{n}(X_i - \bar{X})(Y_i - \bar{Y})}{\displaystyle\sum_{i=1}^{n}(X_i - \bar{X})^2} = \frac{S_{XY}}{S_{XX}}}\]
\[\boxed{\hat{\beta}_0 = \bar{Y} - \hat{\beta}_1\,\bar{X}}\]
where \(S_{XY} = \sum(X_i-\bar{X})(Y_i-\bar{Y})\) and \(S_{XX} = \sum(X_i-\bar{X})^2\).
Note
Since \(\hat{\beta}_0 = \bar{Y} - \hat{\beta}_1\bar{X}\), the fitted line always passes through \((\bar{X},\,\bar{Y})\).
Minimise \(\text{SSR} = \sum(Y_i - \beta_0 - \beta_1 X_i)^2\):
\[\frac{\partial\,\text{SSR}}{\partial \beta_0} = -2\sum(Y_i - \beta_0 - \beta_1 X_i) = 0\]
\[\frac{\partial\,\text{SSR}}{\partial \beta_1} = -2\sum X_i(Y_i - \beta_0 - \beta_1 X_i) = 0\]
These are the Normal Equations:
\[\sum Y_i = n\hat{\beta}_0 + \hat{\beta}_1 \sum X_i\]
\[\sum X_i Y_i = \hat{\beta}_0 \sum X_i + \hat{\beta}_1 \sum X_i^2\]
Solving simultaneously gives the formulas on the previous slide.
The estimated regression equation:
\[\hat{Y}_i = \hat{\beta}_0 + \hat{\beta}_1 X_i\]
Slope \(\hat{\beta}_1\):
For every one-unit increase in \(X\), the estimated mean of \(Y\) changes by \(\hat{\beta}_1\) units.
This is a marginal effect — the effect of \(X\) on the average \(Y\).
Intercept \(\hat{\beta}_0\):
The estimated mean of \(Y\) when \(X = 0\).
Often not directly meaningful — \(X = 0\) may be outside the range of the data.
Important
\(\hat{\beta}_1\) and \(\hat{\beta}_0\) describe averages in the sample. They are estimates of the unknown population parameters \(\beta_1\) and \(\beta_0\).
Hypothetical data: 20 observations on sales revenue (\(X\)) and operating profit (\(Y\)), both in m.u.
(Intercept) revenue
21.8268848 0.1435997
Interpretation:
Call:
lm(formula = profit ~ revenue)
Residuals:
Min 1Q Median 3Q Max
-29.169 -10.033 -3.226 11.177 27.235
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 21.82688 9.99146 2.185 0.042392 *
revenue 0.14360 0.03113 4.612 0.000216 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 15.32 on 18 degrees of freedom
Multiple R-squared: 0.5417, Adjusted R-squared: 0.5162
F-statistic: 21.27 on 1 and 18 DF, p-value: 0.0002164
These algebraic identities hold for every OLS fit:
\[\sum_{i=1}^{n} e_i = 0\]
\[\sum_{i=1}^{n} X_i\, e_i = 0\]
\[\sum_{i=1}^{n} \hat{Y}_i\, e_i = 0\]
They are not assumptions — they are guaranteed by the OLS first-order conditions. The residuals automatically sum to zero and are uncorrelated with \(X\) and with \(\hat{Y}\).
The population error variance \(\sigma^2\) is unknown. We estimate it with:
\[\hat{\sigma}^2 = s^2 = \frac{\text{SSR}}{n - 2} = \frac{\displaystyle\sum_{i=1}^{n} e_i^2}{n - 2}\]
We divide by \(n - 2\) because estimating \(\hat{\beta}_0\) and \(\hat{\beta}_1\) uses up two degrees of freedom.
The standard error of the regression (residual standard error in R):
\[s = \sqrt{\frac{\displaystyle\sum e_i^2}{n-2}}\]
In R: summary(model)$sigma
lm(Y ~ X) fits the model; summary() shows coefficients, standard errors, and \(p\)-values.Lecture 9 — Part II: Goodness of Fit, OLS Properties & Introduction to MLR
Note
Reference: Newbold Ch. 11.4–11.5, Ch. 12.1–12.3.
Statistics II — Linear Regression: Introduction & OLS