The Linear Regression Model

Part I — Introduction & OLS Estimation

Paulo Fagandini

Lisbon Accounting and Business School — Polytechnic University of Lisbon

The Linear Regression Model

Lecture Overview

Lecture 8 — Introduction & OLS Estimation

  • Why regression? Motivation and examples
  • Statistical vs. deterministic relationships
  • Scatter diagrams and linear correlation
  • The Simple Linear Regression (SLR) model
  • The Classical Assumptions (Gauss–Markov)
  • OLS: deriving \(\hat{\beta}_0\) and \(\hat{\beta}_1\)
  • Interpreting the estimated coefficients
  • Running a regression in R with lm()

Note

Reference: Newbold, Carlson & Thorne — Statistics for Business and Economics, Chapters 11–12.

Why Regression?

Motivation

In many situations, we observe that variables tend to move together:

Accounting & Finance

  • Sales revenue → Operating profit
  • Advertising spend → Market share
  • Firm leverage → Cost of debt
  • Total assets → Audit fees

Economics & Management

  • Disposable income → Consumption
  • Interest rates → Investment
  • Education → Earnings
  • Hours studied → Exam grade

Regression analysis lets us:

  1. Describe the average relationship between variables
  2. Quantify the size of that relationship
  3. Predict values of one variable from another

Statistical vs. Deterministic Relationships

Deterministic: \[C = 2\pi r\]

Every value of \(r\) gives exactly one \(C\).

No randomness. No need for statistics.

Statistical: \[\text{Profit}_i = \beta_0 + \beta_1\,\text{Revenue}_i + \varepsilon_i\]

Same revenue \(\rightarrow\) different profit across firms.

The error \(\varepsilon_i\) captures everything else.

Important

Regression models the conditional mean of \(Y\) given \(X\):

\[E[Y \mid X = x] = \beta_0 + \beta_1\,x\]

We do not claim \(X\) determines \(Y\) exactly — only on average.

Types of Variables

Role Name Also called Symbol
What we explain Dependent variable Response, regressand \(Y\)
What explains it Independent variable Regressor, predictor \(X\)
  • \(Y\) is always quantitative.
  • \(X\) can be quantitative or qualitative (dummy variables — introduced in Part II).

Running example (hypothetical data throughout):

A firm’s operating profit (\(Y\), m.u.) as a function of sales revenue (\(X\), m.u.).

Note

All numerical examples use hypothetical data constructed for illustration purposes only.

Exploring the Relationship

The Scatter Diagram

Before fitting any model, always plot the data. The scatter diagram displays the \(n\) observed pairs \((x_i, y_i)\).

Reading the Scatter Diagram

From the scatter diagram we assess:

  • Direction: positive (both increase together) or negative (one increases, the other decreases)?
  • Form: linear, curved, or no discernible pattern?
  • Strength: how tightly do points cluster around the trend?
  • Outliers: any unusual observations that break the pattern?

Important

Always plot before you fit.

A high \(R^2\) on a non-linear or structurally misspecified relationship can be meaningless. Classic illustration: Anscombe’s Quartet.

Anscombe’s Quartet

\(\hat{\beta}_0 \approx 3.0\), \(\hat{\beta}_1 \approx 0.50\), \(R^2 \approx 0.67\) for all four — but only Dataset 1 works.

The SLR Model

Model Specification

The Simple Linear Regression (SLR) model:

\[\boxed{Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i, \qquad i = 1, 2, \ldots, n}\]

Symbol Name Meaning
\(Y_i\) Dependent variable Observed outcome for unit \(i\)
\(X_i\) Independent variable Known regressor for unit \(i\)
\(\beta_0\) Intercept Mean of \(Y\) when \(X = 0\)
\(\beta_1\) Slope Change in mean \(Y\) per unit increase in \(X\)
\(\varepsilon_i\) Error term All other factors affecting \(Y_i\)

Note

\(\beta_0\) and \(\beta_1\) are unknown population parameters. Our goal is to estimate them from data as \(\hat{\beta}_0\) and \(\hat{\beta}_1\).

The Error Term \(\varepsilon_i\)

\[Y_i = \underbrace{\beta_0 + \beta_1 X_i}_{\text{systematic part}} + \underbrace{\varepsilon_i}_{\text{random part}}\]

The error captures:

  • Omitted variables — factors that influence \(Y\) but are not in the model
  • Measurement error — imprecision in recording \(Y_i\) or \(X_i\)
  • Inherent randomness — unpredictable variation in behaviour or outcomes

Important

The error is not a mistake in the model — it is an unavoidable feature of any statistical relationship. What matters is that we make appropriate assumptions about its behaviour.

The Classical Assumptions

Gauss–Markov Assumptions

For OLS to have good statistical properties, we require:

# Name Statement
A1 Linearity \(Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i\)
A2 Zero mean \(E[\varepsilon_i] = 0 \;\forall\, i\)
A3 Homoscedasticity \(\text{Var}(\varepsilon_i) = \sigma^2 \;\forall\, i\)
A4 No autocorrelation \(\text{Cov}(\varepsilon_i, \varepsilon_j) = 0 \;\forall\, i \neq j\)
A5 Exogeneity \(X_i\) is fixed or independent of \(\varepsilon_i\)
A6 Normality \(\varepsilon_i \sim N(0,\, \sigma^2)\)

Note

A1–A5 are the Gauss–Markov conditions. Under them, OLS is BLUE (Best Linear Unbiased Estimator). A6 is additionally required for exact \(t\)- and \(F\)-tests in small samples.

What the Assumptions Imply

Under A1–A6, for each \(i\):

\[Y_i \mid X_i \;\sim\; N\!\left(\beta_0 + \beta_1 X_i,\; \sigma^2\right)\]

OLS Estimation

The Idea: Minimise Squared Residuals

Given data \((x_1,y_1),\ldots,(x_n,y_n)\), the fitted line is:

\[\hat{Y}_i = \hat{\beta}_0 + \hat{\beta}_1 X_i\]

The residual for observation \(i\):

\[e_i = Y_i - \hat{Y}_i\]

Important

Ordinary Least Squares (OLS): choose \(\hat{\beta}_0\) and \(\hat{\beta}_1\) to minimise the Sum of Squared Residuals:

\[\min_{\hat{\beta}_0,\,\hat{\beta}_1} \;\text{SSR} = \sum_{i=1}^{n} e_i^2 = \sum_{i=1}^{n}(Y_i - \hat{\beta}_0 - \hat{\beta}_1 X_i)^2\]

Why square the residuals? To penalise large errors symmetrically and to obtain a unique closed-form solution.

OLS: Visualising the Criterion

OLS Formulas

Taking partial derivatives of SSR and setting them to zero yields:

\[\boxed{\hat{\beta}_1 = \frac{\displaystyle\sum_{i=1}^{n}(X_i - \bar{X})(Y_i - \bar{Y})}{\displaystyle\sum_{i=1}^{n}(X_i - \bar{X})^2} = \frac{S_{XY}}{S_{XX}}}\]

\[\boxed{\hat{\beta}_0 = \bar{Y} - \hat{\beta}_1\,\bar{X}}\]

where \(S_{XY} = \sum(X_i-\bar{X})(Y_i-\bar{Y})\) and \(S_{XX} = \sum(X_i-\bar{X})^2\).

Note

Since \(\hat{\beta}_0 = \bar{Y} - \hat{\beta}_1\bar{X}\), the fitted line always passes through \((\bar{X},\,\bar{Y})\).

Deriving the OLS Formulas

Minimise \(\text{SSR} = \sum(Y_i - \beta_0 - \beta_1 X_i)^2\):

\[\frac{\partial\,\text{SSR}}{\partial \beta_0} = -2\sum(Y_i - \beta_0 - \beta_1 X_i) = 0\]

\[\frac{\partial\,\text{SSR}}{\partial \beta_1} = -2\sum X_i(Y_i - \beta_0 - \beta_1 X_i) = 0\]

These are the Normal Equations:

\[\sum Y_i = n\hat{\beta}_0 + \hat{\beta}_1 \sum X_i\]

\[\sum X_i Y_i = \hat{\beta}_0 \sum X_i + \hat{\beta}_1 \sum X_i^2\]

Solving simultaneously gives the formulas on the previous slide.

Interpreting the Estimates

Interpreting \(\hat{\beta}_1\) and \(\hat{\beta}_0\)

The estimated regression equation:

\[\hat{Y}_i = \hat{\beta}_0 + \hat{\beta}_1 X_i\]

Slope \(\hat{\beta}_1\):

For every one-unit increase in \(X\), the estimated mean of \(Y\) changes by \(\hat{\beta}_1\) units.

This is a marginal effect — the effect of \(X\) on the average \(Y\).

Intercept \(\hat{\beta}_0\):

The estimated mean of \(Y\) when \(X = 0\).

Often not directly meaningful — \(X = 0\) may be outside the range of the data.

Important

\(\hat{\beta}_1\) and \(\hat{\beta}_0\) describe averages in the sample. They are estimates of the unknown population parameters \(\beta_1\) and \(\beta_0\).

R Demo: Fitting and Interpreting

Hypothetical data: 20 observations on sales revenue (\(X\)) and operating profit (\(Y\)), both in m.u.

set.seed(5)
n       <- 20
revenue <- round(runif(n, 100, 500), 0)
profit  <- round(10 + 0.18 * revenue + rnorm(n, 0, 15), 1)

model <- lm(profit ~ revenue)
coef(model)
(Intercept)     revenue 
 21.8268848   0.1435997 

Interpretation:

  • \(\hat{\beta}_1 = 0.144\): each additional m.u. of revenue is associated with an average increase of 0.144 m.u. in operating profit.
  • \(\hat{\beta}_0 = 21.83\): baseline estimate when revenue = 0 (extrapolation — interpret cautiously).

R Demo: The Full Summary

summary(model)

Call:
lm(formula = profit ~ revenue)

Residuals:
    Min      1Q  Median      3Q     Max 
-29.169 -10.033  -3.226  11.177  27.235 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 21.82688    9.99146   2.185 0.042392 *  
revenue      0.14360    0.03113   4.612 0.000216 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 15.32 on 18 degrees of freedom
Multiple R-squared:  0.5417,    Adjusted R-squared:  0.5162 
F-statistic: 21.27 on 1 and 18 DF,  p-value: 0.0002164

R Demo: The Fitted Line

Properties of OLS Residuals

These algebraic identities hold for every OLS fit:

\[\sum_{i=1}^{n} e_i = 0\]

\[\sum_{i=1}^{n} X_i\, e_i = 0\]

\[\sum_{i=1}^{n} \hat{Y}_i\, e_i = 0\]

They are not assumptions — they are guaranteed by the OLS first-order conditions. The residuals automatically sum to zero and are uncorrelated with \(X\) and with \(\hat{Y}\).

Estimating the Error Variance

The population error variance \(\sigma^2\) is unknown. We estimate it with:

\[\hat{\sigma}^2 = s^2 = \frac{\text{SSR}}{n - 2} = \frac{\displaystyle\sum_{i=1}^{n} e_i^2}{n - 2}\]

We divide by \(n - 2\) because estimating \(\hat{\beta}_0\) and \(\hat{\beta}_1\) uses up two degrees of freedom.

The standard error of the regression (residual standard error in R):

\[s = \sqrt{\frac{\displaystyle\sum e_i^2}{n-2}}\]

In R: summary(model)$sigma

Lecture Summary

Key Takeaways — Lecture 8

  • Regression models the conditional mean of \(Y\) given \(X\) — a statement about averages, not individual values.
  • The SLR model is \(Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i\); the error \(\varepsilon_i\) is unavoidable.
  • The Gauss–Markov assumptions (A1–A5) ensure OLS is BLUE; adding normality (A6) enables exact inference.
  • OLS chooses \(\hat{\beta}_0\) and \(\hat{\beta}_1\) to minimise \(\sum e_i^2\): \[\hat{\beta}_1 = \frac{S_{XY}}{S_{XX}}, \qquad \hat{\beta}_0 = \bar{Y} - \hat{\beta}_1\bar{X}\]
  • \(\hat{\beta}_1\) is the marginal effect: estimated change in mean \(Y\) per unit increase in \(X\).
  • In R: lm(Y ~ X) fits the model; summary() shows coefficients, standard errors, and \(p\)-values.

Looking Ahead

Lecture 9 — Part II: Goodness of Fit, OLS Properties & Introduction to MLR

  • How well does the line fit? — \(R^2\) and the variance decomposition
  • Why trust OLS? — The Gauss–Markov theorem
  • What if we leave out a relevant variable? — Omitted Variable Bias
  • Adding more regressors — Multiple Linear Regression

Note

Reference: Newbold Ch. 11.4–11.5, Ch. 12.1–12.3.