The Linear Regression Model

Part I — Introduction & OLS Estimation

Paulo Fagandini

Lisbon Accounting and Business School — Polytechnic University of Lisbon

The Linear Regression Model

Lecture Overview

Lecture 8 — Introduction & OLS Estimation

Why regression? Motivation and examples
Statistical vs. deterministic relationships
Scatter diagrams and linear correlation
The Simple Linear Regression (SLR) model
The Classical Assumptions (Gauss–Markov)
OLS: deriving $\hat{\beta}_0$ and $\hat{\beta}_1$
Interpreting the estimated coefficients
Running a regression in R with lm()

Note

Reference: Newbold, Carlson & Thorne — Statistics for Business and Economics, Chapters 11–12.

Why Regression?

Motivation

In many situations, we observe that variables tend to move together:

Accounting & Finance

Sales revenue → Operating profit
Advertising spend → Market share
Firm leverage → Cost of debt
Total assets → Audit fees

Economics & Management

Disposable income → Consumption
Interest rates → Investment
Education → Earnings
Hours studied → Exam grade

Regression analysis lets us:

Describe the average relationship between variables
Quantify the size of that relationship
Predict values of one variable from another

Statistical vs. Deterministic Relationships

Deterministic: \[C = 2\pi r\]

Every value of $r$ gives exactly one $C$.

No randomness. No need for statistics.

Statistical: \[\text{Profit}_i = \beta_0 + \beta_1\,\text{Revenue}_i + \varepsilon_i\]

Same revenue $\rightarrow$ different profit across firms.

The error $\varepsilon_i$ captures everything else.

Important

Regression models the conditional mean of $Y$ given $X$:

\[E[Y \mid X = x] = \beta_0 + \beta_1\,x\]

We do not claim $X$ determines $Y$ exactly — only on average.

Types of Variables

Role	Name	Also called	Symbol
What we explain	Dependent variable	Response, regressand	$Y$
What explains it	Independent variable	Regressor, predictor	$X$

$Y$ is always quantitative.
$X$ can be quantitative or qualitative (dummy variables — introduced in Part II).

Running example (hypothetical data throughout):

A firm’s operating profit ($Y$, m.u.) as a function of sales revenue ($X$, m.u.).

Note

All numerical examples use hypothetical data constructed for illustration purposes only.

Exploring the Relationship

The Scatter Diagram

Before fitting any model, always plot the data. The scatter diagram displays the $n$ observed pairs $(x_i, y_i)$.

Reading the Scatter Diagram

From the scatter diagram we assess:

Direction: positive (both increase together) or negative (one increases, the other decreases)?
Form: linear, curved, or no discernible pattern?
Strength: how tightly do points cluster around the trend?
Outliers: any unusual observations that break the pattern?

Important

Always plot before you fit.

A high $R^2$ on a non-linear or structurally misspecified relationship can be meaningless. Classic illustration: Anscombe’s Quartet.

Anscombe’s Quartet

$\hat{\beta}_0 \approx 3.0$, $\hat{\beta}_1 \approx 0.50$, $R^2 \approx 0.67$ for all four — but only Dataset 1 works.

The SLR Model

Model Specification

The Simple Linear Regression (SLR) model:

\[\boxed{Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i, \qquad i = 1, 2, \ldots, n}\]

Symbol	Name	Meaning
$Y_i$	Dependent variable	Observed outcome for unit $i$
$X_i$	Independent variable	Known regressor for unit $i$
$\beta_0$	Intercept	Mean of $Y$ when $X = 0$
$\beta_1$	Slope	Change in mean $Y$ per unit increase in $X$
$\varepsilon_i$	Error term	All other factors affecting $Y_i$

Note

$\beta_0$ and $\beta_1$ are unknown population parameters. Our goal is to estimate them from data as $\hat{\beta}_0$ and $\hat{\beta}_1$.

The Error Term $\varepsilon_i$

\[Y_i = \underbrace{\beta_0 + \beta_1 X_i}_{\text{systematic part}} + \underbrace{\varepsilon_i}_{\text{random part}}\]

The error captures:

Omitted variables — factors that influence $Y$ but are not in the model
Measurement error — imprecision in recording $Y_i$ or $X_i$
Inherent randomness — unpredictable variation in behaviour or outcomes

Important

The error is not a mistake in the model — it is an unavoidable feature of any statistical relationship. What matters is that we make appropriate assumptions about its behaviour.

The Classical Assumptions

Gauss–Markov Assumptions

For OLS to have good statistical properties, we require:

#	Name	Statement
A1	Linearity	$Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i$
A2	Zero mean	$E[\varepsilon_i] = 0 \;\forall\, i$
A3	Homoscedasticity	$\text{Var}(\varepsilon_i) = \sigma^2 \;\forall\, i$
A4	No autocorrelation	$\text{Cov}(\varepsilon_i, \varepsilon_j) = 0 \;\forall\, i \neq j$
A5	Exogeneity	$X_i$ is fixed or independent of $\varepsilon_i$
A6	Normality	$\varepsilon_i \sim N(0,\, \sigma^2)$

Note

A1–A5 are the Gauss–Markov conditions. Under them, OLS is BLUE (Best Linear Unbiased Estimator). A6 is additionally required for exact $t$- and $F$-tests in small samples.

What the Assumptions Imply

Under A1–A6, for each $i$:

\[Y_i \mid X_i \;\sim\; N\!\left(\beta_0 + \beta_1 X_i,\; \sigma^2\right)\]

OLS Estimation

The Idea: Minimise Squared Residuals

Given data $(x_1,y_1),\ldots,(x_n,y_n)$, the fitted line is:

\[\hat{Y}_i = \hat{\beta}_0 + \hat{\beta}_1 X_i\]

The residual for observation $i$:

\[e_i = Y_i - \hat{Y}_i\]

Important

Ordinary Least Squares (OLS): choose $\hat{\beta}_0$ and $\hat{\beta}_1$ to minimise the Sum of Squared Residuals:

\[\min_{\hat{\beta}_0,\,\hat{\beta}_1} \;\text{SSR} = \sum_{i=1}^{n} e_i^2 = \sum_{i=1}^{n}(Y_i - \hat{\beta}_0 - \hat{\beta}_1 X_i)^2\]

Why square the residuals? To penalise large errors symmetrically and to obtain a unique closed-form solution.

OLS: Visualising the Criterion

OLS Formulas

Taking partial derivatives of SSR and setting them to zero yields:

\[\boxed{\hat{\beta}_1 = \frac{\displaystyle\sum_{i=1}^{n}(X_i - \bar{X})(Y_i - \bar{Y})}{\displaystyle\sum_{i=1}^{n}(X_i - \bar{X})^2} = \frac{S_{XY}}{S_{XX}}}\]

\[\boxed{\hat{\beta}_0 = \bar{Y} - \hat{\beta}_1\,\bar{X}}\]

where $S_{XY} = \sum(X_i-\bar{X})(Y_i-\bar{Y})$ and $S_{XX} = \sum(X_i-\bar{X})^2$.

Note

Since $\hat{\beta}_0 = \bar{Y} - \hat{\beta}_1\bar{X}$, the fitted line always passes through $(\bar{X},\,\bar{Y})$.

Deriving the OLS Formulas

Minimise $\text{SSR} = \sum(Y_i - \beta_0 - \beta_1 X_i)^2$:

\[\frac{\partial\,\text{SSR}}{\partial \beta_0} = -2\sum(Y_i - \beta_0 - \beta_1 X_i) = 0\]

\[\frac{\partial\,\text{SSR}}{\partial \beta_1} = -2\sum X_i(Y_i - \beta_0 - \beta_1 X_i) = 0\]

These are the Normal Equations:

\[\sum Y_i = n\hat{\beta}_0 + \hat{\beta}_1 \sum X_i\]

\[\sum X_i Y_i = \hat{\beta}_0 \sum X_i + \hat{\beta}_1 \sum X_i^2\]

Solving simultaneously gives the formulas on the previous slide.

Interpreting the Estimates

Interpreting $\hat{\beta}_1$ and $\hat{\beta}_0$

The estimated regression equation:

\[\hat{Y}_i = \hat{\beta}_0 + \hat{\beta}_1 X_i\]

Slope $\hat{\beta}_1$:

For every one-unit increase in $X$, the estimated mean of $Y$ changes by $\hat{\beta}_1$ units.

This is a marginal effect — the effect of $X$ on the average $Y$.

Intercept $\hat{\beta}_0$:

The estimated mean of $Y$ when $X = 0$.

Often not directly meaningful — $X = 0$ may be outside the range of the data.

Important

$\hat{\beta}_1$ and $\hat{\beta}_0$ describe averages in the sample. They are estimates of the unknown population parameters $\beta_1$ and $\beta_0$.

R Demo: Fitting and Interpreting

Hypothetical data: 20 observations on sales revenue ($X$) and operating profit ($Y$), both in m.u.

set.seed(5)
n       <- 20
revenue <- round(runif(n, 100, 500), 0)
profit  <- round(10 + 0.18 * revenue + rnorm(n, 0, 15), 1)

model <- lm(profit ~ revenue)
coef(model)

(Intercept)     revenue 
 21.8268848   0.1435997

Interpretation:

$\hat{\beta}_1 = 0.144$: each additional m.u. of revenue is associated with an average increase of 0.144 m.u. in operating profit.
$\hat{\beta}_0 = 21.83$: baseline estimate when revenue = 0 (extrapolation — interpret cautiously).

R Demo: The Full Summary

summary(model)


Call:
lm(formula = profit ~ revenue)

Residuals:
    Min      1Q  Median      3Q     Max 
-29.169 -10.033  -3.226  11.177  27.235 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 21.82688    9.99146   2.185 0.042392 *  
revenue      0.14360    0.03113   4.612 0.000216 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 15.32 on 18 degrees of freedom
Multiple R-squared:  0.5417,    Adjusted R-squared:  0.5162 
F-statistic: 21.27 on 1 and 18 DF,  p-value: 0.0002164

R Demo: The Fitted Line

Properties of OLS Residuals

These algebraic identities hold for every OLS fit:

\[\sum_{i=1}^{n} e_i = 0\]

\[\sum_{i=1}^{n} X_i\, e_i = 0\]

\[\sum_{i=1}^{n} \hat{Y}_i\, e_i = 0\]

They are not assumptions — they are guaranteed by the OLS first-order conditions. The residuals automatically sum to zero and are uncorrelated with $X$ and with $\hat{Y}$.

Estimating the Error Variance

The population error variance $\sigma^2$ is unknown. We estimate it with:

\[\hat{\sigma}^2 = s^2 = \frac{\text{SSR}}{n - 2} = \frac{\displaystyle\sum_{i=1}^{n} e_i^2}{n - 2}\]

We divide by $n - 2$ because estimating $\hat{\beta}_0$ and $\hat{\beta}_1$ uses up two degrees of freedom.

The standard error of the regression (residual standard error in R):

\[s = \sqrt{\frac{\displaystyle\sum e_i^2}{n-2}}\]

In R: summary(model)$sigma

Lecture Summary

Key Takeaways — Lecture 8

Regression models the conditional mean of $Y$ given $X$ — a statement about averages, not individual values.
The SLR model is $Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i$; the error $\varepsilon_i$ is unavoidable.
The Gauss–Markov assumptions (A1–A5) ensure OLS is BLUE; adding normality (A6) enables exact inference.
OLS chooses $\hat{\beta}_0$ and $\hat{\beta}_1$ to minimise $\sum e_i^2$: \[\hat{\beta}_1 = \frac{S_{XY}}{S_{XX}}, \qquad \hat{\beta}_0 = \bar{Y} - \hat{\beta}_1\bar{X}\]
$\hat{\beta}_1$ is the marginal effect: estimated change in mean $Y$ per unit increase in $X$.
In R: lm(Y ~ X) fits the model; summary() shows coefficients, standard errors, and $p$-values.

Looking Ahead

Lecture 9 — Part II: Goodness of Fit, OLS Properties & Introduction to MLR

How well does the line fit? — $R^2$ and the variance decomposition
Why trust OLS? — The Gauss–Markov theorem
What if we leave out a relevant variable? — Omitted Variable Bias
Adding more regressors — Multiple Linear Regression

Note

Reference: Newbold Ch. 11.4–11.5, Ch. 12.1–12.3.

Symbol	Name	Meaning
\(Y_i\)	Dependent variable	Observed outcome for unit \(i\)
\(X_i\)	Independent variable	Known regressor for unit \(i\)
\(\beta_0\)	Intercept	Mean of \(Y\) when \(X = 0\)
\(\beta_1\)	Slope	Change in mean \(Y\) per unit increase in \(X\)
\(\varepsilon_i\)	Error term	All other factors affecting \(Y_i\)

The Linear Regression Model

The Linear Regression Model

Lecture Overview

Why Regression?

Motivation

Statistical vs. Deterministic Relationships

Types of Variables

Exploring the Relationship

The Scatter Diagram

Reading the Scatter Diagram

Anscombe’s Quartet

The SLR Model

Model Specification

The Error Term \(\varepsilon_i\)

The Classical Assumptions

Gauss–Markov Assumptions

What the Assumptions Imply

OLS Estimation

The Idea: Minimise Squared Residuals

OLS: Visualising the Criterion

OLS Formulas

Deriving the OLS Formulas

Interpreting the Estimates

Interpreting \(\hat{\beta}_1\) and \(\hat{\beta}_0\)

R Demo: Fitting and Interpreting

R Demo: The Full Summary

R Demo: The Fitted Line

Properties of OLS Residuals

Estimating the Error Variance

Lecture Summary

Key Takeaways — Lecture 8

Looking Ahead

#	Name	Statement
A1	Linearity	\(Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i\)
A2	Zero mean	\(E[\varepsilon_i] = 0 \;\forall\, i\)
A3	Homoscedasticity	\(\text{Var}(\varepsilon_i) = \sigma^2 \;\forall\, i\)
A4	No autocorrelation	\(\text{Cov}(\varepsilon_i, \varepsilon_j) = 0 \;\forall\, i \neq j\)
A5	Exogeneity	\(X_i\) is fixed or independent of \(\varepsilon_i\)
A6	Normality	\(\varepsilon_i \sim N(0,\, \sigma^2)\)