Inference & Model Validation: Application in R

Hands-on practice with pre-loaded R data

Paulo Fagandini

Lisbon Accounting and Business School

Try it online

No installation needed

Scan the QR code or open the link below to try Jupyter in your browser.

https://jupyter.org

Statistical Inference & Model Validation

Learning goals

Read and interpret the full summary() output: \(t\)-statistics, \(p\)-values, and significance codes.
Construct and interpret confidence intervals for individual coefficients.
Conduct and interpret the \(F\)-test for overall model significance.
Obtain prediction intervals and understand how they differ from confidence intervals.
Produce and interpret the four diagnostic plots.
Detect heteroscedasticity and multicollinearity using formal tools.
Follow a complete model validation workflow from start to finish.

Picking up from Lecture 09

Our model so far

In Lecture 09 we settled on the three-variable model for daily ozone concentration in New York:

\[\text{Ozone}_i = \beta_0 + \beta_1\,\text{Temp}_i + \beta_2\,\text{Wind}_i + \beta_3\,\text{Solar.R}_i + \varepsilon_i\]

Let’s rebuild it quickly before we start:

data(airquality)
aq   <- airquality[complete.cases(airquality), ]
mlr2 <- lm(Ozone ~ Temp + Wind + Solar.R, data = aq)

\(n =\) 111 complete observations
\(k = 3\) regressors → degrees of freedom for residuals: \(n - k - 1 =\) 107

Today we ask: are the estimates trustworthy, and do the classical assumptions hold?

Exercise 1

Reading the full summary

Run summary(mlr2) and locate the following in the output:

The estimated coefficients and their standard errors.
The \(t\)-statistic and \(p\)-value for each coefficient.
The residual standard error \(s\).
\(R^2\) and adjusted \(\bar{R}^2\).
The \(F\)-statistic and its \(p\)-value.

Hint

summary(mlr2)

Each row of the Coefficients table gives you: Estimate, Std. Error, t value, Pr(>|t|).

The last line of the output contains the \(F\)-statistic.

Exercise 2

Interpreting the \(t\)-tests

Using the output from Exercise 1:

State \(H_0\) and \(H_1\) for the coefficient on Wind.
What is the value of the \(t\)-statistic for Wind? How was it computed?
At \(\alpha = 5\%\), do you reject \(H_0\)? How do you know?
Repeat for Solar.R.

Hint

summary(mlr2)$coefficients

# t-statistic = Estimate / Std. Error
# Reject H0 if |t| > t critical value, OR if p-value < alpha

# Critical value for alpha = 0.05 (two-sided)
qt(0.975, df = nrow(aq) - 4)

Exercise 3

Confidence intervals

Compute 95% confidence intervals for all coefficients.
Interpret the interval for Temp in plain language.
Does the interval for any coefficient include zero? What does that imply?
Recompute at the 99% level — how do the intervals change, and why?

Hint

confint(mlr2, level = 0.95)
confint(mlr2, level = 0.99)

A coefficient whose 95% CI excludes zero is significant at \(\alpha = 5\%\) — this is exactly equivalent to rejecting \(H_0: \beta_j = 0\) in the \(t\)-test.

Exercise 4

The \(F\)-test

State the null hypothesis of the overall \(F\)-test.
Find the \(F\)-statistic and its \(p\)-value in summary(mlr2).
At \(\alpha = 5\%\), what do you conclude?
Verify the \(F\)-statistic manually using the formula:

\[F = \frac{R^2 / k}{(1 - R^2) / (n - k - 1)}\]

Hint

s   <- summary(mlr2)
R2  <- s$r.squared
k   <- 3
n   <- nrow(aq)

F_manual <- (R2 / k) / ((1 - R2) / (n - k - 1))
F_manual

# Compare to R's value
s$fstatistic

Exercise 5

One-sided hypothesis test

Theory tells us that higher wind speed should reduce ozone (wind disperses pollutants).

State this as a one-sided hypothesis test for \(\beta_{\text{Wind}}\).
The two-sided \(p\)-value is available from summary(). How do you convert it to a one-sided \(p\)-value?
Do you reject \(H_0\) at \(\alpha = 5\%\)?

Hint

# Extract the two-sided p-value for Wind
p_two_sided <- summary(mlr2)$coefficients["Wind", "Pr(>|t|)"]
p_two_sided

# One-sided p-value (H1: beta_Wind < 0)
# Only valid if the sign of the estimate is consistent with H1
coef(mlr2)["Wind"]       # check the sign first
p_one_sided <- p_two_sided / 2
p_one_sided

Exercise 6

Prediction and prediction intervals

Predict ozone concentration for three new days:

Day	Temp (°F)	Wind (mph)	Solar.R (lang)
A	75	8	180
B	90	4	250
C	65	15	100

Obtain point predictions.
Obtain 95% prediction intervals.
Obtain 95% confidence intervals for the mean response.
Which interval is wider? Why?

Hint

new_days <- data.frame(
  Temp    = c(75, 90, 65),
  Wind    = c(8, 4, 15),
  Solar.R = c(180, 250, 100)
)

# Point predictions and prediction intervals (for individual outcomes)
predict(mlr2, newdata = new_days, interval = "prediction", level = 0.95)

# Confidence intervals for the mean response
predict(mlr2, newdata = new_days, interval = "confidence", level = 0.95)

The prediction interval is wider because it must also account for the individual error \(\varepsilon_i\) around the mean.

Exercise 7

Residual diagnostics — the four plots

Produce the four standard diagnostic plots using plot(mlr2).
For each plot, state what assumption it checks and what you observe.
Are there any observations flagged as influential?

Hint

par(mfrow = c(2, 2))
plot(mlr2)
par(mfrow = c(1, 1))

Plot	Assumption checked
Residuals vs. Fitted	Linearity, homoscedasticity
Normal Q–Q	Normality of residuals
Scale-Location	Homoscedasticity (more clearly)
Residuals vs. Leverage	Influential observations

Exercise 8

Testing normality formally

The Q–Q plot gives a visual impression, but we can also test formally.

Run the Shapiro-Wilk test on the residuals.
State \(H_0\) and interpret the \(p\)-value.
Is the result consistent with what you saw in the Q–Q plot?

Hint

shapiro.test(resid(mlr2))

\(H_0\): residuals are normally distributed.

If \(p > 0.05\), we do not reject normality at the 5% level.

Exercise 9

Detecting multicollinearity

Compute the correlation matrix of the three regressors.
Calculate the VIF for each regressor.
Is there a multicollinearity concern? What would be the consequence if VIF were very high?

Hint

# Correlation matrix
cor(aq[, c("Temp", "Wind", "Solar.R")])

# VIF (requires the car package)
# install.packages("car")
library(car)
vif(mlr2)

Rule of thumb: VIF \(< 5\) is acceptable; VIF \(> 10\) is a serious concern.

Exercise 10

Putting it all together

Now carry out the full validation workflow for mlr2:

Coefficients — signs, magnitudes, and significance. ✓ (Exercises 1–5)
Residuals vs. fitted — any non-linearity or heteroscedasticity?
Q–Q plot + Shapiro-Wilk — are residuals approximately normal?
VIF — any multicollinearity?
Based on all of the above: do you trust the inference from this model? Are there any concerns?

Hint

# Complete workflow in one go
summary(mlr2)
confint(mlr2)

par(mfrow = c(2, 2)); plot(mlr2); par(mfrow = c(1, 1))

shapiro.test(resid(mlr2))

library(car)
vif(mlr2)

Write a brief paragraph summarising your conclusions — this is exactly what you would include in a report.

Cheat sheet

Inference commands

# Full regression summary (t-tests, F-test, R²)
summary(mlr2)

# Extract specific parts
summary(mlr2)$coefficients      # coefficient table
summary(mlr2)$r.squared         # R²
summary(mlr2)$adj.r.squared     # adjusted R²
summary(mlr2)$fstatistic        # F-statistic (value, df1, df2)
summary(mlr2)$sigma             # residual standard error s

# Confidence intervals
confint(mlr2, level = 0.95)

# Critical t-value (two-sided, alpha = 0.05)
qt(0.975, df = nrow(aq) - 4)

# Manual t-statistic
coef(mlr2)["Wind"] / summary(mlr2)$coefficients["Wind", "Std. Error"]

Prediction commands

new_days <- data.frame(Temp = c(75, 90), Wind = c(8, 4), Solar.R = c(180, 250))

# Point prediction only
predict(mlr2, newdata = new_days)

# Prediction interval (for individual outcome)
predict(mlr2, newdata = new_days, interval = "prediction", level = 0.95)

# Confidence interval (for mean response)
predict(mlr2, newdata = new_days, interval = "confidence", level = 0.95)

Diagnostic commands (plots)

# Four standard plots
par(mfrow = c(2, 2))
plot(mlr2)
par(mfrow = c(1, 1))

# Residuals vs fitted (manual, with ggplot)
ggplot(data.frame(fitted = fitted(mlr2), resid = resid(mlr2)),
       aes(x = fitted, y = resid)) +
  geom_point() +
  geom_hline(yintercept = 0, linetype = "dashed") +
  theme_minimal()

Diagnostic commands (tests)

# Normality test
shapiro.test(resid(mlr2))

# Multicollinearity
library(car)
vif(mlr2)

# Correlation among regressors
cor(aq[, c("Temp", "Wind", "Solar.R")])

Homework (not graded)

In the diagnostic plots you may have noticed that the residuals from mlr2 show some non-constant variance — ozone measurements are non-negative and right-skewed.

Fit a model using \(\log(\text{Ozone})\) as the response instead of Ozone:

mlr_log <- lm(log(Ozone) ~ Temp + Wind + Solar.R, data = aq)

Re-run the full diagnostic workflow on mlr_log.
Do the residuals look more normal and homoscedastic?
How does the interpretation of the coefficients change when the response is on a log scale?

We will discuss log transformations and their interpretation in the next lecture.