Hands-on practice with pre-loaded R data
Lisbon Accounting and Business School
Scan the QR code or open the link below to try Jupyter in your browser.
summary() output: \(t\)-statistics, \(p\)-values, and significance codes.In Lecture 09 we settled on the three-variable model for daily ozone concentration in New York:
\[\text{Ozone}_i = \beta_0 + \beta_1\,\text{Temp}_i + \beta_2\,\text{Wind}_i + \beta_3\,\text{Solar.R}_i + \varepsilon_i\]
Let’s rebuild it quickly before we start:
Today we ask: are the estimates trustworthy, and do the classical assumptions hold?
Run summary(mlr2) and locate the following in the output:
Using the output from Exercise 1:
Wind.Wind? How was it computed?Solar.R.Temp in plain language.summary(mlr2).\[F = \frac{R^2 / k}{(1 - R^2) / (n - k - 1)}\]
Theory tells us that higher wind speed should reduce ozone (wind disperses pollutants).
summary(). How do you convert it to a one-sided \(p\)-value?# Extract the two-sided p-value for Wind
p_two_sided <- summary(mlr2)$coefficients["Wind", "Pr(>|t|)"]
p_two_sided
# One-sided p-value (H1: beta_Wind < 0)
# Only valid if the sign of the estimate is consistent with H1
coef(mlr2)["Wind"] # check the sign first
p_one_sided <- p_two_sided / 2
p_one_sidedPredict ozone concentration for three new days:
| Day | Temp (°F) | Wind (mph) | Solar.R (lang) |
|---|---|---|---|
| A | 75 | 8 | 180 |
| B | 90 | 4 | 250 |
| C | 65 | 15 | 100 |
new_days <- data.frame(
Temp = c(75, 90, 65),
Wind = c(8, 4, 15),
Solar.R = c(180, 250, 100)
)
# Point predictions and prediction intervals (for individual outcomes)
predict(mlr2, newdata = new_days, interval = "prediction", level = 0.95)
# Confidence intervals for the mean response
predict(mlr2, newdata = new_days, interval = "confidence", level = 0.95)The prediction interval is wider because it must also account for the individual error \(\varepsilon_i\) around the mean.
plot(mlr2).The Q–Q plot gives a visual impression, but we can also test formally.
Now carry out the full validation workflow for mlr2:
# Full regression summary (t-tests, F-test, R²)
summary(mlr2)
# Extract specific parts
summary(mlr2)$coefficients # coefficient table
summary(mlr2)$r.squared # R²
summary(mlr2)$adj.r.squared # adjusted R²
summary(mlr2)$fstatistic # F-statistic (value, df1, df2)
summary(mlr2)$sigma # residual standard error s
# Confidence intervals
confint(mlr2, level = 0.95)
# Critical t-value (two-sided, alpha = 0.05)
qt(0.975, df = nrow(aq) - 4)
# Manual t-statistic
coef(mlr2)["Wind"] / summary(mlr2)$coefficients["Wind", "Std. Error"]new_days <- data.frame(Temp = c(75, 90), Wind = c(8, 4), Solar.R = c(180, 250))
# Point prediction only
predict(mlr2, newdata = new_days)
# Prediction interval (for individual outcome)
predict(mlr2, newdata = new_days, interval = "prediction", level = 0.95)
# Confidence interval (for mean response)
predict(mlr2, newdata = new_days, interval = "confidence", level = 0.95)In the diagnostic plots you may have noticed that the residuals from mlr2 show some non-constant variance — ozone measurements are non-negative and right-skewed.
Ozone:mlr_log.We will discuss log transformations and their interpretation in the next lecture.
Statistics II — Inference & Model Validation