Interval Estimation — Part 1

Paulo Fagandini

Lisbon Accounting and Business School — Polytechnic University of Lisbon

Interval Estimation

Interval Estimation — Part 1

Topics covered

  1. Confidence Intervals — General Concepts

  2. Confidence Interval for the Population Mean, \(\mu\), of Normal Populations

Reference: Newbold, P., Carlson, W., & Thorne, B. — Statistics for Business and Economics, Global Ed.

1) Confidence Intervals — General Concepts

Confidence Intervals — General Concepts

Definition

Once a random sample has been drawn from the population, interval estimation yields an interval that, with a specified degree of confidence, contains the true (unknown) parameter (interval estimate for a parameter \(\theta\)).

Confidence Intervals — General Concepts

Methodology for constructing a confidence interval:

  • Find a “good” point estimator;

  • Establish a confidence level (most common: 90%, 95%, and 99%);

  • Know the sample size;

  • Know the sampling distribution of the estimator.

Confidence Intervals — General Concepts

In choosing the estimator, the Pivotal Variable Method should be followed.

According to this method, the pivot statistic (or pivotal variable):

  • Must contain the parameter to be estimated in its expression;

  • Its sampling distribution (exact or approximate) must not depend on the parameter, nor on any other unknown quantity.

2) CI for the Population Mean, \(\mu\)

Confidence Interval for \(\mu\) — Overview

Case 1 — Normal population, \(\sigma\) known

Case 2 — Normal population, \(\sigma\) unknown

Case 1: Normal Population and \(\sigma\) Known

Case 1 — Normal Population and \(\sigma\) Known

Consider a random sample \(X_1, X_2, \ldots, X_n\), \(n \in \mathbb{N}\), drawn from a population \(X\) with distribution \(N(\mu, \sigma)\), where \(\sigma\) is known.

\[\underbrace{\mu}_{\text{parameter}} \;\longrightarrow\; \underbrace{\bar{X}}_{\substack{\text{point} \\ \text{estimator}}} \;\longrightarrow\; \underbrace{Z = \dfrac{\bar{X} - \mu}{\sigma / \sqrt{n}} \sim N(0,1)}_{\text{pivot statistic}}\]

Case 1 — Setting Up the Probability Statement

\[P\!\left(-z_{\alpha/2} < Z < z_{\alpha/2}\right) = 1 - \alpha\]

Confidence attributed to the interval: \((1 - \alpha) \times 100\%\)

Case 1 — Deriving the Confidence Interval

\[P\!\left(-z_{\alpha/2} < Z < z_{\alpha/2}\right) = 1-\alpha\]

\[\Updownarrow\]

\[P\!\left(-z_{\alpha/2} < \frac{\bar{X}-\mu}{\sigma/\sqrt{n}} < z_{\alpha/2}\right) = 1-\alpha\]

\[\Updownarrow \quad \cdots\]

\[P\!\left(\underbrace{\bar{X} - z_{\alpha/2}\frac{\sigma}{\sqrt{n}}}_{T_1} < \mu < \underbrace{\bar{X} + z_{\alpha/2}\frac{\sigma}{\sqrt{n}}}_{T_2}\right) = 1-\alpha\]

Case 1 — The Confidence Interval

The \((1-\alpha)\times 100\%\) confidence interval for \(\mu\) is:

\[\boxed{IC_{(1-\alpha)\times 100\%}(\mu) = \left(\bar{x} - z_{\alpha/2}\frac{\sigma}{\sqrt{n}},\; \bar{x} + z_{\alpha/2}\frac{\sigma}{\sqrt{n}}\right)}\]

where \(t_1 = \bar{x} - z_{\alpha/2}\dfrac{\sigma}{\sqrt{n}}\) and \(t_2 = \bar{x} + z_{\alpha/2}\dfrac{\sigma}{\sqrt{n}}\) are the observed lower and upper bounds.

Case 1 — Random Nature of the Bounds

Different samples yield different values of \(\bar{x}\) and, consequently, different values of the bounds \(t_1\) and \(t_2\).

Therefore, those bounds are realizations of random variables \(T_1\) and \(T_2\), respectively.

The confidence interval is random — it varies from sample to sample. What we compute from our data is one particular realization of that random interval.

Case 1 — Interpretation

Correct Interpretation

If an infinite number of random samples of the same size were drawn, and a \((1-\alpha)\times 100\%\) confidence interval for \(\mu\) were computed from each sample, then \((1-\alpha)\times 100\%\) of those intervals would contain the true value of \(\mu\).

A particular computed interval either contains \(\mu\) or it does not — we just do not know which. The \((1-\alpha)\times 100\%\) refers to the long-run coverage of the procedure.

Case 1 — Visualization of Coverage

Case 1 — Properties of the CI

\[IC_{(1-\alpha)\times100\%}(\mu)=\left(\bar{x} - z_{\alpha/2}\frac{\sigma}{\sqrt{n}},\; \bar{x} + z_{\alpha/2}\frac{\sigma}{\sqrt{n}}\right)\]

The interval is symmetric: the midpoint equals the point estimate \(\bar{x}\), and \(\sigma_{\bar{X}} = \sigma/\sqrt{n}\) is the standard error of the estimator \(\bar{X}\).

The estimation error is the maximum error committed:

\[|\bar{X} - \mu| < z_{\alpha/2}\frac{\sigma}{\sqrt{n}} \quad \longleftarrow \text{estimation error}\]

Case 1 — Width of the Interval

\[IC_{(1-\alpha)\times100\%}(\mu)=\left(\bar{x} - z_{\alpha/2}\frac{\sigma}{\sqrt{n}},\; \bar{x} + z_{\alpha/2}\frac{\sigma}{\sqrt{n}}\right)\]

  • Confidence level \(\uparrow\) \(\Rightarrow\) width increases \(\Rightarrow\) inference becomes less precise (and vice versa).

  • Variance \(\uparrow\) \(\Rightarrow\) width increases, because the standard error of the estimator increases.

  • Sample size \(\uparrow\) \(\Rightarrow\) width decreases \(\Rightarrow\) inference becomes more precise.

Case 1 — A Practical Note

\[IC_{(1-\alpha)\times100\%}(\mu)=\left(\bar{x} - z_{\alpha/2}\frac{\sigma}{\sqrt{n}},\; \bar{x} + z_{\alpha/2}\frac{\sigma}{\sqrt{n}}\right)\]

It is not guaranteed that constructing a CI always produces useful information. It is necessary to strike a balance between:

  • the sample size,
  • the confidence level, and
  • the precision of the interval.

Case 1 — Application Exercise

Application Exercise — Setup

Consider a population with a Normal distribution and known standard deviation \(\sigma = 20\).

A random sample of size \(n = 20\) was drawn, yielding a sample mean \(\bar{x} = 320\).

Find the 90% confidence interval for the population mean.

Application Exercise — Conditions (Case 1)

We are in the conditions of Case 1 (Normal population, \(\sigma\) known):

\[X \sim N(\mu,\; \sigma = 20), \qquad n = 20, \qquad \bar{x} = 320\]

The pivot statistic is:

\[Z = \frac{\bar{X} - \mu}{\sigma / \sqrt{n}} \sim N(0, 1)\]

Application Exercise — Finding \(z_{\alpha/2}\)

The confidence interval to use is:

\[IC_{(1-\alpha)\times100\%}(\mu) = \left(\bar{x} - z_{\alpha/2}\frac{\sigma}{\sqrt{n}},\; \bar{x} + z_{\alpha/2}\frac{\sigma}{\sqrt{n}}\right)\]

We want 90% confidence \(\Rightarrow\) \(1 - \alpha = 0.90\) \(\Rightarrow\) \(\alpha = 0.10\).

We have \(\bar{x}\), \(\sigma\), and \(n\). We still need \(z_{\alpha/2}\).

For the most common confidence levels (90%, 95%, …), the values of \(z_{\alpha/2}\) are tabulated (Table 5).

Looking at the table, for \(\alpha = 0.10\):

\[z_{\alpha/2} = z_{0.05} = 1.645\]

Application Exercise — Result

Substituting all values into the CI formula:

\[IC_{90\%}(\mu) = \left(320 - 1.645 \times \frac{20}{\sqrt{20}},\; 320 + 1.645 \times \frac{20}{\sqrt{20}}\right)\]

90% Confidence Interval for \(\mu\)

\[IC_{90\%}(\mu) = (312.64\;;\; 327.36)\]

Application Exercise — Interpretation

\[IC_{90\%}(\mu) = (312.64\;;\; 327.36)\]

The particular interval \((312.64\;;\; 327.36)\) is a 90% confidence interval for the true value of \(\mu\).

This means: if we were to construct confidence intervals from many different samples of the same size, 90% of them would effectively contain the population mean \(\mu\).

Case 2: Normal Population and \(\sigma\) Unknown

Case 2 — Normal Population and \(\sigma\) Unknown

Consider a random sample \(X_1, X_2, \ldots, X_n\), \(n \in \mathbb{N}\), drawn from a population \(X\) with distribution \(N(\mu, \sigma)\), where \(\sigma\) is unknown.

\[\underbrace{\mu}_{\text{parameter}} \;\longrightarrow\; \underbrace{\bar{X}}_{\substack{\text{point} \\ \text{estimator}}} \;\longrightarrow\; \underbrace{T = \dfrac{\bar{X} - \mu}{S' / \sqrt{n}} \sim t_{(n-1)}}_{\text{pivot statistic}}\]

where \(S'\) is the corrected sample standard deviation.

Case 2 — Setting Up the Probability Statement

\[P\!\left(-t_{\alpha/2} < T < t_{\alpha/2}\right) = 1-\alpha \qquad \text{(confidence: }(1-\alpha)\times 100\text{\%)}\]

Case 2 — The Confidence Interval

The \((1-\alpha)\times 100\%\) confidence interval for \(\mu\) is:

\[\boxed{IC_{(1-\alpha)\times 100\%}(\mu) = \left(\bar{x} - t_{\alpha/2}\frac{s'}{\sqrt{n}},\; \bar{x} + t_{\alpha/2}\frac{s'}{\sqrt{n}}\right)}\]

where \(t_{\alpha/2}\) is the critical value from the \(t_{(n-1)}\) distribution.

All properties derived for the CI in Case 1 (see earlier slides) apply equally here: the interval is symmetric, and its width increases with the confidence level and variance, and decreases with the sample size.

Case 2 — Application Exercise 1

Exercise 1 — Setup

Based on a random sample of \(n = 16\) observations from a Normal population, the following 90% confidence interval for the expected value was constructed:

\[(7.398\;;\; 12.602)\]

Given that \(s' = 3.872\), determine the confidence level that can be attributed to this interval.

(a) 98% (b) 99% (c) Between 98% and 99%

Exercise 1 — Conditions (Case 2)

Conditions: Case 2 — Normal population, \(\sigma\) unknown.

\[X \sim N(\mu, \sigma\!=\!?\,), \qquad n = 16, \qquad s' = 3.872\]

Pivot statistic: \(\;T = \dfrac{\bar{X} - \mu}{S'/\sqrt{n}} \sim t_{(n-1)} \equiv t_{(15)}\)

Exercise 1 — Finding \(t_{\alpha/2}\)

The interval given is:

\[IC_{(1-\alpha)\times100\%}(\mu) = (7.398\;;\; 12.602)\]

Its width is:

\[h = 12.602 - 7.398 = 5.204\]

The width of the CI can also be expressed as:

\[h = 2 \times t_{\alpha/2}\frac{s'}{\sqrt{n}}\]

Exercise 1 — Solving for \(t_{\alpha/2}\)

Setting the two expressions for \(h\) equal:

\[2 \times t_{\alpha/2}\frac{s'}{\sqrt{n}} = 5.204 \;\Rightarrow\; t_{\alpha/2} = \frac{5.204 \times \sqrt{n}}{2 \times s'}\]

\[t_{\alpha/2} = \frac{5.204 \times \sqrt{16}}{2 \times 3.872} = \frac{5.204 \times 4}{7.744} = 2.688\]

From the \(t\)-Student table (Table 7, row for 15 d.f.):

\[\underset{{\scriptstyle\text{area}=0.01}}{2.602} \;<\; 2.688 \;<\; \underset{{\scriptstyle\text{area}=0.005}}{2.947}\]

Exercise 1 — Conclusion

\[\underset{{\scriptstyle\text{area}=0.01}}{2.602} < 2.688 < \underset{{\scriptstyle\text{area}=0.005}}{2.947}\]

\[0.005 < \frac{\alpha}{2} < 0.01 \;\Leftrightarrow\; 0.01 < \alpha < 0.02 \;\Leftrightarrow\; 0.98 < 1-\alpha < 0.99\]

The confidence level of the given interval lies between 98% and 99%.

The correct answer is (c).

Case 2 (continued): \(\sigma\) Unknown and \(n > 30\)

Case 2 — Large Samples (\(n > 30\))

Consider a random sample \(X_1, X_2, \ldots, X_n\), \(n \in \mathbb{N}\) and \(n > 30\), from a population \(X \sim N(\mu, \sigma)\) with \(\sigma\) unknown.

\[\underbrace{\mu}_{\text{parameter}} \;\longrightarrow\; \underbrace{\bar{X}}_{\substack{\text{point} \\ \text{estimator}}} \;\longrightarrow\; \underbrace{Z = \dfrac{\bar{X} - \mu}{S' / \sqrt{n}} \;\dot{\sim}\; N(0,1)}_{\text{pivot statistic (approx.)}}\]

The pivot statistic no longer follows a \(t_{(n-1)}\) distribution — it follows an approximately \(N(0,1)\) distribution.

Case 2 — Large-Sample CI (\(n > 30\))

The approximate \((1-\alpha)\times 100\%\) confidence interval for \(\mu\) is:

\[IC_{(1-\alpha)\times 100\%}(\mu) \approx \left(\bar{x} - z_{\alpha/2}\frac{s'}{\sqrt{n}},\; \bar{x} + z_{\alpha/2}\frac{s'}{\sqrt{n}}\right)\]

Note: When \(n > 30\) and \(\sigma\) is unknown, the corrected sample standard deviation \(s'\) is used in place of \(\sigma\), and the standard normal critical value \(z_{\alpha/2}\) replaces the \(t_{\alpha/2}\) critical value. The resulting interval is approximate.

Case 2 — Application Exercise 2

Exercise 2 — Setup

Using the same sample from Exercise 1 (\(n = 16\), \(s' = 3.872\), CI at 98%: \((7.398\;;\;12.602)\)):

What sample size should be collected (assuming \(s'\) does not change) so that the width of the CI is reduced by half?

Exercise 2 — Initial Analysis

Current width: \(h = 5.204\)

New target width: \(h' = h/2 = 5.204/2 = 2.602\)

Increasing precision while keeping all other conditions constant requires a larger sample. We assume \(n > 30\), so the pivot becomes approximately \(N(0,1)\).

Exercise 2 — Solving for \(n\)

The approximate CI for \(\mu\) becomes:

\[IC_{98\%}(\mu) \approx \left(\bar{x} - z_{\alpha/2}\frac{s'}{\sqrt{n}},\; \bar{x} + z_{\alpha/2}\frac{s'}{\sqrt{n}}\right)\]

The new width satisfies:

\[h' = 2 \times z_{\alpha/2}\frac{s'}{\sqrt{n}} \;\Rightarrow\; 2.602 = 2 \times \underset{\substack{\uparrow \\ \text{Table 5},\; \varepsilon = 0.02}}{2.326} \times \frac{3.872}{\sqrt{n}}\]

\[\sqrt{n} = \frac{2 \times 2.326 \times 3.872}{2.602} \approx 6.92\]

\[n \geq (6.92)^2 \;\Rightarrow\; \boxed{n = 48}\]

Exercise 2 — Conclusion

Required Sample Size

To reduce the width of the 98% CI by half (from 5.204 to 2.602), while keeping \(s' = 3.872\) unchanged, a sample of \(n = 48\) observations must be collected.

Note that the assumption \(n > 30\) is satisfied (\(48 > 30\)), validating the use of the normal approximation.

Summary — Case 1 vs. Case 2

Case 1 Case 2 (\(n \leq 30\)) Case 2 (\(n > 30\))
Population \(N(\mu, \sigma)\) \(N(\mu, \sigma)\) \(N(\mu, \sigma)\)
\(\sigma\) Known Unknown Unknown
Pivot \(Z \sim N(0,1)\) \(T \sim t_{(n-1)}\) \(Z \;\dot{\sim}\; N(0,1)\)
Critical value \(z_{\alpha/2}\) \(t_{\alpha/2}\) \(z_{\alpha/2}\)
SE \(\sigma/\sqrt{n}\) \(s'/\sqrt{n}\) \(s'/\sqrt{n}\)
Interval Exact Exact Approximate