Parametric Hypothesis Tests — Part 1

Basic Concepts, Test Procedure, Errors, and p-value

Paulo Fagandini

Lisbon Accounting and Business School – Polytechnic University of Lisbon

2025-04-14

Introduction to Hypothesis Testing

Recall: Statistical Inference

So far in this course, we have covered:

  • Sampling distributions (Topic 1): how sample statistics behave
  • Point estimation (Topic 2.1): finding a single “best guess” for a parameter
  • Interval estimation (Topic 2.2): constructing a range of plausible values

Now we ask a different question:

Given a claim about a population parameter, does the sample data support or contradict that claim?

Why Hypothesis Testing?

In practice, decisions must be made under uncertainty:

  • A pharmaceutical company claims a new drug lowers cholesterol by more than 20 mg/dL. Does the clinical trial data support this?
  • An engineer suspects that a machine is overfilling packages beyond the nominal 100 g. Is there statistical evidence?
  • A bank manager believes the average processing time for loan applications has decreased. Can we confirm this?

Hypothesis testing provides a formal, rigorous framework for answering these questions using sample data.

From Estimation to Testing

The Key Shift

In estimation, we ask: “What is the value of the parameter?”

In hypothesis testing, we ask: “Is the parameter equal to (or greater than, or less than) a specific value?”

We already have the building blocks:

  • Sampling distributions
  • Pivot (fulcral) variables: \(Z\), \(T\), \(Q\) (\(\chi^2\))
  • Critical values from statistical tables

3.1 — Basic Concepts

The Structure of a Hypothesis Test

Every hypothesis test involves the same fundamental elements:

  1. Null hypothesis (\(H_0\)): the claim we assume to be true until evidence says otherwise
  2. Alternative hypothesis (\(H_1\)): the claim we are trying to find evidence for
  3. Test statistic: a function of the sample data (computed under \(H_0\))
  4. Significance level (\(\alpha\)): the probability of rejecting \(H_0\) when it is actually true
  5. Critical (rejection) region: the set of values of the test statistic that lead to rejection of \(H_0\)

The Null Hypothesis (\(H_0\))

Null Hypothesis

The null hypothesis, denoted \(H_0\), represents the status quo — the current belief, the manufacturer’s specification, or the default assumption.

It always contains an equality sign (\(=\), \(\leq\), or \(\geq\)).

Examples:

  • \(H_0\): \(\mu = 100\) (the mean weight is 100 g)
  • \(H_0\): \(\mu \leq 500\) (the mean daily revenue does not exceed 500 €)
  • \(H_0\): \(\sigma^2 \geq 10\) (the variance is at least 10)

Important: We never “accept” \(H_0\). We either reject it or fail to reject it.

The Alternative Hypothesis (\(H_1\))

Alternative Hypothesis

The alternative hypothesis, denoted \(H_1\) (or \(H_a\)), represents what we are trying to find evidence for. It is the “research hypothesis.”

It contains a strict inequality (\(\neq\), \(>\), or \(<\)).

The form of \(H_1\) determines the type of test:

\(H_0\) \(H_1\) Type of test
\(\mu = \mu_0\) \(\mu \neq \mu_0\) Two-tailed (bilateral)
\(\mu \leq \mu_0\) \(\mu > \mu_0\) Right-tailed (unilateral right)
\(\mu \geq \mu_0\) \(\mu < \mu_0\) Left-tailed (unilateral left)

How to Formulate \(H_0\) and \(H_1\)

A common source of confusion: which claim goes in \(H_0\) and which in \(H_1\)?

Rule of thumb:

  • \(H_0\) is the claim that we assume true unless the data convinces us otherwise
  • \(H_1\) is the claim we want to prove (what the researcher suspects)

Think of it as a trial: \(H_0\) is “innocent” (the default), and \(H_1\) is “guilty.” The burden of proof lies on \(H_1\).

Example: Formulating Hypotheses

Scenario: A food company states that each package contains, on average, 500 g of product. A consumer protection agency suspects the company is underfilling.

What are \(H_0\) and \(H_1\)?

  • The status quo (company’s claim): \(\mu = 500\) (or \(\mu \geq 500\))
  • The suspicion (what we want evidence for): \(\mu < 500\)

\[H_0: \mu \geq 500 \quad \text{vs.} \quad H_1: \mu < 500\]

This is a left-tailed test.

Example: Formulating Hypotheses (cont.)

Scenario: An engineer suspects that a machine is overfilling packages beyond the nominal 100 g.

\[H_0: \mu \leq 100 \quad \text{vs.} \quad H_1: \mu > 100\]

This is a right-tailed test.


Scenario: A quality control inspector wants to check whether the mean weight differs from the specification of 100 g (could be above or below).

\[H_0: \mu = 100 \quad \text{vs.} \quad H_1: \mu \neq 100\]

This is a two-tailed (bilateral) test.

The Test Statistic

Test Statistic

The test statistic is a random variable, computed from the sample, whose distribution is known under \(H_0\).

It measures how far the sample result is from what \(H_0\) predicts.

You already know these from interval estimation:

Parameter Conditions Test Statistic Distribution under \(H_0\)
\(\mu\) \(\sigma\) known \(Z_0 = \frac{\bar{X}-\mu_0}{\sigma/\sqrt{n}}\) \(N(0,1)\)
\(\mu\) \(\sigma\) unknown \(T_0 = \frac{\bar{X}-\mu_0}{S'/\sqrt{n}}\) \(t_{(n-1)}\)
\(\sigma^2\) Normal pop. \(Q_0 = \frac{(n-1)S'^2}{\sigma_0^2}\) \(\chi^2_{(n-1)}\)

The Significance Level (\(\alpha\))

Significance Level

The significance level \(\alpha\) is the maximum probability of rejecting \(H_0\) when \(H_0\) is actually true.

\[\alpha = P(\text{reject } H_0 \mid H_0 \text{ is true})\]

Common values: \(\alpha = 0.01\), \(\alpha = 0.05\), \(\alpha = 0.10\).

  • Smaller \(\alpha\) \(\Rightarrow\) harder to reject \(H_0\) \(\Rightarrow\) stronger evidence required
  • Larger \(\alpha\) \(\Rightarrow\) easier to reject \(H_0\) \(\Rightarrow\) weaker evidence suffices

\(\alpha\) is chosen before collecting data and performing the test.

The Critical (Rejection) Region

Critical Region

The critical region (or rejection region) is the set of values of the test statistic for which we reject \(H_0\).

Its boundary is determined by the significance level \(\alpha\) and the type of test.

Critical Regions: Visual Summary

The shaded areas represent the rejection region. If the observed test statistic falls in the shaded area, we reject \(H_0\).

The General Test Procedure

Step-by-step procedure for a hypothesis test

  1. State the hypotheses: write \(H_0\) and \(H_1\)
  2. Choose the significance level \(\alpha\)
  3. Identify the test statistic and its distribution under \(H_0\)
  4. Determine the critical region (using \(\alpha\) and statistical tables)
  5. Compute the observed value of the test statistic from the sample
  6. Make the decision: reject \(H_0\) if the observed value falls in the critical region; otherwise, do not reject \(H_0\)
  7. State the conclusion in the context of the problem

Type I and Type II Errors

Two Types of Mistakes

When we make a decision based on sample data, we can make two kinds of errors:

\(H_0\) is true \(H_0\) is false
Do not reject \(H_0\) Correct decision Type II Error (\(\beta\))
Reject \(H_0\) Type I Error (\(\alpha\)) Correct decision

Type I Error

Type I Error

A Type I error occurs when we reject \(H_0\) even though \(H_0\) is true.

\[\alpha = P(\text{Type I Error}) = P(\text{reject } H_0 \mid H_0 \text{ is true})\]

This is exactly the significance level \(\alpha\).

Analogy: Convicting an innocent person in a trial.

Type II Error

Type II Error

A Type II error occurs when we fail to reject \(H_0\) even though \(H_0\) is false (i.e., \(H_1\) is true).

\[\beta = P(\text{Type II Error}) = P(\text{do not reject } H_0 \mid H_1 \text{ is true})\]

Analogy: Acquitting a guilty person in a trial.

\(\beta\) depends on:

  • The true value of the parameter (how far from \(H_0\))
  • The sample size \(n\)
  • The significance level \(\alpha\)

The Power of a Test

Power

The power of a test is the probability of correctly rejecting \(H_0\) when \(H_1\) is true:

\[\text{Power} = 1 - \beta = P(\text{reject } H_0 \mid H_1 \text{ is true})\]

A good test has high power (close to 1).

In practice:

  • Increasing \(n\) increases power (and reduces \(\beta\))
  • Increasing \(\alpha\) increases power but also increases the risk of Type I error
  • There is always a trade-off between \(\alpha\) and \(\beta\)

The Trade-off Between \(\alpha\) and \(\beta\)

For a fixed sample size \(n\):

  • If we decrease \(\alpha\) (more conservative) \(\Rightarrow\) \(\beta\) increases (harder to detect a real effect)
  • If we increase \(\alpha\) (more liberal) \(\Rightarrow\) \(\beta\) decreases (easier to detect, but more false alarms)

The only way to reduce both \(\alpha\) and \(\beta\) simultaneously is to increase the sample size \(n\).

Analogy: Fire Alarm

Think of a hypothesis test as a fire alarm:

  • Type I Error (\(\alpha\)): The alarm goes off, but there is no fire (false alarm)
  • Type II Error (\(\beta\)): There is a fire, but the alarm does not go off (missed detection)
  • If we make the alarm very sensitive: fewer missed fires (\(\beta\) small), but more false alarms (\(\alpha\) large)
  • If we make the alarm less sensitive: fewer false alarms (\(\alpha\) small), but more missed fires (\(\beta\) large)

Summary: Errors at a Glance

Solved Example — Putting It All Together

Example 1: Test for the Mean (\(\sigma\) known)

A bottling company fills bottles with a nominal volume of \(\mu_0 = 500\) mL. The filling process is known to have a standard deviation of \(\sigma = 8\) mL. A quality inspector takes a random sample of \(n = 36\) bottles and obtains a sample mean of \(\bar{x} = 497\) mL.

At the 5% significance level, is there evidence that the machine is underfilling?

Example 1: Solution — Step 1

Step 1: State the hypotheses.

The inspector suspects underfilling, i.e., the mean is less than 500.

\[H_0: \mu \geq 500 \quad \text{vs.} \quad H_1: \mu < 500\]

This is a left-tailed test.

Example 1: Solution — Step 2

Step 2: Significance level.

\[\alpha = 0.05\]

Example 1: Solution — Step 3

Step 3: Test statistic.

The population standard deviation \(\sigma = 8\) is known, so:

\[Z_0 = \frac{\bar{X} - \mu_0}{\sigma / \sqrt{n}} \underset{\text{under } H_0}{\sim} N(0, 1)\]

Example 1: Solution — Step 4

Step 4: Critical region.

For a left-tailed test at \(\alpha = 0.05\):

\[\text{Rejection region: } \left]-\infty\, ;\, -z_{\alpha}\right] = \left]-\infty\, ;\, -1.645\right]\]

We reject \(H_0\) if \(z_{obs} \leq -1.645\).

Example 1: Solution — Step 5

Step 5: Compute the observed value.

\[z_{obs} = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}} = \frac{497 - 500}{8 / \sqrt{36}} = \frac{-3}{8/6} = \frac{-3}{1.333} \approx -2.25\]

Example 1: Solution — Step 6

Step 6: Decision.

\[z_{obs} = -2.25 \leq -1.645\]

The observed value falls in the rejection region.

\(\Rightarrow\) We reject \(H_0\) at the 5% significance level.

Example 1: Solution — Step 7

Step 7: Conclusion.

At the 5% significance level, there is sufficient statistical evidence to conclude that the machine is underfilling the bottles (i.e., the mean volume is less than 500 mL).

Example 2: Formulating Hypotheses — Practice

For each scenario, write \(H_0\) and \(H_1\), and state the type of test:

  1. A bank claims the average time to process a loan is 5 days. A consultant believes it takes longer.
  2. A manufacturer states the defect rate is at most 3%. An auditor suspects it is higher.
  3. A researcher wants to test if a training program changes employee productivity from the current average of 40 units/day.

Example 2: Solutions

  1. \(H_0: \mu \leq 5 \quad \text{vs.} \quad H_1: \mu > 5\) → Right-tailed
  1. \(H_0: p \leq 0.03 \quad \text{vs.} \quad H_1: p > 0.03\) → Right-tailed
  1. \(H_0: \mu = 40 \quad \text{vs.} \quad H_1: \mu \neq 40\) → Two-tailed

Note: In (c), the researcher does not have a directional suspicion — it could go either way — hence the two-tailed test.

Quick Recap Before the Break

What we covered so far

  • Null hypothesis (\(H_0\)): status quo, contains \(=\), \(\leq\), or \(\geq\)
  • Alternative hypothesis (\(H_1\)): research claim, contains \(\neq\), \(>\), or \(<\)
  • Test statistic: pivot variable computed under \(H_0\)
  • Significance level (\(\alpha\)): \(P(\text{reject } H_0 \mid H_0 \text{ true})\)
  • Critical region: values that lead to rejection of \(H_0\)
  • Type I Error (\(\alpha\)): rejecting a true \(H_0\)
  • Type II Error (\(\beta\)): not rejecting a false \(H_0\)
  • Power = \(1 - \beta\)

3.2 — The \(p\)-value

Motivation

Before the break, we used the critical region approach:

  1. Fix \(\alpha\)
  2. Find the critical value(s)
  3. Compare \(z_{obs}\) (or \(t_{obs}\), \(q_{obs}\)) with the critical value

But this only tells us “reject” or “do not reject” for a specific \(\alpha\).

What if someone asks: “Would you still reject \(H_0\) at \(\alpha = 0.01\)?”

The \(p\)-value answers this question for all possible significance levels at once.

Definition of the \(p\)-value

\(p\)-value

The \(p\)-value is the probability, computed under \(H_0\), of observing a test statistic value as extreme as, or more extreme than, the value actually observed.

It is the smallest significance level at which we would reject \(H_0\).

Informally: the \(p\)-value measures how surprised we should be by the sample data, if \(H_0\) were true.

  • Small \(p\)-value \(\Rightarrow\) data is very unlikely under \(H_0\) \(\Rightarrow\) strong evidence against \(H_0\)
  • Large \(p\)-value \(\Rightarrow\) data is consistent with \(H_0\) \(\Rightarrow\) no evidence against \(H_0\)

\(p\)-value Formulas

The formula depends on the type of test:

Type of test \(p\)-value
Left-tailed (\(H_1: \theta < \theta_0\)) \(p = P(T \leq t_{obs} \mid H_0)\)
Right-tailed (\(H_1: \theta > \theta_0\)) \(p = P(T \geq t_{obs} \mid H_0)\)
Two-tailed (\(H_1: \theta \neq \theta_0\)) \(p = 2 \times P(T \geq |t_{obs}| \mid H_0)\)

Here \(T\) denotes the test statistic (could be \(Z\), \(T\), \(Q\), etc.) and \(t_{obs}\) is its observed value.

Decision Rule Using the \(p\)-value

\(p\)-value decision rule

Given a significance level \(\alpha\):

  • If \(p\text{-value} \leq \alpha\) \(\Rightarrow\) reject \(H_0\)
  • If \(p\text{-value} > \alpha\) \(\Rightarrow\) do not reject \(H_0\)

This is equivalent to the critical region approach, but more informative:

  • The critical region approach gives a binary answer for one specific \(\alpha\)
  • The \(p\)-value tells you exactly how much evidence the data provides against \(H_0\)

\(p\)-value: Graphical Interpretation

The shaded blue areas represent the \(p\)-value: the probability of obtaining a result as extreme as \(t_{obs}\), under \(H_0\).

Example 3: \(p\)-value Calculation (cont. from Example 1)

Recall: \(H_0: \mu \geq 500\) vs. \(H_1: \mu < 500\), \(z_{obs} = -2.25\), left-tailed test.

\[p\text{-value} = P(Z \leq z_{obs} \mid H_0) = P(Z \leq -2.25)\]

\[= \Phi(-2.25) = 1 - \Phi(2.25) = 1 - 0.9878 = 0.0122\]

Interpretation: If \(H_0\) were true (\(\mu = 500\)), the probability of observing a sample mean as low as (or lower than) 497 is only 1.22%.

Decision (at \(\alpha = 0.05\)): Since \(p = 0.0122 < 0.05 = \alpha\), we reject \(H_0\).

Decision (at \(\alpha = 0.01\)): Since \(p = 0.0122 > 0.01 = \alpha\), we do not reject \(H_0\).

Example 4: \(p\)-value with the \(t\)-distribution

A random sample of \(n = 20\) observations is drawn from a Normal population. The sample mean is \(\bar{x} = 101\) and the corrected sample standard deviation is \(s' = 3\).

Test \(H_0: \mu \leq 100\) vs. \(H_1: \mu > 100\).

Example 4: Solution

Test statistic (\(\sigma\) unknown, Normal population):

\[T_0 = \frac{\bar{X} - \mu_0}{S'/\sqrt{n}} \underset{\text{under } H_0}{\sim} t_{(n-1)} = t_{(19)}\]

Observed value:

\[t_{obs} = \frac{101 - 100}{3/\sqrt{20}} = \frac{1}{3/4.472} = \frac{1}{0.6708} \approx 1.49\]

Example 4: Solution (cont.)

\(p\)-value (right-tailed test):

\[p = P(T > 1.49 \mid H_0)\]

Using the \(t\)-Student table (Table 7) with 19 degrees of freedom:

We look for the row \(\nu = 19\). We find that \(t_{0.10,(19)} = 1.328\) and \(t_{0.05,(19)} = 1.729\).

Since \(1.328 < 1.49 < 1.729\), we have:

\[0.05 < p < 0.10\]

Decision (at \(\alpha = 0.05\)): Since \(p > 0.05\), we do not reject \(H_0\).

Conclusion: At the 5% level, there is not sufficient evidence to conclude that \(\mu > 100\).

Reading the \(p\)-value from Tables

When using the \(t\)-Student or \(\chi^2\) tables, we typically cannot compute the exact \(p\)-value. Instead, we bracket it:

Strategy: Find the two table entries that “sandwich” \(t_{obs}\) (or \(q_{obs}\)).

Example: \(t_{obs} = 2.861\), \(\nu = 19\) (right-tailed).

From Table 7: \(t_{0.005,(19)} = 2.861\).

So \(p = 0.005\) (exactly, in this case).

Tip: If \(t_{obs}\) matches a table entry exactly, the \(p\)-value is the corresponding tail probability. Otherwise, state the interval.

Example 5: Two-tailed \(p\)-value

A manufacturer specifies that the mean diameter of a component is \(\mu_0 = 25\) mm. The population variance is known: \(\sigma^2 = 4\). A random sample of \(n = 49\) components has \(\bar{x} = 25.6\) mm.

Test \(H_0: \mu = 25\) vs. \(H_1: \mu \neq 25\). Compute the \(p\)-value.

Example 5: Solution

Test statistic (\(\sigma\) known):

\[Z_0 = \frac{\bar{X} - \mu_0}{\sigma/\sqrt{n}} \sim N(0,1)\]

Observed value:

\[z_{obs} = \frac{25.6 - 25}{2/\sqrt{49}} = \frac{0.6}{2/7} = \frac{0.6}{0.2857} \approx 2.10\]

\(p\)-value (two-tailed):

\[p = 2 \times P(Z \geq |z_{obs}|) = 2 \times P(Z \geq 2.10)\]

\[= 2 \times [1 - \Phi(2.10)] = 2 \times (1 - 0.9821) = 2 \times 0.0179 = 0.0358\]

Decision: \(p = 0.0358 < 0.05\), so we reject \(H_0\) at \(\alpha = 0.05\). There is evidence that \(\mu \neq 25\) mm.

Equivalence: Critical Region vs. \(p\)-value

Both approaches always give the same conclusion for a given \(\alpha\):

Approach Reject \(H_0\) if…
Critical region \(t_{obs}\) falls in the rejection region
\(p\)-value \(p\text{-value} \leq \alpha\)

In practice, many practitioners prefer the \(p\)-value because:

  • It provides more information (the exact strength of evidence)
  • It allows the reader to draw their own conclusion for their preferred \(\alpha\)
  • It is the standard output of statistical software

Common Misinterpretations of the \(p\)-value

The \(p\)-value is NOT:

  • The probability that \(H_0\) is true
  • The probability that \(H_1\) is false
  • The probability of making an error

The \(p\)-value IS:

The probability of observing data as extreme as (or more extreme than) what was observed, assuming \(H_0\) is true.

Practical Guidelines for Interpreting \(p\)-values

While the decision rule (\(p \leq \alpha \Rightarrow\) reject) is clear-cut, some authors provide informal guidelines:

\(p\)-value Evidence against \(H_0\)
\(p > 0.10\) No evidence
\(0.05 < p \leq 0.10\) Weak (marginal) evidence
\(0.01 < p \leq 0.05\) Moderate evidence
\(0.001 < p \leq 0.01\) Strong evidence
\(p \leq 0.001\) Very strong evidence

Source: adapted from Newbold, Carlson & Thorne.

These are informal guidelines, not strict rules. The choice of \(\alpha\) remains a decision of the researcher.

Relationship Between Confidence Intervals and Hypothesis Tests

The Connection

There is a direct relationship between a two-tailed hypothesis test and a confidence interval:

CI–Test Equivalence

At significance level \(\alpha\), we reject \(H_0: \mu = \mu_0\) (two-tailed) if and only if \(\mu_0\) does not belong to the \((1-\alpha)\times 100\%\) confidence interval for \(\mu\).

This makes intuitive sense: if the hypothesized value \(\mu_0\) is outside the plausible range given by the CI, then the data contradicts \(H_0\).

Example 6: CI and Test Equivalence

Recall Example 5: \(\mu_0 = 25\), \(\sigma = 2\), \(n = 49\), \(\bar{x} = 25.6\).

The 95% confidence interval for \(\mu\) is:

\[\bar{x} \pm z_{0.025} \cdot \frac{\sigma}{\sqrt{n}} = 25.6 \pm 1.96 \times \frac{2}{7} = 25.6 \pm 0.56\]

\[\text{CI} = [25.04\, ;\, 26.16]\]

Since \(\mu_0 = 25 \notin [25.04\, ;\, 26.16]\), we reject \(H_0: \mu = 25\) at \(\alpha = 0.05\).

This is consistent with our earlier finding (\(p = 0.0358 < 0.05\)).

Solved Exercises

Exercise 1

A manufacturer claims that light bulbs last, on average, at least 1000 hours. A random sample of 25 bulbs from a Normal population gives \(\bar{x} = 980\) hours and \(s' = 40\) hours.

At \(\alpha = 0.05\), is there evidence against the manufacturer’s claim?

Exercise 1: Solution

Step 1: \(H_0: \mu \geq 1000\) vs. \(H_1: \mu < 1000\) (left-tailed)

Step 2: \(\alpha = 0.05\)

Step 3: \(\sigma\) unknown, Normal population, so:

\[T_0 = \frac{\bar{X} - \mu_0}{S'/\sqrt{n}} \underset{\text{under } H_0}{\sim} t_{(24)}\]

Exercise 1: Solution (cont.)

Step 4: Left-tailed, \(\alpha = 0.05\), \(\nu = 24\)

\[\text{Rejection region: } \left]-\infty\,;\, -t_{0.05,(24)}\right] = \left]-\infty\,;\, -1.711\right]\]

(From Table 7: \(t_{0.05,(24)} = 1.711\))

Step 5:

\[t_{obs} = \frac{980 - 1000}{40/\sqrt{25}} = \frac{-20}{8} = -2.50\]

Step 6: \(t_{obs} = -2.50 < -1.711\) → falls in the rejection region.

Decision: Reject \(H_0\).

Exercise 1: \(p\)-value

\[p = P(T \leq -2.50) = P(T \geq 2.50)\]

From Table 7, \(\nu = 24\): \(t_{0.01,(24)} = 2.492\) and \(t_{0.005,(24)} = 2.797\).

Since \(2.492 < 2.50 < 2.797\), we have \(0.005 < p < 0.01\).

Conclusion: At 5%, there is strong evidence that the mean lifetime is less than 1000 hours (\(p < 0.01\)).

Exercise 2

A food processing company fills cereal boxes labeled as 500 g. The variance of the filling process is known to be \(\sigma^2 = 25 \text{ g}^2\). A random sample of \(n = 64\) boxes yields \(\bar{x} = 498\) g.

  1. Test whether the mean weight differs from 500 g at \(\alpha = 0.05\).

  2. Compute the \(p\)-value.

Exercise 2: Solution

a) \(H_0: \mu = 500\) vs. \(H_1: \mu \neq 500\) (two-tailed)

\(\sigma = 5\) is known, \(n = 64\):

\[Z_0 = \frac{\bar{X} - 500}{\sigma/\sqrt{n}} \sim N(0,1) \qquad z_{obs} = \frac{498 - 500}{5/\sqrt{64}} = \frac{-2}{5/8} = \frac{-2}{0.625} = -3.20\]

Rejection region (two-tailed, \(\alpha = 0.05\)):

\[\left]-\infty;\, -z_{0.025}\right] \cup \left[z_{0.025};\, +\infty\right[ = \left]-\infty;\, -1.96\right] \cup \left[1.96;\, +\infty\right[\]

Since \(z_{obs} = -3.20 < -1.96\), we reject \(H_0\).

Exercise 2: Solution (cont.)

b) \(p\)-value (two-tailed):

\[p = 2 \times P(Z \geq |{-3.20}|) = 2 \times P(Z \geq 3.20)\]

\[= 2 \times [1 - \Phi(3.20)] = 2 \times (1 - 0.9993) = 2 \times 0.0007 = 0.0014\]

Very strong evidence against \(H_0\): the \(p\)-value is much smaller than any conventional \(\alpha\).

Exercise 3

From a random sample of \(n = 20\) observations drawn from a Normal population, one obtains \(\bar{x} = 101\) and \(s' = 3\).

Consider the test: \(H_0: \mu \leq 100\) vs. \(H_1: \mu > 100\).

  1. For \(\alpha = 0.05\), determine the rejection region.

  2. If, for another sample of the same size, \(t_{obs} = 2.861\), what is the \(p\)-value?

Exercise 3: Solution

a) Right-tailed test, \(T_0 \sim t_{(19)}\):

\[\text{Rejection region: } \left[t_{0.05,(19)}\,;\, +\infty\right[ = \left[1.729\,;\, +\infty\right[\]

(Table 7: \(t_{0.05,(19)} = 1.729\))

b) For \(t_{obs} = 2.861\), right-tailed:

\[p = P(T > 2.861)\]

From Table 7, \(\nu = 19\): \(t_{0.005,(19)} = 2.861\).

So \(p = 0.005\).

There is evidence to reject \(H_0\) for all usual significance levels (\(0.005 < \alpha\), for any conventional \(\alpha\)).

Summary of Today’s Lecture

Key takeaways — Lecture 6

  1. A hypothesis test is a formal procedure to evaluate a claim about a population parameter
  2. \(H_0\) (status quo) vs. \(H_1\) (research hypothesis); \(H_0\) always contains equality
  3. The test statistic is a pivot variable computed under \(H_0\)
  4. Type I Error (\(\alpha\)) = rejecting true \(H_0\); Type II Error (\(\beta\)) = not rejecting false \(H_0\)
  5. The \(p\)-value is the smallest \(\alpha\) at which we would reject \(H_0\)
  6. Decision: reject \(H_0\) if \(p \leq \alpha\), or equivalently, if \(t_{obs}\) falls in the rejection region
  7. A two-tailed test at level \(\alpha\) and a \((1-\alpha)\) CI give equivalent conclusions

Next Lecture (April 21)

We will apply the testing framework to specific cases:

  • Section 3.3: Tests for means and variances of Normal populations
  • Section 3.4: Tests for the difference of means of two Normal populations
  • Section 3.5: Large-sample tests (asymptotic normality)

Make sure to review the relevant sections in Newbold (Chapter 9).

Disclaimer

These slides are a free adaptation of the course material for Estatística II by Prof. Teresa Ferreira and Prof. Sandra Custódio from the Lisbon Accounting and Business School — Polytechnic University of Lisbon.


Primary reference: Newbold, P., Carlson, W. & Thorne, B. — Statistics for Business and Economics, Global Edition.