Chapter 8: Classical Inference and Hypothesis Testing

Section 8.1: Hypothesis Tests – 1 Population

Example: SAT scores are normally distributed with unknown mean \( \mu \), known standard deviation \( \sigma \).

Population: \( X_p \sim \mathcal{N}(515, 116^2) \)

From a random sample: \( \bar{X} = 555 \), \( n = 25 \)

Hypotheses:

Z-statistic:

\( Z = \frac{\bar{X} - \mu}{\sigma / \sqrt{n}} = \frac{555 - 515}{116 / \sqrt{25}} = 1.724 \)

P-value: \( P(Z \ge 1.724) \approx 0.042 \)

Interpretation: Under \( H_0 \), there is a 4.2% chance to observe such a high sample mean. Hence, likely Sodor students have higher SAT scores.


Summary of 1 Population Results

Sample Test Statistic Distribution
\( X_n \sim \mathcal{N}(\mu, \sigma^2) \), \( \sigma \) known \( Z = \frac{\bar{X} - \mu}{\sigma / \sqrt{n}} \) \( Z \sim \mathcal{N}(0,1) \)
\( X_n \sim \mathcal{N}(\mu, \sigma^2) \), \( \sigma \) unknown \( T = \frac{\bar{X} - \mu}{S / \sqrt{n}} \) \( T \sim t(n - 1) \)
\( Y \sim \text{Binomial}(n, p) \), exact binomial Exact binomial
\( Y \sim \text{Bin}(n, p) \), large \( n \) \( Z = \frac{Y \pm 0.5 - np}{\sqrt{np(1 - p)}} \) \( Z \sim \mathcal{N}(0, 1) \) with continuity correction

Section 8.2: Bootstrap t-test for Mean

Given \( X_i \sim F(\mu, \sigma^2) \) (distribution unknown), define: \[ T = \frac{\bar{X} - \mu}{S / \sqrt{n}} \]

Example: \( X \sim \text{Exp}(\lambda) \Rightarrow \mu_p = \frac{1}{\lambda} \), \( \sigma_p^2 = \frac{1}{\lambda^2} \)

Test \( H_0: \mu = \mu_p \) vs \( H_A: \mu \ne \mu_p \)

Observed t-statistic: \[ t = \frac{\bar{X} - \mu_p}{S / \sqrt{n}} \]

Perform bootstrap resampling \( N \) times:

Empirical p-value: \[ \text{p-value} = 2 \times \min\left( P(T^* \ge t), P(T^* \le t) \right) \]

R implementations of the bootstrap t-test:


Section 8.3: Hypothesis Tests – 2 Population Means

Example: comparing two means with independent samples:

We want to test if \( \mu_1 > \mu_2 \). Let sample statistics be:

Hypotheses:

Test statistic:

\[ T = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)}{\sqrt{\frac{S_1^2}{n_1} + \frac{S_2^2}{n_2}}} \sim t_{\nu} \] with degrees of freedom \( \nu \) approximated using Welch's method.

Observed t-value:

\[ t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} \]

P-value:

\[ \text{p-value} = P(T \ge t) = 1 - \text{pt}(t, \text{df} = \nu) \]

Matched Pairs: If samples are paired:


Section 8.4: Hypothesis Tests – 2 Population Proportions

Example: treatment vs control proportions

Hypotheses:

From binomial model:

Standardized statistic:

\[ Y = \frac{(\hat{p}_1 - \hat{p}_2) - (p_1 - p_2)}{\sqrt{\frac{p_1(1 - p_1)}{n_1} + \frac{p_2(1 - p_2)}{n_2}}} \quad (\text{discrete}) \]

Under \( H_0 \):

\[ \hat{p} = \frac{\hat{p}_1 n_1 + \hat{p}_2 n_2}{n_1 + n_2} \quad \Rightarrow \quad Y = \frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p}(1 - \hat{p}) \left( \frac{1}{n_1} + \frac{1}{n_2} \right)}} \]

In this example:

This very small p-value gives strong evidence that the treatment improves the proportion.

Section 8.5: Type I and Type II Errors

Type I Error: Rejecting \( H_0 \) when it is true. Also called a "false positive."

Type II Error: Not rejecting \( H_0 \) when \( H_A \) is true. Also called a "false negative."

Truth \ Test Result Do not reject \( H_0 \) Reject \( H_0 \)
\( H_0 \) true Type I Error
\( H_0 \) false Type II Error

Analogy (Criminal Justice):

Truth \ Verdict Innocent Guilty
Verdict: Innocent Type II Error
Verdict: Guilty Type I Error

Goal: Minimize Type I Error.

Note: Decreasing Type I Error rate tends to increase Type II Error rate.

A 5% significance level means there is a 5% chance of committing a Type I Error.


Section 8.6: Example of Type I Error in Binomial Test

Example: A lotion company claims 3% allergic reaction. You collect a random sample with \( n = 100 \) and plan to file a lawsuit if \( Y \ge 5 \) allergic cases are found.

Hypotheses:

Model: \( Y = X_1 + \cdots + X_n \sim \text{Binomial}(n, p) \)

Test decision: Reject \( H_0 \) if \( Y \ge 5 \)

Compute Type I Error:

\[ P(\text{Type I Error}) = P(Y \ge 5 \mid p = 0.03) = 1 - P(Y \le 4) = 1 - \text{pbinom}(4, 100, 0.03) \approx 0.182 \]

Interpretation: 18.2% chance of wrongly rejecting the company's claim if it is true.

The acceptable level of Type I Error depends on the consequences of the error (e.g., human life, cost, time).

Note: While 5% is commonly used as the significance level, there is no mathematical reason for this specific value.


Section 8.7: Type I Error Threshold for Normal Data

Example: SAT scores follow \( X_p \sim \mathcal{N}(515, 116^2) \), sample size \( n = 100 \).

Goal: Determine threshold \( C \) such that \( P(\bar{X} < C \mid \mu = 515) = 0.10 \)

Hypotheses: \( H_0: \mu = 515 \), \( H_A: \mu < 515 \)

Using standardization:

\[ P\left(\bar{X} < C \mid \mu = 515\right) = P\left(Z < \frac{C - 515}{116/\sqrt{100}}\right) = 0.10 \Rightarrow z = -1.282 \]

Solving for \( C \):

\[ C = 515 + (-1.282) \cdot \frac{116}{10} = 500 \]

Decision rule: Reject \( H_0 \) if \( \bar{X} < 500 \)

Critical region: Given \( H_0, H_A \), the significance level \( \alpha \) defines critical region \( R \), where test statistic leads to rejecting \( H_0 \).

For this example: \( R = (-\infty, 500) \)


Section 8.8: Type II Error and Power

Definition:

Goal: High power (typically above 80%) is desirable.

Example: Baby weight: population \( \sim \mathcal{N}(30, 6^2) \), sample size \( n = 30 \)

Hypotheses:

\( \alpha = 0.05 \Rightarrow z = -1.645 \Rightarrow C = 30 + (-1.645) \cdot \frac{6}{\sqrt{30}} = 28.2 \)

Reject \( H_0 \) if \( \bar{X} < 28.2 \)

Suppose true mean \( \mu = 27 \), compute power:

\[ P(\bar{X} < 28.2 \mid \mu = 27) = P\left(Z < \frac{28.2 - 27}{6/\sqrt{30}}\right) = P(Z < 1.0947) \approx 0.863 \]

Conclusion: 86.3% chance of correctly rejecting \( H_0 \) if \( \mu = 27 \)