Chapter 8: Classical Inference and Hypothesis Testing

Section 8.1: Hypothesis Tests – 1 Population

Example: SAT scores are normally distributed with unknown mean \( \mu \), known standard deviation \( \sigma \).

Population: \( X_p \sim \mathcal{N}(515, 116^2) \)

From a random sample: \( \bar{X} = 555 \), \( n = 25 \)

Hypotheses:

\( H_0: \mu = 515 \)
\( H_A: \mu > 515 \)

Z-statistic:

\( Z = \frac{\bar{X} - \mu}{\sigma / \sqrt{n}} = \frac{555 - 515}{116 / \sqrt{25}} = 1.724 \)

P-value: \( P(Z \ge 1.724) \approx 0.042 \)

Interpretation: Under \( H_0 \), there is a 4.2% chance to observe such a high sample mean. Hence, likely Sodor students have higher SAT scores.

Summary of 1 Population Results

Sample	Test Statistic	Distribution
\( X_n \sim \mathcal{N}(\mu, \sigma^2) \), \( \sigma \) known	\( Z = \frac{\bar{X} - \mu}{\sigma / \sqrt{n}} \)	\( Z \sim \mathcal{N}(0,1) \)
\( X_n \sim \mathcal{N}(\mu, \sigma^2) \), \( \sigma \) unknown	\( T = \frac{\bar{X} - \mu}{S / \sqrt{n}} \)	\( T \sim t(n - 1) \)
\( Y \sim \text{Binomial}(n, p) \), exact binomial	—	Exact binomial
\( Y \sim \text{Bin}(n, p) \), large \( n \)	\( Z = \frac{Y \pm 0.5 - np}{\sqrt{np(1 - p)}} \)	\( Z \sim \mathcal{N}(0, 1) \) with continuity correction

Section 8.2: Bootstrap t-test for Mean

Given \( X_i \sim F(\mu, \sigma^2) \) (distribution unknown), define: \[ T = \frac{\bar{X} - \mu}{S / \sqrt{n}} \]

Example: \( X \sim \text{Exp}(\lambda) \Rightarrow \mu_p = \frac{1}{\lambda} \), \( \sigma_p^2 = \frac{1}{\lambda^2} \)

Test \( H_0: \mu = \mu_p \) vs \( H_A: \mu \ne \mu_p \)

Observed t-statistic: \[ t = \frac{\bar{X} - \mu_p}{S / \sqrt{n}} \]

Perform bootstrap resampling \( N \) times:

Resample data with replacement → \( \bar{X}_i^*, S_i^* \)
Compute: \[ T_i^* = \frac{\bar{X}_i^* - \mu_p}{S_i^* / \sqrt{n}} \]

Empirical p-value: \[ \text{p-value} = 2 \times \min\left( P(T^* \ge t), P(T^* \le t) \right) \]

R implementations of the bootstrap t-test:

ch8_t-bootstrap_exp.html: Example using exponential distribution
ch8_t-bootstrap_skew.html: Example using skewed distribution

Section 8.3: Hypothesis Tests – 2 Population Means

Example: comparing two means with independent samples:

\( X_1 \sim \mathcal{N}(\mu_1, \sigma_1^2) \)
\( X_2 \sim \mathcal{N}(\mu_2, \sigma_2^2) \)

We want to test if \( \mu_1 > \mu_2 \). Let sample statistics be:

Sample sizes: \( n_1, n_2 \)
Sample means: \( \bar{X}_1, \bar{X}_2 \)
Sample standard deviations: \( S_1, S_2 \)

Hypotheses:

\( H_0: \mu_1 - \mu_2 = 0 \)
\( H_A: \mu_1 > \mu_2 \)

Test statistic:

\[ T = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)}{\sqrt{\frac{S_1^2}{n_1} + \frac{S_2^2}{n_2}}} \sim t_{\nu} \] with degrees of freedom \( \nu \) approximated using Welch's method.

Observed t-value:

\[ t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} \]

P-value:

\[ \text{p-value} = P(T \ge t) = 1 - \text{pt}(t, \text{df} = \nu) \]

Matched Pairs: If samples are paired:

\( X_A \sim \mathcal{N}(\mu_A, \sigma_A^2) \), \( X_B \sim \mathcal{N}(\mu_B, \sigma_B^2) \)
Define \( X = X_A - X_B \), then treat \( X_1, X_2, \dots \) as one sample
\( X \sim \mathcal{N}(\mu_A - \mu_B, \sigma_A^2 + \sigma_B^2) \)
Solve as in Section 8.1 (one-population test)

Section 8.4: Hypothesis Tests – 2 Population Proportions

Example: treatment vs control proportions

Treatment: \( n_1 = 143, \hat{p}_1 = 75.5\% \)
Control: \( n_2 = 119, \hat{p}_2 = 42.9\% \)

Hypotheses:

\( H_0: p_1 = p_2 \)
\( H_A: p_1 > p_2 \)

From binomial model:

\( X_1 \sim \text{Binom}(n_1, p_1) \), \( X_2 \sim \text{Binom}(n_2, p_2) \)
\( E[\hat{p}_1 - \hat{p}_2] = p_1 - p_2 \)
\( \text{Var}[\hat{p}_1 - \hat{p}_2] = \frac{p_1(1 - p_1)}{n_1} + \frac{p_2(1 - p_2)}{n_2} \)

Standardized statistic:

\[ Y = \frac{(\hat{p}_1 - \hat{p}_2) - (p_1 - p_2)}{\sqrt{\frac{p_1(1 - p_1)}{n_1} + \frac{p_2(1 - p_2)}{n_2}}} \quad (\text{discrete}) \]

Under \( H_0 \):

\[ \hat{p} = \frac{\hat{p}_1 n_1 + \hat{p}_2 n_2}{n_1 + n_2} \quad \Rightarrow \quad Y = \frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p}(1 - \hat{p}) \left( \frac{1}{n_1} + \frac{1}{n_2} \right)}} \]

In this example:

\( \hat{p} \approx 0.6068 \)
\( \text{SE} \approx 0.0606 \)
\( z = 5.3795 \)
\( p = P(Z \ge z) \approx 4 \times 10^{-8} \)

This very small p-value gives strong evidence that the treatment improves the proportion.

Section 8.5: Type I and Type II Errors

Type I Error: Rejecting \( H_0 \) when it is true. Also called a "false positive."

Type II Error: Not rejecting \( H_0 \) when \( H_A \) is true. Also called a "false negative."

Truth \ Test Result	Do not reject \( H_0 \)	Reject \( H_0 \)
\( H_0 \) true	✓	Type I Error
\( H_0 \) false	Type II Error	✓

Analogy (Criminal Justice):

\( H_0 \): innocent
\( H_A \): guilty

Truth \ Verdict	Innocent	Guilty
Verdict: Innocent	✓	Type II Error
Verdict: Guilty	Type I Error	✓

Goal: Minimize Type I Error.

Note: Decreasing Type I Error rate tends to increase Type II Error rate.

A 5% significance level means there is a 5% chance of committing a Type I Error.

Section 8.6: Example of Type I Error in Binomial Test

Example: A lotion company claims 3% allergic reaction. You collect a random sample with \( n = 100 \) and plan to file a lawsuit if \( Y \ge 5 \) allergic cases are found.

Hypotheses:

\( H_0: p = 0.03 \)
\( H_A: p > 0.03 \)

Model: \( Y = X_1 + \cdots + X_n \sim \text{Binomial}(n, p) \)

Test decision: Reject \( H_0 \) if \( Y \ge 5 \)

Compute Type I Error:

\[ P(\text{Type I Error}) = P(Y \ge 5 \mid p = 0.03) = 1 - P(Y \le 4) = 1 - \text{pbinom}(4, 100, 0.03) \approx 0.182 \]

Interpretation: 18.2% chance of wrongly rejecting the company's claim if it is true.

The acceptable level of Type I Error depends on the consequences of the error (e.g., human life, cost, time).

Note: While 5% is commonly used as the significance level, there is no mathematical reason for this specific value.

Section 8.7: Type I Error Threshold for Normal Data

Example: SAT scores follow \( X_p \sim \mathcal{N}(515, 116^2) \), sample size \( n = 100 \).

Goal: Determine threshold \( C \) such that \( P(\bar{X} < C \mid \mu = 515) = 0.10 \)

Hypotheses: \( H_0: \mu = 515 \), \( H_A: \mu < 515 \)

Using standardization:

\[ P\left(\bar{X} < C \mid \mu = 515\right) = P\left(Z < \frac{C - 515}{116/\sqrt{100}}\right) = 0.10 \Rightarrow z = -1.282 \]

Solving for \( C \):

\[ C = 515 + (-1.282) \cdot \frac{116}{10} = 500 \]

Decision rule: Reject \( H_0 \) if \( \bar{X} < 500 \)

Critical region: Given \( H_0, H_A \), the significance level \( \alpha \) defines critical region \( R \), where test statistic leads to rejecting \( H_0 \).

For this example: \( R = (-\infty, 500) \)

Section 8.8: Type II Error and Power

Definition:

Type II Error \( \beta = P(\text{do not reject } H_0 \mid H_A \text{ true}) \)
Power \( = 1 - \beta = P(\text{reject } H_0 \mid H_A \text{ true}) \)

Goal: High power (typically above 80%) is desirable.

Example: Baby weight: population \( \sim \mathcal{N}(30, 6^2) \), sample size \( n = 30 \)

Hypotheses:

\( H_0: \mu = 30 \)
\( H_A: \mu < 30 \)

\( \alpha = 0.05 \Rightarrow z = -1.645 \Rightarrow C = 30 + (-1.645) \cdot \frac{6}{\sqrt{30}} = 28.2 \)

Reject \( H_0 \) if \( \bar{X} < 28.2 \)

Suppose true mean \( \mu = 27 \), compute power:

\[ P(\bar{X} < 28.2 \mid \mu = 27) = P\left(Z < \frac{28.2 - 27}{6/\sqrt{30}}\right) = P(Z < 1.0947) \approx 0.863 \]

Conclusion: 86.3% chance of correctly rejecting \( H_0 \) if \( \mu = 27 \)