Example: SAT scores are normally distributed with unknown mean \( \mu \), known standard deviation \( \sigma \).
Population: \( X_p \sim \mathcal{N}(515, 116^2) \)
From a random sample: \( \bar{X} = 555 \), \( n = 25 \)
Hypotheses:
Z-statistic:
\( Z = \frac{\bar{X} - \mu}{\sigma / \sqrt{n}} = \frac{555 - 515}{116 / \sqrt{25}} = 1.724 \)
P-value: \( P(Z \ge 1.724) \approx 0.042 \)
Interpretation: Under \( H_0 \), there is a 4.2% chance to observe such a high sample mean. Hence, likely Sodor students have higher SAT scores.
Sample | Test Statistic | Distribution |
---|---|---|
\( X_n \sim \mathcal{N}(\mu, \sigma^2) \), \( \sigma \) known | \( Z = \frac{\bar{X} - \mu}{\sigma / \sqrt{n}} \) | \( Z \sim \mathcal{N}(0,1) \) |
\( X_n \sim \mathcal{N}(\mu, \sigma^2) \), \( \sigma \) unknown | \( T = \frac{\bar{X} - \mu}{S / \sqrt{n}} \) | \( T \sim t(n - 1) \) |
\( Y \sim \text{Binomial}(n, p) \), exact binomial | — | Exact binomial |
\( Y \sim \text{Bin}(n, p) \), large \( n \) | \( Z = \frac{Y \pm 0.5 - np}{\sqrt{np(1 - p)}} \) | \( Z \sim \mathcal{N}(0, 1) \) with continuity correction |
Given \( X_i \sim F(\mu, \sigma^2) \) (distribution unknown), define: \[ T = \frac{\bar{X} - \mu}{S / \sqrt{n}} \]
Example: \( X \sim \text{Exp}(\lambda) \Rightarrow \mu_p = \frac{1}{\lambda} \), \( \sigma_p^2 = \frac{1}{\lambda^2} \)
Test \( H_0: \mu = \mu_p \) vs \( H_A: \mu \ne \mu_p \)
Observed t-statistic: \[ t = \frac{\bar{X} - \mu_p}{S / \sqrt{n}} \]
Perform bootstrap resampling \( N \) times:
Empirical p-value: \[ \text{p-value} = 2 \times \min\left( P(T^* \ge t), P(T^* \le t) \right) \]
R implementations of the bootstrap t-test:
Example: comparing two means with independent samples:
We want to test if \( \mu_1 > \mu_2 \). Let sample statistics be:
Hypotheses:
Test statistic:
\[ T = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)}{\sqrt{\frac{S_1^2}{n_1} + \frac{S_2^2}{n_2}}} \sim t_{\nu} \] with degrees of freedom \( \nu \) approximated using Welch's method.
Observed t-value:
\[ t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} \]
P-value:
\[ \text{p-value} = P(T \ge t) = 1 - \text{pt}(t, \text{df} = \nu) \]
Matched Pairs: If samples are paired:
Example: treatment vs control proportions
Hypotheses:
From binomial model:
Standardized statistic:
\[ Y = \frac{(\hat{p}_1 - \hat{p}_2) - (p_1 - p_2)}{\sqrt{\frac{p_1(1 - p_1)}{n_1} + \frac{p_2(1 - p_2)}{n_2}}} \quad (\text{discrete}) \]
Under \( H_0 \):
\[ \hat{p} = \frac{\hat{p}_1 n_1 + \hat{p}_2 n_2}{n_1 + n_2} \quad \Rightarrow \quad Y = \frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p}(1 - \hat{p}) \left( \frac{1}{n_1} + \frac{1}{n_2} \right)}} \]
In this example:
This very small p-value gives strong evidence that the treatment improves the proportion.
Type I Error: Rejecting \( H_0 \) when it is true. Also called a "false positive."
Type II Error: Not rejecting \( H_0 \) when \( H_A \) is true. Also called a "false negative."
Truth \ Test Result | Do not reject \( H_0 \) | Reject \( H_0 \) |
---|---|---|
\( H_0 \) true | ✓ | Type I Error |
\( H_0 \) false | Type II Error | ✓ |
Analogy (Criminal Justice):
Truth \ Verdict | Innocent | Guilty |
---|---|---|
Verdict: Innocent | ✓ | Type II Error |
Verdict: Guilty | Type I Error | ✓ |
Goal: Minimize Type I Error.
Note: Decreasing Type I Error rate tends to increase Type II Error rate.
A 5% significance level means there is a 5% chance of committing a Type I Error.
Example: A lotion company claims 3% allergic reaction. You collect a random sample with \( n = 100 \) and plan to file a lawsuit if \( Y \ge 5 \) allergic cases are found.
Hypotheses:
Model: \( Y = X_1 + \cdots + X_n \sim \text{Binomial}(n, p) \)
Test decision: Reject \( H_0 \) if \( Y \ge 5 \)
Compute Type I Error:
\[ P(\text{Type I Error}) = P(Y \ge 5 \mid p = 0.03) = 1 - P(Y \le 4) = 1 - \text{pbinom}(4, 100, 0.03) \approx 0.182 \]
Interpretation: 18.2% chance of wrongly rejecting the company's claim if it is true.
The acceptable level of Type I Error depends on the consequences of the error (e.g., human life, cost, time).
Note: While 5% is commonly used as the significance level, there is no mathematical reason for this specific value.
Example: SAT scores follow \( X_p \sim \mathcal{N}(515, 116^2) \), sample size \( n = 100 \).
Goal: Determine threshold \( C \) such that \( P(\bar{X} < C \mid \mu = 515) = 0.10 \)
Hypotheses: \( H_0: \mu = 515 \), \( H_A: \mu < 515 \)
Using standardization:
\[ P\left(\bar{X} < C \mid \mu = 515\right) = P\left(Z < \frac{C - 515}{116/\sqrt{100}}\right) = 0.10 \Rightarrow z = -1.282 \]
Solving for \( C \):
\[ C = 515 + (-1.282) \cdot \frac{116}{10} = 500 \]
Decision rule: Reject \( H_0 \) if \( \bar{X} < 500 \)
Critical region: Given \( H_0, H_A \), the significance level \( \alpha \) defines critical region \( R \), where test statistic leads to rejecting \( H_0 \).
For this example: \( R = (-\infty, 500) \)
Definition:
Goal: High power (typically above 80%) is desirable.
Example: Baby weight: population \( \sim \mathcal{N}(30, 6^2) \), sample size \( n = 30 \)
Hypotheses:
\( \alpha = 0.05 \Rightarrow z = -1.645 \Rightarrow C = 30 + (-1.645) \cdot \frac{6}{\sqrt{30}} = 28.2 \)
Reject \( H_0 \) if \( \bar{X} < 28.2 \)
Suppose true mean \( \mu = 27 \), compute power:
\[ P(\bar{X} < 28.2 \mid \mu = 27) = P\left(Z < \frac{28.2 - 27}{6/\sqrt{30}}\right) = P(Z < 1.0947) \approx 0.863 \]
Conclusion: 86.3% chance of correctly rejecting \( H_0 \) if \( \mu = 27 \)