Chapter 4: Sampling Distributions

Motivating Example: Coin Flipping

Flip a coin \(n = 10\) times. Each flip is a Bernoulli trial:

\[ X \sim \text{Bern}(p), \quad X = \begin{cases} 1 & \text{(head with prob } p) \\ 0 & \text{(tail with prob } 1 - p) \end{cases} \]

\( \mathbb{E}[X] = p \)
\( \text{Var}[X] = p(1 - p) \)

Let the sample be \(X_1, X_2, \ldots, X_{10}\), then:

\[ T = X_1 + X_2 + \cdots + X_{10} \sim \text{Binom}(10, p) \]

Standard error: \( SE[T] = \sqrt{\text{Var}[T]} = \sqrt{np(1 - p)} = 2.5 \)

Simulation for Approximate Sampling Distribution

Randomly draw 1 sample of size \(n\)
Compute number of heads \(T^{(1)}\)
Repeat \(N\) times to get \(\hat{T} = \{T^{(1)}, \ldots, T^{(N)}\}\)
Approximate PMF: \( \hat{p}(x) = \frac{\text{count}(\hat{T} = x)}{N} \)

See R code ch4_sampling-dist-coin-flip.html

Example: Exponential Distribution

Let \( X \sim \text{Exp}(\lambda) \) with \( \lambda = 1/15 \). We know:

\[ f(x) = \begin{cases} \lambda e^{-\lambda x} & x \geq 0 \\ 0 & x < 0 \end{cases}, \quad \mu = \mathbb{E}[X] = 1/\lambda, \quad \sigma^2 = \text{Var}[X] = 1/\lambda^2 \]

Let \( X = (X_1, X_2, \ldots, X_n) \) be a sample of size \(n = 100\). Define the test statistic:

\[ T(X) = \frac{1}{n} \sum_{i=1}^n X_i = \bar{X} \]

Assume the distribution of \(\bar{X}\) is unknown (actually it's Gamma). We can still compute:

\( \mathbb{E}[\bar{X}] = \mu = 1/\lambda = 15 \)
\( \text{Var}[\bar{X}] = \sigma^2 / n = 1 / (n \lambda^2) = 2.25 \)
\( \text{SE}[T] = \sqrt{\text{Var}[T]} = 1.5 \)

→ For mean of a sample, the mean and variance of sampling distribution is known.

What about the pdf of \(\bar{X}\)? Find the approximate sampling distribution \(\hat{T}\) via simulation and show that:

\( \mathbb{E}[\hat{T}] \approx \mathbb{E}[T] \)
\( \text{Var}[\hat{T}] \approx \text{Var}[T] \)

See R code ch4_sampling-dist-exponential.Rmd

Central Limit Theorem (CLT)

For large \(n\), \(\hat{p}(x)\) is approximated by a normal distribution:

Let \( X_1, \dots, X_n \) be i.i.d. random variables with \( \mathbb{E}[X] = \mu \), \( \text{Var}[X] = \sigma^2 \). Let \( \bar{X}_n \) be the sample mean. Then:

\[ \lim_{n \to \infty} P\left( \frac{\bar{X}_n - \mu}{\sigma / \sqrt{n}} \leq z \right) = \Phi(z), \quad P(Z \leq z) \approx \Phi(z) \]

So: \[ \bar{X}_n \approx \mathcal{N}(\mu, \sigma^2 / n) \quad \text{for large } n \]

→ The distribution of the mean of a large sample can be approximated by a normal distribution.