The bootstrap is a way to estimate the sampling distribution of a statistic using only the observed data, without assuming any specific probability distribution.
Given data \( \{1, 3\} \), we have \( n = 2 \), \( \bar{x} = 2 \). Since we don't know the population distribution, we can't derive the sampling distribution of \( \bar{X} \) analytically.
Consider all permutations of the data with replacement:
The bootstrap distribution \( \bar{X}^{(t)}_{\text{boot}} \) has:
\( \hat{\mu}^{(t)}_{\text{boot}} = 2 \)
\( \hat{SE}^{(t)}_{\text{boot}} = \sqrt{\frac{1}{4}[(1{-}2)^2 + (2{-}2)^2 + (2{-}2)^2 + (3{-}2)^2]} = \frac{1}{2\sqrt{2}} \)
The bootstrap distribution approximates the true sampling distribution by resampling from the data:
For large \( N \), \( \bar{X}^{(t)}_{\text{boot}} \approx \bar{X}_{\text{boot}} \)
Note: The theoretical bootstrap using all \( n^n \) permutations is computationally infeasible for moderate \( n \). For example, if \( n = 30 \), then \( n^n \approx 2 \times 10^{44} \) samples, which would take longer than the age of the universe to compute exhaustively.
Example: Let \( X \sim \mathcal{N}(0, 1) \), and take a sample of size \( n = 50 \).
Compute:
\( \hat{\mu}_{\text{boot}} = \frac{1}{N} \sum_{i=1}^{N} \bar{X}^{(i)} \)
\( \hat{SE}_{\text{boot}} = \sqrt{\frac{1}{N - 1} \sum_{i=1}^{N} (\bar{X}^{(i)} - \hat{\mu}_{\text{boot}})^2} \)
Use the empirical cumulative distribution function (ECDF) of \( \bar{X}_{\text{boot}} \):
Determine the 2.5th and 97.5th percentiles from the ECDF:
\( \bar{X}_{0.025} < \mu < \bar{X}_{0.975} \)
See also demo: ch5_bootstrap_normal.html