Chapter 5: The Bootstrap

5.1 Introduction to Bootstrap

The bootstrap is a way to estimate the sampling distribution of a statistic using only the observed data, without assuming any specific probability distribution.

Given data \( \{1, 3\} \), we have \( n = 2 \), \( \bar{x} = 2 \). Since we don't know the population distribution, we can't derive the sampling distribution of \( \bar{X} \) analytically.

Theoretical Bootstrap

Consider all permutations of the data with replacement:

The bootstrap distribution \( \bar{X}^{(t)}_{\text{boot}} \) has:

\( \hat{\mu}^{(t)}_{\text{boot}} = 2 \)

\( \hat{SE}^{(t)}_{\text{boot}} = \sqrt{\frac{1}{4}[(1{-}2)^2 + (2{-}2)^2 + (2{-}2)^2 + (3{-}2)^2]} = \frac{1}{2\sqrt{2}} \)


5.2 Bootstrap Distribution

The bootstrap distribution approximates the true sampling distribution by resampling from the data:

For large \( N \), \( \bar{X}^{(t)}_{\text{boot}} \approx \bar{X}_{\text{boot}} \)

Note: The theoretical bootstrap using all \( n^n \) permutations is computationally infeasible for moderate \( n \). For example, if \( n = 30 \), then \( n^n \approx 2 \times 10^{44} \) samples, which would take longer than the age of the universe to compute exhaustively.


5.3 Bootstrap Confidence Interval

Example: Let \( X \sim \mathcal{N}(0, 1) \), and take a sample of size \( n = 50 \).

Compute:

\( \hat{\mu}_{\text{boot}} = \frac{1}{N} \sum_{i=1}^{N} \bar{X}^{(i)} \)

\( \hat{SE}_{\text{boot}} = \sqrt{\frac{1}{N - 1} \sum_{i=1}^{N} (\bar{X}^{(i)} - \hat{\mu}_{\text{boot}})^2} \)

Constructing the 95% Confidence Interval

Use the empirical cumulative distribution function (ECDF) of \( \bar{X}_{\text{boot}} \):

Determine the 2.5th and 97.5th percentiles from the ECDF:

\( \bar{X}_{0.025} < \mu < \bar{X}_{0.975} \)

See also demo: ch5_bootstrap_normal.html