ch7_t-distribution.Rmd

Example showing that sampling distribution for the mean of a sample from a normal distribution with an unknown variance is a t-distribution with n-1 degrees of freedom.

Consider the sample mean \(\bar{X}\) from a sample of size n=15 from \(X\sim N(\mu,\sigma^2)\).

Simulate the sampling distribution with \(N\) samples. For each sample find the estimates for \(\hat{\mu}=\bar{x}\) and \(\hat{\sigma}=s\), then show that the normalized variable for the mean of the sampling distribution, \[W = \frac{\bar{X} - \mu}{s / \sqrt{n}},\] does not have a normal distribution but a t-distribution.

n <- 15
mu <- 25
sigma <- 7
N <- 10**5  # number of samples 
w <- numeric(N) # create a vector to hold normalized mean(x) for each sample
# loop to create sample, calculate normalized mean, store it in w array
for (i in seq(1,N)) {   
  x <- rnorm(n,mu,sigma)         # sample
  xbar <- mean(x)                # mean
  s <- sd(x)                     # std dev
  w[i] <- (xbar-mu)/(s/sqrt(n))  # normalized mean saved in w[i]
} 
hist(w,col="lightblue")

Check if the distribution of \(W\) is normal using a qqplot:

qqnorm(w,col="blue")  # qqplot of w vs std normal
abline(0,1,col="red")    # line with intercept=0 and slope=1

The qqplot shows that \(W\) is not normal; the sample quantiles have more extreme values in the tails of the distribution relative to the theoretical quantiles which means \(W\) has longer tails than a normal distribution.

Claim \(W \sim t(n-1)\), a t-distribution with n-1 degrees of freedom.

t-distribution (also known as “Student’s t-distribution”) = \(t(k)\) where \(k\) is a parameter called the degrees of freedom (also “dof” or “df”).

In R, the t-distribution is just “t” so the pdf is given by “dt(x,df=k)” (see below).

Plot pdf of std normal and t-distribution for different dof

xfine<-seq(-4,4,length=101)       # x array for plotting
plot(xfine,dnorm(xfine),type='l',xlab='x',ylab='pdf')   # std normal
lines(xfine,dt(xfine,df=8),lt='dashed',col='purple')    # t(df=8)
lines(xfine,dt(xfine,df=2),lt='dotted',col='red')       # t(df=2)

Shown above are plots of the pdfs of std normal (solid) and the t-distribution (dashed is df=8, dotted is df=2).
- The t-distribution has similar shape to normal distribution but shorter peak and longer tails.
- As df -> \(\infty\) the t-distribution converges to the standard normal distribution.

Make a qqplot of W vs t(n-1)

pfine<-(seq(1:N)-0.5)/N # array of size N, evenly spaced in (0,1)
q_theory <- qt(pfine,n-1)  # t(n-1) quantiles
qqplot(w,q_theory,col="green") 
abline(0,1,col="red") # line with intercept=0 and slope=1

Here, the agreement in the qqplot is excellent, confirming \(W \sim t(n-1)\).

More details of t-distribution are in Appendix B.11 (also includes the proof of \(W \sim t[n-1]\)).