Example showing that sampling distribution for the mean of a sample from a normal distribution with an unknown variance is a t-distribution with n-1 degrees of freedom.
Consider the sample mean \(\bar{X}\) from a sample of size n=15 from \(X\sim N(\mu,\sigma^2)\).
Simulate the sampling distribution with \(N\) samples. For each sample find the estimates for \(\hat{\mu}=\bar{x}\) and \(\hat{\sigma}=s\), then show that the normalized variable for the mean of the sampling distribution, \[W = \frac{\bar{X} - \mu}{s / \sqrt{n}},\] does not have a normal distribution but a t-distribution.
n <- 15
mu <- 25
sigma <- 7
N <- 10**5 # number of samples
w <- numeric(N) # create a vector to hold normalized mean(x) for each sample
# loop to create sample, calculate normalized mean, store it in w array
for (i in seq(1,N)) {
x <- rnorm(n,mu,sigma) # sample
xbar <- mean(x) # mean
s <- sd(x) # std dev
w[i] <- (xbar-mu)/(s/sqrt(n)) # normalized mean saved in w[i]
}
hist(w,col="lightblue")
Check if the distribution of \(W\) is normal using a qqplot:
qqnorm(w,col="blue") # qqplot of w vs std normal
abline(0,1,col="red") # line with intercept=0 and slope=1
The qqplot shows that \(W\) is not normal; the sample quantiles have more extreme values in the tails of the distribution relative to the theoretical quantiles which means \(W\) has longer tails than a normal distribution.
Claim \(W \sim t(n-1)\), a t-distribution with n-1 degrees of freedom.
t-distribution (also known as “Student’s t-distribution”) = \(t(k)\) where \(k\) is a parameter called the degrees of freedom (also “dof” or “df”).
In R, the t-distribution is just “t” so the pdf is given by “dt(x,df=k)” (see below).
Plot pdf of std normal and t-distribution for different dof
xfine<-seq(-4,4,length=101) # x array for plotting
plot(xfine,dnorm(xfine),type='l',xlab='x',ylab='pdf') # std normal
lines(xfine,dt(xfine,df=8),lt='dashed',col='purple') # t(df=8)
lines(xfine,dt(xfine,df=2),lt='dotted',col='red') # t(df=2)
Shown above are plots of the pdfs of std normal (solid) and the
t-distribution (dashed is df=8, dotted is df=2).
- The t-distribution has similar shape to normal distribution but
shorter peak and longer tails.
- As df -> \(\infty\) the
t-distribution converges to the standard normal distribution.
Make a qqplot of W vs t(n-1)
pfine<-(seq(1:N)-0.5)/N # array of size N, evenly spaced in (0,1)
q_theory <- qt(pfine,n-1) # t(n-1) quantiles
qqplot(w,q_theory,col="green")
abline(0,1,col="red") # line with intercept=0 and slope=1
Here, the agreement in the qqplot is excellent, confirming \(W \sim t(n-1)\).
More details of t-distribution are in Appendix B.11 (also includes the proof of \(W \sim t[n-1]\)).