Statistics is driven by real applications:
Example (Sec 1.1: Flight Delays):
Statistical inference: Making conclusions about a population based on a sample.
Notation:
Population is infinite. Random sample means independent and identically distributed (i.i.d) observations.
Example: Let \( X \in \{0, 1\} \) with equal probability → Bernoulli distribution:
\( X \sim \text{Bern}(p) \), where:
\[ X = \begin{cases} 1 & \text{with probability } p \\ 0 & \text{with probability } 1 - p \end{cases} \]
Population is finite. May not come from a known random process.
Example: Population = \{1, 2, 3, 3̅, 7\} (size \( N = 5 \))
If \( N \gg n \), both methods yield similar results.
Examples:
Survey: Ask people what they think or how they live
Sample survey: Use a sample from the population due to practicality
Example: General Social Survey (GSS)
Observational Study: Observe only, no intervention
Example: Beer and hot wings consumption (Sec. 1.9)
Experimental Study: Change conditions or give treatment
Example: Tree seedling growth under different fertilizer/competition (Sec. 1.10)
Caution: Non-random samples → results not generalizable
Example: Tai Chi arthritis study (Sec. 1.11)
The sample space \( S \) is the set of all possible outcomes of a random experiment.
Example: Rolling a 6-sided die → \( S = \{1, 2, 3, 4, 5, 6\} \)
A discrete random variable maps outcomes to a countable set: \( X: S \rightarrow \{x_1, x_2, \ldots\} \)
A continuous random variable takes values over the real numbers: \( S = \mathbb{R} \)
Probability is the relative frequency of an event if the experiment were repeated many times.
Example: For a fair die, \( P(1) = \frac{1}{6} \)
We can write \( P(A) = P(X \in A) \), where \( A \) is a set of outcomes.
\( P(A|B) = \frac{P(A \cap B)}{P(B)} \): the probability of \( A \) given \( B \) has occurred.
If \( \{B_1, B_2, \ldots, B_n\} \) is a partition of \( S \), then:
\( P(A) = \sum_{i=1}^n P(B_i) P(A|B_i) \)
Example (rolling a total of 3 with two dice):
\( P(A) = \frac{1}{6} \cdot \frac{1}{6} + \frac{1}{6} \cdot \frac{1}{6} = \frac{2}{36} \)
A discrete random variable has outcomes in a finite or countably infinite set.
The probability mass function (PMF): \( p(x) = P(X = x) \), and \( \sum p(x) = 1 \)
Example (sum of 2 dice): Distribution peaks at 7 with \( p(7) = \frac{6}{36} \)
\( p(k) = \binom{n}{k} p^k (1 - p)^{n - k}, \quad k = 0, 1, \ldots, n \)
\( \mathbb{E}[g(X)] = \sum_j g(x_j) p(x_j) \)
\( \mu = \mathbb{E}[X] \), \( \sigma^2 = \mathbb{E}[(X - \mu)^2] \)
PDF: \( f(x) \geq 0 \), \( \int_{-\infty}^{\infty} f(x) dx = 1 \)
\( P(a < X < b) = \int_a^b f(x) dx \)
\( F(x) = P(X \leq x) = \int_{-\infty}^{x} f(t) dt \)
\( f(x) = \begin{cases} \lambda e^{-\lambda x}, & x \geq 0 \\ 0, & x < 0 \end{cases} \)
\( F(x) = 1 - e^{-\lambda x} \)
Mean: \( \mu = \frac{1}{\lambda} \), Variance: \( \sigma^2 = \frac{1}{\lambda^2} \)
\( \text{Var}(X) = \mathbb{E}[X^2] - \mu^2 \)
\( \text{Var}(a + bX) = b^2 \cdot \text{Var}(X) \)
Let \( X_1, X_2, \ldots, X_n \) be independent and identically distributed (i.i.d.) random variables with mean \( \mu \) and variance \( \sigma^2 \).
The sample mean is defined as \( \bar{X} = \frac{1}{n} \sum_{j=1}^{n} X_j \).
Example: Bernoulli Random Variable
Normal Distribution: \( X \sim \mathcal{N}(\mu, \sigma^2) \)
PDF: \( f(x) = \frac{1}{\sqrt{2\pi} \sigma} e^{- \frac{(x - \mu)^2}{2\sigma^2}} \)
Standard Normal Distribution: \( Z \sim \mathcal{N}(0, 1) \)
PDF: \( f(z) = \frac{1}{\sqrt{2\pi}} e^{- z^2 / 2} \)
Standardization: \( Z = \frac{X - \mu}{\sigma}, \quad X = \mu + \sigma Z \)
Cumulative Distribution Function (CDF): \( \Phi(z) = \int_{-\infty}^{z} \frac{1}{\sqrt{2\pi}} e^{-t^2 / 2} dt \)
Approximate Probabilities:
Sums of Normal Random Variables: If \( X \sim \mathcal{N}(\mu_1, \sigma_1^2), Y \sim \mathcal{N}(\mu_2, \sigma_2^2) \), and they are independent:
\( X + Y \sim \mathcal{N}(\mu_1 + \mu_2, \sigma_1^2 + \sigma_2^2) \)
Example: Weight of boys \( X \sim \mathcal{N}(100, 5^2) \), weight of girls \( Y \sim \mathcal{N}(90, 6^2) \)
Want: \( P(X - Y > 6) \)
Let \( X_i \sim \mathcal{N}(\mu_i, \sigma_i^2) \), and define \( X = \sum a_i X_i \).
Corollary: If \( X_i \sim \mathcal{N}(\mu_0, \sigma_0^2) \), then:
\( \bar{X} = \frac{1}{n} \sum X_i \sim \mathcal{N}(\mu_0, \frac{\sigma_0^2}{n}) \)
Example: Coffee volume \( \sim \mathcal{N}(8, 0.47) \), sample size \( n = 10 \):
Moment Generating Function: \( M(t) = \mathbb{E}[e^{tX}] \)
\( \frac{d^n}{dt^n} M(t) \bigg|_{t = 0} = \mathbb{E}[X^n] \)