Data: A set \( \{x_1, x_2, ..., x_n\} \), e.g., \( \{8, 3, 14, 1, 5, 7, 21, 4, 10, 3\} \).
Ordered Data: \( \{1, 3, 3, 4, 5, 7, 8, 10, 14, 21\} \)
See supporting material: ch02_basicstats.html
Sample Standard Deviation:
\( s = \sqrt{\frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{x})^2} = \sqrt{\frac{1}{9}(332.4)} = 6.077 \)
Population Standard Deviation:
\( s_{\text{pop}} = \sqrt{\frac{1}{n} \sum_{i=1}^n (x_i - \bar{x})^2} \)
\( s_{\text{pop}} < s \), and \( \lim_{n \to \infty} \frac{s_{\text{pop}}}{s} = 1 \)
Interquartile Range (IQR): \( Q_3 - Q_1 = 6.25 \)
Upper Fence: \( Q_3 + 1.5 \cdot \text{IQR} = 18.875 \)
Lower Fence: \( Q_1 - 1.5 \cdot \text{IQR} = -6.125 \)
Outliers are data points outside the fences.
The box shows \( Q_1 \), median, and \( Q_3 \); whiskers extend to the most extreme data points within the fences.
Data are grouped into bins:
Definition: \( \hat{F}(x) = \frac{1}{n} \cdot \text{# of observations } \leq x \)
For sorted data \( \{0, 3, 3, 5, 7\} \):
See exploration demo: ch02_explore_data.html
Quantile Definition:
Let \( X \) be a random variable. The \( p \)-th quantile \( q_p \) satisfies:
\( P(X \leq q_p) = p \), i.e., \( F(q_p) = p \)
Example (Standard Normal):
\( X \sim N(0, 1) \), then \( F(x) = \int_{-\infty}^{x} \frac{1}{\sqrt{2\pi}} e^{-z^2/2} dz \)
Given \( f(x) = e^{-x} \) for \( x \ge 0 \), find \( q_{0.75} \):
\( \int_0^{q_p} e^{-x} dx = 0.75 \Rightarrow -e^{-q_p} + 1 = 0.75 \Rightarrow q_p = \ln(4) \approx 1.386 \)
Let \( Z \sim N(0,1) \), \( X \sim N(\mu, \sigma^2) \)
\( X = \mu + \sigma Z \Rightarrow q_p^{(X)} = \mu + \sigma q_p^{(Z)} \)
Thus, plotting \( q_p^{(X)} \) vs. \( q_p^{(Z)} \) gives a line with slope \( \sigma \) and intercept \( \mu \).
Purpose: Compare sample data to a theoretical distribution
Steps:
If points lie along the line \( y = x \), the data follows the reference distribution.
For normal data \( N(\mu, \sigma^2) \), quantiles relate linearly to \( N(0,1) \):
\( q^{(d)} = \mu + \sigma q^{(t)} \)
Conclusion: You do not need to know \( \mu \) and \( \sigma \) to test normality.
See derivations and formulae: ch02_prob_dist.html