A comprehensive guide to statistical tests and sample size calculations
Statistical hypothesis testing is fundamental to data analysis and research. Understanding which test to use and how to calculate the required sample size ensures your analysis has sufficient statistical power to detect meaningful effects.
💡 Key Concept: Sample size directly impacts your ability to detect true effects (statistical power) while controlling for false positives (Type I error).
Comparing means when population variance is unknown
\[ t = \frac{\bar{x} - \mu}{s / \sqrt{n}} \]
where \(\bar{x}\) = sample mean, \(\mu\) = population mean, \(s\) = sample standard deviation, \(n\) = sample size
\[ n = \frac{2(Z_{\alpha/2} + Z_{\beta})^2 \sigma^2}{\delta^2} \]
• \(Z_{\alpha/2}\) = critical value for significance level (e.g., 1.96 for α=0.05)
• \(Z_{\beta}\) = critical value for power (e.g., 0.84 for 80% power)
• \(\sigma\) = pooled standard deviation
• \(\delta\) = minimum detectable difference (effect size)
Comparing means of matched or repeated measurements
\[ t = \frac{\bar{d}}{s_d / \sqrt{n}} \]
where \(\bar{d}\) = mean of differences, \(s_d\) = standard deviation of differences
\[ n = \frac{(Z_{\alpha/2} + Z_{\beta})^2 \sigma_d^2}{\delta^2} \]
• \(\sigma_d\) = standard deviation of paired differences
• \(\delta\) = minimum detectable mean difference
• Typically requires fewer subjects than independent samples due to reduced variability
Comparing means when population variance is known
\[ Z = \frac{\bar{x} - \mu}{\sigma / \sqrt{n}} \]
where \(\sigma\) = known population standard deviation
\[ n = \frac{(Z_{\alpha/2} + Z_{\beta})^2 \sigma^2}{\delta^2} \]
• Uses known population variance \(\sigma^2\)
• Generally requires smaller sample than t-test due to known variance
Probability of detecting a true effect
Statistical power is the probability of correctly rejecting the null hypothesis when it is false (i.e., detecting a true effect). Power = 1 - β, where β is the Type II error rate.
False positive - rejecting true null hypothesis. Typically set at 0.05 (5%).
False negative - failing to reject false null hypothesis. Power = 1 - β.
⚡ Conventional Power: Studies typically aim for 80% power (β = 0.20)
Range of plausible values for population parameter
A 95% confidence interval means that if we repeated the study many times, 95% of calculated intervals would contain the true population parameter.
\[ CI = \bar{x} \pm t_{\alpha/2} \cdot \frac{s}{\sqrt{n}} \]
95% CI: \(\bar{x} \pm 1.96 \cdot SE\) (for large samples)
\[ n = \left(\frac{Z_{\alpha/2} \cdot \sigma}{E}\right)^2 \]
• \(E\) = desired margin of error (half-width of CI)
• Example: For 95% CI with margin of error ±5 and σ=20: n = (1.96×20/5)² ≈ 62
📊 Statistical analysis is a tool for discovery, not a substitute for thinking.
Always consider the context and practical significance of your findings.