Lesson 11 — Statistics in the Real World
With great power
comes… messy data.
The real-world considerations that make or break a statistical analysis.
Data takes work
The statistical test is often the easy part. Getting clean, representative data is where most time goes.
📋DesignPlan the experiment carefully
📊CollectGather data systematically
✏CleanRemove outliers, fix errors
🔎AnalyzeFinally: run the test!
Bending the rules
Real data rarely meets all assumptions perfectly. That’s okay — if you’re transparent about it.
AcceptableDistribution is slightly non-normal but close enough — use Pearson’s and note the limitation.
Needs attentionVery non-normal or bimodal data — use a non-parametric test instead.
Sample size & power
📈Small sampleMay miss real effects
(low power)
📉Large sampleMore reliable results
(high power)
💬Power calculationPlan before collecting data to know how many observations you need
A low-powered study might find no effect even when a real one exists!
Non-parametric alternatives
When your data violates normality, use these distribution-free alternatives:
- vs t-testMann-Whitney U — compare two groups without assuming normality
- vs ANOVAKruskal-Wallis — compare 3+ groups without assuming normality
- vs PearsonSpearman — correlation for non-normally distributed continuous data
P-value vs effect size
A small p-value means a result is unlikely by chance — not that it’s meaningful.
p < 0.001Statistically significant — unlikely to be chance
d = 0.02Effect size — tiny practical impact (e.g. drug reduces blood sugar by 1 mg/dl)
Always report effect size alongside p-values for real-world decisions.
The final word
Statistics is a tool, not a truth machine
“All models are wrong, some are useful.”
— George Box, British statistician
One result doesn’t establish a pattern. Consider context, replication, and limitations — then unlock the hidden patterns behind everything!
1 / 7