Module VII·Article III·~2 min read
Statistical Traps: What Data Conceal
Statistics, Probability, and Bayesian Thinking
Turn this article into a podcast
Pick voices, format, length — AI generates the audio
Liars, Great Liars, and Statistics
Benjamin Disraeli (or Twain): "There are three kinds of lies: lies, damned lies, and statistics." This does not mean that statistics always lie—it means they can mislead in the absence of critical thinking.
Simpson's Paradox: a trend present in several groups of data disappears or reverses when the groups are combined. Example: Treatment A has better results for women AND better results for men—but in the combined population, it is worse than Treatment B. How? If Treatment B has a disproportionately large number of men (who recover less well regardless of treatment)—they "drag down" the overall statistics.
Survivorship bias: we analyze only "survivors"—successful companies, returned aircraft, completed projects—and draw conclusions, ignoring those who did not survive. Walter Schwartz during WWII: do not reinforce the areas of hits in returned aircraft—reinforce the areas where there are no hits because aircraft with hits in those areas did not return.
Correlation ≠ Causation and Other Traps
"After, therefore because" (post hoc ergo propter hoc): after X, Y happened, so X caused Y. The rooster crows before sunrise—does the rooster cause the sunrise? Correlation between deaths from drowning and ice cream sales (seasonality—a hidden variable).
"Regression to the mean": after an extreme value, the next value is, as a rule, closer to the mean—regardless of what happened between them. A student makes a mistake, receives a reprimand, makes fewer mistakes next time—not because the reprimand helped, but due to regression to the mean.
P-hacking: researchers test many hypotheses and publish only the "significant" ones (p < 0.05). With 20 tests, one will randomly be "significant" with p < 0.05 even with zero effect. This is the "reproducibility crisis" in science.
Question for reflection: Find an example in your work where you or your organization drew causal conclusions from correlation. How did this affect the decisions?
§ Act · what next