Why Correlation Isn't Causation

The Difference Between Patterns and Proof

November 6, 2021 · 3 min read

I keep running into the same mistake when looking at data: two things happen together, so one must be causing the other. It's a natural assumption, and it's often wrong.

Correlation means two variables move together. When one goes up, the other tends to go up (or down). More people wearing sunglasses, more cold drinks sold. The numbers track each other. But sunglasses don't make people thirsty. Something else is going on.

Causation is different. It means one thing actually produces the other. Raise the price of a product, watch sales drop. That's a cause and effect relationship. One thing leads directly to another.

The tricky part is that our brains are wired to find patterns. When two things seem connected, we instinctively look for a reason. Sometimes we find the right one. Often we don't.

When I see a correlation, I've learned to pause and consider the alternatives. Maybe I have the direction backward and Y is actually causing X. Maybe there's a third factor I haven't noticed that influences both. Maybe the relationship only holds under certain conditions. Or maybe there's a chain of effects and I'm seeing the endpoints without the middle steps.

This matters most when making decisions. I've seen teams get excited about a feature that correlated with higher conversion rates. The obvious conclusion was to promote that feature everywhere. But without testing, there was no way to know if the feature was actually driving conversions or if something else entirely was responsible.

So how do you move from correlation to causation? You test.

The classic approach is hypothesis testing. You start with a null hypothesis that says there's no effect. Then you collect data and see if you can reject that hypothesis. If users who see a new design convert at a statistically significant higher rate than those who don't, you have evidence. Not proof, but evidence.

A/B testing is the practical version of this. Split your traffic between the original version and a variant. Control for outside factors as much as possible. Measure the difference. If the variant wins by a meaningful margin, you've got something you can act on.

Even then, you have to be careful. External factors like promotions, seasonality, or changes in your user base can skew results. Running the test longer or repeating it helps. So does being honest about what the data actually shows versus what you want it to show.

The takeaway for me is simple: correlation is useful for generating hypotheses, but it's a terrible basis for conclusions. When the stakes matter, test before you act.