Sponsored Ads
Free Statistics Homework Help

Correlation vs. Causation

Smoking causes lung cancer. That's hardly a controversial statement anymore. But how do you know that people that get addicted to smoking don't have a genetic difference that predisposes them for lung cancer? You'd have to do an experiment to control for that or come up with a clear medical explanation of the carcinogenic effects of smoking. Of course that can be done, and no one is going to take the genetic argument seriously.

But it does raise a serious question about how to use correlation measurements to draw inferences about things effecting each other. In the example of smoking and cancer, you can define a variable $S$ which equals the number of cigarettes an individual smokes a day, and a variable $L$ which is whether they contracted lung cancer or not. If you measure the correlation coefficient defined in the last section, eqn. 1.63 between these two quantities, $R(S,L)$, I'm sure you'll find it's quite positive. But then you've shown that these two quantities are correlated not that one causes the other.

Let's take some other ludicrous examples to explain the problem of correlation vs. causation. Define $T$ as the temperature of a day in Manhattan, and $I$ as the number of ice cream vendors out on that day. The correlation coefficient between these two is almost certainly quite positive. (How many vendors are out there in January?). Does this prove that ice cream vendors cause it to be hot? Obviously causation goes the other way. Common sense tells you that. Unless of course you believe in conspiracy theories.

How about anti-baldness lotion. Define $L$ as the amount of baldness lotion applied to a scalp. And $B$ as the degree of baldness (1 completely bald, 0 full head of hair). You'd expect the correlation coefficient between these two also to be highly positive. But does that imply that this anti-baldness medication causes baldness?

Same with diet food. Diet food being defined here, not as lettuce but those premade meals you find in the frozen section with "diet" or "low fat" written all over them. I bet you'll find that people that eat diet food tend to be fatter than those that don't. Define $F$ as the number of pounds of diet food consumed in a week. Define $W$ as the weight of the person. I'm not sure about this, but it makes sense that these would be positively correlated, that is $R(F,W)$ is quite positive. Most people I know eating diet food (aside from diet cokes) are not skinny.

Now how about ant poison. How many people without ant problems in their house have a lot of ant poison on the floor? How about those with a big ant problem? Does this imply that ant poison is causing the ant problem? Not unless you believe the CEOs of these poison companies are just big ants in disguise, doing their best to keep their relations well fed.

The upshot of all this is that causation and correlation are very different. Diets, ant poisons, anti-baldness balms, all are doing what they're suppose to be, not the opposite. Causation causes correlation, but not the necessarily the converse. There is lot more to proving causation than this simple correlation formula, and that's why you've got to be very careful reading news stories, or even medical journals that purport to show that A causes B.


  • Correlation vs. Causation, examples

josh 2010-10-20