Hey, Did I Do That? The Difference between Correlation and Causation
It is relatively easy to identify relationships between data points. However, the presence of a relationship does not imply that the data points or events caused each other. The idea that correlation implies causation is a common fallacy that not only occurs with analytics, but in business decisions, academic research, and media stories. Think for a moment – Have you ever read an article or saw a news story where the conclusion did not sit right? There was probably an assumption about an effect as a result of other facts being related.
Consider this analogy. In a particular city, research showed a strong, positive relationship between ice cream sales and crime. When sales went up, so did crime and vice versa. Does this mean that buying ice cream causes crime? Or is there some other factor involved, such as weather or time? People purchase more ice cream in warmer months. People are also outside more in warmer months with their homes unattended for longer periods of time. If you incorrectly assume that ice cream causes crime, would you stop the sale of ice cream in order to combat crime? As a result of other variables and incomplete context, it is difficult to justify a causal reaction between ice cream and crime.
What is Correlation and Causation?
To understand the difference between correlation and causation, let’s start with the definitions. Correlation indicates a relationship between two variables, usually found in statistics. The most common form or correlation is Pearson’s product moment, which produces a coefficient “r” that describes the strength and direction of a linear relationship between two variables. For example, there may be a strong, positive relationship between email sends and site visits as a result of r=.93.
A cause is an act that occurs in such a way that something happens as a result. For example, my act of opening and clicking on an email link caused me to go to a website. An assumption may be made that because I received the email, I went to the website, but how do you really know that email caused me to go to the site without asking me? Is it not possible for me to go to the website without clicking on an email link? Sure it is.
Acting on Correlation
Correlation may not indicate causation, but it can lead to action. Consider the marketing analyst that finds a strong correlation between site searches and revenue. To increase revenue, do we drive visitors to the search page on our site? Maybe, but what if visitors go to the site search page because they are frustrated and can’t find the product they want to purchase? Do we want to increase frustration?
So what good is correlation if you are telling me I can’t assume causation? Correlation gives you a strong starting point for further testing and optimization. Since we know there is a relationship between site search and revenue, we can now look at why. Through additional analysis, A/B tests, and customer surveys, we can better understand the relationship, context, and impact that our decisions will have on other factors.
Before diving into analysis, ask yourself “Why?” What is the business question and what value can be derived from my analysis? An important follow-up task is to consider the ability to take action and realize the value. As we get closer to understanding our target audience and what causes desired outcomes, remember the work is not finished until we act.