It is rel­a­tively easy to iden­tify rela­tion­ships between data points. How­ever, the pres­ence of a rela­tion­ship does not imply that the data points or events caused each other.  The idea that cor­re­la­tion implies cau­sa­tion is a com­mon fal­lacy that not only occurs with ana­lyt­ics, but in busi­ness deci­sions, aca­d­e­mic research, and media sto­ries. Think for a moment – Have you ever read an arti­cle or saw a news story where the con­clu­sion did not sit right? There was prob­a­bly an assump­tion about an effect as a result of other facts being related.

Con­sider this anal­ogy. In a par­tic­u­lar city, research showed a strong, pos­i­tive rela­tion­ship between ice cream sales and crime. When sales went up, so did crime and vice versa. Does this mean that buy­ing ice cream causes crime? Or is there some other fac­tor involved, such as weather or time? Peo­ple pur­chase more ice cream in warmer months. Peo­ple are also out­side more in warmer months with their homes unat­tended for longer peri­ods of time. If you incor­rectly assume that ice cream causes crime, would you stop the sale of ice cream in order to com­bat crime? As a result of other vari­ables and incom­plete con­text, it is dif­fi­cult to jus­tify a causal reac­tion between ice cream and crime.

What is Cor­re­la­tion and Causation?

To under­stand the dif­fer­ence between cor­re­la­tion and cau­sa­tion, let’s start with the def­i­n­i­tions. Cor­re­la­tion indi­cates a rela­tion­ship between two vari­ables, usu­ally found in sta­tis­tics. The most com­mon form or cor­re­la­tion is Pearson’s prod­uct moment, which pro­duces a coef­fi­cient “r” that describes the strength and direc­tion of a lin­ear rela­tion­ship between two vari­ables. For exam­ple, there may be a strong, pos­i­tive rela­tion­ship between email sends and site vis­its as a result of r=.93.

A cause is an act that occurs in such a way that some­thing hap­pens as a result. For exam­ple, my act of open­ing and click­ing on an email link caused me to go to a web­site.  An assump­tion may be made that because I received the email, I went to the web­site, but how do you really know that email caused me to go to the site with­out ask­ing me? Is it not pos­si­ble for me to go to the web­site with­out click­ing on an email link? Sure it is.

Act­ing on Correlation

Cor­re­la­tion may not indi­cate cau­sa­tion, but it can lead to action. Con­sider the mar­ket­ing ana­lyst that finds a strong cor­re­la­tion between site searches and rev­enue.  To increase rev­enue, do we drive vis­i­tors to the search page on our site? Maybe, but what if vis­i­tors go to the site search page because they are frus­trated and can’t find the prod­uct they want to pur­chase? Do we want to increase frustration?

So what good is cor­re­la­tion if you are telling me I can’t assume cau­sa­tion? Cor­re­la­tion gives you a strong start­ing point for fur­ther test­ing and opti­miza­tion.  Since we know there is a rela­tion­ship between site search and rev­enue, we can now look at why. Through addi­tional analy­sis, A/B tests, and cus­tomer sur­veys, we can bet­ter under­stand the rela­tion­ship, con­text, and impact that our deci­sions will have on other factors.

So What?

Before div­ing into analy­sis, ask your­self “Why?” What is the busi­ness ques­tion and what value can be derived from my analy­sis? An impor­tant follow-up task is to con­sider the abil­ity to take action and real­ize the value. As we get closer to under­stand­ing our tar­get audi­ence and what causes desired out­comes, remem­ber the work is not fin­ished until we act.