You can’t go five min­utes in the cur­rent busi­ness world with­out the terms big data, pre­dic­tive or sta­tis­ti­cal tool being thrown about. If one was to believe all of the hype you would have no prob­lems mak­ing per­fect deci­sions, act­ing quickly, and all every­one would be improv­ing their per­for­mance by mil­lions of dol­lars every hour. Of course every­one in the field also acknowl­edges just how far every­one else is from that real­ity, but they fail to men­tion the same errors in logic from their own promises and their own analy­sis. All data is lever­aged using math­e­mat­i­cal tools many of which do not have the level of under­stand that are nec­es­sary to max­i­mize their value. Data can both be a pow­er­ful and impor­tant aid to improv­ing busi­ness and a real decid­ing fac­tor between suc­cess and fail­ure. It can also be a crutch used to make poor deci­sions or to val­i­date one opin­ion ver­sus another. The fun­da­men­tal truth is that noth­ing with “big data” is really all that new, and that in almost all cases, the promises that you peo­ple are mak­ing have no basis in real­ity. It is vital that peo­ple under­stand core prin­ci­ples of sta­tis­tics that will enable them to dif­fer­en­ti­ate when data is being used in either of those two roles and to help max­i­mize the value that data can bring to your organization.

So how then do you arm your­self to max­i­mize out­comes and to com­bat poor data dis­ci­pline? The key is in under­stand­ing key con­cepts of sta­tis­tics, so that you can find when and how promises are made that can­not pos­si­bly be true. You do not need to under­stand the equa­tions, or even have mas­terly level depth on most of these top­ics, but it is vital that you under­stand the truth behind cer­tain types of sta­tis­ti­cal claims. I want to break down the top few that you will hear, and how they are mis­used to make promises, and how do you really achieve that level of success.

Cor­re­la­tion does not Equal Causation –

Prob­lem- I don’t think any­one can get through col­lege with­out hav­ing heard this phrase, and most can quote it imme­di­ately, but very few really focus on what it means. The key thing to take from this is that no mat­ter how great your cor­rel­a­tive analy­sis is it can not tell you cause of the out­come nor the value of items with­out direct active inter­ac­tion with the data. No mat­ter how much you can prove a lin­ear cor­re­la­tion or even find a micro-conversion that you believe is suc­cess, by itself it can never answer even the most basic of real world busi­ness ques­tions. They can be guid­ing lights towards a new rev­e­la­tion, but they can also just be empty noise lead­ing you away from vital infor­ma­tion. It is impos­si­ble to tell if you leave the analy­sis at just basic cor­re­la­tion, yet in almost all cases this is where peo­ple are more then happy to leave their analy­sis. The key is to make sure that you do not jump to con­clu­sions and that you incor­po­rate other pieces of infor­ma­tion instead of blindly fol­low­ing the data.

Just because I can prove a per­fect cor­re­la­tion between email sign-ups and con­ver­sion rate, that they both go up, I can never know from cor­re­la­tion alone if get­ting more peo­ple to sign-up for emails CAUSED more con­ver­sions, or if the peo­ple we got to con­vert more are also more inter­ested in sign­ing up for email. In a test this is vital because not only is it easy see those two points, but you are also lim­ited with only a sin­gle data point mak­ing even cor­re­la­tion impos­si­ble to diag­nose. It is incred­i­bly com­mon for peo­ple to claim they know the direc­tion and that they need to gen­er­ate more email signups in order to pro­duce more rev­enue, but it is impos­si­ble to make that con­clu­sion based on purely cor­rel­a­tive infor­ma­tion alone and it can be mas­sively dam­ag­ing to a busi­ness to point resources in a direc­tion that can equally pro­duce neg­a­tive and not pos­i­tive results.

The fun­da­men­tal key is to make sure that you are incor­po­rat­ing con­sis­tent ACTIVE inter­ac­tion with data, where you induce change across a wide vari­ety of items and mea­sure the casual value of them. Com­bined or lead­ing your cor­rel­a­tive infor­ma­tion you can dis­cover amaz­ing new lessons that you would never have learned before. With­out doing this the data that many claim is lead­ing them to con­clu­sions is often incom­plete for fun­da­men­tally wrong and can in no way pro­duce the insights that peo­ple are claim­ing. The core goal is always to min­i­mize the cost of this active inter­ac­tion with data while max­i­miz­ing the num­ber and level of alter­na­tives that you are com­par­ing. Fail­ure to do this will inevitably lead to lost rev­enue and often false direc­tions for entire prod­uct road maps as peo­ple lever­age data to con­firm their opin­ions and not to truly use data ratio­nally to pro­duce amaz­ing results.

Exam­plesMul­ti­ple suc­cess met­rics, Attri­bu­tion, Track­ing Clicks, Per­sonas, Clustering

Solu­tion – Causal changes can arm you with the added infor­ma­tion needed to answer these ques­tions more directly, but in real­ity that is not always going to be an option. If noth­ing else, always remem­ber that for any data to tell you what lead to some­thing else, you have to prove three things:

1) That what you saw was not just a ran­dom outcome

2) That the two items are cor­re­lated with each other, and not just some other change

3) That you need to prove causal direc­tion to be able to prove any conclusion

Just the very act of stop­ping peo­ple from not rac­ing ahead or abus­ing this data to prove their own agenda will dra­mat­i­cally improve the effi­ciency of your data usage as well as the value derived from your entire data organization.

Rate vs. Value –

Prob­lem – There is noth­ing more com­mon than find­ing pat­terns and anom­alies in your ana­lyt­ics. This prob­a­bly is the sin­gle core skill of all analy­sis, yet it can often be the most mis­used or abuse actions taken with data. It can be seg­ments that have dif­fer­ent pur­chase behav­ior, chan­nels that behave dif­fer­ently, or even “prob­lems” with cer­tain pages or processes. Find­ing a pat­tern or anom­aly at best is sim­ply the halfway point of action­able insight, not the final stop to be fol­lowed blindly. Rate is the pat­tern of behav­ior, usu­ally expressed as a ratio of actions. Find­ing rates of action is the sin­gle most com­mon and core action in the world of ana­lyt­ics, but the issue usu­ally comes when we con­fuse the pat­tern we observe with the action to “cor­rect” that action. Like Cor­re­la­tion vs. Cau­sa­tion above though, a pat­tern by itself is just noise. It takes active inter­ac­tion and com­par­i­son with other less iden­ti­fied able options in order to val­i­date the value of those types of analysis.

Just because Google users spend 4.34 min per visit or email users aver­age visit depth is 3.4 pages are exam­ples of rates of action. What this is not is the mea­sure of value of those actions. Value is the change in out­come cre­ated by that cer­tain action not the rate at which peo­ple hap­pen to do things in the past. Most peo­ple under­stand “past per­for­mance does not ensure future out­comes” but they fail to apply the same logic when it comes to look­ing for pat­terns in their own data. Value is expressed as a lift or dif­fer­en­ti­a­tion, things like adding a but­ton increased con­ver­sion by 14% or remov­ing our hero image gen­er­ated 18% more rev­enue per visitor.

The main issues come from con­fus­ing the abil­ity to mea­sure dif­fer­ent actions with know­ing how to change someone’s behav­ior. The sim­plest exam­ple of this is the sim­ple null hypoth­e­sis of what would hap­pen if that item wasn’t there? Just because 34% of peo­ple click on your hero image which is by far the high­est amount on your home­page, what would hap­pen if that image wasn’t there? You wouldn’t just lose 34% of peo­ple, they would instead inter­act with other part of the page. Would you make more less rev­enue? Would it be bet­ter or worse?

It also comes down to two dif­fer­ent busi­ness ques­tions. At face value the only pos­si­ble ques­tion you could answer with just pat­tern analy­sis is, “What is an action we can take?”, in the ideal value busi­ness case you would instead answer “Based on my cur­rent finite resources, what is the action I can take to gen­er­ate the most X” where X is your sin­gle suc­cess met­ric. Rates of value have no mea­sure of abil­ity to change or of cost to do so, and as such they can not answer many of the busi­ness ques­tions that they are erro­neously applied to.

Exam­plesPer­son­al­iza­tion, Fun­nel Analy­sis, Attri­bu­tion, Page Analy­sis, Pathing, Chan­nel Analysis

Solu­tion – The real key is to make sure that built into any plans of opti­miza­tion you are incor­po­rat­ing active data acqui­si­tion and a that you are always mea­sur­ing null assump­tions and mea­sur­ing the value of items. This infor­ma­tion com­bined with knowl­edge of influ­ence and cost to change can be vital, but with­out it is likely empty noise. There are entire stud­ies in math ded­i­cated to this, with the most com­mon being ban­dit based prob­lem solv­ing. Once you have actively acquired knowl­edge, you then will start to build infor­ma­tion that can start to inform and improve the cost of data acqui­si­tion, but never replace it.

These are but two of the many areas where peo­ple con­sis­tently make mis­takes when lever­ag­ing data and con­cepts from sta­tis­tics to make false con­clu­sions. Data should be your great­est asset not your great­est lia­bil­ity, but until you help your orga­ni­za­tion make data dri­ven deci­sions and not data val­i­dated deci­sion there are always going to be mas­sive oppor­tu­ni­ties for improve­ment. Make it a focus to improve your orga­ni­za­tions under­stand­ing and inter­ac­tion with each of these con­cepts and you will start using far less resources and mak­ing far bet­ter out­comes. Fail­ure to do so also insures the oppo­site out­comes over time.

Under­stand data and data dis­ci­pline have to become your biggest areas of focus and edu­cat­ing oth­ers your pri­mary direc­tive if you truly want to see your orga­ni­za­tion take the next step. Don’t let just report­ing data or mak­ing claims of analy­sis be enough for you and you will quickly find that it is not enough for others.