Some of the great­est mis­takes peo­ple make is by hav­ing com­plete faith in num­bers or in their own abil­i­ties to use them to get a desired result. While nor­mally there are a great many just biases and log­i­cal fal­lac­ies that make up human cog­ni­tion, some­times there are fac­tors in the real world that con­spire to make it even more dif­fi­cult to act in a mean­ing­ful and pos­i­tive way. One of the more inter­est­ing phe­nom­e­non in the world of data is the sta­tis­ti­cal bias known as “simpson’s para­dox”. Simpon’s Para­dox is a great reminder that one look at data cre­ates a fal­lacy that can often lead to a very wrong con­clu­sion. Even worse it can allow for claims of suc­cess for actions that are neg­a­tive in the con­text of the real world.

Simpson’s para­dox is a pretty straight for­ward bias, it is when you have a cor­re­la­tion present in two dif­fer­ent groups indi­vid­u­ally, but when com­bined they pro­duce the exact oppo­site effect.

Here is a real world example:

We have run a analy­sis and show that a vari­a­tion on the site pro­duces a dis­tinct win­ner for both our organic and our paid traffic:

But when we com­bine the 2, we have the exact inverse pat­tern play out. Ver­sion A won by a large mar­gin for both Organic and Paid traf­fic, but com­bined it dra­mat­i­cally under per­forms B:

This seems so counter intu­itive, but it plays out in many places in real world sit­u­a­tions. You also may find the inverse pat­tern, one where you see no dif­fer­ence in dis­tinct groups, but com­bined you see a mean­ing­ful difference.

In both cases, log­i­cally we would want to pre­sume that A was bet­ter than B, but it was not until we add the larger con­text that we under­stand the true value.

While this is a trick of num­bers, it presents itself far more than you might expect, espe­cially as groups dive into seg­men­ta­tion and per­son­al­iza­tion. The more peo­ple leap directly into per­son­al­iza­tion with vigor, the more they are leav­ing them­selves open to biases like Simpson’s Para­dox. We get so excited when they are able to cre­ate a tar­geted mes­sage, and so des­per­ate to show its value and to prove their “metal” that they don’t take the time to eval­u­ate things on the holis­tic scale. Even worse, they don’t even com­pare it with other seg­ments or account for the cost to main­tain a sys­tem. They are so excited by their abil­ity to present “rel­e­vant” con­tent to a group that they think needs it, that they fail to mea­sure if it adds value or if it is the best option. Even worse, they then go around telling the world about their great find­ing, only to be caus­ing mas­sive harm to the site as a whole.

One of the key rules to under­stand is that as you keep div­ing down to find some­thing “use­ful” either from ana­lyt­ics or from causal feed­back after the fact, the more likely this plays out. You can use num­bers to come to any con­clu­sion with cre­ative enough “dis­cov­ery”. If you keep div­ing, if you keep pars­ing, you are expo­nen­tially increas­ing the chances that you will arrive at a false or mis­lead­ing con­clu­sion. Decid­ing how you are going to use data after the fact is always going to lead to biased results. It is easy to prove a point when­ever you for­get the con­text of the infor­ma­tion or you lose the dis­ci­pline of try­ing to use it to find the best answer.

So how do you com­bat this? The fun­da­men­tal way is to make sure that you are tak­ing every­thing to the high­est com­mon denom­i­na­tor. Here is a really easy process if you are not sure how to proceed:

1) Decide what and how you are going to use your data BEFORE you act.

2) Test out the con­tent – Serve it ran­domly to all groups, even if you design the con­tent specif­i­cally for one group, test to every­one. If you are right, the data will tell you.

3) Mea­sure the impact of every type of inter­ac­tion to the same com­mon denom­i­na­tor. Con­vert every­thing to the same fis­cal scale, and use that to eval­u­ate alter­na­tives against each other. Con­vert­ing to the same scale allows you to insure that you know the actual value of the change, not just the impact to spe­cific segments.

4) Fur­ther mod­ify your scale to account for the main­te­nance cost to serve to that group. If it takes you a whole new data sys­tem, 2 apis, cookie inter­ac­tion and IT sup­port to tar­get to that group, then you have to get mas­sively higher return then a group you can do in a few seconds.

What you will dis­cover as you go down this path is that you are often wrong, in some cases dra­mat­i­cally so, about the value of tar­get­ing to a pre­con­ceived group. You will dis­cover not only that many of the groups you think are valu­able are not, but also many groups that you would not nor­mal con­sider for value to be higher valu­able (espe­cially in terms of effi­ciency). If you do this with dis­ci­pline and over time, you will also learn com­plete new ways to opti­mize your site, be it the types of changes, the groups that are actu­ally exploitable, the cost of infra­struc­ture, and the best ways to move for­ward with real unbi­ased data.

As always, it is the moments where you prove your­self wrong that you will get dra­matic results. Just try­ing to prove your­self right does noth­ing but give you the right to make your­self look good.

I always dif­fer­en­ti­ate a dynamic user expe­ri­ence from a “tar­geted expe­ri­ence”. In the first case, you are fol­low­ing a process, feed­ing a sys­tem, not dic­tat­ing the out­come, and then mea­sur­ing the pos­si­ble out­comes and choos­ing the most effi­cient option. In the sec­ond, you are decid­ing that some­thing is good based on con­jec­ture, biases, and inter­nal pol­i­tics, serv­ing to that group, and then jus­ti­fy­ing that action. Simpson’s para­dox is just one of many ways that you can go wrong, so I chal­lenge you to eval­u­ate what you are doing? Is it valu­able or are you just claim­ing it is? Are you look­ing at the whole pic­ture, or only the parts that sup­port what you are doing? Are you really improv­ing things, or just talk­ing about how great you are at improv­ing things?