Adobe
Adobe Digital Marketing Blog
  • Digital Marketing
    • Mobile
    • Social Media
    • Digital Advertising
    • Search Engine Marketing
  • Analytics
  • Personalization
  • Industries
    • Financial Services
    • Media & Entertainment
    • Retail & Travel
  • Executive Insights
    • Aseem Chandra
Digital Publishing Suite analytics integration Why we do what we do: Forced Reality — Conjunction Fallacy

Confidence and Vanity – How Statistical measures can lead you astray

Personalization · By Andrew Anderson On December 26, 2011 · Leave a Comment

In deal­ing with the best ways to change a site for max­i­mize ROI, one of the most com­mon refrains I hear is “is the change sta­tis­ti­cally con­fi­dent” or “what is the con­fi­dence inter­val”, which often leads to a long dis­cus­sion around what do those mea­sures really mean. One of the fun­ni­est things in our indus­try is the over reliance on sta­tis­ti­cal mea­sures to prove that some­one is “right”. Whether it is Z-Score, T-Test, Chi-Squared or other mea­sures, and peo­ple love to throw them out and use them as the end-all be-all of con­fir­ma­tion that they and they alone are cor­rect. Reliance on any one tool, dis­ci­pline, or action to “prove” value does noth­ing to improve per­for­mance or to allow you to make bet­ter deci­sions. These sta­tis­ti­cal mea­sures can be extremely valu­able, when used in the right con­text and with­out blind reliance on them to answer any and all questions.

Con­fi­dence based cal­cu­la­tions are often used in a way that leaves them being the least effec­tive way to the true mea­sures change and impor­tance of data (or “who is cor­rect”) when they are applied to real world sit­u­a­tions. They work great in a con­trolled set­ting, and with infini­tum data, but in the real world, they are just one of many imper­fect stan­dards for mea­sur­ing the impact of data and changes. Real world data dis­tri­b­u­tion, espe­cially over any short period of time, rarely resem­bles nor­mal dis­tri­b­u­tion. You are also try­ing to account for dis­tinct groups with dif­fer­ing propen­si­ties of action, instead of try­ing to account for one larger rep­re­sen­ta­tive pop­u­la­tion. What is also impor­tant to note is that even in the best case sce­nario, these mea­sures work if you have a rep­re­sen­ta­tive data set, mean­ing that just a few hours or even a cou­ple of days of data will never be rep­re­sen­ta­tive (unless you Tues­day morn­ing vis­i­tors are iden­ti­cal to your Sat­ur­day after­noon vis­i­tors). What you are left with is your choice of many imper­fect mea­sures which are use­ful, but are not mean­ing­ful enough to be the only tool you use to make decisions.

What is even worse is that peo­ple also try to use this value as a pre­dic­tor of out­come, so they say things like I am 95% con­fi­dent that I will get 12% lift. These mea­sures only mea­sure the like­li­hood of the pat­tern of out­come, so that you can say, I am 95% con­fi­dent that B will be bet­ter than A, but they are not mea­sures of the scale of out­come, only the pattern.

It is like some­one found this new fancy tool, and sud­denly has to apply it because they real­ize that what they were pre­vi­ously doing was wrong, but now, this one thing, will sud­denly make them per­fect. Like any tool at your dis­posal, there is a lot of value when used cor­rectly and with the right amount of dis­ci­pline. When you are not dis­ci­plined in how you eval­u­ate data, you will never really under­stand it and use it to make good decisions.

So if you can not rely on con­fi­dence alone, how best to deter­mine if you should act on data? Here are three really sim­ple steps to mea­sure impact of changes when eval­u­at­ing causal data sets:

1) Look at per­for­mance over time – Look at the graph, look for con­sis­tency of data, and look for lack of inflec­tion points(comparative analy­sis). Make sure you have at least 1 week of con­sis­tent data (that is not the same as just one week of data). You can­not replace under­stand­ing pat­terns, look­ing at the data, and under­stand­ing its mean­ing. Noth­ing can replace the value of just eye balling your data to make sure you are not get­ting spiked on a sin­gle day and that your data is con­sis­tent. This human level check gives you the con­text that helps cor­rect against so many imper­fec­tions that just look­ing at the end num­bers leaves you open for.

2) Make sure you have enough data – The amount needed changes by site. Some sites, 1000 con­ver­sions per recipe is not enough, some sites 100 per recipe are. Under­stand your site and your data flow. I can­not stress enough that data with­out con­text is not valu­able. You can get 99% con­fi­dence on 3 con­ver­sions over 1, but that doesn’t make it valu­able or the data actionable.

3) Make sure you have mean­ing­ful dif­fer­en­ti­a­tion –Make sure you know what your nat­ural vari­ance is for your site (in a vis­i­tor based met­ric sys­tem, it is pretty reg­u­larly around 2% after a week). There are many easy ways to fig­ure out what it is for the con­text of what you are doing. You can be 99% con­fi­dent at .5% lift, and I will tell you have noth­ing (neu­tral). You can have 3% lift and 80% con­fi­dence, if it is over a con­sis­tent week and you nat­ural vari­ance is below 3%, and I will tell you have a decent win.

I have got­ten into many debates with sta­tis­ti­cians whether con­fi­dence pro­vides any value at all in the con­text of online test­ing, and my usual answer is that if you under­stand what it means, it can be a great barom­e­ter and another fail safe that you are mak­ing a sound deci­sion. The fail­ure is that you can’t just use it as the only tool in your arse­nal. I am not say­ing that there is not a lot of value from P-value based cal­cu­la­tions, or most sta­tis­ti­cal mod­els. I will stress how­ever that they are not panacea nor are they an excuse for not doing active work to under­stand and act on your data. You have to be will­ing to let the data dic­tate what is right, and that means you must be will­ing to under­stand the dis­ci­plines of using the data itself.

  • Follow Adobe Digital Marketing

    Fol­low @AdobeDigMktg
  • Popular Posts

    • Excellent Blog Post — Getting More from your Omniture Implementation.4
    • Tim Tebow and Mobile Marketing in 2012 (Part 1)2
    • Change the Conversation: What does “Efficiency” really mean?2
    • My work, My passion — Customer Analytics1
    Adobe Digital Marketing Blog

    Pages

    • Digital Marketing
    • Analytics
    • Personalization
    • Industries
    • Executive Insights

    The Latest

    • Using Dependent Code: Adding the Twitter Handle Name to Your Referring Traffic
      A common request I hear from customers is the desire to integrate […]

    More

    See how Adobe is changing the world through digital experiences. We are the leader in delivering solutions that let customers produce, distribute, and realize value from great content, whether in media and publishing or digital marketing.
    © 2012 Adobe Systems Incorporated. All Rights Reserved.
    Tweet