It doesn’t take long work­ing in a data field for you to come across data being used in ways other than what it was intended for. George Can­ning once cor­rectly quipped, “I can prove any­thing by sta­tis­tics except the truth.” One of the hard­est strug­gles for any­one try­ing to make sense of all the var­i­ous data sources is an under­stand­ing of the data that you are deal­ing with, what is it really telling you, what is it not telling you, and how should you act. We have all this rich inter­est­ing infor­ma­tion, but what is the right tool for the job? What is the right way to think about or lever­age that data? One of the ways that test­ing pro­grams lose value over time is when they stop eval­u­at­ing their data with a crit­i­cal eye and focus on what is it really telling you. They so want to find mean­ing in things that they con­vince them­selves and oth­ers of answers that the data could not ever pro­vide. Under­stand your data, under­stand the amaz­ing power that it can pro­vide, and under­stand the things it can­not tell you.

Every tool has its own use, and we get the most value when we use tools in the cor­rect man­ner. Just hav­ing a tool does not mean it is the right fit for all jobs. When you come from an ana­lyt­ics back­ground, you nat­u­rally look to solve prob­lems with your pre­ferred ana­lyt­ics solu­tions. When you come from a test­ing back­ground, you nat­u­rally look for test­ing as the answer to all prob­lems. The same is true for any back­ground, as the real­ity is when we are not sure, you are wired to turn back to what you are com­fort­able with. The real­ity is that you get more value when you lever­age each tool cor­rectly, and the fastest way to do that is to under­stand what the data does and does not tell you from each tool.

Ana­lyt­ics is the world of cor­rel­a­tive pat­terns, with a sin­gle data stream that you can parse and look back­wards at. You can find inter­est­ing anom­alies, com­pare rates of action, and build mod­els based on large data sets. It is a pas­sive data acqui­si­tion that allows you to see where you have been. When used cor­rectly, it can tell you what is not work­ing and help you find things that you should explore. What you can not do is tell the value of any action directly, nor can it tell you what the right way to change things is.

Test­ing is the world of com­par­a­tive analy­sis, with only a sin­gle data point avail­able to iden­tify pat­terns. It is not just a ran­dom tool to throw one option ver­sus another to set­tle an inter­nal argu­ment, but instead a valu­able resource for active acqui­si­tion of knowl­edge. You can change part of a user expe­ri­ence and you can see its impact on an end goal. What you can not do is answer “why?” with a sin­gle data point, nor can you attribute cor­re­lated events to your change to each other. You can add dis­ci­pline and rigor to both to add more insight, but at its core all test­ing is really telling you is the value of a spe­cific change. It is beholden on you for the qual­ity of the input, just as your opti­miza­tion pro­gram is beholden on the dis­ci­pline used in designed and pri­or­i­tiz­ing opportunities.

Yet with­out fail peo­ple look at one tool and claim it can do the other, or that the data tells them more then it really does. Whether it is the dif­fer­ence in rate and value, or it is believ­ing that a sin­gle data point can tell you the rela­tion­ship between two sep­a­rate met­rics. Where we make mis­takes is in think­ing that the infor­ma­tion itself tells you the direc­tion of the rela­tion­ship of that infor­ma­tion, or the cost of inter­act­ing with it. This is vital infor­ma­tion for opti­miza­tion, yet so often groups pre­tend they have this infor­ma­tion and make sub­op­ti­mal decisions.

We also fail to keep per­spec­tive on what the data actu­ally rep­re­sents. We get tun­nel vision on what the impact is to a spe­cific seg­ment or group that we lose the view on what the impact to the whole is. To make this even worse, you will find groups tar­get­ing or iso­lat­ing traf­fic, such as only new users, to their tests and extrap­o­lat­ing the impact to the site as a whole. It does not mat­ter what our abil­ity to tar­get to a spe­cific group is unless that change will cre­ate a pos­i­tive out­come for the site. The first rule of any sta­tis­tics is that your data must be rep­re­sen­ta­tive. Another of my favorite quotes is, “Before look at what the sta­tis­tics are telling you, you must first look at what it is not telling you.”.

Tools do not under­stand the qual­ity of the inputs, it is up to the user to know when they have biased results or they do not. Always remem­ber the truth about any piece of infor­ma­tion, “Data does not speak for itself – it needs con­text, and it needs skep­ti­cal eval­u­a­tion”. Fail­ure to do so inval­i­dates the data’s abil­ity to make a the best deci­sion. Data in the online world has spe­cific chal­lenges that just sam­pling ran­dom peo­ple in the phys­i­cal world does not have to account for. Our indus­try is lit­tered with reports of results or of best prac­tices that ignore these fun­da­men­tal truths about tools. It is so much eas­ier to think you have a result and manip­u­late data to meet your expec­ta­tions then it is to have dis­ci­pline and to act in as unbi­ased a way as pos­si­ble. When you get this tun­nel vision, both in what you ana­lyze or in the pop­u­la­tion you lever­age, you are vio­lat­ing these rules and leav­ing the results highly ques­tion­able. Not under­stand­ing the con­text of your data is just as bad or worse then not under­stand­ing the nature of your data.

The best way to think about ana­lyt­ics is as a doc­tor uses data. You come for a visit, he talks to you, you give him a pat­tern of events (my shoul­der hurts, I feel sick, etc..). He then uses that infor­ma­tion to reduce what he won’t do (if your shoul­der hurts, he is not going to x-ray your knee or give you cough med­i­cine). He then starts look­ing for ways to test that pat­tern. Really good doc­tors use those same tests to leave open the pos­si­bil­ity that some­thing else is the root cause (maybe a shoul­der exam shows that you have back prob­lems). Poor doc­tors just give you a pain pill and never look deeper into the issue. Know­ing what data can­not tell you greatly increases the effi­ciency of the actions you can take, just as know­ing how to actively acquire the infor­ma­tion need for the right answers, and how to act on that data, improves your abil­ity to find the root cause of your problems.

A deep under­stand­ing of your data gives the abil­ity to act. You may not always know why some­thing hap­pens, but you can act deci­sively if you have clear rules of action and you have an under­stand­ing of how data inter­acts with the larger world. It is so easy to want to have more data, or to want to cre­ate a story that makes it eas­ier for oth­ers to under­stand some­thing. It is not that these are wrong, only that the data pre­sented in no way actu­ally val­i­dates that story nor could pro­vide the answers that you are telling oth­ers that it does. In its worse, you are dis­tract­ing from the real issue, at its best, it is just addi­tional cost and over­head to action.

The edu­ca­tion of oth­ers and the self on the value and uses of data is vital for long term growth of any pro­gram. It is way too easy to get caught up in all the var­i­ous daily pres­sures to really take time eval­u­ate data, so it is vital that as the owner of that data that you make it a top pri­or­ity. If you do not under­stand the real nature of your data, then you are sub­ject to biases which remove its abil­ity to be valu­able. There are thou­sands of mis­guided uses of data, all of which are easy to miss unless you are more inter­ested in the usage of data then the pre­sen­ta­tion and gath­er­ing of data. Do not think that just know­ing how to imple­ment a tool, or know­ing how to read a report, tells you any­thing about the real infor­ma­tion that is present in it. Take the time to really eval­u­ate what the infor­ma­tion is really rep­re­sent­ing, and to under­stand the pros and cons of any manip­u­la­tion you do with that data. Just read­ing a blog or hear­ing some­one speak at a con­fer­ence does not give you enough infor­ma­tion to under­stand the real nature of tools at your dis­posal. Dive deep into the world of data and the dis­ci­plines of it, choose the right tools for the job, and then make sure that oth­ers are as com­fort­able with that infor­ma­tion as you are. It can be dif­fi­cult to get to those lev­els of con­ver­sa­tions or to con­vince oth­ers that they might be look­ing at data incor­rectly, but those moments when you suc­ceed can be the great­est moments for your program.