My first trip through the com­mon heuris­tics of con­ver­sion rate opti­miza­tion looked at two of the more com­mon test­ing ideas and how they usu­ally reach false or lim­it­ing con­clu­sions. In my sec­ond part I want to look at gen­eral test­ing the­ory best prac­tices and how they can be major lim­it­ing fac­tors in the suc­cess of your program.

It is impor­tant to remem­ber that you are always going to get an out­come so this is not about can you make money. How you and the peo­ple in your orga­ni­za­tion think about test­ing is the largest fac­tor in what you value that opti­miza­tion pro­duces. This is an eval­u­a­tion of the effi­ciency of the method and how much does it pro­duce for the same or less resources. In con­cept you can spend infi­nite amount of resources to achieve any end goal, but the real­ity is that we are always faced with a finite amount of time and pop­u­la­tion, which means we must always be look­ing for ways to improve inef­fi­cient sys­tems. If we con­tinue to be lim­ited by these com­mon heuris­tics then the indus­try as a whole will con­tinue to pro­duce min­i­mal results com­pared to what it can and should be producing.

Always have a Hypothesis –

There is not more mis­un­der­stood term then hypoth­e­sis. In all like­li­hood it is because most are famil­iar only with their 6th grade (at least in my school) sci­ence instruc­tion or they took class­room for­mal sci­ence in col­lege. In those fields we oper­ate like we have unlim­ited time and resources and we are try­ing to val­i­date whether a drug will cause can­cer, not whether a ban­ner will get more clicks if it is blue or red. The stakes are higher and the mod­els are much more sim­ple in class­room con­trolled stud­ies for can­cer. There is a lot to sci­en­tific method, not just hav­ing a hypoth­e­sis, espe­cially when approached from a resource effi­ciency per­spec­tive that is not con­sid­ered in such a sim­plis­tic view of idea validation.

We must apply sci­en­tific rigor, but we must also make sure that all actions make sense in real world sit­u­a­tions, which means that effi­ciency and min­i­miz­ing regret are more impor­tant than val­i­da­tion of an individual’s opin­ion. It is not that sci­en­tific method relies on the use of a hypoth­e­sis, it is sim­ply that we mis­take a hypoth­e­sis with a cor­rect hypoth­e­sis; we seek val­i­da­tion for our opin­ions and not the dis­cov­ery of the best way to pro­ceed. Sci­ence is also about prov­ing one idea ver­sus all other alter­na­tive hypoth­e­sis yet we ignore that part of the dis­ci­pline because it is not the part that allows some­one to see if they are right. In the grand scheme of things we are dras­ti­cally over valu­ing test ideas and that is dis­tract­ing from the parts of the process that pro­vide value.

Let’s start with the basics. You should never, and I mean never, run a test if you do not have a sin­gle suc­cess met­ric for your entire site. In most cases this is to make more money, but what­ever it is, this goal exists out­side of the con­cept of the test. You must also must have rigid mea­sure­ment and action rules that are repro­ducible, which means that you must under­stand real world sit­u­a­tions like the lim­i­ta­tions of con­fi­dence and vari­ance.

You can then have an opin­ion about what you think will hap­pen when you make a change. The prob­lem is when we con­fuse that opin­ion with the mea­sured goals of the test. Even worse we limit what we com­pare result­ing in mas­sively inef­fi­cient use of your time and effort. Just because you believe that improv­ing your nav­i­ga­tion will get peo­ple to spend more time on your site, that is com­pletely irrel­e­vant to the end goal of mak­ing more money. Your belief that more engage­ment will result in more rev­enue is not enough to make it so. If you are right AND if that also pro­duces more rev­enue, then you will know that from rev­enue. If you are wrong you will only know that from rev­enue. We must con­struct our actions to pro­duce answers to our opin­ion and to what is best for our orga­ni­za­tion. Hypoth­e­sis and ideas are just a very small part of a much more com­plex and impor­tant pic­ture, and over focus on them allows peo­ple to avoid the respon­si­bil­ity and the ben­e­fit on focus­ing on all those other parts, which are the ones that really make a dif­fer­ence over time for any and all test­ing programs.

The worst fac­tor of this is that it allows peo­ple to fall for con­gru­ence bias and to fail to ask the right ques­tions. We become so used to the con­ver­sa­tion around a sin­gle idea that the con­cept of dis­cov­ery and chal­leng­ing assump­tions is more word then action. Ques­tions can be incred­i­bly impor­tant to the suc­cess of a pro­gram, but only if they are tack­led in the right order and used to focus atten­tion, not as the final val­i­da­tion of spent atten­tion. If your hypoth­e­sis is that a cer­tain nav­i­ga­tion change will result in more engage­ment, then the cor­rect use of your resources are either which of a num­ber of dif­fer­ent ver­sions of the nav­i­ga­tion will pro­duce the most rev­enue or if you can, which sec­tion on your site pro­duces the most engage­ment when changed. In both cases you have adapted your “hypoth­e­sis” to present a more effi­cient and func­tional use of your time. The hypoth­e­sis exists, but it is not the con­straint of the test. If you are right, you will see it. If you are wrong, you will make more money.

This means that hav­ing a hypoth­e­sis is impor­tant, but only if it is not the test char­ter. Have an idea what you are try­ing to accom­plish and make sure that you go about see­ing the value of cer­tain actions com­pared to each other is more impor­tant. Some­times the most effec­tive hypoth­e­sis are “I believe that we do not know the value of dif­fer­ent sec­tions on our pages.” Don’t con­fuse your opin­ion on what will win with a suc­cess­ful test. Chal­lenge assump­tions and design efforts to max­i­mize what you can do with what you have and you will never be with­out opin­ions. The best answers are always when you are proven wrong, but if you get too caught up on val­i­dat­ing your hypoth­e­sis, then you will always be miss­ing the largest lessons you could be learning.

We need to opti­mize X because it is los­ing Y

This is the clas­sic prob­lem of con­fus­ing rate and value, or more cor­rectly cor­rel­a­tive and causal infer­ence. We con­fuse what we want to hap­pen with what is really hap­pen­ing. Just because peo­ple were doing X and now they are doing Y, it doesn’t mean that this is directly caus­ing any change, pos­i­tive or neg­a­tive to our end goals. Out­side of the three rules of prov­ing cau­sa­tion the real issue here is that we get tied to our beliefs about a pat­tern of events even when the data can­not pos­si­bly val­i­date that con­clu­sion. Under­stand­ing and act­ing on what you know as opposed to what you want to have hap­pen is the dif­fer­ence between being data dri­ven and sim­ply being data justified.

Think about it this way, I have 23% clicks on one sec­tion of my page and 0% on another. If I were to improve one of those which one is going to pro­duce the biggest returns? The answer here is that you do not know. A rate of inter­ac­tion can­not pos­si­bly tell you the value of chang­ing that item. Some of the most impor­tant parts of any user expe­ri­ence are things that can’t even be clicked.

This plays out out­side of clicks too. We have a prod­uct fun­nel and we see more peo­ple leav­ing on page 3, there­fore we need to test on page 3. The real­ity is that more or less peo­ple may or may not be tied to more or less rev­enue. Even if it is tied it may be a qual­i­fi­ca­tion issue higher, or a user inter­ac­tion issue, or sim­ply too many peo­ple in a prior step. This is called a lin­ear assump­tion fal­lacy, where we assume that when we have 5 peo­ple and 2 con­vert that if we have 10 peo­ple 4 will con­vert. Lin­ear mod­els are rare in nature but are easy to under­stand, so we fall back on com­fort over real­is­tic understanding.

The act of fig­ur­ing out what to test can be dif­fi­cult but it is never improved by pre­tend­ing we have val­i­da­tion of our own ideas when we have noth­ing to jus­tify them. We need to be open to dis­cov­er­ing where we should go and to focus on some set path. In almost all cases you will find that you are wrong, often dra­mat­i­cally so, about where prob­lems really are and how to fix them. This is why it is so impor­tant to not try and focus solely on more or less cor­rel­a­tive actions. We can and should be able to test fast enough and with few enough resources that we will never be lim­ited to this realm unless we can are stuck there mentally.

Like so much else what you spend your time and effort on is incred­i­bly impor­tant. There are a thou­sand things you can improve and there are always new ideas. Jus­ti­fy­ing them falsely or focus­ing on them instead of the dis­ci­pline of test­ing is noth­ing but a drag on your entire test­ing pro­gram. Test ideation is about 1% of the value derived from a test pro­gram yet it is 90%+ of where peo­ple like to spend their time. A 5% gain that took 2 months is worth a lot less than a 10% gain that took 2 weeks. The most impor­tant issues we must face are not about gen­er­at­ing test ideas or val­i­dat­ing our beliefs about how to improve our site, it is about dis­cov­er­ing and apply­ing resources to make sure that we are doing the 10% option and not the 5% option. If we overly focus on test ideas and not the dis­ci­pline of apply­ing them cor­rectly we are never going to going to achieve what should be achieved. If we get lost try­ing to focus only on where we want to go, then you will always be lim­ited in the pos­si­ble out­comes you can generate.