One of the more com­mon refrains I hear as I speak with dif­fer­ent orga­ni­za­tions or read indus­try blogs, is how do you deal with a failed test? Peo­ple speak of this as if it is a com­mon or accepted prac­tice, one that you need to help peo­ple under­stand before you move for­ward. The irony of these state­ments is that when most groups are speak­ing, they are mea­sur­ing the value of the test by if they got a “win­ner”, a recipe that beat their con­trol. Peo­ple almost always come into test­ing with the wrong idea of what a suc­cess­ful test is. Change what what suc­cess means, and you will be able to change your entire test­ing program.

Suc­cess and fail­ure of any test is deter­mined before you launch the test, not by the mea­sure­ment of one recipe ver­sus another. A suc­cess­ful test may have no recipe beat the con­trol, and an unsuc­cess­ful test may have a clear sin­gle win­ner. Suc­cess is not lift, because lift with­out con­text is nice but almost meaningless.

Suc­cess is putting an idea through the right sys­tem, which enables you find out the right answers and that allows you to get per­for­mance that you would not have oth­er­wise. If all you do is test one idea ver­sus another that you were already con­sid­er­ing, you are not gen­er­at­ing lift, you are only stop­ping neg­a­tive actions. In addi­tion, if I find some­thing that beats con­trol by 5%, that sounds great, until you add con­text that if I had tested 3 other recipes, they would result in a 10%, 15%, and 25% change. Do you reward the 5% gain, or the 20% oppor­tu­nity loss?

In the long run, a great idea poorly exe­cuted will never beat a mediocre idea exe­cuted correctly.

You can mea­sure how suc­cess­ful a test will be by ask­ing some very sim­ple ques­tions before you con­sider run­ning the test:

1) Are you pre­pared to act on the test? – Do you know what the met­ric you are using is? Do you have the abil­ity to push results? Is every­one in agree­ment before you start that no mat­ter what wins, you will go with it? Do you know what the rules of action are and when you can call a win­ner and when is too soon? If you answered no to any of those ques­tions, then any test you run is going to be almost meaningless.

2) Are you chal­leng­ing an assump­tion? – This means that you need to make sure that you can see not only if you are cor­rect, but if you are wrong. It also means that you need to have more than 1 alter­na­tive in a test. Alter­na­tives need to be dif­fer­ent from each other and allow for an out­come out­side of com­mon opin­ion to take hold. Con­sider any test with a sin­gle alter­na­tive to be a fail­ure as there is no way to get a result with context.

3) Are you focus­ing on should over can?– This is when we get caught up on can we do a test, can we tar­get to a spe­cific group, or mak­ing sure that we can track 40 met­rics. It is incred­i­bly easy to get lost in the exe­cu­tion of a cam­paign, but the real­ity is that most of the things we think are required aren’t, and if we can not tie an action back to the goal of a test, then there is no rea­son to do it. These items should be in con­sid­er­a­tion based on your infra­struc­ture, and based on value. Pri­or­i­tize cam­paigns by how effi­cient they are to run, and never include more then you need to take the actions you need to take. Any con­ver­sa­tion that you are hav­ing that is focused purely on the action is both inef­fi­cient and a red her­ring tak­ing you away from what matters.

So how then do you make sure that you are get­ting suc­cess from a test? If noth­ing else, you need to build a frame­work for what will define a suc­cess­ful test, and then make sure that all actions you take fill that frame­work. Get­ting peo­ple to agree to these rules can seem extremely dif­fi­cult at first, but hav­ing the con­ver­sa­tion out­side of a spe­cific test and mak­ing it a require­ment that they fol­low them will help ensure that your pro­gram is mov­ing down the right path to success.

Here is a really sim­ple sam­ple guide­line to make sure all tests you run will be valu­able. Each orga­ni­za­tion should build their own, but they will most likely be very similar:

  • At least 4 recipes
  • One suc­cess met­ric that is site wide, same as other tests, and directly tied to revenue
  • No more than 4 other met­rics, and all of these must be site wide and used in mul­ti­ple tests
  • Every­one in agree­ment on how to act with results
  • Every­one pre­pared to do a follow-up test based on the outcome
  • At least 7 seg­ments and no more than 20, with each seg­ment at least 5–7% of your pop­u­la­tion and all must have a com­pa­ra­ble segment
  • If inter­ested in tar­get­ing, test must be open to larger pop­u­la­tion and use seg­ments to either con­firm beliefs or to prove your­self wrong. (e.g. if I want to tar­get to Face­book users, I should serve the same expe­ri­ences to all users and if I am right, then the con­tent I have for Face­book users will be the high­est per­former for my Face­book seg­ment).

One of the most impor­tant things that an opti­miza­tion pro­gram can do is make sure that all tests fol­low a sim­i­lar frame­work. Suc­cess in the long run fol­lows from how you approach the prob­lem, not by the out­come of a spe­cific action. You will notice that in no point here is the focus on the cre­ation of test ideas, which is where most peo­ple spend way too much time. Any test idea is only as good as the sys­tem by which you eval­u­ate it. Tests should never be about my idea ver­sus yours, but instead about the dis­cov­ery and exploita­tion of com­par­a­tive infor­ma­tion, where we can fig­ure out what option is best, not if my idea is bet­ter than yours.

What vari­ant won, whose idea was it, and gen­er­at­ing test ideas are some of the biggest red her­rings in test­ing pro­grams. You have to be able to move the con­ver­sa­tion away from the inputs, and instead focus peo­ple on the cre­ation of a valu­able sys­tem by which you fil­ter all of that noise. Do not let your­self get caught in a trap of being reac­tive, instead proac­tively reach out and help groups under­stand how vital it is that we fol­low this type of framework.

Change the con­ver­sa­tion, change how you mea­sure suc­cess, and oth­ers will fol­low. Keep hav­ing the same con­ver­sa­tion or let oth­ers dic­tate how you are going to act, and you will never be able to prove success.