Test­ing seems like such a nat­ural exten­sion of most exist­ing oper­a­tions that very few groups take the time to eval­u­ate as a sep­a­rate dis­ci­pline. They then are shocked when they hit the inevitable points of resis­tance that all pro­grams run into. They are not pre­pared for the polit­i­cal, tech­ni­cal, or orga­ni­za­tional bar­ri­ers that rid­dle their jour­ney. Even worse, those bar­ri­ers inter­fere with the need for sim­ple speedy exe­cu­tion, which hin­ders their abil­ity to reach the level of mon­e­tary impact that the pro­gram is truly are capa­ble of. What dif­fer­en­ti­ates pro­grams in the long haul are those that lever­age their abil­ity to push past those bar­ri­ers and change the way they think about the test­ing, evolv­ing beyond just focus­ing on sim­ple tests. They start to focus on the power of the sys­tem itself to shape their for­ward tra­jec­tory. It takes peo­ple to be real thought lead­ers, not just for them­selves, but also for their orga­ni­za­tions, to really reach the next level of their program.

The theme of the first group of dis­ci­plines for pro­grams was all about orga­ni­za­tional con­sis­tency and get­ting every­one to work effi­ciently and quickly towards one goal. Once you have achieved that abil­ity to move for­ward and act as needed, the main ingre­di­ent for suc­cess comes from your abil­ity to think about test­ing dif­fer­ently, to chal­lenge your­self and oth­ers to focus on learn­ing and on not mak­ing assump­tions. Way too many groups end up lever­ag­ing their test­ing pro­gram to only look a the impact of what peo­ple want to do, instead of using as a tool to learn and go in new direc­tions. Often times the least effi­cient part of any sys­tem are the peo­ple run­ning it, and as such we have to come up with ways to break down that bar­rier and to allow the test data to really dic­tate where we go, what we talk about, and how we allo­cate resources. How you think about test­ing, how you chal­lenge your­self to not fall into the many biases that dic­tate human nature, and how you can bring peo­ple to chal­lenge their own actions dic­tates the scale of the impact of each test and of the pro­gram as a whole.

So many groups run into the prob­lem of not real­iz­ing that they are sub opti­miz­ing. They run a test, and they get a win­ner, and so they assume that every­thing is golden. They report the win­ner, they move on. We get so caught up in the imme­di­ate return on our actions that we never take the time to under­stand what it means in con­text. The prob­lem is that we are not treat­ing each action with the respect it deserves, and that they are tak­ing they are busy look­ing at today and not tomor­row. It is never about get­ting a result, it is about the abil­ity to dif­fer­en­ti­ate dif­fer­ent results from each other, to make sure that we are going down the most effi­cient path for our busi­ness. If you get a 3% lift, that is more then you had pre­vi­ously, but what if the 3% is just one recipe in a test with a 5%, a 7% and a 10% win­ner. Would you then think push­ing the 3% win­ner was the best course of action? The only way to escape that trap is by chang­ing how you think about test­ing, and the prac­tices that you put in place around that change, you are given the abil­ity to mea­sure the effi­ciency of your actions, to view the causal rela­tion­ships of alter­na­tives and to mea­sure the scale of impact and the cost to achieve them.

In order to make sure you are get­ting the most from the actions you have enabled, here are 5 dis­ci­plines which will help you achieve the results you want from the program.

Best instead of Bet­ter Testing –

I have already explored this con­cept here, but it is impor­tant to under­stand the dis­tinc­tion. “Bet­ter” test­ing is the act of try­ing to fig­ure out if one idea is bet­ter than another. It lim­its the play­ing field and is used to make an imme­di­ate deci­sion about who is “right”. Best test­ing is try­ing to fig­ure out what the value of each fea­si­ble option is, and to fig­ure out what the best places are to put resources or to move the site. Test­ing is a sys­tem, one that only pro­duces value based on the qual­ity of the input. If you limit your input to only pop­u­lar opin­ion or a few ideas, then you are dra­mat­i­cally hin­der­ing the out­put of that sys­tem. It is about using test­ing to not just push pre­con­ceive notions but to instead democ­ra­tize ideas, so that the sys­tem is more impor­tant than the idea. If you are able to let test­ing tell you where to go, you will not only get bet­ter results from this test, but you will bet­ter inform future tests and stop your­self from spend­ing resources in an inef­fi­cient manner.

Best test­ing is the first step to allow­ing your pro­gram to pro­duce expo­nen­tial growth in a way that just a test would never be able to to do. You are able to use the pro­gram to build, not just run tests. The entire goal is to increase effi­ciency and to facil­i­tate learn­ing, and the eas­i­est great­est step towards that goal is forc­ing your team to think in terms of what is best, not just what idea is better.

Focus on Learning –

Every test you run is the chance to learn about your site and to get out­side of your com­fort zone. So many groups fail because they only test what they think will win, or what they focus on try­ing to get a con­sen­sus about recipes. If you spend the time and energy you waste talk­ing about test ideas and focus it on cre­at­ing all the options, you will spend less time and energy and will get bet­ter results. Stop argu­ing and move those resources towards cre­at­ing. If you are really try­ing to open up your test­ing to fea­si­ble alter­na­tives, you will con­stantly find win­ners that fly in the face of crowds. If you are focused on that out­come, you sud­denly find all sorts of new lessons wait­ing for you, with the added ben­e­fit of get­ting mag­ni­tudes of value on top of what you learn.

There are so many assump­tions, mis­con­cep­tions, and faulty “best prac­tices” that dic­tate the online world. Even worse, we make assump­tions that some­thing that works else­where works for your site and your users. We gain noth­ing if we are just prov­ing our­selves right, but instead when we chal­lenge those ideas, we start to learn about what makes your site unique and what works best for what you do. We start learn­ing about the best places to effi­ciently change your site, and even who the most exploitable user seg­ments are. Even bet­ter, those lessons seep into your other con­ver­sa­tions, to inform prod­uct plans and senior man­age­ment and all other groups about what you know, not just what you think or pre­tend you know.

The most obvi­ous exam­ple of this is with mul­ti­vari­ate test­ing. So many groups, espe­cially agen­cies, push MVT test­ing as a tool to find a sin­gle answer by throw­ing a num­ber of vari­ants for mul­ti­ple items on a page. It is a big mix­ing machine to reach a new ver­sion of the page faster. If you change that and use MVT test­ing as a learn­ing tool, to focus on what sec­tion of the page or what fac­tor of a sec­tion is most influ­en­tial, then you are able to lever­age your resources in a way to max­i­mize the ROI and learn, accom­plish­ing both tasks far bet­ter then just throw­ing things up to see what sticks. Any­time you are run­ning a mas­sive MVT, full fac­to­r­ial or not, you are sub opti­miz­ing, both in a resource and time per­spec­tive, but also because you have failed to lever­age the oppor­tu­nity to learn.

Liv­ing Knowl­edge Base –

As pro­grams grow, and as you use test­ing as a vehi­cle to learn, the most impor­tant thing you will accu­mu­late is not lift, but func­tional knowl­edge about your site and users. Stor­ing and shar­ing this knowl­edge, based off of causal data, informs future deci­sions in a way that ana­lyt­ics or just “Best Prac­tices” will never be able to do. You have to make this stor­ing and shar­ing a func­tion of the team, and make it acces­si­ble and mean­ing­ful to all groups, even those that were not part of the test that gained the knowl­edge in the first place. For most groups, this accu­mu­la­tion and shar­ing of knowl­edge has scales of impact far greater then just the indi­vid­ual test results.

Build­ing a repos­i­tory of that knowl­edge, and hav­ing it be an active breath­ing thing, that inter­acts with peo­ple and exists out­side of indi­vid­ual tests is vital to achiev­ing the results that pro­grams want to achieve. What it is not is just a list of tests run and their results. What it is meant to be is some­thing that shares lessons, suc­cesses, and fail­ure, but is focused on the learn­ing you have done from your pro­gram, not the minu­tia of the actions that got you there. Every ver­sion of this is dif­fer­ent, as is each and every orga­ni­za­tion, yet they share the same char­ac­ter­is­tics: they focus on what has been learned across tests, they are eas­ily acces­si­ble, they are used to start con­ver­sa­tions and as a barom­e­ter to go, and they see what does and more impor­tantly what does not work. It also allows you to weigh the var­i­ous actions against each other, as just lift alone does not tell you the scale of impact. If you con­tinue to try the same things that fail, you will never be able to lever­age the expo­nen­tial effi­ciency that learn­ing should allow you.

Iter­a­tive Testing –

Iter­a­tive test­ing may seem like a no brainer, but so many groups fail to under­stand or lever­age it as a dis­ci­pline, instead talk­ing about but fail­ing to act on it con­sis­tently. It should be an orga­ni­za­tional rule that no test is ever “over”, but only at a new stage ready for the next test. If you are using tests to learn, then you will know how to pri­or­i­tize a page, which then needs to have the dif­fer­ent sec­tions explored, for not only what the fea­si­ble options are, but also what the most influ­en­tial parts of those are and then what the best way to tackle that win­ning fac­tor may be. The goal here is to make sure that each test that you run max­i­mizes the effi­ciency while mit­i­gat­ing the oppor­tu­nity cost, and the only way to do that is to con­stantly build off prior knowl­edge to insure that you are max­i­miz­ing your place­ment of resources. In many cases, this can be the most dif­fi­cult bar­rier for groups to actu­ally over­come, as while there is a lot of pos­i­tive talk around the sub­ject, there are so many pushes and pulls for your time and for the resources, that it is easy to lose track or to just stop at the end of any given test. You have to force your­self and your group to main­tain that momen­tum, and more impor­tantly lever­age all of that learn­ing, to really drive where you are and where you are going.

Here is an example:

You take a prod­uct page, you chal­lenge your­self to learn about what the most influ­en­tial sec­tion are, so you test out what belongs and what doesn’t by remov­ing all pos­si­ble sec­tions on the page. You learn that 2 sec­tions don’t mat­ter, the top nav­i­ga­tion and the brand infor­ma­tion. You then test out remov­ing them together and find that you have improved the page towards your site wide sin­gle suc­cess met­ric. At this point you have a new page after push­ing the win­ner, but need to dive deeper. You then use a small 3X2 MVT to learn that the but­ton is the most influ­en­tial ele­ment on the new page. You fol­low that up to look at the fac­tors of that but­ton to fig­ure out what about it dri­ves that influ­ence. You learn that color is far more influ­en­tial that copy or size so you then test out 5–6 dif­fer­ent col­ors, and learn that pur­ple, the color that every­one thought never had a chance of win­ning, actu­ally does bet­ter than all the other colors.

What you have is an entire path that you never could have pre­con­ceived. You have forced a way to make sure that pop­u­lar opin­ion did not drive what you test. The result­ing page is not one that any­one could have pre­dicted, but is by far the best per­form­ing. You have learned what the impor­tance of ele­ments are, the best way to change them, and you have used very lit­tle in the way of resources. You are free to con­tinue, as you opti­mize the sec­ond most impor­tant part of the page, or you opti­mize the sec­ond fac­tor of the but­ton. The process con­tin­ues, either in that same part of the user expe­ri­ence, or where you can see other oppor­tu­ni­ties to do the same process based on what you have learned.

Iter­a­tive test­ing is some­thing that is con­stantly thrown around, agreed on, but then why is it so rarely done con­sis­tently? Why do so many groups think that talk­ing about iter­a­tive action is enough? Groups get too caught up on the sin­gle win­ner that they miss that it is just one part of a very fluid user expe­ri­ence. You have to force your­self to fol­low this path, that is why it is a dis­ci­pline, and when things become dif­fi­cult to push through and do this, always, in order to reap the rewards you are seek­ing. Even bet­ter, if you are using seg­men­ta­tion at each point, you will end up with a page that can be dynamic based on the win­ning alter­na­tives for exploitable seg­ments. Each “test” is really just the next evo­lu­tion of the same process. You have to stop your­self and your orga­ni­za­tion from view­ing the test as the ends to itself. Doing this con­sis­tently, not once but always, really dif­fer­en­ti­ates the value you will receive from testing.

Decon­struc­tion –

The abil­ity to have some­one present you a test idea, and then break it apart to find the assump­tions that lead to it so that you can learn as you grow is a vital skill for opti­miza­tion. One of the most com­mon mis­takes that test­ing groups make is to take each idea at face value. Every idea comes from only one point of view, and is rid­dled with biases. You have to force your­self and oth­ers to get past those points to really dis­cover the value hid­den behind what sounds like a con­ven­tion­ally accepted con­cept. Treat the most impor­tant part of your pro­gram as the sys­tem by which you dis­cover new things or chal­lenge biases, and you will always be able to get greater results. We need to take any idea, and chal­lenge every core part of it, so that we leave noth­ing to chance and so that we can really eval­u­ate what works, not just what sounds like it works.

So rarely are you pre­sented with the chance to opti­mize a page from start to fin­ish as men­tioned above, but that does not mean you are not still respon­si­ble for apply­ing the same dis­ci­pline to any start­ing point. Peo­ple will always come to you with test ideas, either through inter­nal debates, feed­back, or just as they are going across the site. There is most likely value from each idea, but the real skill is to break down the idea, to make sure you are not pre­sum­ing the path, and to learn. Is that item even needed? Does what you have been doing, is it pos­i­tive or neg­a­tive? Is that the best place to put resources? What else can be done with that space? Being able to take an idea and chal­lenge the com­po­nents of it is how you will arrive at the most impor­tant lessons you will learn.

The idea of tar­get­ing con­tent in your carousel on your home­page based on what peo­ple have already pur­chased sounds like a great idea, but lets eval­u­ate all of those assumptions:

Is pur­chaser even exploitable (can you change their behav­ior by chang­ing the user expe­ri­ence)?
Is it the most exploitable way to look at the same user?
Is the home­page the right place?
Is the carousel the most influ­en­tial part of that page?
What type of changes to the con­tent are most influ­en­tial, is it the word­ing, the pre­sen­ta­tion, the loca­tion?
Does con­tent have the largest impact?
How does that entire path com­pare to other alternatives?

That is just the very first pass. It is actu­ally far more likely that just chang­ing the lay­out of your home­page based on browser is going to be both much higher in yield, but also more effi­cient of a much larger scale.

It takes prac­tice and dis­ci­pline to force your­self to chal­lenge all of these ideas. It can often lead to dis­com­fort at first, but push­ing past that and assist­ing oth­ers in see­ing their own biases and their own assump­tions helps every­one grow. Test­ing is not about who had the best idea, it is about becom­ing hum­ble and real­iz­ing every­one is “wrong” and about cre­at­ing a sys­tem to push past that and learn and chal­lenge com­mon con­ven­tion for the bet­ter­ment of everyone.

Con­clu­sion –

I am often asked how I define a “suc­cess­ful” pro­gram. When it comes to mea­sur­ing a pro­gram, I mea­sure it on how often they have learned some­thing new and unex­pected. Have they done some­thing that makes every­one go “That can’t be right”, or “that goes against every­thing I have ever heard.” The only way to have those moments and to get the mag­ni­tudes of value both short and long term that those con­clu­sions present you is to break apart ideas and to chal­lenge your­self and oth­ers on the ques­tions that you are try­ing to tackle. It isn’t about being sadis­tic or altru­is­tic, but instead about look­ing at the dis­ci­pline as a means to an end that helps every­one achieve their goals.

In the end, it doesn’t mat­ter if you can run 100 tests a month if you are run­ning sub opti­mal tests. The point is never what you did get, but what you got in rela­tion to what you could have got­ten with the same or fewer resources. So many groups get lost because they focus on the actions, and not the dis­ci­plines that define them and the larger pic­ture of the pro­gram that they exist in. You did an action, that you might have done any­ways, but it is over and you are left with noth­ing but the next idea.

You have to change how you think and how you act to get the results you want and to make test­ing a fun­da­men­tal part of who you are as an orga­ni­za­tion. I talk in terms of dis­ci­plines, because that is what they are, they are core beliefs that only take hold when we chal­lenge our­selves to live by them every day, not just when it is con­ve­nient or easy. Chal­leng­ing your­self to these dis­ci­plines and ask­ing the tough ques­tions of oth­ers will allow you to move down a path away from the obvi­ous to the world where you learn, grow, and have results that really move people.

To nav­i­gate the entire test­ing series:
Test­ing 101 / Test­ing 202 / Test­ing 303 — Part 1 / Test­ing 303 — Part 2