One mis­take I see a lot of com­pa­nies mak­ing is run­ning a test to find a win­ner, instead of run­ning a test to learn some­thing. The two-recipe test is a symp­tom of this. Often the test includes a com­plete redesign plot­ted against the incum­bent default. In this sce­nario, the redesign may edge out the default. It may not. Either way, you may have won a bat­tle by reach­ing a slightly higher thresh­old in per­for­mance or by stop­ping an engagement-crippling redesign from becom­ing a per­ma­nent fix­ture on your site. You have found what can be called a local max­i­mum. The prob­lem here is while you may have won a bat­tle, that bat­tle is not help­ing you win the war.

What is the war? The war here is opti­miz­ing your site to move past bet­ter in search of best.  Let’s say you are for­tu­nate in that your redesign test had a chal­lenger that per­formed bet­ter than the default. In this case you have found a bet­ter expe­ri­ence, but it tells you noth­ing about how to get to the best expe­ri­ence. Is this new base­line the best the site can do? Maybe it is, but more than likely it is not. At this point you have per­fect vis­i­bil­ity back­ward into how the tested change impacted per­for­mance, but no vis­i­bil­ity for­ward into what types of changes will drive fur­ther increases.

How do you get to the best expe­ri­ence then? That involves iden­ti­fy­ing what mat­ters first and then iter­a­tively improv­ing those ele­ments. To put this in con­text, con­sider the redesign. A redesign may involve chang­ing five dif­fer­ent ele­ments. If you have a large mar­ket­ing team or ded­i­cated agency, make that 15 dif­fer­ent ele­ments. Either way, it will be impos­si­ble to accu­rately allo­cate any pos­i­tive or neg­a­tive lift among those var­i­ous changes, result­ing in a real attri­bu­tion prob­lem. Instead, start with some diag­nos­tic tests that will tell you what about the cur­rent page is impor­tant and wor­thy of follow-up tests. These tests allow you to keep your focus uphill, look­ing past local maximums.

Here are a cou­ple ways to do this:

The Exclu­sion Test
A sim­ple exclu­sion test is a per­fect exam­ple. This involves cre­at­ing sev­eral expe­ri­ences where you exclude one ele­ment at a time from the page to see its over­all impact on page con­sump­tion (or con­ver­sion rate – depend­ing on what met­ric makes sense for your busi­ness). If you remove some­thing and page con­sump­tion goes down, that ele­ment is help­ful and should be at the top of your list of ele­ments on which you will focus a series of A/B tests. Con­versely, if you remove an ele­ment and page con­sump­tion goes up, that ele­ment is prob­a­bly adding clut­ter and you may want to con­sider remov­ing it. For most pub­lish­ers, these tests are very easy to set up because of how the typ­i­cal main edit wells and right rails are set up. These ele­ments typ­i­cally employ styling such that if you hide one ele­ment, every­thing below it col­lapses neatly upward. This means a sim­ple CSS change that can be writ­ten in two min­utes will pro­duce a recipe. If your site is employ­ing proper CSS, you can have an entire test set up in an hour.

The Mul­ti­vari­ate Test
The MVT seems to be a test­ing buzz­word that hon­estly gets many testers into trou­ble because they mis­use it. When imple­mented cor­rectly though, it can be a pow­er­ful diag­nos­tic tool. If you are using an MVT to find the magic recipe from an array of 46 per­mu­ta­tions, you are prob­a­bly mis­us­ing it. Instead, the MVT can be very help­ful in iden­ti­fy­ing which ele­ments on the page con­tribute to suc­cess, or have high “ele­ment con­tri­bu­tion”. Either way, the focus here should be on the impor­tance of the ele­ments, and not the spe­cific treat­ments of those ele­ments (red vs. green, etc.) Run­ning an MVT in this way allows you to test three or more ele­ments at once, with the goal of find­ing out which of the ele­ments mat­ter enough to war­rant follow-up A/B tests.

These two types of tests can be help­ful diag­nos­tic meth­ods of find­ing out where the levers are on your site that if pulled will drive toward the best site, instead of sim­ply set­tling with bet­ter. Run­ning a cou­ple of extra diag­nos­tic tests to cre­ate a map to the top of the moun­tain may sound like more effort and more work than sim­ply stop­ping at the near­est peak. It is more work, but that is what will dis­tin­guish best from bet­ter on your site. If it were easy every­one would be doing it and you would be paid far less because any­one can do it.