All right folks, gather around…get nice and close…I want to tell you a cau­tion­ary tale about A/B test­ing. That’s right, the old standby of web ana­lyt­ics. A/B test­ing, or “split-run test­ing” as it is some­times called, is one of the most per­va­sive and widely used method­olo­gies behind web site improve­ment. And right­fully so — the con­cept is simple.

Say, for exam­ple, you want to test a new ver­sion of your Home Page. Well, direct some of your traf­fic to your cur­rent page (the A page) and some to your new page (the B page)…look at the dif­fer­ences in performance…and voila, you have your win­ner! Did the new page B out­per­form page A? If so, great — let’s direct all of our traf­fic to that page and watch our sales go through the roof. Or per­haps page B didn’t fare so well — so scrap it, and try again. Easy.

I would esti­mate this method­ol­ogy has dri­ven hun­dreds of mil­lions, if not bil­lions, in incre­men­tal rev­enue gains for Web sites. So why stop now? It’s pow­er­ful, sim­ple, and accurate.…or is it? While I’d agree that A/B test­ing is pow­er­ful and sim­ple, there is a dark side to A/B test­ing that if unchecked, can sig­nif­i­cantly impact your rev­enue gains, or even (shud­der) pro­duce losses. What is this dark side? Read on and I’ll tell you a story about A/B test­ing gone wrong.

THE FIRST MISTAKE

Once upon a time there was large retailer that wanted to redesign their Home Page. As KPIs, they used Home Page Con­ver­sion Rate and Rev­enue per Home Page Visit to mea­sure the suc­cess of their Home Page.

Before the test, the page had a Home Page Con­ver­sion rate of 3.0% and Rev­enue per Home Page visit of $20. Most of the Home Page con­sisted of big images of each prod­uct cat­e­gory, and not much else. The Mar­ket­ing group wanted more room for spe­cial offers and the cur­rent design gave them none. Fur­ther­more, as they con­tin­ued adding cat­e­gories, the Home Page was becom­ing long and unwieldy. The design team felt it was dif­fi­cult for cus­tomers to nav­i­gate. So they cre­ated a new Home Page with much more room for pro­mo­tions, and elim­i­nated some of the less pop­u­lar cat­e­gories. They were pretty pleased with the design and knew the next step was to A/B test it.

They decided upon a 90/10 split test — with only 10% of traf­fic going to this new Home Page — so they could min­i­mize risk if it should per­form poorly (note: While many folks think of A/B test­ing as a 50%/50% split of traf­fic; an indus­try best prac­tices is to expose only a frac­tion of your traf­fic to the new design � like 10% — so as to min­i­mize risk).

Within the first 24 hours, the results were encour­ag­ing. The new page was show­ing a 4.0% Home Page Con­ver­sion Rate, and Rev­enue per Home Page visit of $25. Within the first week, the per­for­mance held up.

They began talk­ing about what they would change next. The Mar­ket­ing team felt the A/B test clearly demon­strated the effec­tive­ness of their spe­cial offers. The Design team felt the A/B test demon­strated the effec­tive­ness of fewer cat­e­gories. They argued back and forth, but the data didn’t sup­port either point. Rather, it sup­ported both points. In other words, the test was inconclusive.

So what hap­pened? Why did the test fail? Well, it failed because they changed more than one ele­ment on a page. When you con­duct an A/B test, the best prac­tice is to change only one ele­ment on the page. Chang­ing more than one ele­ment in an A/B test makes it impos­si­ble to deter­mine which change drove bet­ter per­for­mance and which did not. And while you might be tempted to just embrace all the changes since they led to an over­all pos­i­tive result; that is a short-sighted way to make a deci­sion of such impor­tance to your rev­enue stream. Why? Because it will not advance your under­stand­ing of your cus­tomers or their behav­ior, and will not keep you from repeat­ing mis­takes in the future.

Yes, it takes dis­ci­pline to con­duct your A/B tests this way. But it pays off in solid, action­able busi­ness intel­li­gence that helps you improve your over­all success.

Side­note: If you want to test mul­ti­ple ele­ments at the same time, you need to con­duct multi-variate test­ing (MVT) rather than A/B test­ing. Most multi-element or multi-variate tests intro­duce both pos­i­tive and neg­a­tive per­form­ers. It’s the nature of the beast. This is one rea­son why multi-variate test­ing — or MVT — is so pow­er­ful. Because it pro­vides a sense of which ele­ments per­form bet­ter and which per­form worse, it helps you con­tin­u­ously improve your per­for­mance. But it does take a lot longer to set up than A/B tests, and usu­ally requires spe­cial­ized help to do it right. If you’re inter­ested in MVT, Omni­ture Site­Cat­a­lyst inte­grates with sev­eral major MVT ven­dors — please feel free to send us an email via the Con­tact link in the footer if you’d like more information.

Bot­tom line: when you are doing A/B test­ing, dis­ci­pline your team to focus on one ele­ment, one test…remember that: one ele­ment, one test…and you can avoid costly mis­takes like the one I just highlighted.

THE SECOND MISTAKE

Now most indus­try vet­er­ans will read­ily acknowl­edge the “one ele­ment, one test” mantra…that’s not earth-shattering and new. But what far fewer know, and what many more peo­ple stum­ble over, is this sec­ond mis­take. To illus­trate, let’s con­tinue with our hap­less retailer.

The retailer designs a new Home Page, chang­ing only one ele­ment on the page. Good so far. But after the first 24 hours, the results are dis­cour­ag­ing. Con­ver­sion has dropped to 2.0% and rev­enue per Home Page visit has dropped to $10. They decide to wait it out, but after 7 days, the results are still dis­cour­ag­ing. They yank the page and con­clude the new design was flawed.

While this all sounds pretty log­i­cal, it’s not. When run­ning an A/B test, it’s crit­i­cal to dig below your top-level KPIs — such as con­ver­sion and rev­enue per visit — and exam­ine the vis­i­tor mix. New vis­i­tors typ­i­cally respond much more favor­ably to a new page design than repeat vis­i­tors. Loyal vis­i­tors tend to be the worst. This actu­ally shouldn’t be sur­pris­ing to any­one. Imag­ine that your local gro­cery store sud­denly moved the deli from the left front side of the store to the back right. If it were your first visit, you wouldn’t know the dif­fer­ence and would ori­ent your­self appro­pri­ately. But if you had been a fre­quent cus­tomer — say shop­ping there twice a week — you would be pretty con­fused. In fact, after going to the left front side of the store — you might be so frus­trated you leave. I know that sounds extreme, but in an online mar­ket­place, it�s much eas­ier to shop else­where than in the mate­r­ial world.

This is not to say you shouldn’t redesign the store, or your site. If you have good rea­sons to move the deli (like you are adding a new high-margin DVD-rental kiosk where the deli once was), then by all means move it. Just real­ize that it may take an extended period of time for your most loyal vis­i­tors to warm up to it.

Tak­ing a step back, there is arguably an inverse rela­tion­ship between vis­i­tor value and pos­i­tive response to alter­na­tive page design. This is par­tic­u­larly the case when you con­sider vis­i­tor value as not just a func­tion of life­time rev­enue, but a func­tion of Recency and Fre­quency as well. In other words, tak­ing the Direct Mar­ket­ing con­cept of a stan­dard RFM model, your upper quar­tile (those with the high­est scores in Recency, Fre­quency, and Mon­e­tary val­ues) will quite pos­si­bly have the worst reac­tion to your new design.

So what does all this mean to you? Well, first off, dive deeply into those A/B test results, and use multi-dimensional seg­men­ta­tion to help you under­stand exactly who is react­ing well to your changes, and who hates them. Seg­ment by new and repeat vis­i­tors and ana­lyze the per­for­mance of each group. Bet­ter yet, seg­ment per­for­mance by RFM quar­tiles. It’s really the only way to know if you’re hit­ting the mark with your tar­geted cus­tomers. If you�re new to RFM, some key start­ing met­rics for RFM would be �Days Since Last Pur­chase or Visit�; Vis­its within a period time; and Order Value/Revenues.

Frankly, this is where a prod­uct like Omni­ture Dis­cover can be your best friend because multi-dimensional seg­men­ta­tion is so crit­i­cal to your analy­sis. You just can’t drill into the data using tra­di­tional web ana­lyt­ics tools — they sim­ply aren’t built to sup­port the mul­ti­ple lev­els you need. You need a way to see all vis­i­tors -> vis­i­tors to Home Page -> vis­i­tors to Home Page who saw page B -> vis­i­tors to Home Page who saw page B and ordered -> vis­i­tors to Home Page who saw page B, ordered, and were new -> vis­i­tors to Home Page who saw page B, didn’t order, and were new, etc.

Armed with that kind of deep-level insight, you can’t help but make all the right deci­sions — some of which may have seemed com­pletely counter-intuitive at first. (The sur­prise is half the fun!)

THE MORAL OF THE STORY

Hope­fully this lit­tle para­ble helped expand your under­stand­ing of A/B test­ing. My goal is to empower you to max­i­mize your rev­enue poten­tial and avoid the bad deci­sions that can result from using only high-level data. If you will dis­ci­pline your­self to change and test only one ele­ment at a time, and will take the time to ana­lyze your results seg­ment by seg­ment, you will avoid the two most com­mon, and most costly, errors peo­ple make in A/B test­ing.

If you’d like assis­tance set­ting up or ana­lyz­ing your A/B tests, feel free to con­tact the Omni­ture Best Prac­tices Group — we’re more than happy to help!

What to chat about this topic? Dis­cuss your own expe­ri­ences with A/B and Multi-variate test­ing with your peers in our Web­site Con­ver­sion and Opti­miza­tion forum.

1 comments
shawn
shawn

If you're dealing with international customers I would think that segmenting performance by geography would be just as (if not more) important than by RFM quartiles, particularly if you're targeting specific markets.