There’s been quite a storm brewing around the best methodology to use for multivariate testing: fractional-factorial vs. full-factorial. (For a quick primer, definitions of both are included at the bottom of this post.) I have to say that some of the arguments I’ve heard border on ideological in both their passion and rigor. Are there scenarios where one methodology makes more sense than the other? Absolutely. Is it possible that one methodology is right for every scenario? No. I have my own thoughts on when each approach is applicable, but I’d first like to see if we can agree on a few statements:

1) The Internet changes every day, whether it’s based on oil prices, local and global news, or your competition switching tactics and prices.

2) What worked on your site six months ago may not be the same thing that works today, and will most likely not be what works six months from now.

3) The most successful Internet marketers are light on their feet – agile, flexible, and able to adapt quickly.

4) There is no magic formula to the Internet. We cannot say that if A & B, then C will happen each and every time.

In a perfect theoretical world where none of the statements above were true, I would run full-factorial each and every time. That way, I get to understand which exact combination is best out of all the possible combinations of elements, and even calculate all the different levels of interaction between elements. However, we unfortunately don’t have the gift of infinite time when running tests and analyzing results.

I recently read a case study touting full-factorial and the 576 different combinations tested. It had great graphs and charts of data, but, in my opinion, there were 2 huge things missing:

1) How long did this test take to run? If I go by Google’s handy calculator, I would estimate it took nearly half a year:

I don’t know many companies who have the luxury of running a test longer than one month, let alone five months!

2) How did different customer segments perform? Were segments even set up and tracked? With 576 combinations to test, even setting up two coarse segments such as new visitor and return visitor would double the amount of time the test had to run. In this case, we’re now looking at closer to a year! How can any company with various acquisition points and customer behavioral segments run a test and not slice their population up to understand where the differentiation lies though? Consider the customers who search on “guitar center” vs. those who click a PPC ad after searching for “les paul guitar” – is it possible they might react differently in a test? I would say it’s quite likely.

Does all this mean there are no cases where full-factorial might be more effective? Not at all. I have recommended running a full-factorial to clients in the past when the elements they were testing were highly graphical and seemed interdependent. Take, for example, a row of different photo categories (Abstracts, People, Close ups, B&W, etc) to choose from where each category’s photo representation would be considered an element to test. That seems like the appropriate place to run a full-factorial because you may not want 2 pictures that look very similar to appear side-by-side. However, there are trade-offs to dedicating the time and traffic to full-factorial. You most likely have to severely limit the number of elements you will be testing at once. You may also have to forgo customer segmentation unless you are one of the few companies with the benefit of millions of visitors a day.

I think that one of our own customers actually summed it up best for me last week. John Pace, a true champion of testing and the head of optimization at Real Networks, likened fractional-factorial testing to a barometer. He’s a sailing man, so forgive me if the analogy doesn’t sync up for you. A barometer measures atmospheric pressure, but its value is not so much in the precise measurement as the notification that there is a directional change in pressure.

In much the same way, testing is supposed to give you directional feedback on what is performing and resonating best with your visitors. Testing is not a document or proof that you can use to be 100% sure of how your visitors will behave moving forward. Because of that, I question how valuable it is to spend 5 months running 1 single test for learnings that may no longer be applicable by the time the test has completed and the data pumped through analysis. Instead, why not take the winnings and learnings of your week-long fractional-factorial multivariate test and then run another test that builds off that new and improved baseline. If you can approach your testing program that way, I’m confident that you’ll find more upside in both lift and learnings in the same 5-month period.

At the end of the day, no matter which methodology you end up using, the race for conversions and revenue is not going to be won by the might of your statistics. Your strengths should be creativity, innovation, and the ability to listen and react to your customer. Employing those in testing will get you to the finish line; it’s just a matter of whether you get there sooner or later than the rest of the pack.

Definitions

Multivariate Test: A multivariate (MVT) test enables you to test multiple elements simultaneously. A multivariate test example would be to test the banner, headline, copy, and call-to-action on a landing page. The benefits of running a multivariate test are that you can test more elements at once than an A/B test, and you also get information about which elements were most significant and which alternatives produced the most lift.

Full-Factorial Design: Full-factorial design tests all of the different combinations of elements and their alternatives. For example, if you had 7 elements on a page with 2 alternatives each, a full-factorial design would test all 128 (2^7) combinations.

Fractional-Factorial Design: Fractional-factorial design tests a subset of all the different combinations of elements and their alternatives. In the same test example above, a fractional- factorial design using the Taguchi method would test 8 combinations.