One of the fun­da­men­tal strug­gles peo­ple have when enter­ing the world of test­ing is an under­stand­ing of seg­men­ta­tion. Seg­men­ta­tion is word that just about every­one uses, but one that has sub­tle but impor­tant dif­fer­ences in mean­ing when it comes to whichever part of the data stream you are talk­ing about. Peo­ple under­stand that there are dif­fer­ent user groups, but they fail to see that you can look at and act on that infor­ma­tion in a mas­sive num­ber of ways. Because of this peo­ple con­sis­tently fail to change the way they think about seg­men­ta­tion, instead revert­ing to what­ever way is most com­fort­able to them.

In the world of test­ing, peo­ple strug­gle to bal­ance their pre­con­ceived notions of what mat­ters with the eval­u­a­tion of the effi­ciency and value of each seg­ment. Too many peo­ple are so used to con­coct­ing a story about their users that they blindly believe their con­struct with­out mea­sur­ing the valid­ity of their story. One of the fun­da­men­tal truths about chang­ing behav­ior is that the same user can be looked at from a hun­dred dif­fer­ent ways, and the core chal­lenge of opti­miza­tion is to fig­ure out the most effi­cient way to change their expe­ri­ence. Some of the worst mis­takes in test­ing come from peo­ple sim­ply serv­ing con­tent with­out know­ing the value of that action. I want to present some sim­ply but effec­tive rules to make sure that you are get­ting the max­i­mum value from your “per­son­al­iza­tion” or test­ing programs.

To start, I want to make sure that we are clear on a def­i­n­i­tion for seg­ments. In the world of test­ing, we care about find­ing exploitable seg­ments. Seg­ments that change behav­ior by chang­ing their expe­ri­ence, not ones that just have dif­fer­ent behav­iors inher­ently. While it is nice and good that you want to serve con­tent to return users, if the same con­tent “wins” for return and new vis­i­tors, then you have not actu­ally increased per­for­mance. If how­ever a dif­fer­ent tested expe­ri­ence “wins” for new vis­i­tors then return vis­i­tors, the change in expe­ri­ence then would cre­ate a pos­i­tive lift to the over­all pop­u­la­tion. Our quest is never to find groups that have a dif­fer­ent behav­ior, but ones that change their behav­ior to improve more based on the out­comes of a dif­fer­ent experience.

The next thing to under­stand is that in the world of test­ing, a seg­ment needs to be the def­i­n­i­tion of the user prior to them enter­ing a cam­paign. It can have any def­i­n­i­tion, but that infor­ma­tion has to be used prior to them enter­ing. The rea­son for this is that the fun­da­men­tal pur­pose of test­ing is to eval­u­ate the change in behav­ior once you make the change in their expe­ri­ence. After they have entered, you are only mea­sur­ing rates of action; you are no longer mea­sur­ing some­thing that allows you to make a mean­ing­ful change to a user expe­ri­ence. That infor­ma­tion may be inter­est­ing, but the value comes from our abil­ity to inter­act with peo­ple, and that is done on your ini­tial change. There are uses for track­ing things past ini­tial inter­ac­tion, but that no longer fits the use or def­i­n­i­tion of a segment.

Where peo­ple make mis­takes is when they try to think about test data the same way that they do ana­lyt­ics data, and then try to use seg­ments to deter­mine what they do after you make the change in expe­ri­ence. One of the core dif­fer­ences is that in ana­lyt­ics, you are only look­ing at a rate of action, so any two groups that have a rate of action dif­fer­ence seem mean­ing­ful. In test­ing, we are look­ing for our abil­ity to influ­ence behav­ior, and in order to do that, we change expe­ri­ences, so we must look at a seg­ment at the point we dif­fer­en­ti­ate the user experience.

So if a seg­ment is only defined before it inter­acts with a test, and what most pro­grams track can­not actu­ally pro­vide value, what then should you use for your seg­ments? The first thing that every­one loves to talk about is behav­ior seg­ments, and those should def­i­nitely be con­sid­ered, but you first must meet a few cri­te­ria to be valuable.

Here are two rules of thumb to make sure you are look­ing at the right segments:

1) Must be at least 7–10% of your pop­u­la­tion – Must be big enough that the cost to serve a dif­fer­ent expe­ri­ence is cov­ered. For larger sites, a smaller group is appro­pri­ate, for smaller sites, you might need a larger split to be meaningful.

2) Must be com­pa­ra­ble – Just because we can tar­get to Google users, unless non-Google users need a dif­fer­ent expe­ri­ence, then tar­get­ing to that group does not actu­ally improve performance.

With those two rules in mind, then we need to make sure that we are look­ing at groups of users that we have enough infor­ma­tion about. Just because you are sure that a per­son who comes to your site 3 times and has looked at 2 prod­uct pages and newslet­ters, if that rep­re­sents only 1% of your pop­u­la­tion, no amount of increased behav­ior is going to make the upkeep cost worth­while. Keep in mind that the smaller the seg­ment and the more dif­fer­ent seg­ments you serve, the higher the upkeep cost in addi­tional cre­ative, pro­gram main­te­nance, and in cre­at­ing tests in the future. It is easy to get caught up in today’s out­come, but the goal is to cre­ate a suc­cess­ful pro­gram, not just run a sin­gle test. You always have to keep in mind the effi­ciency of the change, not just the hoped for increase. It is easy to get excited when we lose the con­text of the change, but when you look at things from a holis­tic view, you will find the value of many dis­cov­er­ies is miniscule.

Another key rule with seg­ments is that you must look at every­thing, not just what you want to win. Behav­ioral seg­ments are easy to under­stand, but that does not guar­an­tee they are valu­able or more impor­tantly, the most valu­able. For just about every site, it is the things that you don’t think mat­ter and the things you don’t want to do that are the most valu­able. The same users is a new vis­i­tor, but you also know so many other things about them. You are try­ing to find the most effi­cient thing to exploit, not just be able to tar­get on one type of view of that same user. You may want to tar­get them based on being a new vis­i­tor, but if it is more valu­able for you to change the expe­ri­ence based on time of day, then the new and return­ing view is not the one you should be tar­get­ing. In order to make sure that hap­pens, you have to make sure that you are includ­ing seg­ments from all pos­si­ble ways to look at a user. Remem­ber that you are not try­ing to find just an exploitable seg­ment, but the most exploitable seg­ment. The goal for any pro­gram is never to find an out­come but the best out­come for that use of resources. With that in mind, a good rule of thumb to go with the two rules above is to make sure that all test­ing efforts have at least 7 seg­ments and no more then 20, just to keep things man­age­able to keep you going down a rab­bit hole. Over time, if you find some seg­ments types to be more valu­able than oth­ers, then you can replace old ones with deeper dives into the new ones.

A stan­dard start­ing seg­ment list might look like this:

New Users
Return­ing Users
From Google
From Bing
Inter­net Explorer
Work Hours
Non-Work Hours
Non Mobile

Add 2–3 eas­ily iden­ti­fi­able and large behav­ior seg­ments that make sense for your busi­ness. With that list in hand, you can dis­cover if browser is more valu­able than Google or behav­ior more valu­able than time of day. It is much bet­ter to look at higher level seg­ments and then later drill down as your learn as opposed to choos­ing the most minute seg­ment, as it is hard to mea­sure effi­cien­cies of scale if you are look­ing too narrow.

One of the last great chal­lenges then becomes what to do when a seg­ment you didn’t think mat­tered is the most impor­tant. This is actu­ally the case for just about all orga­ni­za­tions, but the real key is to remem­ber that it changes noth­ing. Just because you can’t think in terms of browser mat­ter­ing more than peo­ple who come from email, ulti­mately you are set­ting rules and then chang­ing the user expe­ri­ence based on them. Why a Fire­fox user responds to some­thing is far less impor­tant then what you do with that infor­ma­tion. Inversely, just because you have always believed some­thing to be true, and you can mea­sure mas­sively dif­fer­ent rates of action, and experts are telling you about the power of tar­get­ing to group X, does not mean that it is nec­es­sar­ily of any value or the most valu­able. Being suc­cess­ful and cre­at­ing the largest value is not about the abil­ity to tar­get but about the abil­ity to tar­get expe­ri­ences in a way that gen­er­ates the max­i­mum amount of improved per­for­mance for the low­est cost. Wor­ry­ing about why some­thing mat­ters more can often lead you down a rab­bit hole of lost value and mis­guided efforts. Be open to the data and make sure you are pre­pared to act on it before it goes live.

The goal for all pro­grams is to dis­cover and cre­ate dynamic user expe­ri­ences that pro­duce a pos­i­tive out­come. You can find mag­ni­tudes more value when you have prop­erly found and exploited oppor­tu­ni­ties to make rel­e­vant user expe­ri­ences. Where we fail is in think­ing we are smarter than the sys­tem and in think­ing that just because some­thing shows a dif­fer­ent rate of action then it must be exploitable. We have to get past what we want to prove, and instead be hum­ble and open to see­ing what is truly valu­able. Be open to new ideas and be will­ing to do things you are com­fort­able with or would never have imag­ined. If you do, then you will change not only the path your per­for­mance is on, but also the path your orga­ni­za­tion is on.