Change the Conversation: Segmentation and Testing
One of the fundamental struggles people have when entering the world of testing is an understanding of segmentation. Segmentation is word that just about everyone uses, but one that has subtle but important differences in meaning when it comes to whichever part of the data stream you are talking about. People understand that there are different user groups, but they fail to see that you can look at and act on that information in a massive number of ways. Because of this people consistently fail to change the way they think about segmentation, instead reverting to whatever way is most comfortable to them.
In the world of testing, people struggle to balance their preconceived notions of what matters with the evaluation of the efficiency and value of each segment. Too many people are so used to concocting a story about their users that they blindly believe their construct without measuring the validity of their story. One of the fundamental truths about changing behavior is that the same user can be looked at from a hundred different ways, and the core challenge of optimization is to figure out the most efficient way to change their experience. Some of the worst mistakes in testing come from people simply serving content without knowing the value of that action. I want to present some simply but effective rules to make sure that you are getting the maximum value from your “personalization” or testing programs.
To start, I want to make sure that we are clear on a definition for segments. In the world of testing, we care about finding exploitable segments. Segments that change behavior by changing their experience, not ones that just have different behaviors inherently. While it is nice and good that you want to serve content to return users, if the same content “wins” for return and new visitors, then you have not actually increased performance. If however a different tested experience “wins” for new visitors then return visitors, the change in experience then would create a positive lift to the overall population. Our quest is never to find groups that have a different behavior, but ones that change their behavior to improve more based on the outcomes of a different experience.
The next thing to understand is that in the world of testing, a segment needs to be the definition of the user prior to them entering a campaign. It can have any definition, but that information has to be used prior to them entering. The reason for this is that the fundamental purpose of testing is to evaluate the change in behavior once you make the change in their experience. After they have entered, you are only measuring rates of action; you are no longer measuring something that allows you to make a meaningful change to a user experience. That information may be interesting, but the value comes from our ability to interact with people, and that is done on your initial change. There are uses for tracking things past initial interaction, but that no longer fits the use or definition of a segment.
Where people make mistakes is when they try to think about test data the same way that they do analytics data, and then try to use segments to determine what they do after you make the change in experience. One of the core differences is that in analytics, you are only looking at a rate of action, so any two groups that have a rate of action difference seem meaningful. In testing, we are looking for our ability to influence behavior, and in order to do that, we change experiences, so we must look at a segment at the point we differentiate the user experience.
So if a segment is only defined before it interacts with a test, and what most programs track cannot actually provide value, what then should you use for your segments? The first thing that everyone loves to talk about is behavior segments, and those should definitely be considered, but you first must meet a few criteria to be valuable.
Here are two rules of thumb to make sure you are looking at the right segments:
1) Must be at least 7–10% of your population – Must be big enough that the cost to serve a different experience is covered. For larger sites, a smaller group is appropriate, for smaller sites, you might need a larger split to be meaningful.
2) Must be comparable – Just because we can target to Google users, unless non-Google users need a different experience, then targeting to that group does not actually improve performance.
With those two rules in mind, then we need to make sure that we are looking at groups of users that we have enough information about. Just because you are sure that a person who comes to your site 3 times and has looked at 2 product pages and newsletters, if that represents only 1% of your population, no amount of increased behavior is going to make the upkeep cost worthwhile. Keep in mind that the smaller the segment and the more different segments you serve, the higher the upkeep cost in additional creative, program maintenance, and in creating tests in the future. It is easy to get caught up in today’s outcome, but the goal is to create a successful program, not just run a single test. You always have to keep in mind the efficiency of the change, not just the hoped for increase. It is easy to get excited when we lose the context of the change, but when you look at things from a holistic view, you will find the value of many discoveries is miniscule.
Another key rule with segments is that you must look at everything, not just what you want to win. Behavioral segments are easy to understand, but that does not guarantee they are valuable or more importantly, the most valuable. For just about every site, it is the things that you don’t think matter and the things you don’t want to do that are the most valuable. The same users is a new visitor, but you also know so many other things about them. You are trying to find the most efficient thing to exploit, not just be able to target on one type of view of that same user. You may want to target them based on being a new visitor, but if it is more valuable for you to change the experience based on time of day, then the new and returning view is not the one you should be targeting. In order to make sure that happens, you have to make sure that you are including segments from all possible ways to look at a user. Remember that you are not trying to find just an exploitable segment, but the most exploitable segment. The goal for any program is never to find an outcome but the best outcome for that use of resources. With that in mind, a good rule of thumb to go with the two rules above is to make sure that all testing efforts have at least 7 segments and no more then 20, just to keep things manageable to keep you going down a rabbit hole. Over time, if you find some segments types to be more valuable than others, then you can replace old ones with deeper dives into the new ones.
A standard starting segment list might look like this:
Add 2–3 easily identifiable and large behavior segments that make sense for your business. With that list in hand, you can discover if browser is more valuable than Google or behavior more valuable than time of day. It is much better to look at higher level segments and then later drill down as your learn as opposed to choosing the most minute segment, as it is hard to measure efficiencies of scale if you are looking too narrow.
One of the last great challenges then becomes what to do when a segment you didn’t think mattered is the most important. This is actually the case for just about all organizations, but the real key is to remember that it changes nothing. Just because you can’t think in terms of browser mattering more than people who come from email, ultimately you are setting rules and then changing the user experience based on them. Why a Firefox user responds to something is far less important then what you do with that information. Inversely, just because you have always believed something to be true, and you can measure massively different rates of action, and experts are telling you about the power of targeting to group X, does not mean that it is necessarily of any value or the most valuable. Being successful and creating the largest value is not about the ability to target but about the ability to target experiences in a way that generates the maximum amount of improved performance for the lowest cost. Worrying about why something matters more can often lead you down a rabbit hole of lost value and misguided efforts. Be open to the data and make sure you are prepared to act on it before it goes live.
The goal for all programs is to discover and create dynamic user experiences that produce a positive outcome. You can find magnitudes more value when you have properly found and exploited opportunities to make relevant user experiences. Where we fail is in thinking we are smarter than the system and in thinking that just because something shows a different rate of action then it must be exploitable. We have to get past what we want to prove, and instead be humble and open to seeing what is truly valuable. Be open to new ideas and be willing to do things you are comfortable with or would never have imagined. If you do, then you will change not only the path your performance is on, but also the path your organization is on.