After your opti­miza­tion pro­gram has been run­ning for a while, there will nat­u­rally come times when you get to share the amaz­ing suc­cess that has been gen­er­ated by run­ning your pro­gram cor­rectly. Often times groups are so happy to get to brag, they don’t real­ize that they are caus­ing the pro­gram long term problems.

There are many keys to shar­ing results cor­rectly, but the most impor­tant thing to remem­ber is why are you shar­ing them. Yes, you are talk­ing to oth­ers about how great you are, but ulti­mately results are not about the past, but about the future. Should oth­ers invest in the pro­gram? Should they expand where and what you opti­mize? What do oth­ers think about and inter­act with opti­miza­tion and most impor­tantly, why should they lis­ten to you and your team when it comes to how and why you should be run­ning cer­tain tests. The crit­i­cal moments come not from the shar­ing of results, but the fram­ing of why and how you get those results, and what it can do for other parts of the orga­ni­za­tion. Focus on the wrong parts, and you will are ask­ing for far more trou­ble than most can imagine.

Some of the worst moments for pro­grams come after big results pre­sen­ta­tions, where they have got­ten buy in from oth­ers and the mas­sive expan­sion of the pro­gram, but fail to share the right mes­sage and to help oth­ers under­stand that test­ing is often times not what they think it is. These moments inevitably lead to large resource drains, neg­a­tive impres­sions, and mas­sive time sinks for the orig­i­nal test­ing group, lead­ing them to frus­tra­tion and less total rev­enue gen­er­a­tion then before the influx of resources. Look­ing back 6 and 12 months later, you can eas­ily see the moment that things went south.

With all of those con­cerns in mind, here are the keys to suc­cess­fully shar­ing results within your organization.

DO – Focus on what you got by prov­ing assump­tions wrong

It is so fun to talk about get­ting an 8% lift, or get­ting a 20% lift on mul­ti­ple tests, but often times the lift is sec­ondary to the dis­ci­pline that lead you to get it. If you are test­ing to find the opti­mal use of resources, then it is inevitable that you will find many times when pop­u­lar assump­tions have been proven wrong. As you share your results, it is impor­tant that this is the pri­mary part of the mes­sage. It is not about “we got a 15% lift”, it is about “every­one wanted to do X, which would have gen­er­ated a 3% lift, but we found that of these 5 fea­si­ble alter­na­tives, that doing Y actu­ally was dra­mat­i­cally bet­ter and gen­er­ated a 15% lift, or 12% bet­ter than we would have got­ten if all we did was test what peo­ple thought was going to win.”

DON’T – Report test reports as a sin­gle rev­enue number

It is so fun and easy to report results as, “the test gen­er­ated 6.2 mil­lion in addi­tional rev­enue.” The prob­lem is that there is absolutely no way to know specifics with accu­racy, and you will gar­ner a lack of trust if later the P&L does not show that exact fig­ure of gain. I under­stand how impres­sive it is to point to a sin­gle num­ber and how much it can get you credit, but ulti­mately it is far more dam­ag­ing then the tem­po­rary good it might generate.

Instead, even if we ignore all the real world prob­lems with con­fi­dence, it is impor­tant to under­stand that con­fi­dence and most mea­sures only mea­sure the like­li­hood of pat­tern, not the actual out­come. If I have a 10% lift and 96% con­fi­dence, it is not 96% con­fi­dent that you will get a 10% lift, only 96% con­fi­dent that the mea­sured expe­ri­ence will beat con­trol. Con­fi­dence inter­vals can also be tricky because of the many assump­tions of the Gauss­ian bell curves that they are based off of.

Instead focus on report tests as a range, based on a pre­set range. What that range is some­what arbi­trary, as long as it is suf­fi­ciently large enough to con­vey the mas­sive range of pos­si­bil­ity. If I have not done deep analy­sis of past results, I will often times report test results in a 50% — 200% range, so that the 6.2 mil­lion becomes an expected out­come of 3.1 to 12.4 mil­lion dol­lars. Ulti­mately the range is arbi­trary, though there are ways to look back at results over time and see an expected range. Express every­thing in a large but rel­e­vant range and you will avoid all the mas­sive prob­lems of cred­i­bil­ity false report­ing creates.

DO – Report all tests based on rev­enue impact

While you can’t report an absolute num­ber that does not mean that you should not be report­ing the fis­cal impact of a test. Trans­lat­ing all tests to a rev­enue fig­ure gives you the abil­ity to express your efforts as they impact the bot­tom line, while also giv­ing you the abil­ity to ratio­nally com­pare the results amongst tests. Being able to look at tests as hav­ing bot­tom line impact, or not, allows oth­ers to see the scale of change and the effi­cien­cies that test­ing can bring to the rest of the orga­ni­za­tion. Focus­ing on other things like clicks, fun­nels, opin­ions and the like will dis­tract from the core mes­sage and devalue cur­rent and future efforts.

Even if you are not a retail site, you can trans­late leads or page-views to aver­age value or CPM. Rev­enue also serves the pur­pose of mak­ing you eval­u­ate your sin­gle suc­cess met­ric to ensure that it is tied to the pur­pose of your site and are not being caught up on side goals that do not impact the bot­tom line.

DON’T – For­get that you are mea­sur­ing gross rev­enue, not net revenue

Except in rare cir­cum­stances, most groups end up mea­sur­ing gross rev­enue when it comes to the impact to the busi­ness. While this makes num­bers seem much larger than they really are, it often times leads to groups over esti­mat­ing their impact to the busi­ness as a whole. If you can­not express impact based on pure rev­enue gen­er­a­tion, at least make it clear what num­bers and assump­tions you are using and what you expect the entire pro­gram to deliver to the bot­tom line. Noth­ing kills cred­i­bil­ity then num­bers that any ratio­nal exec­u­tive can not believe.

DO – Report on the scale of impact of var­i­ous tests

So much is missed if we do not look at pat­terns across tests. One of the crit­i­cal things for groups to under­stand is that lift by itself does not tell you rev­enue. A smaller pop­u­la­tion mea­sured with a very large lift is often worth far less than a larger pop­u­la­tion with a much smaller lift. If you are trans­lat­ing all tests to rev­enue, then you can eas­ily fig­ure out that where you have been able to gen­er­ate the most rev­enue, not nec­es­sar­ily the most lift. You have a dirent mea­sure of say­ing that this type of test pro­duces 4x another type. This active data acqui­si­tion is what allows you to plan out and increase resource effi­ciency in the future, and becomes vital for the long term growth of a pro­gram. Often times the lessons learned here really shape how peo­ple look at the impact of var­i­ous chan­nels. This type of analy­sis also helps peo­ple start to under­stand the dif­fer­ences between rev­enue allo­ca­tion and rev­enue generation.

DON’T – For­get that the most valu­able results from tests is not the lift

It is vital that over­time you start to get a deep causal under­stand­ing of what your abil­ity to influ­ence var­i­ous parts of the user expe­ri­ence, as well as user groups, and what the cost to do so is. While it is fun to talk about the rev­enue impact, know­ing that in 4 out of 5 places chang­ing con­tent did not have much of an impact com­pared to spa­tial changes com­pletely changes how other parts of the busi­ness even oper­ate. These lessons, about where you have been able to make an impact, how, and what it took to do it can help shape entire prod­uct roadmaps and help drive expo­nen­tial rev­enue gen­er­a­tion in the future.

There is no bet­ter time to express that test­ing is not just a list of actions, but an active acqui­si­tion of knowl­edge then when and how you talk about results. Fail­ing to look at these pat­terns across tests and fail­ure to really use this as a way to fil­ter your other data can lead to mas­sively inef­fi­cient uses of resources. Your pro­gram is worth far more then the indi­vid­ual actions you take, so why would you allow oth­ers to overly focus on tests when it is the act of opti­miza­tion that dri­ves the largest value oppor­tu­ni­ties? Make this the focus, what you learned, how, why, and what the impact was, and you will be able to make oth­ers see what test­ing can really do for them.

There is no greater time to really see where a pro­gram is at then to see how they com­mu­ni­cate results. You can tell how effi­cient they have been, how they work with other groups, and most of all how much the per­sonal ego of the peo­ple involved on both ends of the pre­sen­ta­tion gets in the way of real mean­ing­ful results. If you think about and focus on the right parts of express­ing results, you will be able to move for­ward and really change your orga­ni­za­tion. Noth­ing dri­ves oth­ers to want to invest in and expand a pro­grams impact if you can show it improves every other part of the busi­ness. Focus on just the lift and just num­bers, and you are set­ting your­self and oth­ers up for failure.