One of the great strug­gles mar­keters have when they enter new realms, espe­cially those of ana­lyt­ics and test­ing, is try­ing to apply the dis­ci­plines of math to what they are doing. They are amazed by the promise of mod­els and of apply­ing a much more strin­gent dis­ci­pline then the nor­mal qual­i­ta­tive dis­cus­sions they are used. The prob­lem is that most mar­keters are not PHDs in sta­tis­tics, nor have they really worked with the math applied to their real world issues. We have all this data and this promise of power before us, but most lack the dis­ci­pline to inter­act and really derive value from the data. In this series, I want to explain some of the math con­cepts that impact daily analy­sis, espe­cially those that a major­ity of peo­ple do not real­ize they are strug­gling with, and show you how and where use them, as well as their prag­matic limitations.

In the first of these, I want to intro­duce the N-Armed ban­dit prob­lem as it is really at the heart of all test­ing pro­grams and is a fun­da­men­tal eval­u­a­tion of the proper use of resources.

The N-Armed Ban­dit prob­lem, also called the One-Armed ban­dit prob­lem or the multi-armed ban­dit prob­lem, is the fun­da­men­tal con­cept of the bal­ance of acquir­ing new knowl­edge while at the same time exploit­ing that knowl­edge for gain. The con­cept goes like this:

You walk into a casino with N num­ber of slot machines. Each machine has a dif­fer­ent pay­off. If the goal is to walk away with the most money, then you need to go through a process of fig­ur­ing out the slot machine with the high­est pay­out, yet keep as much money back as pos­si­ble in order to exploit that machine. How do you bal­ance the need to test out the pay­outs from the dif­fer­ent machines while reserv­ing as much money as pos­si­ble to put into the machine with the great­est payout?

Which one do you choose?

Exploring the casino

As we dive into the real world appli­ca­tion of this con­cept, it is impor­tant that you walk away with some key under­stand­ings of why it mat­ters to you. An eval­u­a­tion of the N-Armed ban­dit prob­lem and how we inter­act with it in the real world leads to two main goals:

1) Dis­cov­ery of rel­a­tive value of actions

2) The most effi­cient use of resources for this dis­cov­ery and for exploitation

The N-Armed ban­dit prob­lem is at the core of machine learn­ing and of test­ing pro­grams, and does not have a one-size fits all answer. There is no per­fect way to learn and to exploit, but there are a num­ber of well known strate­gies. In the real world, where the sys­tem is con­stantly shift­ing and the val­ues are con­stantly mov­ing it gets even more dif­fi­cult, but that does not make it any less valu­able. All orga­ni­za­tions face the fun­da­men­tal strug­gle in how best to apply resources, espe­cially between doing what they are already doing and explor­ing new avenues or func­tional alter­na­tives. Do you put resources where you feel safe, where you think you know the val­ues? Or do you use them to explore and find out the value of other alter­na­tives? The tac­tics used to solve the N-armed ban­dit prob­lem come down to how greedy you try to be and about giv­ing you ways to think about apply­ing those resources. Where most groups fal­ter is when they fail to bal­ance those two goals, becom­ing lost in their own fear, egos, or biases; either div­ing too deep into “trusted” out­lets, or going too far down the path of dis­cov­ery. The chal­lenge is try­ing to keep to the rules of value and of bounded loss.

The rea­son this prob­lem comes into play for all test­ing pro­grams is that the entire need for test­ing is the dis­cov­ery of the var­i­ous val­ues for each vari­ant, or for each con­cept, against one another. If you are not allow­ing for this ques­tion to enter your test­ing, then you are always only throw­ing resources towards what you assume is the value of a change. Know­ing just one out­come can never help you be effi­cient. How do you know what value you could have got­ten by just throw­ing all your money into one slot machine? While it is easy to con­vince your­self that because you did get a pay­out, that you did the right thing, the eval­u­a­tion of the dif­fer­ent pay­outs is the heart of improv­ing your per­for­mance. You have to focus on apply­ing resources, and for all groups there is a finite amount of resources, to achieve the high­est pos­si­ble return.

In an ideal world, you would already know all pos­si­ble val­ues, be able to intrin­si­cally call the value of each action, and then apply all your resources towards that one action that causes you the great­est return (a greedy action). Unfor­tu­nately, that is not the world we live in, and the prob­lem lies when we allow our­selves that delu­sion. The prob­lem is that we do not know the value of each out­come, and as such need to max­i­mize our abil­ity of that discovery.

If the goal is to dis­cover what the value of each action is, and then exploit them, then fun­da­men­tally the chal­lenge is to how best to apply the least amount of resources, in this case time and work, to the dis­cov­ery of the great­est amount of rel­a­tive val­ues. The chal­lenge becomes one purely of effi­ciency. We have to cre­ate a mean­ing­ful test­ing sys­tem and effi­cien­cies in our orga­ni­za­tion, either polit­i­cally, infra­struc­ture, or tech­ni­cally, in order to min­i­mize the amount of resources we spend and to max­i­mize the amount of vari­a­tions that we can eval­u­ate. Every time we get side tracked, or we do not run a test that has this goal of explor­ing at its heart, or we pre­tend we have a bet­ter under­stand­ing of the value of things via the abuse of data, we are being inef­fi­cient and are fail­ing on this ques­tion for the high­est pos­si­ble value. The goal is to cre­ate a sys­tem that allows you to facil­i­tate this need, to mea­sure each value against each other, to dis­cover and to exploit, in the short­est time and with the least amount of resources.

An exam­ple of a sub­op­ti­mal design for test­ing based on this is any sin­gle recipe “chal­lenger” test. Ulti­mately, any “bet­ter” test is going to limit your abil­ity to see the rel­a­tive val­ues. You want to test out your ban­ner on your front door, but how do you know that it is more impor­tant then your other pro­mos? Or your nav­i­ga­tion, or your call to action? Just because you have found an anom­aly or pat­tern in data, what does that mean to other alter­na­tives? If you only test or eval­u­ate one thing by itself, or don’t test out fea­si­ble options against each other, then you will never know the rel­a­tive value of those actions. You are just putting all your money into one slot machine, not know­ing if has a higher pay­out then the oth­ers near it.

This means that any action that is taken by a sys­tem that lim­its the abil­ity to mea­sure val­ues against each other, or that does not allow you to mea­sure val­ues in con­text, or that does not acknowl­edge the cost of that eval­u­a­tion, is inef­fi­cient and is lim­it­ing the value of the data. Any­thing that is not directly allow­ing you the fastest way to fig­ure out the pay­outs of the dif­fer­ent slot machines is los­ing you value. It also means that any action that requires addi­tional resources for that dis­cov­ery is suboptimal.

If we have accepted that we have to be effi­cient in our test­ing pro­gram, we still have to deal with the great­est lim­iter of impact, the peo­ple in the sys­tem. Every time we are lim­ited only to “best prac­tices” or by a HiPPO, then we have low­ered the pos­si­ble value we can receive. Some of the great work by studiers of prob­a­bil­ity, espe­cially by Nas­sim Nicholas Taleb, has shown that for sys­tems, over time, the more human level inter­ac­tion, or the less organic that the sys­tem is allowed to be, the lower the value and the higher the pain we create.

Com­par­ing organic ver­sus inor­ganic systems:

Taleb - Value of a System

We can see that for any inor­ganic sys­tem, one that has all of those rules forced onto it, over time there is a lot less unpre­dictabil­ity then what peo­ple think, and that there is almost a guar­an­tee of loss of value for each rule and for each assump­tion that is entered into that sys­tem. One of the fastest ways to improve your abil­ity to dis­cover the var­i­ous pay­outs is to have an under­stand­ing of just how many slot machines are before you. Every time that you think you are smarter then the sys­tem, or you get caught up in “best prac­tices” or pop­u­lar opin­ion, you have forced a non-organic limit into the sys­tem. You have arti­fi­cially said that there are less machines avail­able to you. This means that for the dis­cov­ery part of the sys­tem, and the best thing for our pro­gram and for gain­ing value, that we must limit human sub­jec­tion or rules, in order to insure the high­est amount of value.

An exam­ple of these con­straints is any hypoth­e­sis based test. If you are lim­it­ing your pos­si­ble out­comes to only what you “think” will win, you will never be able to test out every­thing that is fea­si­ble. Just because you hear a “best prac­tice” or some­one has this golden idea, you have to make sure that you are still test­ing it rel­a­tively to other pos­si­bil­i­ties, nor can you let it impact your eval­u­a­tion of that data. It is ok to have an idea of what you think will win going in, but you can not limit your­self to that in your test­ing. That is the same as walk­ing up to the slot machine with the most flashy lights, just because the guy next to you said to, and only putting your money in that machine.

Every­one always says the house wins, and in Vegas that is how it works. In the real world, the deck may be stacked against you, but that does not mean that you are going to lose. Once you under­stand the rules of the game and can think in terms of effi­ciency and exploit­ing, then you have the advan­tage. If you can agree that at the end of the day your goal is to walk out of that casino with the largest stack of bills pos­si­ble, then you have to focus on learn­ing and exploit­ing. The odds really aren’t stacked against you here, but the only way to really win this game is to be will­ing to play it the right way. Do you choose the first and most flashy machine? Or do you want to make money?