This con­cept seems a lit­tle silly on the face of things: Not all num­bers are equal. Of course not all num­bers are equal; if they were I’d be pay­ing the same amount at the gas pump every day, no mat­ter what day of the week it was or what the sign said. In fact, there would not even be a need for a sign at all. Num­bers have a value; it’s what we made them for, and how we define their value and what we equate them to is what makes things interesting.

Part of the process of pre­dic­tive ana­lyt­ics is fore­cast­ing what will come to be based on what has passed. So let’s say we have five years’ worth of data of every sort in rela­tion to Inter­net vis­its and con­ver­sions to sales. By mak­ing sim­ple cor­re­la­tions (remem­ber the poi­son­ing knit­ters?), we are able to see pieces of data right away. For instance, there could be a direct, pos­i­tive cor­re­la­tion between users who look at rat­ings and reviews and conversions―meaning that a cer­tain per­cent­age of users who read a review for a prod­uct are likely to buy it. That seems pretty straight­for­ward. So why would any­one ever need to graph any­thing? It just so hap­pens that these num­bers are not always the same as they look on paper, thus the use of Anscombe’s Quartet.

Back in col­lege I ate a lot of Ramen Noo­dles; most col­lege stu­dents do. One of the many “life skills” one learns in the col­lege years is what I call “food math.” Most stu­dents can tell you with­out hes­i­ta­tion how many pack­ets of Ramen Noo­dles they can afford at nearly any given time. Luck­ily, every so often, to avoid scurvy, it is a good idea to eat some fruit. This is sim­ply the eco­nom­ics of oppor­tu­nity cost; if I wanted fruit, I’d have to give up some Ramen meals. But other fac­tors play a role as well, such as time and per­haps cash rebates asso­ci­ated with eat­ing more noo­dles than any liv­ing human being in existence.

The table below plainly shows all the fac­tors in fruit and noo­dles that make up our datasets. You can see that each dataset con­sists of 11 (x, y) points. You can see the mean, vari­ances, cor­re­la­tions, and regres­sion lines are all exactly what one would expect of the datasets math­e­mat­i­cally. What hap­pens when we graph them using Anscombe’s Quar­tet though?

aq1

This method is used to show how dif­fer­ent points relate to one another, help­ing us to visu­ally iden­tify anom­alous data, or “out­liers.” These num­bers all appear to cor­re­late both neg­a­tively and pos­i­tively across the spec­trum, but once graphed, it becomes appar­ent that there is some­thing else going on here. Find­ing these out­liers means that we can iden­tify issues before they become more impact­ful and “solve the mystery.”

What’s bet­ter is that we can also iden­tify oppor­tu­ni­ties as well. In this case it shows that there is some­thing going on with cash rebates in rela­tion to Ramen Noo­dles. It could be some­thing as sim­ple as when I ordered 13 cases of Ramen, I received a rebate. If this is the case, that gives me a start­ing place in fig­ur­ing out if order­ing 13 cases is worth giv­ing up some fruit, if there’s a time limit on the rebate, if I have to order 13 cases all at once to take advan­tage of the oppor­tu­nity, or any num­ber of other fac­tors. In essence, that is the point of Anscombe’s Quar­tet, giv­ing those work­ing in mar­ket­ing and ana­lyt­ics a start­ing point when look­ing for anom­alous data.

aq2

By cor­re­lat­ing lit­er­ally thou­sands of user defined met­rics and then graph­ing them, big data becomes man­age­able… even use­ful. Tools such as Adobe Ana­lyt­ics Pre­mium pro­duce a visual rep­re­sen­ta­tion based on Anscombe’s Quar­tet as well as a numeric value. Both graphs and tables are highly cus­tomiz­able so that mar­keters have the abil­ity to mod­ify how they see data, lit­er­ally. This lets mar­keters observe and inter­pret datasets in a way that is more com­plete and ver­sa­tile. The end result is that the mar­keter can pre­dict trends more accu­rately and take advan­tage of cur­rent trends more quickly. For col­lege stu­dents it means more Ramen all around… or per­haps less if they can fig­ure out how to get those rebates.

In my next post I will relay a real-life case study to show how com­bin­ing the visual and math­e­mat­i­cal can yield research results. All of this is yet to come in, “Ana­lyt­ics Case Study (Or All the Names Have Been Changed to Pro­tect My Income).”

0 comments