Anscombe’s Quartet (Or Not All Numbers Are Equal)
This concept seems a little silly on the face of things: Not all numbers are equal. Of course not all numbers are equal; if they were I’d be paying the same amount at the gas pump every day, no matter what day of the week it was or what the sign said. In fact, there would not even be a need for a sign at all. Numbers have a value; it’s what we made them for, and how we define their value and what we equate them to is what makes things interesting.
Part of the process of predictive analytics is forecasting what will come to be based on what has passed. So let’s say we have five years’ worth of data of every sort in relation to Internet visits and conversions to sales. By making simple correlations (remember the poisoning knitters?), we are able to see pieces of data right away. For instance, there could be a direct, positive correlation between users who look at ratings and reviews and conversions―meaning that a certain percentage of users who read a review for a product are likely to buy it. That seems pretty straightforward. So why would anyone ever need to graph anything? It just so happens that these numbers are not always the same as they look on paper, thus the use of Anscombe’s Quartet.
Back in college I ate a lot of Ramen Noodles; most college students do. One of the many “life skills” one learns in the college years is what I call “food math.” Most students can tell you without hesitation how many packets of Ramen Noodles they can afford at nearly any given time. Luckily, every so often, to avoid scurvy, it is a good idea to eat some fruit. This is simply the economics of opportunity cost; if I wanted fruit, I’d have to give up some Ramen meals. But other factors play a role as well, such as time and perhaps cash rebates associated with eating more noodles than any living human being in existence.
The table below plainly shows all the factors in fruit and noodles that make up our datasets. You can see that each dataset consists of 11 (x, y) points. You can see the mean, variances, correlations, and regression lines are all exactly what one would expect of the datasets mathematically. What happens when we graph them using Anscombe’s Quartet though?
This method is used to show how different points relate to one another, helping us to visually identify anomalous data, or “outliers.” These numbers all appear to correlate both negatively and positively across the spectrum, but once graphed, it becomes apparent that there is something else going on here. Finding these outliers means that we can identify issues before they become more impactful and “solve the mystery.”
What’s better is that we can also identify opportunities as well. In this case it shows that there is something going on with cash rebates in relation to Ramen Noodles. It could be something as simple as when I ordered 13 cases of Ramen, I received a rebate. If this is the case, that gives me a starting place in figuring out if ordering 13 cases is worth giving up some fruit, if there’s a time limit on the rebate, if I have to order 13 cases all at once to take advantage of the opportunity, or any number of other factors. In essence, that is the point of Anscombe’s Quartet, giving those working in marketing and analytics a starting point when looking for anomalous data.
By correlating literally thousands of user defined metrics and then graphing them, big data becomes manageable… even useful. Tools such as Adobe Analytics Premium produce a visual representation based on Anscombe’s Quartet as well as a numeric value. Both graphs and tables are highly customizable so that marketers have the ability to modify how they see data, literally. This lets marketers observe and interpret datasets in a way that is more complete and versatile. The end result is that the marketer can predict trends more accurately and take advantage of current trends more quickly. For college students it means more Ramen all around… or perhaps less if they can figure out how to get those rebates.
In my next post I will relay a real-life case study to show how combining the visual and mathematical can yield research results. All of this is yet to come in, “Analytics Case Study (Or All the Names Have Been Changed to Protect My Income).”