One of the most dif­fi­cult con­cepts to explain hap­pens to be one of the most fre­quently asked ques­tions that my col­leagues and I receive from Site­Cat­a­lyst users. There are sev­eral iter­a­tions of this ques­tion, each with roughly the same answer:

  • Why doesn’t the sum total of vis­its from each line item in the Pages report add up to the visit total at the bot­tom of the report?
  • Why doesn’t the sum total of orders from each line item in the Prod­ucts report add up to the order total at the bot­tom of the report?
  • Why doesn’t the sum total of [any suc­cess met­ric] from each line item in my mer­chan­dis­ing eVar report add up to the total at the bot­tom of the report?

When users ask this ques­tion (in any of its forms), I typ­i­cally explain that the report in ques­tion involves a one-to-many rela­tion­ship between the met­ric being viewed and the line items in the report. But this expla­na­tion can be dif­fi­cult to grasp. I have been try­ing to come up with an anal­ogy to help explain these phe­nom­ena, and I think I’ve got one. I’m hop­ing it will clar­ify this behav­ior for you, or will help you bet­ter explain it to the users at your organization.

There are few things that I enjoy more than relax­ing on my couch on an autumn week­end (in between home improve­ment tasks requested/mandated by my wife, of course) and watch­ing foot­ball from all across Amer­ica. Being the sports geek that I am, I often have my lap­top by my side so I can check scores and stats from games that I’m not watch­ing. The rea­son I men­tion this, and as the title of this blog post sug­gests, there’s an appar­ent sta­tis­ti­cal anom­aly in foot­ball that par­al­lels this behav­ior in SiteCatalyst.

When a quar­ter­back throws for a touch­down, some­one has to catch the pass—usually a wide receiver. When this hap­pens, the quarterback’s num­bers reflect that he threw for a touch­down. At the same time, the wide receiver’s num­bers show one touchdown:

PASSING COMP/ATT YDS AVG TD INT
T. Brady 6/9 83 9.2 1 0
RECEIVING REC YDS AVG TD LG
R. Moss 2 27 13.5 1 14

If you didn’t know bet­ter you would see a touch­down tal­lied on both the quarterback’s stat sheet and on the wide receiver’s record and con­clude that the team must have scored two touch­downs, and there­fore that this rep­re­sents 14 points (two sep­a­rate touch­downs). Of course, this isn’t actu­ally the case. There is sim­ply a one-to-many rela­tion­ship between touch­downs and play­ers involved. There is no way for these sta­tis­tics to show both play­ers involved in the touch­down with­out show­ing a touch­down asso­ci­ated with each of them.

Hope­fully I haven’t con­fused you. (If you’re a hockey fan, there’s a sim­i­lar anal­ogy in there, where two play­ers can be cred­ited with an assist on a sin­gle goal. And if you’re not a sports fan at all, hope­fully the rest of this post will still make sense!) Con­sider the order dis­crep­ancy described above in the list of ques­tions. Here’s what a typ­i­cal order might look like.

s.events="purchase"
s.products=";Macbook 13.3-inch;1;1199.99,;Adobe Photoshop CS4;1;799.99,;Kingston DataTraveler 16GB Flash Drive;1;39.99"
s.purchaseID="220236197"

Based on this order, in the Prod­ucts report you’d see some­thing like this:

Products report showing Orders

The line items add up to three orders, but there was really only one order—you saw it above—so how should Site­Cat­a­lyst han­dle this?

Show 0.3 orders for each prod­uct, so that the line items add up to one? Well, that wouldn’t be quite right, because then you would see a bunch of strange num­bers that wouldn’t give you a real sense of how pop­u­lar an item is; its pop­u­lar­ity would be deter­mined in part by how many prod­ucts belonged to the order (e.g., an order with five prod­ucts would assign 0.2 orders to each prod­uct, but an order with 10 prod­ucts would only assign 0.1 orders to each product).

Show the summed total of the orders from each line item at the bot­tom of the report? That might lead users to think that your site had a lot more orders than it really did.

Instead, Site­Cat­a­lyst shows the site-wide total regard­less of the sum of the line items.

De-duplicating your favorite met­rics

Say­ing that you need to add up the met­ric totals for var­i­ous line items for any rea­son is really just a way of say­ing that you need clas­si­fi­ca­tions around your report val­ues so that they’re grouped appro­pri­ately. For exam­ple, why would you add up the orders for all prod­uct names con­tain­ing the word “shoes” other than to get a sense of the total orders for shoe-related prod­ucts (tak­ing into account that some orders may involve mul­ti­ple such prod­ucts)? This can be accom­plished using SAINT clas­si­fi­ca­tions and Omni­ture Discover.

After all, it might be hard to make sense of exactly how many points a foot­ball team scored just by look­ing at the indi­vid­ual play­ers’ sta­tis­tics, but the handy “scor­ing sum­mary” that you’ll see in news­pa­pers and on web sites will cor­rectly pair quar­ter­backs with receivers to give you a bet­ter sense of how much scor­ing really went on:

FIRST QUARTER NE NYJ
NE 5:07 Randy Moss 14 Yd Pass From Tom Brady (Stephen Gostkowski Kick) 7 0

Now that clar­i­fies what hap­pened! Randy Moss, a receiver, caught a 14-yard pass from the quar­ter­back, Tom Brady. And we can see that the New Eng­land Patri­ots scored once, not twice. We’ve de-duplicated the num­ber of touch­downs that the team has scored.

To de-duplicate your data and show the exact total num­ber of [met­ric] that occurred across mul­ti­ple line items in a report, cre­ate a clas­si­fi­ca­tion cat­e­gory to meet your needs and apply the nec­es­sary clas­si­fi­ca­tions to group line items in the par­ent report how­ever you need them.

When these clas­si­fi­ca­tions sync into Dis­cover, you’ll be able to go to the asso­ci­ated report, add the Orders met­ric, and see de-duplicated orders within that cat­e­gory as a line item. To con­tinue with the exam­ple above (prod­ucts con­tain­ing the word “shoes”), if an order con­tained five prod­uct IDs that involve the word “shoes,” and you add a cat­e­gory clas­si­fi­ca­tion and label each of these prod­uct IDs as belong­ing to the “Footwear” cat­e­gory, then you will cor­rectly see one order for the “Footwear” cat­e­gory in Dis­cover when you run the report that cor­re­sponds to the clas­si­fi­ca­tion you’ve set up (e.g., Prod­uct Category)—even though this order involved five products.

And there you have it. Hope­fully this clar­i­fies (at least a lit­tle bit) what can be a con­fus­ing sit­u­a­tion for many users. And unfor­tu­nately for me, I’ll never again be able to look at foot­ball sta­tis­tics with­out think­ing about de-duplication.

As always, please feel free to fol­low me at Omni­ture­Care on Twit­ter and/or Friend­Feed. I’m also avail­able by e-mail at omniturecare@​omniture.​com and would love to hear from you via any of these channels!

3 comments
Dan
Dan

Great article/post. Would like to have seen more detail on the Classification setup and Discover usage...maybe next time around.

Cory
Cory

Saw the Omniture Social Media presentation in PHX last week. This is a great example of how they're Twitter team monitors the web for places to help their users. See how a user provided free promotion for Omniture here, via a response to @OmnitureCare: http://twitter.com/sunsblogger/status/2409212131 This is exactly how Jeff Jordan explained that Omniture is leading edge stuff. :)

Nate Orshan
Nate Orshan

Ben, nice analogy (and my wife, a die-hard Patriots fan, would approve the choice of the example). However, re de-duping, maybe you need to change the name of this post to, e.g., "SiteCatalyst, DISCOVER de-duplication and American football". After all, the punchline to your post is, "Wanna de-dupe those pesky orders in SiteCatalyst? Just spend more...and add Discover!"