One of the most dif­fi­cult con­cepts to explain hap­pens to be one of the most fre­quently asked ques­tions that my col­leagues and I receive from Site­Cat­a­lyst users. There are sev­eral iter­a­tions of this ques­tion, each with roughly the same answer:

  • Why doesn’t the sum total of vis­its from each line item in the Pages report add up to the visit total at the bot­tom of the report?
  • Why doesn’t the sum total of orders from each line item in the Prod­ucts report add up to the order total at the bot­tom of the report?
  • Why doesn’t the sum total of [any suc­cess met­ric] from each line item in my mer­chan­dis­ing eVar report add up to the total at the bot­tom of the report?

When users ask this ques­tion (in any of its forms), I typ­i­cally explain that the report in ques­tion involves a one-to-many rela­tion­ship between the met­ric being viewed and the line items in the report. But this expla­na­tion can be dif­fi­cult to grasp. I have been try­ing to come up with an anal­ogy to help explain these phe­nom­ena, and I think I’ve got one. I’m hop­ing it will clar­ify this behav­ior for you, or will help you bet­ter explain it to the users at your organization.

There are few things that I enjoy more than relax­ing on my couch on an autumn week­end (in between home improve­ment tasks requested/mandated by my wife, of course) and watch­ing foot­ball from all across Amer­ica. Being the sports geek that I am, I often have my lap­top by my side so I can check scores and stats from games that I’m not watch­ing. The rea­son I men­tion this, and as the title of this blog post sug­gests, there’s an appar­ent sta­tis­ti­cal anom­aly in foot­ball that par­al­lels this behav­ior in SiteCatalyst.

When a quar­ter­back throws for a touch­down, some­one has to catch the pass—usually a wide receiver. When this hap­pens, the quarterback’s num­bers reflect that he threw for a touch­down. At the same time, the wide receiver’s num­bers show one touchdown:

T. Brady 6/9 83 9.2 1 0
R. Moss 2 27 13.5 1 14

If you didn’t know bet­ter you would see a touch­down tal­lied on both the quarterback’s stat sheet and on the wide receiver’s record and con­clude that the team must have scored two touch­downs, and there­fore that this rep­re­sents 14 points (two sep­a­rate touch­downs). Of course, this isn’t actu­ally the case. There is sim­ply a one-to-many rela­tion­ship between touch­downs and play­ers involved. There is no way for these sta­tis­tics to show both play­ers involved in the touch­down with­out show­ing a touch­down asso­ci­ated with each of them.

Hope­fully I haven’t con­fused you. (If you’re a hockey fan, there’s a sim­i­lar anal­ogy in there, where two play­ers can be cred­ited with an assist on a sin­gle goal. And if you’re not a sports fan at all, hope­fully the rest of this post will still make sense!) Con­sider the order dis­crep­ancy described above in the list of ques­tions. Here’s what a typ­i­cal order might look like."purchase"
s.products=";Macbook 13.3-inch;1;1199.99,;Adobe Photoshop CS4;1;799.99,;Kingston DataTraveler 16GB Flash Drive;1;39.99"

Based on this order, in the Prod­ucts report you’d see some­thing like this:

Products report showing Orders

The line items add up to three orders, but there was really only one order—you saw it above—so how should Site­Cat­a­lyst han­dle this?

Show 0.3 orders for each prod­uct, so that the line items add up to one? Well, that wouldn’t be quite right, because then you would see a bunch of strange num­bers that wouldn’t give you a real sense of how pop­u­lar an item is; its pop­u­lar­ity would be deter­mined in part by how many prod­ucts belonged to the order (e.g., an order with five prod­ucts would assign 0.2 orders to each prod­uct, but an order with 10 prod­ucts would only assign 0.1 orders to each product).

Show the summed total of the orders from each line item at the bot­tom of the report? That might lead users to think that your site had a lot more orders than it really did.

Instead, Site­Cat­a­lyst shows the site-wide total regard­less of the sum of the line items.

De-duplicating your favorite met­rics

Say­ing that you need to add up the met­ric totals for var­i­ous line items for any rea­son is really just a way of say­ing that you need clas­si­fi­ca­tions around your report val­ues so that they’re grouped appro­pri­ately. For exam­ple, why would you add up the orders for all prod­uct names con­tain­ing the word “shoes” other than to get a sense of the total orders for shoe-related prod­ucts (tak­ing into account that some orders may involve mul­ti­ple such prod­ucts)? This can be accom­plished using SAINT clas­si­fi­ca­tions and Omni­ture Discover.

After all, it might be hard to make sense of exactly how many points a foot­ball team scored just by look­ing at the indi­vid­ual play­ers’ sta­tis­tics, but the handy “scor­ing sum­mary” that you’ll see in news­pa­pers and on web sites will cor­rectly pair quar­ter­backs with receivers to give you a bet­ter sense of how much scor­ing really went on:

NE 5:07 Randy Moss 14 Yd Pass From Tom Brady (Stephen Gostkowski Kick) 7 0

Now that clar­i­fies what hap­pened! Randy Moss, a receiver, caught a 14-yard pass from the quar­ter­back, Tom Brady. And we can see that the New Eng­land Patri­ots scored once, not twice. We’ve de-duplicated the num­ber of touch­downs that the team has scored.

To de-duplicate your data and show the exact total num­ber of [met­ric] that occurred across mul­ti­ple line items in a report, cre­ate a clas­si­fi­ca­tion cat­e­gory to meet your needs and apply the nec­es­sary clas­si­fi­ca­tions to group line items in the par­ent report how­ever you need them.

When these clas­si­fi­ca­tions sync into Dis­cover, you’ll be able to go to the asso­ci­ated report, add the Orders met­ric, and see de-duplicated orders within that cat­e­gory as a line item. To con­tinue with the exam­ple above (prod­ucts con­tain­ing the word “shoes”), if an order con­tained five prod­uct IDs that involve the word “shoes,” and you add a cat­e­gory clas­si­fi­ca­tion and label each of these prod­uct IDs as belong­ing to the “Footwear” cat­e­gory, then you will cor­rectly see one order for the “Footwear” cat­e­gory in Dis­cover when you run the report that cor­re­sponds to the clas­si­fi­ca­tion you’ve set up (e.g., Prod­uct Category)—even though this order involved five products.

And there you have it. Hope­fully this clar­i­fies (at least a lit­tle bit) what can be a con­fus­ing sit­u­a­tion for many users. And unfor­tu­nately for me, I’ll never again be able to look at foot­ball sta­tis­tics with­out think­ing about de-duplication.

Great article/post. Would like to have seen more detail on the Classification setup and Discover usage...maybe next time around.


Saw the Omniture Social Media presentation in PHX last week. This is a great example of how they're Twitter team monitors the web for places to help their users. See how a user provided free promotion for Omniture here, via a response to @OmnitureCare: This is exactly how Jeff Jordan explained that Omniture is leading edge stuff. :)

Nate Orshan
Ben, nice analogy (and my wife, a die-hard Patriots fan, would approve the choice of the example). However, re de-duping, maybe you need to change the name of this post to, e.g., "SiteCatalyst, DISCOVER de-duplication and American football". After all, the punchline to your post is, "Wanna de-dupe those pesky orders in SiteCatalyst? Just spend more...and add Discover!"