Last week I got an email from an indus­try vet­eran, now work­ing on ana­lyt­ics projects at a well-known agency. In it, he said, “I have a few clients who keep ask­ing me to make sure that they’re not being over­whelmed by bots, given all the recent press about bot traf­fic being up to ⅓ to even ½ of web traf­fic.” Indeed, lately there have been a num­ber of arti­cles from respected pub­li­ca­tions cov­er­ing the bot phe­nom­e­non, and its impact on web data, with one arti­cle report­ing that up to 61.5% of all web traf­fic comes from bots. If more than half of your traf­fic comes from bots, who are nei­ther human nor capa­ble of becom­ing loyal cus­tomers, how can you trust your data at all? How can you con­fi­dently opti­mize your con­tent or your mar­ket­ing in this environment?

Most of the answer comes from the way the con­clu­sions in arti­cles like these are gen­er­ated: based on Con­tent Deliv­ery Net­work (CDN) data. The pur­pose of a CDN is to receive requests for con­tent from var­i­ous enti­ties (web browsers, mobile apps, bots, etc.) and quickly pro­vide the requested con­tent. As such, CDNs do not inher­ently require that the request­ing entity meet tech­ni­cal require­ments in order to receive what­ever con­tent has been requested. A CDN receives a request, serves it, and logs it. The analy­ses that we are see­ing pub­lished are the result of these logs; the CDN looks at the raw user-agent strings and poten­tially other pat­terns across all requests and deter­mines which requests came from bots and which did not. It is cer­tainly pos­si­ble than more than 50% of all traf­fic at the CDN level comes from bots.

This is dif­fer­ent than dig­i­tal ana­lyt­ics tools, includ­ing Adobe Ana­lyt­ics, which typ­i­cally (although not always) require JavaScript in order to record traf­fic and user behav­ior. The vast major­ity of Adobe Ana­lyt­ics imple­men­ta­tions over the past sev­eral years have excluded the abil­ity to record data for users (or bots) who can­not exe­cute JavaScript. It was only a cou­ple of years ago that google­bot started exe­cut­ing JavaScript. Before that time, bot traf­fic Google and Bing never showed up in Adobe Ana­lyt­ics reports. Now, those two are the top bots that we see, but there are still far more bots that have no need to exe­cute JavaScript, and there­fore are invis­i­ble to Adobe Ana­lyt­ics, but still vis­i­ble to a CDN. Thus—and this is really the key point—CDNs will report far more bot traf­fic than you will ever see in Adobe Ana­lyt­ics.

My col­league, Bret Gun­der­sen, recently did a small, cross-vertical study of bot traf­fic as a per­cent­age of total page views over a month of data in report suites of var­i­ous sizes. Here is a rep­re­sen­ta­tive sample:

Report Suite Bots Page Views Per­cent­age
A 2,759,711 1,630,572,644 0.17%
B 294,558 41,643,593 0.71%
C 3,877,924 252,966,728 1.53%
D 505,802 23,709,635 2.13%

As Bret pointed out, “Bots are look­ing for con­tent. Sites with fewer pages will see less bot traf­fic. The report suite with the high­est bot traf­fic [in the table above] is a travel site that has thou­sands of hotels, flights, cruises, etc. When a CDN looks at bot traf­fic, they’re look­ing internet-wide, so they will see bots hit pages that peo­ple never hit, such as archived con­tent. I’ve talked with pub­lish­ers who have mil­lions of archived pages, many of which get no vis­i­tor traf­fic each day.” In other words, because bots are indis­crim­i­nately crawl­ing the entire Inter­net, their reported traf­fic data is mas­sive due to bots hit­ting a far larger range of pages and con­tent than humans hit. Per­haps a bet­ter way to put it is that, even if most bots were tracked in Adobe Ana­lyt­ics, bot traf­fic to pages and con­tent that actu­ally mat­ter to you for analy­sis and opti­miza­tion would be far lower than the num­bers reported in these recent CDN-based studies.

Remov­ing known bot traf­fic from Adobe Analytics

Despite the fact that users of dig­i­tal ana­lyt­ics tools needn’t be con­cerned about the pos­si­bil­ity that half of their traf­fic comes from bots, many of you will want to exclude even those bots that do exe­cute JavaScript, to add an extra level of data clean­li­ness and analy­sis con­fi­dence. This is easy to do. You can turn on bot fil­ter­ing in the Admin Con­sole by going to Edit Set­tings > Gen­eral > Bot Rules, as shown below.

Enable Bot Rules in Adobe Analytics

Within the Bot Rules screen, you have a few options:

Screen Shot 2014-04-17 at 12.49.42 PM

The default bot fil­ter­ing in Adobe Ana­lyt­ics is based on the IAB bot list, which is updated monthly and com­piles its list from many sources, includ­ing CDNs and major inter­net prop­er­ties. It includes thou­sands of known bots includ­ing all of your favorites: Google, Bing, Gomez, etc. This list cov­ers the over­whelm­ing major­ity of use cases and needs around bot fil­tra­tion. If you want to use the IAB bot list, you can sim­ply check the box shown above, click “Save,” and you are done.

In addi­tion to (or in place of) the IAB bot list, you can also input or upload a list of bots to be fil­tered out of your data set. This list can be based on user-agent string (typ­i­cally the eas­i­est way to iden­tify a bot at first glance) and/or IP address. For exam­ple, if you have an inter­nally cre­ated bot that crawls your site to mon­i­tor uptime, you can enter the details for that bot in the setup tool and it will be fil­tered out of your data from that point forward.

In keep­ing with the find­ings ref­er­enced above, we have not seen mas­sive drops in traf­fic when cus­tomers have turned on bot fil­ter­ing. This is not because the fil­ter­ing is not work­ing (see the fol­low­ing sec­tion of this post), but rather it is because bot traf­fic did not rep­re­sent a large per­cent­age of mea­sur­able data to begin with.

What hap­pens when I fil­ter bots out of my data? 

Adobe Ana­lyt­ics still col­lects and processes data from bots that exe­cute JavaScript, but rather than hav­ing them pol­lute your reports, they are tucked away in their own sec­tion of the tool. By default, this is located in Site Met­rics > Bots.

  • Bots shows the var­i­ous bots that were iden­ti­fied, such as “gomeza­gent” in the screen shot below, along with traf­fic data for each.
  • Bot Pages shows the dif­fer­ent pages that bots vis­ited on your site.

Here is what the Bots report might look like:

Bots Report in Adobe Analytics

Data in these reports is avail­able from the time that you enable bot fil­ter­ing; his­tor­i­cal data prior to that time will not be avail­able in these reports, and will con­tinue to exist through­out your data. Detected bots are auto­mat­i­cally excluded from seg­ments, cal­cu­lated met­rics, etc.; they are indeed fully sep­a­rated from your data set into the two reports listed above. Projects in Ad Hoc Analy­sis or dash­board in Report Builder will not include this bot data. If you pub­lish an audi­ence to the Adobe Mar­ket­ing Cloud for analytics-powered tar­get­ing, it will also not include bots.

Don’t fear a robot uprising

I hope it is clear that, while robots are cer­tainly part of the Inter­net ecosys­tem, they are noth­ing to be afraid of when it comes to your dig­i­tal and cus­tomer analy­sis. The actual date in your report suites in Adobe Ana­lyt­ics should not be see­ing more than a tiny per­cent­age of their traf­fic from bots, and that traf­fic can eas­ily be fil­tered out of reports and seg­ments using Bot Rules in the Admin Con­sole. Armed with an under­stand­ing of the cause of the mas­sive bot num­bers you may be see­ing in arti­cles around the web, and of how dig­i­tal ana­lyt­ics mea­sure­ment works, you are ready to push your analy­sis and mar­ket­ing for­ward, with­out con­cern of a robot upris­ing to wreak havoc on your data.

1 comments
JonAlexanderStephenson
JonAlexanderStephenson

Coming back to site catalyst after not using it for many years. I am having trouble finding the admin section to turn off bots in my traffic. 


~jon