It’s been a very long time since I last blogged on some of the new features in a release of SiteCatalyst, but today I’m excited to pick up where I left off and share some of my favorite elements of the brand-spanking-new SiteCatalyst 15.3. We believe and hope that these features improve the clarity of your data, enabling more effective and accurate reporting and analysis and ultimately improving decision making in a number of ways.

If you are already using SiteCatalyst 15, you do not need to do anything to upgrade; now that it has been released, SiteCatalyst 15.3 will automatically be available to you. And even if you are not using SiteCatalyst 15, we’ve got some great updates in here for you as well. So here we go. . .

Bot Filtering and Reporting

If you have a web site, and it is live on the Internet, you almost certainly have spiders and bots stop by from time to time, doing things like indexing your site in search engines or checking the performance of your site. Even though they rarely represent a significant amount of traffic, you might not want them to be recorded alongside your visitor data in SiteCatalyst and Discover. In SiteCatalyst 15.3, we have made it easy for you to exclude commonly known spiders and bots (as defined by the IAB) as well as any custom bots/spiders that you might be using for any reason.

set up your bots and spiders for exclusion in sitecatalyst 15.3

You can enable bot filtering by going to the Admin Console, selecting the desired report suite(s), and then choosing Edit Settings > General > Bot Rules. Filtering is not enabled by default for new report suites, and we are not enabling it automatically for existing report suites at the time of the release, so you still have total control over whether/how you use bot filtering in SiteCatalyst. However, beginning today (April 26) you can choose to enable this feature for any of your report suites in SiteCatalyst 15.3.

The interface allows admin users to define their own rules based on user-agent string, IP address, or IP range. To exclude the IAB list of known spiders and bots, simply check the box and click “Save.” Traffic from these agents will be collected in SiteCatalyst, but will be sectioned off from your other data so that it does not impact data from your “real” users. It will not be included in your traffic or conversion data.

view bot traffic in sitecatalyst

You can run the Site Metrics > Bots report to see which spiders and bots have visited your site during a given time period. Site Metrics > Bot Pages report will display the pages that these agents have visited, giving you additional insight into the “indexability” of your site.

Bot filtering is being done “pre-VISTA,” meaning that VISTA rules will not run on data collected from bots that you are excluding using this new feature. In most cases that’s okay, because you want bots to be excluded from the rest of your data set entirely anyway. Some of you have a VISTA rule in place to exclude bots, similar to what this new capability allows. This VISTA rule will continue to operate normally. However, in some cases this VISTA rule may not be necessary anymore, and we recommend working with your Adobe Account Manager to understand the next steps for you and your bot handling VISTA rules. Note that if you use both a VISTA rule and this new SiteCatalyst feature to exclude bots, your VISTA rule will likely handle less traffic from bots than it has in the past, and so you may see a drop in traffic to your “bot exclusions” report suite(s).

One final note: bots excluded using this new feature will not be included in any Data Feeds that you are receiving, which means that the Data Feed traffic levels will remain consistent with the standard SiteCatalyst reports.

Change to Unique Value Management Algorithm

In the past, SiteCatalyst has allowed you to report on the first 500,000 unique values per month in each report. The remaining unique values passed in during the month are lumped together in reports as a single line item called “Uniques Exceeded.” The net result is that popular (i.e. high-traffic) values that are passed in late in the month are not visible as individual line items in reports. They are buried in “Uniques Exceeded.”

Beginning on April 26th we are implementing a more sophisticated means for managing high cardinality reports. There are a number of improvements in this new algorithm, but most importantly the algorithm allows the most popular values to show up in reports as individual line items regardless of whether they occur at the beginning or the end of the month! This change impacts all versions of SiteCatalyst.

My colleague, Matt Freestone, has published an outstanding post that deals with this topic in far more depth, so check it out in the coming days.

Case-Insensitive Props

Historically all custom traffic variables (props) have been treated as case-sensitive. For example, if “value”, “VALUE” and “Value” are passed into a prop they are considered to be three distinct line items, and their metrics are aggregated separately, in the associated Custom Traffic reports. To segment on these values you need to create three separate segmentation criteria. Custom Conversion variables, or eVars, on the other hand, have always been treated as case-insensitive. Their case is ignored. (If “value”, “VALUE” and “Value” are passed into an eVar they are considered as one line item in reports, and their metrics are aggregated together in the associated Custom Conversion reports. Only one segment criteria is required.)

To resolve this inconsistency between props and eVars, beginning April 26 all traffic variables (props, page, channel, server, custom links, exit links and download links) for new report suites will be treated as case-insensitive. This change applies to all versions of SiteCatalyst. Using the example above, SiteCatalyst reports will show either “value”, “VALUE” or “Value” (usually the first one that was passed in during the month), and the metrics for all three will be aggregated together, just like eVars. In the future we may extend this functionality to traffic variables in existing report suites, but for now, it only applies to new report suites. This ensures that your future data will match your historical data in existing suites. (NOTE: Data Warehouse will always use the lowercase version of the traffic variable. In Data Feeds, the “post” column will contain the lowercase version.)

You may also want to read this post by Matt Freestone where he discusses the benefits of case insensitivity as related to managing high cardinality reports. (In case you’re really curious about case sensitivity for props, you can read more about this request in the Idea Exchange.)

Missing Reports Restored

When SiteCatalyst 15 was first released in April 2011, a number of reports that you’ve seen in SiteCatalyst 14 and earlier versions, such as Pages Not Found, Customer Loyalty, and PathFinder were not available. We are pleased to announce that, the majority of these reports are back! Here is the list of reports that are being re-introduced with SiteCatalyst 15.3:

  • Pathfinder
  • Full Paths
  • Path Length
  • Original Entry Page
  • Days Before First Purchase
  • All Search Page Ranking
  • Pages Not Found

You might be thinking, “I had these reports when I used SiteCatalyst 14, and I haven’t removed SiteCatalyst code from my site/app since upgrading, so will I have all of my historical data available to me in these reports now that they are in SiteCatalyst 15?” The answer in most cases is yes. You should see data in these reports going back to the date of your SiteCatalyst 15 upgrade and beyond.

Data Collection Improvements

New versions of AppMeasurement for JavaScript and AppMeasurement for Flash are also available, as well as a brand-new AppMeasurement library for Xbox, but Bret Gundersen has addressed these in a separate post as we resume our habit of blogging each time we do an AppMeasurement release.

The major improvement in our JavaScript code (version H.24.4) is that it now accounts for Google Chrome Pre-render, which can load your web pages before the user actually hits your site. Most analysts and marketers don’t want SiteCatalyst to count a Page View or instantiate a visit when this occurs, because at this point the user has not actually decided to click through to your site. Code version H.24.4 compensates for this behavior by waiting until the user is actually on your site to fire off a beacon and begin recording data. We’re recommending that most companies consider upgrading to this code version to take advantage of this improvement. Because this change occurs in the base code (the s_code.js file), your on-page implementation typically will not require an update for this release.

Beginning on April 26, code version H.24.4 will be available for download in the Admin Console in both SiteCatalyst 14 and SiteCatalyst 15.3.

Data Feed

This one is near and dear to my heart, as someone who has spent a lot of time working with Data Feeds both here at Adobe and as a SiteCatalyst customer. Beginning on April 26, a new lookup file called “column_headers.tsv” will be included in the files delivered as part of all raw clickstream data feeds. This new lookup file contains a single row comprising the list of column names for the data found in hit_data.tsv. For me, and perhaps for someone at your company, this means no more building my 300-column ‘create’ statements by hand. Generally, this should make ETL processes easier and enable you to better understand what you are seeing in Data Feeds.


Each of the topics discussed above is covered in SiteCatalyst 15.3 documentation, release notes, and Knowledge Base articles, and you will find more detail in those locations. We’re really pleased with some of these features and we believe they will serve you well As always, please leave comments with any questions, concerns, or requests. Additionally you can always find me on Twitter at @benjamingaines. We hope you enjoy these new features, and happy analyzing!