This is Part 3 in a multi-part series describing the new classifications rule builder in Adobe Analytics. If you missed any of the previous installments they can be found here:

Advanced pattern matching in classifcation rules

In my last post I mentioned that the classifications rule builder offers four types of matching conditions for classification rules:

  • Starts With
  • Contains
  • Ends With
  • Regular Expression

The first three types are easy to use but can be limiting. For example, suppose I am using the rule builder to classify tracking codes which are of this form:

em:MaybellineSale:June

Let’s assume the first section of the string will be used to set a Channel classification, the second section will be used to set a Campaign Name classification, and the third section will be used to set a Campaign Date classification. It’s pretty easy to see how you could use Starts With to handle the first section of the string:

If Starts With ‘em:’ then set Channel to ‘Email’

You probably have relatively few channels so creating individual “Starts With’ rules to handle each channel is no problem. But handling the second and third sections of the tracking code is tricky:

  1. You will likely have many campaign names and dates and you probably don’t want to have to create a rule for every possible name and date combination.
  2. Using Contains may lead to results you don’t expect. In the example above if you use a rule that says “If Contains ‘May’ then set Campaign Date to ‘May'”, you’ll end up mis-clasifying the tracking code.

This is where regular expressions come to the rescue.

What are regular expressions?

Wikipedia defines regular expressions this way:

regular expression is a specific pattern that provides concise and flexible means to “match” (specify and recognize) strings of text, such as particular characters, words, or patterns of characters.

Some of you are probably familiar with the concept of “wildcards” that are used in string searches. In Windows, for instance, you can use a question mark (?) to represent any single character and an asterisk (*) to represent any string of characters:

201?results.*

Think of regular expressions as wildcard searches on steroids. Regular expressions are so powerful that it can take a while to learn how to fully leverage their capabilities, but learning the basics is pretty easy. For example, here are a few of the commonly used search parameters in regular expressions:

\s Any whitespace character
. Any single character
\S Any non-whitespace character
\d Any digit
\D Any non-digit
\w Any word character (letter, number, underscore)
\W Any non-word character
\b Any word boundary
(...) Capture everything enclosed as a parameter
(a|b) a or b
a? Zero or one of a
a* Zero or more of a
a+ One or more of a
a{3} Exactly 3 of a
a{3,} 3 or more of a
a{3,6} Between 3 and 6 of a
[abc] A single character of: a, b or c
[^abc] Any single character except: a, b, or c
[a-z] Any single character in the range a-z
[a-zA-Z] Any single character in the range a-z or A-Z
^ Start of line
$ End of line
\A Start of string
\z End of string

Regular expressions in classifications rules

In the classifications rule builder you can use regular expressions to match a wide variety of text, characters, words and patterns and use them to set classification columns: Continuing the tracking code example I started with earlier in this post, let’s suppose I set up 3 rules that look like this:

If Starts With em: set Channel = Email
If matches regex ^([^:]+):([^:]+):([^:]+)$ set Campaign Name = $2
If matches regex ^([^:]+):([^:]+):([^:]+)$ set Campaign Date = $3

Holy cow what does all that mean? Let’s take a closer look:

First, notice I used the same regular expression in both the second and third rules but set a different classification in each case. This regular expression matches any string (tracking code in this case) that starts with one or more non-colon characters, then a colon, then more non-colon characters, then a colon, then more non-colon characters. Note that the regular expression could easily be modified for use with any delimiter.

Second, I used parentheses to define portions of the string as parameters that I can use in my classification rules.  Since I have three sets of parentheses I have defined three parameters: $1, $2, and $3. Based on my rules if my tracking code is:

em:MaybellineSale:June

then the second and third rules will set:

Campaign Name = MaybellineSale
Campaign Date = June

What’s more, the regular expression I’ve chosen will work regardless of what substrings occur in the second and third sections of the tracking code. Sweet! Two rules to rule them all.

Amazing. Where can I learn more?

The online documentation for the classifications rule builder contains a short primer on regular expressions and several examples of use cases that came up during our beta testing. For a full treatise on regular expressions I recommend this site: http://www.regular-expressions.info/

In conclusion

I hope you’ve found this series of blog posts helpful. Feel free to leave comments and questions below. I’m also interested in hearing from you if you come up with a really great regular expression for the rule builder that you’d like to share with the rest of the world.

2 comments
Arcadi
Arcadi

Is there a way to take information from two different eVars to create a classification?

Take for example a shared page name that sometimes is used to display Product Type A, sometimes for Product Type B and other times for Product Type C. So we'd have the Page Name stored in a eVar5. In a separate eVar9 we have the Product Type. Would we be able to look at both eVar5 and eVar9 to determine what data to put in a single classification? Or do we have to have both pieces of information concatenated into one eVar?

EricLanser
EricLanser

Can we append a capture group (e.g. $2 or $3 in your examples) to some text.
E.g. following your example above, for a classification like: 'Channel and Name'.  Where Channel is set as a string, and Name via a capture group.

If matches regex ^(em):([^:]+):([^:]+)$ set 'Channel and Name' = "Email-"&$2

That way we'd have a report that showed our general name ('email', rather than 'em') along with the the Name.

In some of my applications, we have multiple independent ways to set the Channel (based on CID, EID, or backup methods such as referring domain).  I'd like a record of both what channel was set, and how it was set in a single report for identifying cases where tracking may be missing but a backup method was used to categorize the data.

Or, would you recommend relying on datawarehouse, discover, or other tools that allow multiple dimension breakdowns?