This is Part 3 in a multi-part series describ­ing the new clas­si­fi­ca­tions rule builder in Adobe Ana­lyt­ics. If you missed any of the pre­vi­ous install­ments they can be found here:

Advanced pat­tern match­ing in clas­sif­ca­tion rules

In my last post I men­tioned that the clas­si­fi­ca­tions rule builder offers four types of match­ing con­di­tions for clas­si­fi­ca­tion rules:

  • Starts With
  • Con­tains
  • Ends With
  • Reg­u­lar Expression

The first three types are easy to use but can be lim­it­ing. For exam­ple, sup­pose I am using the rule builder to clas­sify track­ing codes which are of this form:

em:MaybellineSale:June

Let’s assume the first sec­tion of the string will be used to set a Chan­nel clas­si­fi­ca­tion, the sec­ond sec­tion will be used to set a Cam­paign Name clas­si­fi­ca­tion, and the third sec­tion will be used to set a Cam­paign Date clas­si­fi­ca­tion. It’s pretty easy to see how you could use Starts With to han­dle the first sec­tion of the string:

If Starts With ‘em:’ then set Chan­nel to ‘Email’

You prob­a­bly have rel­a­tively few chan­nels so cre­at­ing indi­vid­ual “Starts With’ rules to han­dle each chan­nel is no prob­lem. But han­dling the sec­ond and third sec­tions of the track­ing code is tricky:

  1. You will likely have many cam­paign names and dates and you prob­a­bly don’t want to have to cre­ate a rule for every pos­si­ble name and date combination.
  2. Using Con­tains may lead to results you don’t expect. In the exam­ple above if you use a rule that says “If Con­tains ‘May’ then set Cam­paign Date to ‘May’”, you’ll end up mis-clasifying the track­ing code.

This is where reg­u­lar expres­sions come to the rescue.

What are reg­u­lar expressions?

Wikipedia defines reg­u­lar expres­sions this way:

reg­u­lar expres­sion is a spe­cific pat­tern that pro­vides con­cise and flex­i­ble means to “match” (spec­ify and recognize) strings of text, such as par­tic­u­lar char­ac­ters, words, or pat­terns of characters.

Some of you are prob­a­bly famil­iar with the con­cept of “wild­cards” that are used in string searches. In Win­dows, for instance, you can use a ques­tion mark (?) to rep­re­sent any sin­gle char­ac­ter and an aster­isk (*) to rep­re­sent any string of characters:

201?results.*

Think of reg­u­lar expres­sions as wild­card searches on steroids. Reg­u­lar expres­sions are so pow­er­ful that it can take a while to learn how to fully lever­age their capa­bil­i­ties, but learn­ing the basics is pretty easy. For exam­ple, here are a few of the com­monly used search para­me­ters in reg­u­lar expressions:

\s Any white­space character
. Any sin­gle character
\S Any non-whitespace character
\d Any digit
\D Any non-digit
\w Any word char­ac­ter (let­ter, num­ber, underscore)
\W Any non-word character
\b Any word boundary
(...) Cap­ture every­thing enclosed as a parameter
(a|b) a or b
a? Zero or one of a
a* Zero or more of a
a+ One or more of a
a{3} Exactly 3 of a
a{3,} 3 or more of a
a{3,6} Between 3 and 6 of a
[abc] A sin­gle char­ac­ter of: a, b or c
[^abc] Any sin­gle char­ac­ter except: a, b, or c
[a-z] Any sin­gle char­ac­ter in the range a-z
[a-zA-Z] Any sin­gle char­ac­ter in the range a-z or A-Z
^ Start of line
$ End of line
\A Start of string
\z End of string

Reg­u­lar expres­sions in clas­si­fi­ca­tions rules

In the clas­si­fi­ca­tions rule builder you can use reg­u­lar expres­sions to match a wide vari­ety of text, char­ac­ters, words and pat­terns and use them to set clas­si­fi­ca­tion columns: Con­tin­u­ing the track­ing code exam­ple I started with ear­lier in this post, let’s sup­pose I set up 3 rules that look like this:

If Starts With em: set Chan­nel = Email
If matches regex ^([^:]+):([^:]+):([^:]+)$ set Cam­paign Name = $2
If matches regex ^([^:]+):([^:]+):([^:]+)$ set Cam­paign Date = $3

Holy cow what does all that mean? Let’s take a closer look:

First, notice I used the same reg­u­lar expres­sion in both the sec­ond and third rules but set a dif­fer­ent clas­si­fi­ca­tion in each case. This reg­u­lar expres­sion matches any string (track­ing code in this case) that starts with one or more non-colon char­ac­ters, then a colon, then more non-colon char­ac­ters, then a colon, then more non-colon char­ac­ters. Note that the reg­u­lar expres­sion could eas­ily be mod­i­fied for use with any delimiter.

Sec­ond, I used paren­the­ses to define por­tions of the string as para­me­ters that I can use in my clas­si­fi­ca­tion rules.  Since I have three sets of paren­the­ses I have defined three para­me­ters: $1, $2, and $3. Based on my rules if my track­ing code is:

em:MaybellineSale:June

then the sec­ond and third rules will set:

Cam­paign Name = May­belli­ne­Sale
Cam­paign Date = June

What’s more, the reg­u­lar expres­sion I’ve cho­sen will work regard­less of what sub­strings occur in the sec­ond and third sec­tions of the track­ing code. Sweet! Two rules to rule them all.

Amaz­ing. Where can I learn more?

The online doc­u­men­ta­tion for the clas­si­fi­ca­tions rule builder con­tains a short primer on reg­u­lar expres­sions and sev­eral exam­ples of use cases that came up dur­ing our beta testing. For a full trea­tise on reg­u­lar expres­sions I rec­om­mend this site: http://​www​.reg​u​lar​-expres​sions​.info/

In con­clu­sion

I hope you’ve found this series of blog posts help­ful. Feel free to leave com­ments and ques­tions below. I’m also inter­ested in hear­ing from you if you come up with a really great reg­u­lar expres­sion for the rule builder that you’d like to share with the rest of the world.

2 comments
Arcadi
Arcadi

Is there a way to take information from two different eVars to create a classification?

Take for example a shared page name that sometimes is used to display Product Type A, sometimes for Product Type B and other times for Product Type C. So we'd have the Page Name stored in a eVar5. In a separate eVar9 we have the Product Type. Would we be able to look at both eVar5 and eVar9 to determine what data to put in a single classification? Or do we have to have both pieces of information concatenated into one eVar?

EricLanser
EricLanser

Can we append a capture group (e.g. $2 or $3 in your examples) to some text.
E.g. following your example above, for a classification like: 'Channel and Name'.  Where Channel is set as a string, and Name via a capture group.

If matches regex ^(em):([^:]+):([^:]+)$ set 'Channel and Name' = "Email-"&$2

That way we'd have a report that showed our general name ('email', rather than 'em') along with the the Name.

In some of my applications, we have multiple independent ways to set the Channel (based on CID, EID, or backup methods such as referring domain).  I'd like a record of both what channel was set, and how it was set in a single report for identifying cases where tracking may be missing but a backup method was used to categorize the data.

Or, would you recommend relying on datawarehouse, discover, or other tools that allow multiple dimension breakdowns?