Track PDF Forms with Omniture

No doubt you noticed that Adobe acquired Omniture — a company that provides online business optimization software — starting with web analytics.  One of the integration possibilities is to help companies track the activity inside their PDF documents — including forms.  What pages did they view? Did they print? save? add annotations? sign?  In the case of a form: what fields did they fill in? What buttons did they click?  How far into the form did they get before they abandoned their session?  Today we’ll work through a sample of adding tracking code to a PDF form.

Tracking code

When you want to track activity on a web side, the Omniture tools offer assistance for instrumenting your html pages.  You give it your tokens and it returns the appropriate script to embed in your source.  Similarly, we can generate code to add to ActionScript, Java and other environments.  In this blog entry I’ll show you how to do the same for your PDF.

A word about privacy

Tracking activity is a sensitive business.  End-users have the right to know that their actions are being tracked.  They also have the right to opt out of tracking.  Adobe Reader has a security policy that protects users.  In practise, what this means is that when a PDF is hosted in the browser, the document may post data as long as it adheres to the cross domain restrictions.  When a PDF is open stand-alone, it can perform http operations only if there is a level of trust.  A couple of ways to establish trust are to use a certified PDF, or the user can explicitly allow http access via the "phone-home" dialog.

Today’s sample, limits the tracking experience to PDF forms that are open in the browser.   Tracking a PDF in standalone Reader isn’t really recommended, because the phone home dialog is too ugly:

warning1

Data Insertion API

The API used by HTML JavaScript to do tracking is based on doing an HTTP Get operation from an image resource.  However, there are other APIs.

Omniture exposes a Data Insertion API where you can http post simple XML fragments to the server.  Once you’re logged in with a developer account, you can find this API described at:

https://developer.omniture.com/documentation/datainsert/understanding

The XML grammar used is fairly simple.  The sample form constructs XML ‘pulse’ transactions that look like:

<request>
   <prop1>Acrobat9.3:WIN</prop1>
   <language>en_CA</language>
   <visitorID>27585603</visitorID>
   <pageURL>51cb51b4-535d-49a4-b6bd-1a975cc94f69</pageURL>
   <pageName>firstname:changed</pageName>
   <channel>PDF Form</channel>
   <reportSuiteID>FormTracker</reportSuiteID>
</request>

Of course, you can format this data any way you like — as long as the reportSuiteID and the URL that you post to are correct.

A few notes about the various fields we populated:

prop1

The API allows us to include up to 50 user defined properties: prop<1> to prop<50>.  In the sample, I’ve included some information about the version of Reader/Acrobat and the platform.  I originally wanted to put this information under <userAgent>, but that value is applicable only to browsers.

visitorID

When tracking from a web page, the way to identify a visitor is with an IP address or with a script-generated id stored in a cookie.  However inside the Acrobat object model, there is no equivalent property to uniquely identify a visitor.  Ideally we’d be able to establish a constant visitorID between the users session in the browser and their session in Adobe Reader.  There’s some more discussion about establishing a unique visitorID below in "the deep end".

pageURL

We need something to identify the PDF.  Using the PDF name is not reliable, since PDFs are easily renamed.  The sample below uses the xfa.uuid property.  This value remains constant even if the form is renamed.  For non-xfa PDFs we could use the doc.docID[0] property.

pageName

The form uses pageName to encode the action that has taken place.  I adopted a scheme where the string is a combination of "field name : event : additional information"

channel

A way to categorize groups of transactions for better reporting.

The Sample Form

Unfortunately I couldn’t include a fully functioning sample form.  I have an Omniture sandbox set up for my own testing, but would rather not expose it to the world. The visitor namespace used in the example is fictitious.  Instead, I’ve changed the code that would normally post data and instead it will populate a field with the xml that would otherwise have been posted to the server.  To see the sample work — follow the link above and open it in the browser.  Or download it and open it in Designer ES2 preview mode.

Detecting a browser

As stated earlier, the sample form will track user activity only when hosted in the browser.  To detect when we are in the browser we look at the document path from the acroform object model: event.target.path.  If the prefix includes a protocol scheme (e.g. http:) then we know we are hosted in the browser.  (as an aside, Designer uses the browser plugin mechanism for hosting Acrobat/Reader when in preview mode.  When testing the sample form in Designer preview, it will behave as if it were loaded in the browser.  This explains why when you close your designer preview, the form itself doesn’t close — until the next preview.  We get the browser behaviour where the document is kept open for a while in the event that the user navigates back to the page hosting that PDF.)

Designer Macro

When you look at the sample form you’ll see that I’ve injected lots of script to gather and emit pulse data:

  • A hidden Tracker subform that contains a script object, and several other events
  • enter and exit events on every field in order to track when field values change

Manually adding script for tracking would get very tedious.  To make it easier, I wrote a designer macro that will instrument my form for tracking.  The macro dialog looks like:

image

Once you select the options you want, the macro injects the required script.  If you want to remove the tracking code from your form, de-select all the tracking options and press "Ok".

Here is a zip file with the macro JavaScript, SWF, and MXML.

HTTP Post

Posting from an XFA form is pretty straightforward, given that FormCalc includes a built in post() function.  However posting from a non-XFA form is not so easy.  I tried a number of options:

doc.submitForm() — While this uses HTTP post, it also displays the server response.  In this case the Omniture server returns: <status>SUCCESS</status>.

Net.HTTP.request() — cannot be called from within a document. This function is available only in folder-level JavaScript.

Net.SOAP.request() — The documentation makes it look like it could be dumbed down to do a raw post, but in practise this is not the case.

The method I eventually cobbled together was to embed an XFA-based PDF as an attachment to the document I wanted to track.  When the document wanted to initiate tracking, it opened the attachment in the background and called into the tracking functions defined there..

The Deep End

There are several interesting things about the markup injected into the form:

HTTP Post

The call to post data is made using the FormCalc post() function.  In order to call post(), I added a "full" event to the tracker subform. We use xfa.event properties to hold the parameters to post() and invoke it with a call to
Tracker.execEvent("full");  This technique is described at: Calling FormCalc From JavaScript.

Multiple events

You might think that adding an enter and exit event to every field object would be a problem if the form happened to have its own enter and exit events.  However, the XFA spec allows fields to have multiple events with the same activity.  i.e. there’s no problem having two enter events. They’ll both fire.  However, Designer will show you only one enter event.

Protos

To keep the markup as terse as possible, I made use of protos when injecting script.  The tracking subform contained the source code for the enter and exit events:

<proto> 
  <event activity="enter" name="Track_enter" id="Track_enter"> 
    <script contentType="application/x-javascript"> 
    
Tracker.Track.FieldEnterExit();
    </script> 
  </event> 
  <event activity="exit" name="Track_exit" id="Track_exit"> 
    <script contentType="application/x-javascript">
      Tracker.Track.FieldEnterExit();
    </script> 
  </event> 
</proto>

Then when adding these events to field objects, the syntax is very terse:

<field>
   <event use="#Track_enter"/>
   <event use="#Track_exit"/>
</field>

Propagating Events

Instead of adding enter and exit events to every field, I could have used a single propagating enter/exit event for all fields.  But since propagating events are available only since 9.1, I chose to add individual events so that the form would work in older releases of Acrobat/Reader.

Tracking validation errors is a different matter.  In this case there is no easy workaround for older versions of Reader — unless you’ve implemented some kind of validation framework.  In order to track validation failures the form uses the validation state change event.  Any time it fires, the form posts to the Omniture tracking server.  Note that the state change event also uses syntax not exposed by designer:

<event activity="validationState" ref="$form"
       name="event__validationState" listen="refAndDescendents">
   <script contentType="application/x-javascript">
   …
   </script>
</event>

Notice the attribute "ref="$form".  Designer doesn’t expose the ref attribute.  It would default to "$" — the current node.  In our example we’re able to house this logic inside the Tracker subform, but have it monitor validation activity in the rest of the form by pointing it at the root form model.

Ideally the Designer macro would be able to query the target version and then would control whether logic to track validation failures is feasible.

Unique Visitor ID

There is one way to create a persistent id using the Acrobat object model — by way of the global object.  I won’t bore you with all the details about how the global object works, but I will show you how I used it to create a persistent id:

/**
* Effective reporting needs a persistent visitor id — across
* all PDF documents.
* @return a persistent visitor id
*/
function getVisitorID() {
    var sVisitorID = "";
    // We use the global object to store/retrieve a visitor id.
    for (var sVariable in global) {
        // The global object security policy doesn’t
        // allow us to examine the contents
        // of all global variables, but it does allow us
        // to enumerate them.
        // We’re looking for a variable named:
        // _OmnitureTracking_*
        // The trailing digits will be our visitor id.
        if (sVariable.indexOf("_OmnitureTracking_") === 0) {
            sVisitorID = sVariable;
            break;
        }
    }
    if (sVisitorID !== "") {
        // Strip off the prefix
        sVisitorID = sVisitorID.replace(/^_\S*_/, "");

    } else {
        // Create a new visitor id
        sVisitorID = Math.ceil(Math.random() * 100000000);
        var sVisitorVar = "_OmnitureTracking_" + sVisitorID;
        // Add this visitorID as a global, and make it persist
        // so that it will be available next time in as well.
        global[sVisitorVar] = "x";
        global.setPersistent(sVisitorVar, true);
    }
    return sVisitorID;
}

In the scenario where the PDF is being tracked in the context of a web site, we might consider embedding the users web site visitorID into the form data.  Then for the PDF tracking we’d concatenate the two values.