Archive for March, 2010

An Accessibility Checker for PDF Forms

One of the great things about PDF forms is that they have a really good accessibility story. You can create PDFs and PDF forms that allow the visually impaired user to interact with the document with the help of a screen reader. However, the quality of the experience is entirely in the hands of the PDF author. If your PDF consists of only a scanned page, the accessibility story is very poor. If your PDF form is richly annotated with audible cues, then the accessibility experience should be very good.

BTW, if you need help, here is a good resource for Accessibility Guidelines.

The problem is that your average form author does not test their result with a screen reader. They may understand the guidelines to make a form accessible, but they do not get feedback as to whether the guidelines have been followed correctly.

How well did your form author do with enabling accessibility? I have written an accessibility checker that will give at least part of that answer. I have written a designer macro (see Designer Macros) that will examine a form and validate the common properties that contribute to accessibility. (I need to repeat the usual caveat – Designer macros are a prototype/kick-the-tires/unsupported feature of Designer ES2).

When you run the macro, this is the dialog the designer user sees:

A graphic of a dialog showing accessibility checks that can be run on a form

Some explanation around the various checks:

Fields without captions
A caption is far more valuable in providing context than independent text that may have been placed near the field. With this option the macro will check that all fields have captions except for those that appear in a table cell.

Field with no assist text

There are several sources of data to provide context information for a field: field name, caption, tool tip, and custom (speak) text.  "assist text" refers to either a tool tip or custom text. If the field does not have one of these, then the macro reports a warning.

Images with no alternate text

Similar to above, we insist that an image has either a tool tip or custom text. If an image is explicitly marked with "Screen reader precedence: none" then we won’t report a warning.

Tables with no header row

Screen readers have special treatment for tables. Navigating through a table with a screen reader makes sense only if there is a header row providing some guidance.  In the absence of a header row we’ll issue this warning.

Read order and tab order diverge

Because of the hierarchical nature of form definitions and the way screen readers traverse them, there are limits on the flexibility of read order.  Specifically, it is possible to define a tab order where the order jumps out of a subform and back in again.  Read order does not allow exiting a subform before all content has been read.  The macro makes sure that the tab order for each subform has only one exit.  If there are two exits, then read order will not be able to follow tab order.

Report

Once the script has done its work analyzing the form, it generates a report using the included report template.  The report looks like:

A two column report with one column of SOM expressions and a second column with error descriptions

Ideally the macro mechanism would have access to the warnings palette, but for the time being, a report will have to be enough.

Here is a Zip file with the files needed for the macro, along with a sample form where I managed to break all the accessibility rules.

I am interested in getting feedback from users if there are other useful checks to make the coverage of the macro more extensive.

Parameterize your SOAP address

The problem for today is how to change the SOAP address used by a PDF form. The common scenario is that users want to have one SOAP address when developing and testing their form, but a different address when deploying it.  In fact, depending on your server configuration, you may want to deploy the same form on different servers with different SOAP addresses.

Rather than change the form for each server we deploy to, we prefer to parameterize the SOAP address.   There are two main challenges for solving this problem:

  1. In a form definition, the WSDL/SOAP definition is read-only
  2. For forms that deal with sensitive data, any SOAP addresses need to be covered by certification in order to prevent any ‘man-in-the-middle’ attacks

Weather Report

For today’s sample, I used a weather report SOAP service provided by www.webservicex.net.  This service: http://www.webservicex.net/WCF/ServiceDetails.aspx?SID=50 allows you to get a six day forecast for US cities based on a zip code.  Here is the corresponding WSDL.

For more background on using a WSDL definition from a PDF form you can check out this previous entry.

Cloning the Connection

There are two ways to execute a SOAP operation:

  1. With declarative XFA markup (the <execute> element)
  2. With a script call on the connection object (connection.execute())

By default, Designer will generate the <execute> markup.  If you want to modify the SOAP address, you need to use the script version. To get past the read-only restriction on connection objects, we first clone the connection.  With a read/write cloned connection we can change the SOAP address.  The code looks like this:

var vConnection = xfa.connectionSet.WeatherForecast.clone(true);
vConnection.soapAddress.value = "<revised soap address>";

Protecting the Address

In our scenario where we design the form once and want it to work with multiple SOAP addresses, we need to figure out how to pass the address to the form.  The obvious solution is to embed the SOAP address in the form data.  This strategy works fine if your form is accessing a weather service.  But if your form is interacting with a financial institution we need something more secure.  We need to deliver the form and the SOAP address in a manner that cannot be tampered.

For sensitive forms we recommend certification.  A certified form cannot be modified without the end user becoming aware that the form definition has changed. 

Parameterized Submit URL

We’ve previously solved a similar problem by allowing submit URLs to be parameterized.  The strategy in this case was to embed an array of submit URLs in the config data.  Then in the form data we inject an index value to indicate which submit URL to use.  The config grammer looks like:

<config xmlns="http://www.xfa.org/schema/xci/2.8/">
   <present>
      <submitUrl>http://service1.submit.net/</submitUrl>
      <submitUrl>http://service2.submit.net/</submitUrl>
      <submitUrl>http://service3.submit.net/</submitUrl>
      …
  
</present>
</config>


Then in our form data we add a data element that selects the appropriate URL:

<xfa:datasets xmlns:xfa=”http://www.xfa.org/schema/xfa-data/1.0/”>
  <variables xmlns=”http://ns.adobe.com/server-context-data/”>
    <
submitUrlIndex>2</submitUrlIndex>
 
</variables>

</xfa:datasets>

With submit URLs, this all works automatically.  When you leave the submit URL unspecified in your form definition, Reader will automatically look up the submitUrl from config.

Reduce, Re-use, Recycle

We were tempted to add new grammar and new code to Reader to do the same for SOAP addresses as we did for submit URLs — but we thought better of it.  We can make this work for SOAP addresses by re-using the submitUrl grammar and adding a bit of JavaScript to do the lookup.

Assuming we now populate the config <submitUrl> grammar with SOAP addresses, we’ll use a new data variable for indexing:

<xfa:datasets xmlns:xfa=”http://www.xfa.org/schema/xfa-data/1.0/”>
  <variables xmlns=”http://ns.adobe.com/server-context-data/”>
   <soapUrlIndex connection="WeatherForecast">2</soapUrlIndex>
 
</variables>

</xfa:datasets>

Of course, this won’t work automagically as it does with submitUrl.  We need to write the script to retrieve the SOAP address.  That’s about 25 lines of script.  Have a look at the click event of the button in the sample form to see how this works.

The Deep End

Did you notice all those nice weather images that show up in the forecast?  You might have assumed that the image data was returned to us by the SOAP call.  Well… not exactly.  The SOAP response gave us the image URLs.   And as you know, a PDF cannot load an external image.  The solution is to embed all the images in the PDF. Yep, all 339 images used by the SOAP service are embedded in the PDF.  Fortunately they’re pretty small images.  In a previous post I talked about the benefits of linked images: Linked vs Embedded Template Images

There is a nice trick to getting all those images embedded.  When you are generating your form on the server, your form data may contain URL references to images.  The syntax looks like:

<img xfa:contentType="image/jpg"
                                href="http://forecast.weather.gov/images/wtf/blizzard.jpg"/>

When we create the PDF on the server (or in Designer preview) all image hrefs found in the data are embedded in the PDF and are indexed by their URL.  When the SOAP service returns an image URL to us, we simply assign that value to an image field and the field automatically gets connected to the image embedded in the PDF.

vNewDay.weatherImage.value.image.href = vDay.WeatherImage.rawValue;

Have a look at the form:ready script on the zip code field for the script that works with the SOAP response.

Here is the data file for pre-loading all those images.  Be warned that using this data file will make your PDF generation *very* slow.

Track PDF Forms with Omniture

No doubt you noticed that Adobe acquired Omniture — a company that provides online business optimization software — starting with web analytics.  One of the integration possibilities is to help companies track the activity inside their PDF documents — including forms.  What pages did they view? Did they print? save? add annotations? sign?  In the case of a form: what fields did they fill in? What buttons did they click?  How far into the form did they get before they abandoned their session?  Today we’ll work through a sample of adding tracking code to a PDF form.

Tracking code

When you want to track activity on a web side, the Omniture tools offer assistance for instrumenting your html pages.  You give it your tokens and it returns the appropriate script to embed in your source.  Similarly, we can generate code to add to ActionScript, Java and other environments.  In this blog entry I’ll show you how to do the same for your PDF.

A word about privacy

Tracking activity is a sensitive business.  End-users have the right to know that their actions are being tracked.  They also have the right to opt out of tracking.  Adobe Reader has a security policy that protects users.  In practise, what this means is that when a PDF is hosted in the browser, the document may post data as long as it adheres to the cross domain restrictions.  When a PDF is open stand-alone, it can perform http operations only if there is a level of trust.  A couple of ways to establish trust are to use a certified PDF, or the user can explicitly allow http access via the "phone-home" dialog.

Today’s sample, limits the tracking experience to PDF forms that are open in the browser.   Tracking a PDF in standalone Reader isn’t really recommended, because the phone home dialog is too ugly:

warning1

Data Insertion API

The API used by HTML JavaScript to do tracking is based on doing an HTTP Get operation from an image resource.  However, there are other APIs.

Omniture exposes a Data Insertion API where you can http post simple XML fragments to the server.  Once you’re logged in with a developer account, you can find this API described at:

https://developer.omniture.com/documentation/datainsert/understanding

The XML grammar used is fairly simple.  The sample form constructs XML ‘pulse’ transactions that look like:

<request>
   <prop1>Acrobat9.3:WIN</prop1>
   <language>en_CA</language>
   <visitorID>27585603</visitorID>
   <pageURL>51cb51b4-535d-49a4-b6bd-1a975cc94f69</pageURL>
   <pageName>firstname:changed</pageName>
   <channel>PDF Form</channel>
   <reportSuiteID>FormTracker</reportSuiteID>
</request>

Of course, you can format this data any way you like — as long as the reportSuiteID and the URL that you post to are correct.

A few notes about the various fields we populated:

prop1

The API allows us to include up to 50 user defined properties: prop<1> to prop<50>.  In the sample, I’ve included some information about the version of Reader/Acrobat and the platform.  I originally wanted to put this information under <userAgent>, but that value is applicable only to browsers.

visitorID

When tracking from a web page, the way to identify a visitor is with an IP address or with a script-generated id stored in a cookie.  However inside the Acrobat object model, there is no equivalent property to uniquely identify a visitor.  Ideally we’d be able to establish a constant visitorID between the users session in the browser and their session in Adobe Reader.  There’s some more discussion about establishing a unique visitorID below in "the deep end".

pageURL

We need something to identify the PDF.  Using the PDF name is not reliable, since PDFs are easily renamed.  The sample below uses the xfa.uuid property.  This value remains constant even if the form is renamed.  For non-xfa PDFs we could use the doc.docID[0] property.

pageName

The form uses pageName to encode the action that has taken place.  I adopted a scheme where the string is a combination of "field name : event : additional information"

channel

A way to categorize groups of transactions for better reporting.

The Sample Form

Unfortunately I couldn’t include a fully functioning sample form.  I have an Omniture sandbox set up for my own testing, but would rather not expose it to the world. The visitor namespace used in the example is fictitious.  Instead, I’ve changed the code that would normally post data and instead it will populate a field with the xml that would otherwise have been posted to the server.  To see the sample work — follow the link above and open it in the browser.  Or download it and open it in Designer ES2 preview mode.

Detecting a browser

As stated earlier, the sample form will track user activity only when hosted in the browser.  To detect when we are in the browser we look at the document path from the acroform object model: event.target.path.  If the prefix includes a protocol scheme (e.g. http:) then we know we are hosted in the browser.  (as an aside, Designer uses the browser plugin mechanism for hosting Acrobat/Reader when in preview mode.  When testing the sample form in Designer preview, it will behave as if it were loaded in the browser.  This explains why when you close your designer preview, the form itself doesn’t close — until the next preview.  We get the browser behaviour where the document is kept open for a while in the event that the user navigates back to the page hosting that PDF.)

Designer Macro

When you look at the sample form you’ll see that I’ve injected lots of script to gather and emit pulse data:

  • A hidden Tracker subform that contains a script object, and several other events
  • enter and exit events on every field in order to track when field values change

Manually adding script for tracking would get very tedious.  To make it easier, I wrote a designer macro that will instrument my form for tracking.  The macro dialog looks like:

image

Once you select the options you want, the macro injects the required script.  If you want to remove the tracking code from your form, de-select all the tracking options and press "Ok".

Here is a zip file with the macro JavaScript, SWF, and MXML.

HTTP Post

Posting from an XFA form is pretty straightforward, given that FormCalc includes a built in post() function.  However posting from a non-XFA form is not so easy.  I tried a number of options:

doc.submitForm() — While this uses HTTP post, it also displays the server response.  In this case the Omniture server returns: <status>SUCCESS</status>.

Net.HTTP.request() — cannot be called from within a document. This function is available only in folder-level JavaScript.

Net.SOAP.request() — The documentation makes it look like it could be dumbed down to do a raw post, but in practise this is not the case.

The method I eventually cobbled together was to embed an XFA-based PDF as an attachment to the document I wanted to track.  When the document wanted to initiate tracking, it opened the attachment in the background and called into the tracking functions defined there..

The Deep End

There are several interesting things about the markup injected into the form:

HTTP Post

The call to post data is made using the FormCalc post() function.  In order to call post(), I added a "full" event to the tracker subform. We use xfa.event properties to hold the parameters to post() and invoke it with a call to
Tracker.execEvent("full");  This technique is described at: Calling FormCalc From JavaScript.

Multiple events

You might think that adding an enter and exit event to every field object would be a problem if the form happened to have its own enter and exit events.  However, the XFA spec allows fields to have multiple events with the same activity.  i.e. there’s no problem having two enter events. They’ll both fire.  However, Designer will show you only one enter event.

Protos

To keep the markup as terse as possible, I made use of protos when injecting script.  The tracking subform contained the source code for the enter and exit events:

<proto> 
  <event activity="enter" name="Track_enter" id="Track_enter"> 
    <script contentType="application/x-javascript"> 
    
Tracker.Track.FieldEnterExit();
    </script> 
  </event> 
  <event activity="exit" name="Track_exit" id="Track_exit"> 
    <script contentType="application/x-javascript">
      Tracker.Track.FieldEnterExit();
    </script> 
  </event> 
</proto>

Then when adding these events to field objects, the syntax is very terse:

<field>
   <event use="#Track_enter"/>
   <event use="#Track_exit"/>
</field>

Propagating Events

Instead of adding enter and exit events to every field, I could have used a single propagating enter/exit event for all fields.  But since propagating events are available only since 9.1, I chose to add individual events so that the form would work in older releases of Acrobat/Reader.

Tracking validation errors is a different matter.  In this case there is no easy workaround for older versions of Reader — unless you’ve implemented some kind of validation framework.  In order to track validation failures the form uses the validation state change event.  Any time it fires, the form posts to the Omniture tracking server.  Note that the state change event also uses syntax not exposed by designer:

<event activity="validationState" ref="$form"
       name="event__validationState" listen="refAndDescendents">
   <script contentType="application/x-javascript">
   …
   </script>
</event>

Notice the attribute "ref="$form".  Designer doesn’t expose the ref attribute.  It would default to "$" — the current node.  In our example we’re able to house this logic inside the Tracker subform, but have it monitor validation activity in the rest of the form by pointing it at the root form model.

Ideally the Designer macro would be able to query the target version and then would control whether logic to track validation failures is feasible.

Unique Visitor ID

There is one way to create a persistent id using the Acrobat object model — by way of the global object.  I won’t bore you with all the details about how the global object works, but I will show you how I used it to create a persistent id:

/**
* Effective reporting needs a persistent visitor id — across
* all PDF documents.
* @return a persistent visitor id
*/
function getVisitorID() {
    var sVisitorID = "";
    // We use the global object to store/retrieve a visitor id.
    for (var sVariable in global) {
        // The global object security policy doesn’t
        // allow us to examine the contents
        // of all global variables, but it does allow us
        // to enumerate them.
        // We’re looking for a variable named:
        // _OmnitureTracking_*
        // The trailing digits will be our visitor id.
        if (sVariable.indexOf("_OmnitureTracking_") === 0) {
            sVisitorID = sVariable;
            break;
        }
    }
    if (sVisitorID !== "") {
        // Strip off the prefix
        sVisitorID = sVisitorID.replace(/^_\S*_/, "");

    } else {
        // Create a new visitor id
        sVisitorID = Math.ceil(Math.random() * 100000000);
        var sVisitorVar = "_OmnitureTracking_" + sVisitorID;
        // Add this visitorID as a global, and make it persist
        // so that it will be available next time in as well.
        global[sVisitorVar] = "x";
        global.setPersistent(sVisitorVar, true);
    }
    return sVisitorID;
}

In the scenario where the PDF is being tracked in the context of a web site, we might consider embedding the users web site visitorID into the form data.  Then for the PDF tracking we’d concatenate the two values.

 

Using the Acrobat JavaScript Debugger with XFA

Many of you are already aware that the JavaScript debugger inside Acrobat partially works with XFA forms.  You can turn on the debugger in the Acrobat preferences:

image

When debugging is turned on, the Acrobat JavaScript console will allow you to navigate to objects in the XFA hierarchy and set break points in a subset of events:

 image

BUT (and you knew there was a "but" coming) there are some serious limits:

  • Using the debugger with XFA forms is not officially supported
  • We cannot debug script objects
  • Storing break points doesn’t seem to work.  This makes it hard to debug an initialization script (unless you force your form to do an xfa.form.remerge())

On the other hand, even with these limits, many of you will find the debugger useful.

There’s another "but" and this is really my main reason for posting on this topic. 
You need to turn off the debugger when you’re not using it.

There are two reasons:

Exception handling.  Note the first dialog above has the option to break when an exception is thrown.  If you’re not expecting it, this option can circumvent normal JavaScript processing.  e.g. it is good JavaScript practise to use try/catch to quietly handle error conditions.  With try/catch we can detect an error condition and allow the application to continue uninterrupted.  But when the debugger has been told to break on exceptions, the quiet thing doesn’t happen any more.  The form stops and you get a message to the console.

Performance.  Do you remember the game of life form? I used it to illustrate some performance characteristics in this blog entry.  The form has a couple of buttons with around 30,000 lines of JavaScript in their click event.  Under normal circumstances, these scripts finished in between 4 and 10 seconds.  With the JavaScript debugger enabled, these took … 10 minutes.  That’s right, roughly a 50 times slow down in script performance with the debugger enabled.

The Inevitable Question

When will XFA have proper JavaScript debugging support?  This is a hard question to answer.  But it gets asked a lot.  Believe me when I say that  we’ve taken a run at this problem many times in the past.  But the fact remains that there are some substantial technical barriers that are holding us back. 

Updated Form Debugger

Last year I posted a sample form that was useful for debugging the internals of dynamic forms: Debug merge and layout.  In the mean time, I’ve discovered a couple bugs.  Time for a maintenance release.  The main changes are:

  • Does a better job finding the coordinates of objects — anything inside a subformSet, fields inside exclusion groups.
  • Reports the breakBefore and breakAfter properties correctly
  • General code cleanup and a bit of optimization

As I’ve mentioned before — this tool is useful not only as a way to debug your merge/layout, but also a good way to visualize what’s going on inside the the XFA processor.

Here’s the updated debugger form.