Archive for July, 2010

xfa.context

Yesterday I discovered that we have an undocumented script object property that you might find useful.

The root xfa object has a property called “context”.  This property is a reference to the object that is hosting a calculation/validation or event.  i.e. it returns what is normally referenced by “this” in those scripts.

I recently came across a scenario where knowing the source context is very useful.  In my case, I wanted a function in a script object to know the calling context.  It worked well to use xfa.context.

I have attached a sample form where I have a set of expense report items.  I wanted to write a sum() function in JavaScript (rather than using formcalc).  Of course, I wanted the sum function to be re-usable so I put it in a script object on the root subform.  You call the function with a SOM expression that returns the fields to be summed:

utils.sum("expense[*].amount");

The challenge here is that the SOM expression is relative to the field calculation that uses it.  That means this code will not work:

function sum(sExpression) {
  // "this" is at the form root
  // and won’t find the result
  this.resolveNodes(sExpression);
  ...
}

One workaround is to provide a fully explicit SOM expression:

utils.sum("ExpenseReport.expenses.expense[*].amount");

But I don’t like this approach.  The calculation is now not encapsulated.  If the form author modifies the hierarchy in any way, the script will fail.  It also means that it’s very difficult to place this logic in a fragment where you don’t know the absolute context.

A second workaround is to resolve the nodes before calling the method:

utils.sum(this.resolveNodes("expense[*].amount"));

I don’t really like this either – it reduces the readability of the code. 
Instead, I used this:

function sum(sExpression) {
  // resolve sExpression in the context
  // of the calling script
  xfa.context.resolveNodes(sExpression);
  ...
}

Note that xfa.context is a read/write property, but at this point I have not found a useful reason to assign a value to xfa.context.

Shared Data in Packages Part 2

Last week I started to outline a design pattern for sharing data in a package.  Again, the general problem we’re trying to solve is to allow multiple forms in a PDF package exchange data — where fields common to multiple forms is propagated.  Part 1 of the problem is establishing a communication mechanism between documents.  Once the documents have been disclosed to each other, any document in the package can modify any other document.  Today’s Part 2 entry describes how to do the sharing.

Data mapping strategies

Implementing data sharing means solving a mapping problem: How do we know which fields in one PDF correlate to which fields in other PDFs? There are several techniques we could choose, including:

  1. Use the same name for common fields
  2. Generate a manifest that explicitly correlates fields
  3. Base all forms in the package on a common data schema

For my solution, I’ve chosen the 3rd option: common schema.  The underlying assumptions are:

  • Field values get propagated by assigning values in the data dom — a data value from one form will have the same data node address in each of the other forms
  • The data dom in the package document will hold the aggregate data of all forms in the package — this is the ‘master copy’ of the data.  Note that the package document does not have to have fields correlating to all the data elements.
  • Attached forms will have their data synchronized from the master copy of the data when they are launched in Reader

Data Sharing Algorithm

With these assumptions in place, the actual algorithm is fairly simple:

  1. When any attachment opens, it registers with the package document
    (the topic of Part 1)
  2. At registration, all data values of the attachment are synchronized from the master copy of the data in the package
  3. While the attachments are open, detect when field values change.  The technique used to detect field value changes is to use propagating enter/exit events. We save the field value at field enter, and compare it to the result at field exit. 
  4. When a field value changes, send a synchronize message to the package document.
  5. When the package document gets a synchronizing message, it updates the master copy of the data and then propagates the change to all other open attached PDFs.

One benefit of this approach is that the actual logic used to synchronize the data resides in the package document. This means you can customize your data sharing algorithm by modifying only the script in the package.

Design Experience

The really good news is that you can put this all together in a very simple design experience.  With today’s sample, there are two fragment subforms: packageSync and embeddedSync. You can probably figure out where they each go.  The fragment subforms have all the logic needed to register and synchronize the documents.  They contain propagating enter/exit events so that all fields in the form are automatically synchronized.  So the Design experience is as simple as:

  1. Drag the packageSync fragment on to a package subform
  2. Drag the embeddedSync fragment on each each attachment form
  3. Attach the embedded forms to the package form

Global Submit

Since the package document holds the aggregate of all the data, a global submit operation can be achieved by simply submitting the data from the package document.

Limits

There are some limits to the synchronization that you should be aware of:

  • No handling for complex field value: rich text, images, multi-select choice lists
  • Does not synchronize subform occurrences
  • Does not synchronize calculated values
  • Because we are relying on propagating events, the solution works only in Reader 9.1 or later

Each of these problems are solvable. I was just too lazy.

The Sample

The sample form and all fragments are in this zip file. Try opening PackageSample.pdf and one or both attachments.  Fill in fields and observe that they get propagated.  Note that any field that has focus will not get updated until you exit the field.

Shared Data in Packages Part 1

In a previous post I warned about using doc.disclosed = true; There’s just too much risk from some rogue form that you might happen to open.  But yet, there are some very compelling applications we can develop if/when we could selectively disclose a document to another trusted PDF.  The application I have in mind is where we share access to the various documents that are open in a package.

I want to be able to propagate shared data between a package document and its attached forms.  When multiple forms capture the same data (e.g. name, address) you’d like to be able to capture it once and have the values automatically shared with each form in the package.

The shared data problem had two parts:

  1. Establishing a trusted connection between a package document and its
    embedded documents
  2. Propagating field values between documents

The first problem was very difficult.  It’s the topic of today’s post.
There will be a part 2 where I provide a solution for the data sharing
mechanics.

Today? We’re pretty much in the deep end.

Opening an embedded document

A package could establish communications with its embedded documents by opening each of them using: doc.openDataObject().  But this solution doesn’t scale.  Your package could have dozens of embedded forms.  We can’t assume it works to open all of them.  Eventually we’ll run into performance problems.

What we want is to get a handle to the embedded documents that the user has launched.  And there is nothing in the Acrobat object model that will tell you which of your child documents are open.

Step 1.  Modify the app object

We need a technique where the host/package document can expose an API that selectively discloses itself — just to its children.  The way to expose an API that all open documents can share is to modify the app object.  e.g. if I write
this code in one document:

app.myfunc = function() {return “hello world”;};

then any other open document can call app.myfunc();

Step 2.  Add a disclosedPackages object

The code outline belows shows how to add a disclosure API to the app object:

// At docReady, the host document/package adds a package 
// disclosure function/object to the acrobat app object 
var fDiscloseObject = new function() { 
    // private list of disclosed package functions 
    var disclosedPackagesList= [];
    this.disclose = function(packageDoc) { ... } 
    this.undisclose = function(packageDoc) { ... } 
    this.findPackage = function(embeddedDoc) { ... } 
}
if (typeof(app.disclosedPackages) === "undefined") { 
  app.disclosedPackages= fDiscloseObject; 
} 

After executing this code, any PDF can call:

app.disclosedPackages.disclose(packageDoc); 
app.disclosedPackages.undisclose(packageDoc); 
app.disclosedPackages.findPackage(embeddedDoc);

The specific usage:

On docReady, a package will disclose itself to its attachments by calling: app.disclosedPackages.disclose(event.target);

On docReady an attached document will try to locate its package document by calling:

app.disclosedPackages.findPackage(event.target);

Once it has a handle to its package document, it can call synchronize methods found in the package.

At docClose the package will call: app.disclosedPackages.undisclose(packageDoc);

Step 3. Validate an embedded child

The app.disclosedPackages object maintains a private
list of disclosed package documents.  We need to selectively disclose the package to any
of the children that calls app.disclosedPackages.findPackage(embeddedDoc);

The problem is: How do we determine whether the candidate document is
actually a child?

There are a couple of tests we apply:

1) Check whether the candidate path is consistent with an attachment.  The doc.path property of an
embedded PDF is constructed to look like:

|<packagedocpath>|U:<byte-order-mark><childfilename>

Loop through the parent’s embedded objects (doc.dataObjects) and look for any that have the same path as the candidate object.

Once we’re satisfied that the paths match we perform test 2

2) The package opens the embedded object that has a path matching the candidate. The package now has two doc objects and needs to confirm that they are the same.  The technique we use is to modify one and see if the modification shows up in the other.

Step 4. Spoof-proof the code

Unfortunately we live in a world where we have to take extra precaution to protect ourselves from hackers.  After all, that’s why we don’t use doc.disclosed in the first place.  In step 2 we added a JavaScript API accessible to every PDF open on our system.  How do we ensure that this API cannot be replaced by malicious JavaScript? After all, if we modified the app object, another document could overwrite our code with their own script.

Check your source

var fDisclose= new function() {
    // private list of disclosed package functions 
    var disclosedPackagesList = [];
    this.disclose = function(packageDoc) { ... }
    this.undisclose = function(packageDoc) { ... }
    this.findPackage = function(embeddedDoc) { ... }
 } 
if (typeof(app.disclosedPackages) === "undefined") {
    app.disclosedPackages = fDisclose; 
}
// Make sure the disclosedPackagesfunction has not been 
// replaced/spoofed 

if (app.disclosedPackages.toSource() === fDisclose.toSource()) {
     app.disclosedPackages.disclose(event.target); 
}

Note that we disclose ourselves only after making sure the source is the same as the original.

The clever JavaScript coder will point out that the toSource() method can be overridden. Not so.  The Acrobat JavaScript implementation does not allow overriding the toSource() and toString() methods of objects.

Naming Conflicts

Something to be careful about is managing different versions of this code.
If there are two variations of this code in different PDFs and both use the same name (app.disclosedPackages), one of them will fail.  So if you write your own version of this, use a unique name.  Better yet, use a unique name that incorporates a version number so you can manage the code over time.

What? No Sample?

Next post.  I promise.

PaperForms Barcode Performance

We have had some feedback around the performance of paper forms barcodes on large forms.  Seems that when customers use multiple barcodes to process large amounts of data, their form slows down. 

The reason the form slows down is because the barcode recalculates every time a field changes.  The calculation is doing several things:

  1. Generate minimal, unique names for each data item. When the names are included in the output, we want them to be as terse as possible, while still uniquely identifying the data.  To do this, each name needs to be compared to all other names.
  2. Gathering data values to be included in the output
  3. Formatting the result as delimited text

It’s the first item that takes the bulk of the time.  In order to find the minimal name, the script compares each data node name against all others and iteratively expands the names until they are unique.  The algorithm appears to have O(n2) performance — which means that it degrades quickly when the number of data elements grows large.

There are three techniques you can use to improve the performance:

1. Do the work in the prePrint event

Move the barcode calculation to the prePrint event. In the barcode properties, uncheck "Automatic Scripting" and move the script from the calculate event to the prePrint event. Now, instead of recomputing the barcode every time a field changes, we compute the barcode only once — just before it gets printed.

2. Use unique field names

When the script encounters duplicate field names, it does lots of extra work to resolve them.  So don’t ask the script to do so much work.  Use unique field names. For example, instead of:

item[0].quantity
item[0].price
item[1].quantity
item[1].price
item[2].quantity
item[2].price

try:

item[0].quantity0
item[0].price0
item[1].quantity1
item[1].price1
item[2].quantity2
item[2].price2

Not only will the script complete more quickly, but the names written to the barcode value will be shorter.  When they are not unique, they get prefixed with their subform name.  When they’re unique, they are left unqualified.

3. Do not include names — and modify the script

Since the bulk of the work that the script does is to come up with minimal unique names, let’s not write out the names.  Uncheck the "Include Field Names" option.  Unfortunately, the script goes through the effort to produce unique names even when names are not included in the output. You need to modify the script to prevent it from calculating names. Uncheck "Automatic Scripting" and add the lines in red below.

19 function encode(node)
20 {
21   var barcodeLabel = this.caption.value.text.value;
22   if (includeLabel == true && barcodeLabel.length > 0)
23   {
24     fieldNames.push(labelID);
25     fieldValues.push(barcodeLabel);
26   }
27 
28   if(collection != null)
29   {
30     // Create an array of all child nodes in the form
31     var entireFormNodes = new Array();
       if (includeFieldNames) {
32       collectChildNodes(xfa.datasets.data, entireFormNodes);
       }
33 
34     // Create an array of all nodes in the collection
35     var collectionNodes = new Array();
36     var nodes = collection.evaluate();
37 
38     for(var i = 0; i < nodes.length; ++i)
39     {
40       collectChildNodes(nodes.item(i), collectionNodes);
41     }
42 
43      // If the form has two or more fields sharing the …
44     // parents of these fields, as well as the subscript …
45     // their parents, will be used to differentiate …
46     // to take as little space in the barcode as possible, …
47     // data in the object names only when necessary …
       if (includeFieldNames) {
48       resolveDuplicates(collectionNodes, entireFormNodes,…);
       }

If you implement this option, odds are you won’t bother with the first two methods.  The performance of the script is now O(n) and should work fine in the calculate event with non-unique names. And … is it just me?  I get a kick out of watching the 2D barcode re-draw itself every time I modify a field. 

 

Accessibility Bug Fix

Recently one of our customers discovered an accessibility bug that caused read order and tab order to diverge. When we investigated it, we discovered that the bug existed in the reader runtime, but that we could work-around the bug by changing the syntax that Designer generates. Fixing the problem in Designer is much preferable, since that doesn’t require upgrading Reader. Of course, waiting for a fix in Designer is not easy either. Fortunately, the macro capability offers us an alternative way to disseminate a fix.  First the usual caveat: macros are an experimental, unsupported product feature.  

The Bug

The bug scenario is when a custom tab order exits a subform; the read order follows the next-in-geographic order, while the tab order follows the explicit custom order.  Described differently, imagine a form with fields named "One", "Two" and "Three" and a subform named "S’.  The geographic order is: S.One, Three, Two, but the custom tab order is S.One, Two, Three. In the Reader runtime, the tab order correctly follows the custom order, but the Reader order gets confused on exiting the subform.

If you’re only interested in correct tab order you will not notice the problem.  However if you are using a screen reader and following read order, you will follow an incorrect order.  (If you examine the resulting form with tools like inspect32 or AccExplorer32 you’ll be able to see the result without actually running a screen reader)

I have included a sample form that illustrates the problem, and a version of the form with the fix applied.

The Fix

Down below, I’ll discuss the technical details of the fix.  But first the macro (rename to FixReadOrder.js) When you install and run the macro, it will scan the entire form searching for all the places where a custom tab order exits a subform. Each of these places will be modified. 

You will need to re-run the macro every time you make edits to the form that modify tab order.

The macro includes some logic so that it will stop functioning in the next version of Designer – given that we expect the bug to be fixed.

The Deep End

Custom tab order is controlled by the <traversal> and <traverse> elements.  The (stripped down) syntax of the original form looks like this:

 1 <subform>
 2  <subform name="S">
 3    <field name="One">
 4      <traversal>
 5        <traverse ref="Two[0]"/>
 6      </traversal>
 7    </field>
 8    <traversal>
 9      <traverse operation="first" ref="One[0]"/>
10    </traversal>
11  </subform>
12  <field name="Three"/>
13  <field name="Two">
14    <traversal>
15      <traverse ref="Three[0]"/>
16    </traversal>
17  </field>
18  <traversal>
19    <traverse operation="first" ref="Subform1[0]"/>
20  </traversal>
21 </subform>

There are two variations of the traverse element used here:

  1. operation="first": specifies the first child of a container (subform)
  2. operation attribute is unspecified (the default) — which means operation="next". Specifies the next element.

The runtime problem occurs when the traverse from field "one" at line 5 does not correctly navigate to field "Two".

Here is the fixed version:

 1 <subform>
 2   <subform name="S">
 3    <field name="One">
 4       <traversal>
 5         <traverse ref="Two[0]"/>
 6       </traversal>
 7     </field>
 8     <traversal>
 9       <traverse operation="first" ref="One[0]"/>
10       <traverse ref="Two[0]"/>
11     </traversal>
12   </subform>
13   <field name="Three"/>
14   <field name="Two">
15     <traversal>
16       <traverse ref="Three[0]"/>
17     </traversal>
18   </field>
19   <traversal>
20     <traverse operation="first" ref="Subform1[0]"/>
21   </traversal>
22 </subform>

The macro added a traverse element at line 10. Now we’ve explicitly told the subform which child to navigate to first and also where to navigate to when the children are finished.  In fact, we end up with two traverse elements pointing to field "Two".  Not surprisingly, the runtime correctly moves to "Two" in both tab order and read order.