Archive for February, 2010

Translating Forms using XLIFF

Many of you have been translating your forms using XLIFF and XSLT to export/import various strings found in the form.  If you haven’t, there’s a nice description of the process here: http://www.adobe.com/devnet/livecycle/pdfs/using_xliff.pdf

Translation Workflow

  1. Configure Designer to generate resource id values for strings in the template
  2. Use XSLT to extract resources from the template (xdp) into XLIFF format
  3. Use an XLIFF- based translator to generate translated strings
  4. Review/approve the translations
  5. Use XSLT to update the template (xdp) with the translated strings

Step 1 has been made easier in the ES2 designer. You can do it from the options menu:

image

For reference, the XLIFF 1.1 schema is described here:
http://www.oasis-open.org/committees/xliff/documents/cs-xliff-core-1.1-20031031.htm

And the XSD definition can be found here:
http://www.oasis-open.org/committees/xliff/documents/xliff-core-1.1.xsd

The using_xliff.pdf document has two xslt scripts attached.  I was having some trouble with the version that extracted the strings, and made a fix. Here are my changes to XDPtoS2X.XSLT.

A Translation Form

Today’s translator.pdf sample form is bound to the XLIFF 1.1 schema.  This means it can be populated with the XLIFF data extracted in step 2 (here is the XLIFF data we’re using in this sample).  Steps 3 and 4 above can now be executed using this form. This has a couple of immediate benefits over other translation tools:

  1. Your XLIFF editing tool (Adobe Reader) is free.  You can use Reader as long as you set up the form so that the XLIFF data gets injected in another process and add a submit button to send back the updated data. If you need to manage the data from the client then you need to either use Acrobat or a rights-enabled translator form.
  2. Any rich text found in the XLIFF data can be reliably edited in Reader — since Reader understands the XHTML subset supported by XFA.

Some things you will notice when you look at XLIFF data in the form:

  • The header area allows you to specify the various translation phases: design, translate, review.  This will allow you to adapt the form to be part of a translation workflow where the roles of translation and approval may be performed by different individuals.  When in the review phase, the form enables an approval checkbox for each translation.
  • Workflow participants can attach notes to each phase and to each individual translation. 
  • The data to be translated is either plain text or rich text (XHTML).  Each translation includes an indication regarding the text content type.  When editing rich text use control+e to bring up the rich text editing toolbar.
  • There is a copy button that copies the source text to the target text — when that’s convenient for the translator.

Preview

Notice that the translator form has a preview button at the top.  Preview will allow the translator to interact with the document that is being translated — and provide a much richer translation experience.  The preview button click event will look for other open PDFs on your machine, and if it finds one that matches the translation strings currently being edited, then preview will be enabled.  Note that by default open PDFs are not visible to each other.  We make the preview PDF visible by adding this script:

form1::docReady – (JavaScript, client)
event.target.disclosed = true;

When a document is disclosed, it is fully accessible to any other document. 

Try opening the translator form and the preview form: po.pdf side by side.  Click the preview button.  Now all kinds of good things happen:

  • When you enter into a subform that does a translation, the corresponding field in the preview form gets highlighted (where possible) and the preview document scrolls to the field in question.
  • We add context information to the translation — some indication of where the text comes from.
  • Each translation inherits the ambient font and paragraph properties of the text being translated. 
  • When a translation is specified, the preview document gets updated with the translated string. (Note that this merely changes the runtime representation of the form — it has not updated the actual form definition: the template).
  • When a translation changes a caption or text object, the translator form allows you to change the size, caption reserve and position of the translated object.

Note that if you interact with the preview form and change the layout (add/remove subforms) you will need to click the preview button again in order to refresh the cached data in the translator form.

Exercises left to the user

From what I’ve provided here, you will have an XLIFF editor, but I haven’t given you enough to construct the whole workflow.  Here are some of the issues you need to tackle:

  • Dynamic forms are variable, and that means that not all elements in the template appear in the preview form.  For example, a form could be designed so that either subform A or subform B is visible.  That means you may want to preview the form twice with data that creates both permutations.
  • The current XSLT scripts handle only strings.  The XSLT script that updates the XDP doesn’t apply updates to coordinates.
  • This form really ought to be incorporated into a LiveCycle workflow with separate steps for the translator and reviewer.  Use the LiveCycle XSLT component to apply the translation.
  • The current mechanism of opening the forms side-by-side is a bit awkward.  It would be nice if they could be hosted in a container application.  Maybe a web page. Maybe an AIR application.

More about Disclosed

You shouldn’t turn on disclosed for production forms.  They should be disclosed only while being translated.  Having them disclosed means that they are vulnerable to untrusted PDFs.  i.e. their behaviour could be changed by another rogue PDF that you might have opened.

There is an alternative to using the disclosed property.  You could install folder-level JavaScript on the system of your translator.  Folder level JavaScript executes in privileged mode and can
find all open documents whether disclosed or not.

Check Sample Version

The translator form uses a script object to return coordinates of form objects.  I’ve shared it as a fragment here.  As with previous samples, I’ve included logic so that you can check whether there are updates to the form or its embedded fragments.  This check performs an http get operation on an inventory file hosted on my blog site.  In the recent Reader releases, this http get operation is subject to tighter security.  When this function executes the first time, a yellow warning bar will appear in Reader/Acrobat and the http get request will fail.  In order to check version, you will need to choose to trust this PDF and then click the button again.

The Deep End

Storing coordinates:

As mentioned, the translator form allows you to modify the coordinates and dimensions of an object.  The way this is stored in the schema is with the coord attribute:

<group restype="description">
  <trans-unit id="034C9948-DAAC-4544-B7C7-F71791D37105"
              resname="034C9948-DAAC-4544-B7C7-F71791D37105"
              approved="no">
     <source>Address</source>
     <xlf:target xmlns:xlf="urn:oasis:names:tc:xliff:document:1.1"
                coord="0mm;59mm;92mm;6.3mm;15mm">Adresse</xlf:target>
  </trans-unit>
</group>

According to the XLIFF schema, the coord attribute is used to store x;y;w;h.  I took the liberty of adding a fifth number: caption reserve.   My apologies for not (yet) providing an xslt script to apply the changes to coordinates — maybe one of the readers of this blog will offer the solution.

PaperForms (2D) Barcodes with Repeating Subforms

One of our support engineers brought an issue with 2D barcodes to my attention this week.  I was able to help her with a solution, so I thought I’d share it here as well.

These barcodes encode form field data the user has typed in.  When the form is printed (or perhaps faxed) the barcode can be scanned and the form data retrieved.  This workflow was originally envisioned on static forms with fixed amounts of data — after all, a 2D barcode holds a limited amount of data.  As a result, the design process does not allow the inclusion of repeating subforms in the barcode data.  However, there certainly are cases where we could safely allow some repeating data in a barcode without overflowing the storage capacity.

I’ll show you how to fix the problem first, and then if I still have your attention I’ll give you the details on how it works.

The Fix

When you add your PaperForm barcode to your form, you also define a collection of fields and subforms to include in the barcode value.  If you add any repeating subforms to this list, then at runtime only the first subform instance will be included in the barcode.  In order to have the barcode include more instances, we need to modify the collection.  In the sample form, I’ve done this with an initialization script on the barcode field.  When you re-use this script, you need to change the last line so that it points to your collection:

makeManifestRepeat(BC_Collection);

You need to replace "BC_Collection" with the name of the collection used by your barcode.   There is one other detail needed to make this work for Reader 8.  When you add a new instance of a subform, you need to explicitly fire the barcode calculation.  The sample does this in the button click event:

PaperFormsBarcode1.execCalculate();

How it Works

The UI terminology for the fields included in a barcode is "Collection".  But the grammar calls it a manifest.  The manifest in the sample looks like this:

<manifest name="BC_Collection" id="2ae4d4a5-5e5d-4dba-b50e-e069b91533ce">
  <ref>xfa[0].form[0].bcTest[0].p1[0].Subform1[0].TextField1[0].dataNode</ref>
  <ref>xfa[0].form[0].bcTest[0].p1[0].Subform1[0].TextField2[0].dataNode</ref>
  <ref>xfa[0].form[0].bcTest[0].p1[0].Subform1[0].TextField3[0].dataNode</ref>
</manifest>

The manifest is a list of SOM expressions to the values to be included in the barcode.  Note that our repeating subform explicitly references the first instance: "Subform1[0]".  To make this manifest include all instances, we need to change it to "Subform1[*]":

<manifest name="BC_Collection" id="2ae4d4a5-5e5d-4dba-b50e-e069b91533ce">
  <ref>xfa[0].form[0].bcTest[0].p1[0].Subform1[*].TextField1[0].dataNode</ref>
  <ref>xfa[0].form[0].bcTest[0].p1[0].Subform1[*].TextField2[0].dataNode</ref>
  <ref>xfa[0].form[0].bcTest[0].p1[0].Subform1[*].TextField3[0].dataNode</ref>
</manifest>

 

We could have fixed this by editing the XML source in Designer, but that would be awkward.  Instead, we update the manifest when the form is opened in Reader — that’s the function of the initialization script.

Big and Complex Forms

How Big and Complex can you make your form?

I get asked this question often. Customers or partners develop very complex or large dynamic forms with many pages and large amounts of script. At what point do we cross the line and reach a level of complexity where Reader/PDF is no longer the right tool for the job?

There is no easy answer. The answer will be different for different users. But it is helpful to look at some of the stress points you’ll encounter with large forms.

Note that these notes apply to forms opened in Acrobat/Reader. The stress points for forms rendered on the server are much different.

  1. Number of pages to render
  2. File Size
  3. Script size, complexity and development methodology
  4. Script performance

Number of pages to render

One of the great properties of regular PDF files is that the file open time is constant no matter how large the PDF. The time to open a two thousand page PDF is pretty much the same as for a one page PDF. This is because Reader doesn’t load the whole PDF into memory and doesn’t read the bytes for page <n> until the user navigates to page <n>.

Dynamic XFA/PDF forms offer a different value proposition. The pages are shaped at form open time by the form data. Of course, there are great advantages to dynamic forms. But there are also associated processing costs. At form open time the entire form definition is loaded into memory. The entire set of data is loaded and merged with the form template. Reader performs enough of the layout to determine how many pages will be rendered. Then when you navigate to page <n>, Reader renders that page from the in-memory structures.

How many pages can Reader handle for a dynamic document? This depends on the complexity of the template. I’ve seen five page forms that take forever to open. I’ve seen a hundred page form open in a second. The limit is more related to the density/complexity of template and data rather than the actual number of pages.

Some form authors attempt to reduce file open time by hiding inactive pages. This strategy was effective in reducing form open time in Acrobat/Reader 7. But in Reader 8.1 when the form open algorithm was improved, the ‘page hiding’ strategy no longer makes a significant difference.

File Size

Dynamic XFA/PDF forms tend to be smaller than static documents. This is because of the template property of forms. For example: a hundred page static PDF will have a hundred pages of PDF mark-up. Whereas in the dynamic case, this could be one page of XFA mark-up that gets replicated a hundred times when merged with data. The latter will be a much smaller file. Nonetheless, dynamic documents can grow to the point where they begin to stress your system. The time to read and parse the documents happens very quickly – even for very large templates. However, the size of the template becomes more of a factor when there are security components in play. Operations such as Certification, Reader extensions and Signatures will perform comparison operations on ‘before-and-after’ versions of the form. The costs of these comparisons are proportionate to the size of the template.

So while there is no absolute threshold on file size, you will find the threshold is lower for certified/extended/signed forms.

Script size and complexity and development methodology

I have seen XFA/PDF files with tens of thousands of lines of JavaScript. Given that there is no debugger, you have to be pretty persistent to create this amount of script. If your big script library is well written, it may perform well enough, but the stress comes with the maintenance of the script:

  • When you change the script, do you have the ability to rigorously test your changes? When you modify fields or subforms, will your script still work? Do you have test collateral that gives you code coverage for all the edge cases in your script? Do you have some form of automated testing? QTP anyone?
  • Is your script maintainable? Or is the code ‘write-only’? Unless you have been disciplined in the creation of your library, you will have longer term maintenance issues when a new developer comes along to update an existing form.
  • When you encounter problems with your script, are you able to isolate the problem when you ask for help? Your friends in our support organization are much better at solving problems with small, simple forms than with large, complex ones. If your script is modular and isolated into components then you’ll be able to ask for help much more easily than if your script is an inter-tangled mess.
  • When you change script, do you preserve previous versions of your form? You need the ability to roll-back changes.

Again, there are no absolutes here, but if you want/need to write lots of script, you need to have the associated discipline in your development environment to make it maintainable.

Script performance

Large amounts of script do not necessarily imply poor performance. But poorly written script of any amount can kill form performance. A script that traverses the entire form hierarchy will have performance that is proportionate to the number of objects in the form. As the form grows, the script slows down. There are many ‘best practises’ for writing efficient script. It is very important to pay close attention to the contents of frequently executed loops.

Conclusion

But before you make a big investment in a form, make sure you consider the alternatives. You might be better off with a Flash form or an AIR application.  If you choose Reader/PDF, the maximum size and complexity of your form depends primarily on your own tolerances.  You need to decide whether the runtime experience is responsive enough.  You need to decide if you are getting the return on investment for your cost to develop and maintain the form.