Posts in Category "Data Binding"

Debug merge and layout: Updated

Almost two years ago I published a blog post with a tool that I developed to help debug data merge and layout problems: http://blogs.adobe.com/formfeed/2009/05/debug_merge_and_layout.html

I continue to use this sample pdf on a regular basis.  It has a reserved location among my desktop icons for quick access.  But occasionally I’d run into PDF forms that cause it to hang.  So finally I took the time to find and fix the problem.  While I was at it, I took care of a couple other minor issues.  The updated form is available for download here.  I hope you find it useful.

Multiple Top Level Subforms

I’m willing to bet that all the XFA form definitions you’ve looked at all have one top level subform.  Not surprising, because that’s what Designer
allows you to author.  But according to the XFA specification, there can be any number of top-level subforms below the <template> root.

Today I’ll briefly describe the processing rules for top-level subforms, provide a simple example where you might find them useful, and give you a couple of macros to make these easier to work with in Designer.

The Processing Rules

The rules are simple:

  • When there are multiple top-level subforms, only one is rendered.
  • We choose which subform to render based on which one best matches the data.  i.e. the template behaves like a choice subformSet: <subformSet relation="choice">. 
  • If no data is supplied, we render the first subform below the <template> element

A Sample

Suppose I want different variations of my form for printing and for interactive.  I construct my template so that it has top level subforms for both print and interactive.  Since the binding is all "by name", the form will render according to the name of the top level node in the data.  Try previewing the sample with these data files: interactive.xml, print.xml — or import these data files in Acrobat, and you’ll see the form change according to the data provided.  However, controlling print behavior with different data files is more applicable on the server than on the client. 

In order to force the form to use the print subform when printing from Acrobat/Reader, I added a pre-print event to the interactive root subform.  The script renames the top level node in the data, and forces a remerge.  This will cause the print operation to use the print subform.

interactive::prePrint – (JavaScript, client)
xfa.record.name = "print";
xfa.form.remerge();

Then to restore to the interactive view, I added a postPrint event to the print subform:

print::postPrint – (JavaScript, client)
xfa.record.name = "interactive";
xfa.form.remerge();

Design Experience

I’ve already hinted that Designer doesn’t really support multiple top-level subforms. But with a couple of handy macros, we can get by.

First problem is that there’s no easy way to create a new top level subform.  Here’s a macro to proide that functionality.  You’ll notice when you run the macro that there’s a problem.  The changes made by the macro don’t show up in the hierarchy.  This is a bug that will be fixed next release.  Meanwhile, closing and re-opening the form will workaround the issue.  Note that the macro to add the new top level subform also inserts a default master page (<pageSet>). Designer and the XFA runtime are generally very unhappy if they don’t have a set of master pages to work with.

Next problem is to figure out how to edit the new top level subform in Designer.  Here’s a macro to solve that problem. Since Designer always renders the first subform, this macro re-orders the top-level subforms so that you get a different top-level subform. Specifically, the macro re-orders the first subform to be last.  Just make sure that when you save the form that your preferred default subform is first in order.

Now, for the specific example of print and interactive variations, I could have implemented the solution in a number of different ways.  e.g. with a choice subformSet or with optional subforms at the second level in the hierarchy.  But the point was to bring to light some functionality that you might not otherwise been aware of.

Form Stitching Design Pattern

If the term "Stitching" is new to you, it refers to server applications that take the content of multiple XDP files and combine them into a whole.  While this sounds like fragment resolution, there is a twist in that the content to be combined isn’t known until the document is requested.  The most common scenario is where a company has a series of individual forms that can be delivered stand-alone or may be included as sub-forms in a larger package.  Package in this context is not a portfolio or PDF package, but rather, a sequence of templates combined serially into one big form.

Historically there have been at least two ways to implement stitching:

  • Stitching implemented by professional services
  • Solution Accelerator (ODA)

Then in LiveCycle ES2 we delivered fragment stitching in LiveCycle Assembler.

Going forward, the Assembler version is the one customers should be targeting, as it is fully supported and will be the solution that
receives enhancements.

Building a system where we can stitch templates together requires some careful planning.  Random forms cannot be arbitrarily stitched together. Any attempt to
do so will inevitably run into conflicts regarding scripting, master
pages, schemas, submit logic, validations etc. In order for individual
forms to be combined into a stitched package, the individual forms must
follow a design pattern that allows them to participate in the final
grouping.

Terminology

In order to explain the stitching process and the necessary design patterns, it helps to start by using some terminology consistently.

  • individual form – a template that can be used as a standalone form
    or can be combined with other individual forms into a stitched package
  • stitched package – a collection of individual forms combined into a
    single, large XDP/PDF. Not to be confused with portfolios or PDF
    attachments
  • host template – the XDP that forms the root of the stitched
    package. Individual forms are inserted into the host template to form
    the stitched package
  • root subform – the subform that appears under the template element in an XDP definition.
  • stitched subform – the subform that will be extracted from individual forms to be included in a stitched package

Stitching Requirements

Some of the common requirements that we see in customer implementations of stitching:

  1. Individual forms must retain their own master page associations
  2. Shared script objects must not be duplicated when individual forms are combined
  3. Individual forms must be able to retain their own form-specific logic
  4. Submit logic for individual forms must collapse into a global
    submit capability in the resulting stitched package. i.e. submit
    buttons from individual forms must be hidden when part of the larger
    package.
  5. We must be able to include multiple copies of any individual form into a stitched result
  6. The number of individual forms included in the final package needs to be arbitrary – no pre-defined limits

The Form Design Pattern

Host Template

The host template will form the root of the stitched package. It
will define all properties that are global to the form. e.g. default
locale, target Reader version, Server PDF render format, form state
setting, meta data etc.

Individual Form

Each individual form must have compatible settings to the host
template, especially: same target Reader version, same
originalXFAVersion.

Common Schema

All individual forms to be combined into a stitched package must be
based on the same XML schema (or must all be designed without a schema). The host template will include the
connectionSet for this shared schema. If an organization has multiple
schemas, they must either combine them into a single uber-schema or they
must maintain one host template for each schema that is in use.

Shared WSDL definitions

Since we do not stitch together connectionSets, the host
template must also include the aggregate set of WSDL definitions used by
individual forms.  It is necessary that individual forms use consistent names when
referencing XML schemas and WSDL definitions. This is especially true
for WSDL definitions where the names are used in script and in template
syntax.

Individual Form Hierarchy

Individual forms must be designed in such a way that so that all
content to be included in the package resides under a single subform
included under the root subform. This second-level subform is the
stiched subform – the content that will be extracted from the individual
form and included in the stitched package.
For reference, here is a sample hierarchy of an individual form:
Designer hierarchy view with a root of "purchaseOrderGroup" with children: script objects and PO_Portrait subform.  PO_Portrait subform has children that include master pages.
In this example, the subform named “PO_Portrait” is the stitched subform that will be included in the final stitched package.

Shared logic

Any logic that is common to multiple individual forms should be
included in the individual forms as a child of the root subform (as a
sibling to the stitched subform). The host template must include a copy
of this shared script. In the example above, the “countryScript”
script object would not be included with the content of the individual
form, but since it is present in the host document, the logic inside
PO_Portrait will continue to work in the package context.

Form-specific logic

If there is logic that is unique to an individual form, it must be
included as a descendant of the stitched subform. This way, it will be
extracted with the stitched subform into the stitched package.

Stitched subforms must define master page definitions as descendants

As long as the master pages used by a stitched subform are
descendants of that subform, they will be included in the final stitched
package. Details at nested master pages.

Note as well that for multiple master page collections to work in the
stitched package, all individual forms should use a consistent printing
option. One of: Print on Front Side Only/Print on Both Sides/Page
Occurrence.

The stitched subform must explicitly target its nested master page.
In the example, the pagination option for PO_Portrait is to place: ‘On
Page “PO_PortraitPage” ‘.

The stitched subform must be a fragment

picture of the Designer dialog used to set the properties of a subform so that it is an inline fragment.
From the context menu in Designer, Fragments/Create Fragment…
Choose “Create New Fragment in Current Document”. The Name you choose
will be the name that is referenced by Assembler. It can be either a
name that is unique to the individual form, or you can use the same name
for all individual forms. re-using the same name will make the
Assembler DDX syntax simpler.

The host document needs an insertion point

Create a nameless subform directly below the root subform. Use this subform to define a named insertion point:
picture of the dialog box used in Designer to define an insertion point at a subform.
Note that the insertion point subform must not specify any properties –
name, width, height etc, or else these will override the properties of
the fragment that will be inserted. In the sample, the XML for the
insertion point subform is very simple:

<subform>
   <extras>
      <text name="insertionPoint">POSchemaContent</text>
   </extras>
</subform>

Note that because of the way Assembler processes insertion points, a
single insertion point can be used to insert an arbitrary number of
individual forms.

Control the presence of submit buttons

In the form properties of the host template, create a variable to
indicate the document is a stitched package. The example uses
“IsPackageDoc” and gives it a value: “1″. In the individual forms, put
logic on the submit button initialize event that looks like:

if (IsPackageDoc && IsPackageDoc.value === "1") {
	this.presence = "hidden";
}

This logic will ensure that the button is visible when the form is an individual form, but hidden when part of a package.

Make sure your schema can repeat

If you want to include more than one copy of any individual form, you
need to make sure that the schema element the stitched subform binds to
is allowed to repeat. This might require wrapping the root element of
your schema in an aggregating subform.

A starter template

For customers who have the luxury of designing these solutions from
scratch, it is wise to create a starter skeleton individual form
template that includes the shared logic, second level subform, nested
master pages, corporate schema, WSDL definition(s) and submit button.
This reduces the need for form authors to remember all the necessary
requirements that are part of the design pattern.

The Assembler Step

In assembler we use the XDPContent command to insert individual forms
into the host template. e.g. our sample below uses this DDX:

<DDX xmlns="http://ns.adobe.com/DDX/1.0/">
 <PDF result="poPackage">
  <XDP>
   <XDP source="hostXDP" retainInsertionPoints="All" baseDocument="true"/>
   <XDPContent insertionPoint="POSchemaContent" source="sourceXDP1" fragment="PO_Portrait"/>
   <XDPContent insertionPoint="POSchemaContent" source="sourceXDP2" fragment="PO_Landscape"/>
   <XDPContent insertionPoint="POSchemaContent" source="sourceXDP3" fragment="PO_Comments"/>
  </XDP>
 </PDF>
 <?ddx-source-hint name="hostXDP"?>
 <?ddx-source-hint name="sourceXDP1"?>
 <?ddx-source-hint name="sourceXDP2"?>
 <?ddx-source-hint name="sourceXDP3"?>
</DDX>

A sample

Here is an example an that follows the design pattern described in this note. An explanation of the included files:

  • Purchase Order.xdp – an individual form
  • ExtraComments.xdp – An individual form
  • Purchase Order Landscape – An individual form
  • Purchase Order.tif – A referenced image
  • Purchase Order Dim.tif – referenced image
  • Purchase Order Group.xsd – the uber schema
  • Purchase Order Host.xdp – the host template
  • stitch.ddx.txt – the Assembler DDX definition used to stitch
  • result.pdf – what it looks like when assembled.

The Deep End

If you’re interested in understanding the details on how insertion points work, here’s some copy/paste from the specification:

The Algorithm

Requirements:

  • The stitching capability will allow us to “push” fragments into
    specific locations (insertion points) within a host document based on
    logic that exists outside the document.
  • In cases where the number of inserted fragments is not known in
    advance, we need to be able to push an arbitrary number of fragments
    into a single stitch point.
  • The host template may have placeholder content that needs to be removed

In preparation for stitching, these grammar rules will be defined:

  1. Add “insertion points” in the host template – places where external content may be inserted
  2. Add “placeholder content” in the host template so that it is visible in Designer, but excluded from the final stitched result
  3. The LiveCycle assembler component grammar (DDX) will have a command (XDPContent) for inserting a fragment to a insertion point

A insertion point will be a mnemonic defined using <extras> named: “insertionPoint“. E.g.:

<extras>
   <text name="insertionPoint">TermsAndConditions"<text/>
</extras>

Placeholder content will be marked with a mnemonic named: “insertionPointPlaceholder”

When Assembler executes a stitch, it will:

  1. Find all insertion points that match the target attribute of the XDPContent command
  2. For each insertion point, clone the XFA element containing insertion point
  3. Replace the insertion point syntax with the syntax of a fragment
    reference. If the element already has a fragment reference, over-write
    the existing reference.
  4. Remove any content marked as being a placeholder

When Assembler has completed it will execute a post-process:

  1. Remove any elements that still have a remnant insertionPoint extras element.
  2. Resolve all fragments

Example

Host template:

<subform name="TAC">
  <extras>
     <text name="insertionPoint">TermsAndConditions<text/>
  </extras>
  <setProperty target="font.typeface" ref="$record.Style.Font"/>

  <draw>
    <value><text>This is temporary placeholder content</text></value>
    <extras>
      <text name="insertionPointPlaceholder"/>
    </extras>
  </draw>
</subform>
                  

The assembler DDX definition would include an XDPContent command to inject a fragment reference at this subform:

<PDF result="final.pdf">
   <XDP source="docin.xdp">
      <XDPContent insertionPoint="TermsAndConditions" >
         <XDP source="tac.xdp" fragment="Alabama"/>
      </XDPContent>
   </XDP>
</PDF>

The result after processing the XDPContent command:

<subform name="TAC" usehref="TAC.xdp#som($template.Alabama)">
  <setProperty target="font.typeface" ref="$record.Style.Font"/>
</subform>

<subform name="TAC">
  <extras>
     <text name="insertionPoint">TermsAndConditions<text/>
  </extras>
  <setProperty target="font.typeface" ref="$record.Style.Font"/>
  <draw>
    <value><text>This is temporary placeholder content</text></value>
    <extras>
      <text name="insertionPointPlaceholder"/>
    </extras>
  </draw>
</subform>
      

After all XDPContent commands have executed, we will:

  1. remove any unresolved elements with insertion points
  2. expand fragment references

 

Editable Floating Fields V3

I had a user report a bug in the floating field sample  — the sample described in these blog entries: Version1   Version2

I’ve updated the code again.  The specific problems fixed were:

  1. The editing field can now be unbound (binding="none"). This is important in cases where you are using an xml schema and there isn’t data that you can bind to the editor field.
  2. There was a problem with preserving trailing spaces in edited values.  The easiest fix was to strip trailing spaces from the edited values.

Here is the updated form.  And the updated fragment.

Shared Data in Packages Part 2

Last week I started to outline a design pattern for sharing data in a package.  Again, the general problem we’re trying to solve is to allow multiple forms in a PDF package exchange data — where fields common to multiple forms is propagated.  Part 1 of the problem is establishing a communication mechanism between documents.  Once the documents have been disclosed to each other, any document in the package can modify any other document.  Today’s Part 2 entry describes how to do the sharing.

Data mapping strategies

Implementing data sharing means solving a mapping problem: How do we know which fields in one PDF correlate to which fields in other PDFs? There are several techniques we could choose, including:

  1. Use the same name for common fields
  2. Generate a manifest that explicitly correlates fields
  3. Base all forms in the package on a common data schema

For my solution, I’ve chosen the 3rd option: common schema.  The underlying assumptions are:

  • Field values get propagated by assigning values in the data dom — a data value from one form will have the same data node address in each of the other forms
  • The data dom in the package document will hold the aggregate data of all forms in the package — this is the ‘master copy’ of the data.  Note that the package document does not have to have fields correlating to all the data elements.
  • Attached forms will have their data synchronized from the master copy of the data when they are launched in Reader

Data Sharing Algorithm

With these assumptions in place, the actual algorithm is fairly simple:

  1. When any attachment opens, it registers with the package document
    (the topic of Part 1)
  2. At registration, all data values of the attachment are synchronized from the master copy of the data in the package
  3. While the attachments are open, detect when field values change.  The technique used to detect field value changes is to use propagating enter/exit events. We save the field value at field enter, and compare it to the result at field exit. 
  4. When a field value changes, send a synchronize message to the package document.
  5. When the package document gets a synchronizing message, it updates the master copy of the data and then propagates the change to all other open attached PDFs.

One benefit of this approach is that the actual logic used to synchronize the data resides in the package document. This means you can customize your data sharing algorithm by modifying only the script in the package.

Design Experience

The really good news is that you can put this all together in a very simple design experience.  With today’s sample, there are two fragment subforms: packageSync and embeddedSync. You can probably figure out where they each go.  The fragment subforms have all the logic needed to register and synchronize the documents.  They contain propagating enter/exit events so that all fields in the form are automatically synchronized.  So the Design experience is as simple as:

  1. Drag the packageSync fragment on to a package subform
  2. Drag the embeddedSync fragment on each each attachment form
  3. Attach the embedded forms to the package form

Global Submit

Since the package document holds the aggregate of all the data, a global submit operation can be achieved by simply submitting the data from the package document.

Limits

There are some limits to the synchronization that you should be aware of:

  • No handling for complex field value: rich text, images, multi-select choice lists
  • Does not synchronize subform occurrences
  • Does not synchronize calculated values
  • Because we are relying on propagating events, the solution works only in Reader 9.1 or later

Each of these problems are solvable. I was just too lazy.

The Sample

The sample form and all fragments are in this zip file. Try opening PackageSample.pdf and one or both attachments.  Fill in fields and observe that they get propagated.  Note that any field that has focus will not get updated until you exit the field.

Shared Data in Packages Part 1

In a previous post I warned about using doc.disclosed = true; There’s just too much risk from some rogue form that you might happen to open.  But yet, there are some very compelling applications we can develop if/when we could selectively disclose a document to another trusted PDF.  The application I have in mind is where we share access to the various documents that are open in a package.

I want to be able to propagate shared data between a package document and its attached forms.  When multiple forms capture the same data (e.g. name, address) you’d like to be able to capture it once and have the values automatically shared with each form in the package.

The shared data problem had two parts:

  1. Establishing a trusted connection between a package document and its
    embedded documents
  2. Propagating field values between documents

The first problem was very difficult.  It’s the topic of today’s post.
There will be a part 2 where I provide a solution for the data sharing
mechanics.

Today? We’re pretty much in the deep end.

Opening an embedded document

A package could establish communications with its embedded documents by opening each of them using: doc.openDataObject().  But this solution doesn’t scale.  Your package could have dozens of embedded forms.  We can’t assume it works to open all of them.  Eventually we’ll run into performance problems.

What we want is to get a handle to the embedded documents that the user has launched.  And there is nothing in the Acrobat object model that will tell you which of your child documents are open.

Step 1.  Modify the app object

We need a technique where the host/package document can expose an API that selectively discloses itself — just to its children.  The way to expose an API that all open documents can share is to modify the app object.  e.g. if I write
this code in one document:

app.myfunc = function() {return “hello world”;};

then any other open document can call app.myfunc();

Step 2.  Add a disclosedPackages object

The code outline belows shows how to add a disclosure API to the app object:

// At docReady, the host document/package adds a package 
// disclosure function/object to the acrobat app object 
var fDiscloseObject = new function() { 
    // private list of disclosed package functions 
    var disclosedPackagesList= [];
    this.disclose = function(packageDoc) { ... } 
    this.undisclose = function(packageDoc) { ... } 
    this.findPackage = function(embeddedDoc) { ... } 
}
if (typeof(app.disclosedPackages) === "undefined") { 
  app.disclosedPackages= fDiscloseObject; 
} 

After executing this code, any PDF can call:

app.disclosedPackages.disclose(packageDoc); 
app.disclosedPackages.undisclose(packageDoc); 
app.disclosedPackages.findPackage(embeddedDoc);

The specific usage:

On docReady, a package will disclose itself to its attachments by calling: app.disclosedPackages.disclose(event.target);

On docReady an attached document will try to locate its package document by calling:

app.disclosedPackages.findPackage(event.target);

Once it has a handle to its package document, it can call synchronize methods found in the package.

At docClose the package will call: app.disclosedPackages.undisclose(packageDoc);

Step 3. Validate an embedded child

The app.disclosedPackages object maintains a private
list of disclosed package documents.  We need to selectively disclose the package to any
of the children that calls app.disclosedPackages.findPackage(embeddedDoc);

The problem is: How do we determine whether the candidate document is
actually a child?

There are a couple of tests we apply:

1) Check whether the candidate path is consistent with an attachment.  The doc.path property of an
embedded PDF is constructed to look like:

|<packagedocpath>|U:<byte-order-mark><childfilename>

Loop through the parent’s embedded objects (doc.dataObjects) and look for any that have the same path as the candidate object.

Once we’re satisfied that the paths match we perform test 2

2) The package opens the embedded object that has a path matching the candidate. The package now has two doc objects and needs to confirm that they are the same.  The technique we use is to modify one and see if the modification shows up in the other.

Step 4. Spoof-proof the code

Unfortunately we live in a world where we have to take extra precaution to protect ourselves from hackers.  After all, that’s why we don’t use doc.disclosed in the first place.  In step 2 we added a JavaScript API accessible to every PDF open on our system.  How do we ensure that this API cannot be replaced by malicious JavaScript? After all, if we modified the app object, another document could overwrite our code with their own script.

Check your source

var fDisclose= new function() {
    // private list of disclosed package functions 
    var disclosedPackagesList = [];
    this.disclose = function(packageDoc) { ... }
    this.undisclose = function(packageDoc) { ... }
    this.findPackage = function(embeddedDoc) { ... }
 } 
if (typeof(app.disclosedPackages) === "undefined") {
    app.disclosedPackages = fDisclose; 
}
// Make sure the disclosedPackagesfunction has not been 
// replaced/spoofed 

if (app.disclosedPackages.toSource() === fDisclose.toSource()) {
     app.disclosedPackages.disclose(event.target); 
}

Note that we disclose ourselves only after making sure the source is the same as the original.

The clever JavaScript coder will point out that the toSource() method can be overridden. Not so.  The Acrobat JavaScript implementation does not allow overriding the toSource() and toString() methods of objects.

Naming Conflicts

Something to be careful about is managing different versions of this code.
If there are two variations of this code in different PDFs and both use the same name (app.disclosedPackages), one of them will fail.  So if you write your own version of this, use a unique name.  Better yet, use a unique name that incorporates a version number so you can manage the code over time.

What? No Sample?

Next post.  I promise.

Duplicating subform structures

This week I was asked to look at a problem where the customer wanted to have two sets of repeating subforms reference the same data.  With normal data binding techniques this doesn’t work. Data may be bound to only one subform. The solution in this case was to bind one set of subforms to the data, and synchronize a second set of subforms to the data using JavaScript.

In the attached sample, the first set of subforms is used for data collection, the second set is used to present the data as prose. In other words, we have two views of the same data: one for editing, one for final presentation. Of course, in many scenarios this would be done as two separate forms.  But we can make it work as a single form.

In the sample, the subform named "Me" is a hierarchy of subforms that binds to data that looks like:

<Me>
    <Fname />
    <LName />
    <BirthDate/>
    <Gender/>
    <Spouse>
        <Fname/>
        <LName/>
        <BirthDate/>
    </Spouse>
    <Dependent> <!– repeats –>
        <Fname/>
        <BirthDate/>
        <Gender/>
        <Hobby> <!– repeats –>
            <HobbyName/>
        </Hobby>
        <Pet> <!– repeats –>
            <PetType/>
            <PetName/>
        </Pet>
    </Dependent>
</Me>

The sample includes a set of repeating subforms that use "normal" (named-based) binding to connect to the data.  The second set of subforms and fields ("Summary") are declared with no data binding.  The Summary subform has a calculation script that synchronizes its descendent objects to the data. 

The calculation script iterates through the data under "Me", looking for fields and subforms that have the same name as the data. 

  • Where there’s a name match to a field, we assign the value. 
  • When there’s a name match on a subfom we recurse one level deeper into the hierarchy. 
  • When there’s a grouping element in the data (a dataGroup) and no corresponding subform, we look for an instance manager so that we can create a subform instance to match to the data. 
  • When there are optional subforms left over that don’t match data, we remove them. 
  • We keep track of which subforms and fields have been "bound’ to data so that the same objects don’t bind twice.
  • Because the logic is in a calculation, it will re-fire whenever the referenced data changes (due to dependency tracking)

There are other ways we could have synchronized data to a set of subforms, but this technique has some advantages:

  1. The logic is generic and can be applied to any form.
  2. We can add and remove fields/subforms without worrying about any bookeeping to match the objects to data.  The logic is centralized and will work as long as the form author uses names that match data.

The only caveat is that this solution won’t scale all that well to very large forms. The re-binding script runs every time a data value changes, and is looping through all the data nodes and form objects.  As the data set grows large, this logic will begin to slow down.

Parameterize your SOAP address

The problem for today is how to change the SOAP address used by a PDF form. The common scenario is that users want to have one SOAP address when developing and testing their form, but a different address when deploying it.  In fact, depending on your server configuration, you may want to deploy the same form on different servers with different SOAP addresses.

Rather than change the form for each server we deploy to, we prefer to parameterize the SOAP address.   There are two main challenges for solving this problem:

  1. In a form definition, the WSDL/SOAP definition is read-only
  2. For forms that deal with sensitive data, any SOAP addresses need to be covered by certification in order to prevent any ‘man-in-the-middle’ attacks

Weather Report

For today’s sample, I used a weather report SOAP service provided by www.webservicex.net.  This service: http://www.webservicex.net/WCF/ServiceDetails.aspx?SID=50 allows you to get a six day forecast for US cities based on a zip code.  Here is the corresponding WSDL.

For more background on using a WSDL definition from a PDF form you can check out this previous entry.

Cloning the Connection

There are two ways to execute a SOAP operation:

  1. With declarative XFA markup (the <execute> element)
  2. With a script call on the connection object (connection.execute())

By default, Designer will generate the <execute> markup.  If you want to modify the SOAP address, you need to use the script version. To get past the read-only restriction on connection objects, we first clone the connection.  With a read/write cloned connection we can change the SOAP address.  The code looks like this:

var vConnection = xfa.connectionSet.WeatherForecast.clone(true);
vConnection.soapAddress.value = "<revised soap address>";

Protecting the Address

In our scenario where we design the form once and want it to work with multiple SOAP addresses, we need to figure out how to pass the address to the form.  The obvious solution is to embed the SOAP address in the form data.  This strategy works fine if your form is accessing a weather service.  But if your form is interacting with a financial institution we need something more secure.  We need to deliver the form and the SOAP address in a manner that cannot be tampered.

For sensitive forms we recommend certification.  A certified form cannot be modified without the end user becoming aware that the form definition has changed. 

Parameterized Submit URL

We’ve previously solved a similar problem by allowing submit URLs to be parameterized.  The strategy in this case was to embed an array of submit URLs in the config data.  Then in the form data we inject an index value to indicate which submit URL to use.  The config grammer looks like:

<config xmlns="http://www.xfa.org/schema/xci/2.8/">
   <present>
      <submitUrl>http://service1.submit.net/</submitUrl>
      <submitUrl>http://service2.submit.net/</submitUrl>
      <submitUrl>http://service3.submit.net/</submitUrl>
      …
  
</present>
</config>


Then in our form data we add a data element that selects the appropriate URL:

<xfa:datasets xmlns:xfa=”http://www.xfa.org/schema/xfa-data/1.0/”>
  <variables xmlns=”http://ns.adobe.com/server-context-data/”>
    <
submitUrlIndex>2</submitUrlIndex>
 
</variables>

</xfa:datasets>

With submit URLs, this all works automatically.  When you leave the submit URL unspecified in your form definition, Reader will automatically look up the submitUrl from config.

Reduce, Re-use, Recycle

We were tempted to add new grammar and new code to Reader to do the same for SOAP addresses as we did for submit URLs — but we thought better of it.  We can make this work for SOAP addresses by re-using the submitUrl grammar and adding a bit of JavaScript to do the lookup.

Assuming we now populate the config <submitUrl> grammar with SOAP addresses, we’ll use a new data variable for indexing:

<xfa:datasets xmlns:xfa=”http://www.xfa.org/schema/xfa-data/1.0/”>
  <variables xmlns=”http://ns.adobe.com/server-context-data/”>
   <soapUrlIndex connection="WeatherForecast">2</soapUrlIndex>
 
</variables>

</xfa:datasets>

Of course, this won’t work automagically as it does with submitUrl.  We need to write the script to retrieve the SOAP address.  That’s about 25 lines of script.  Have a look at the click event of the button in the sample form to see how this works.

The Deep End

Did you notice all those nice weather images that show up in the forecast?  You might have assumed that the image data was returned to us by the SOAP call.  Well… not exactly.  The SOAP response gave us the image URLs.   And as you know, a PDF cannot load an external image.  The solution is to embed all the images in the PDF. Yep, all 339 images used by the SOAP service are embedded in the PDF.  Fortunately they’re pretty small images.  In a previous post I talked about the benefits of linked images: Linked vs Embedded Template Images

There is a nice trick to getting all those images embedded.  When you are generating your form on the server, your form data may contain URL references to images.  The syntax looks like:

<img xfa:contentType="image/jpg"
                                href="http://forecast.weather.gov/images/wtf/blizzard.jpg"/>

When we create the PDF on the server (or in Designer preview) all image hrefs found in the data are embedded in the PDF and are indexed by their URL.  When the SOAP service returns an image URL to us, we simply assign that value to an image field and the field automatically gets connected to the image embedded in the PDF.

vNewDay.weatherImage.value.image.href = vDay.WeatherImage.rawValue;

Have a look at the form:ready script on the zip code field for the script that works with the SOAP response.

Here is the data file for pre-loading all those images.  Be warned that using this data file will make your PDF generation *very* slow.

Updated Form Debugger

Last year I posted a sample form that was useful for debugging the internals of dynamic forms: Debug merge and layout.  In the mean time, I’ve discovered a couple bugs.  Time for a maintenance release.  The main changes are:

  • Does a better job finding the coordinates of objects — anything inside a subformSet, fields inside exclusion groups.
  • Reports the breakBefore and breakAfter properties correctly
  • General code cleanup and a bit of optimization

As I’ve mentioned before — this tool is useful not only as a way to debug your merge/layout, but also a good way to visualize what’s going on inside the the XFA processor.

Here’s the updated debugger form.

Null Data Handling

We run into scenarios where we want to control how null values are represented in our XML instance data.  There are three different ways that null data can be represented in XML data:

  1. Exclude the element.  If the value has no data, do not write it out to the XML file
  2. Use XML Schema’s nil attribute:
    <spouseName xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                                                                                                  xsi:nil="true"/>
  3. Write out an empty element.  e.g. <spouseName/>

Null Handling in XML Schema

When determining which strategy to use, we look to the form’s XML Schema for hints:

  1. If a leaf element (no child elements, no attributes) is marked as optional (minOccurs="0") then it will be excluded when null.
  2. If an element in the schema is marked as nillable="true", then the data will be marked with the xsi:nil attribute
  3. In all other cases we write out null values as empty elements

For the rest of this blog entry I’ll describe exactly how the XFA processor deals with null data, and offer some tips on how you can further customize the behaviour.  In particular, I’d like to show how to control null handling at the data group level.  

Let’s look at the problem using a specific example.  Here’s a sample purchase order schema:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="purchaseOrder" type="poType"/> <xsd:complexType name="poType">
<xsd:sequence>

  <xsd:element name="emptyItem" minOccurs="0">
   <xsd:complexType>
    <xsd:sequence>
     <xsd:element name="partNum" type="xsd:string"/>
     <xsd:element name="description" type="xsd:string"/>
     <xsd:element name="quantity" type="xsd:positiveInteger"/>
     <xsd:element name="unitPrice" type="xsd:float"/>
    </xsd:sequence>
   </xsd:complexType>
  </xsd:element>

  <xsd:element name="excludeItem" minOccurs="0">
   <xsd:complexType>
    <xsd:sequence>
     <xsd:element name="partNum" type="xsd:string"/>
     <xsd:element name="description" type="xsd:string"/>
     <xsd:element name="quantity" type="xsd:positiveInteger"/>
     <xsd:element name="unitPrice" type="xsd:float"/>
    </xsd:sequence>
   </xsd:complexType>
  </xsd:element>

  <xsd:element name="xsiItem" minOccurs="0">
   <xsd:complexType>
    <xsd:sequence>
     <xsd:element name="partNum" type="xsd:string"/>
     <xsd:element name="description" type="xsd:string"/>
     <xsd:element name="quantity" type="xsd:positiveInteger"/>
     <xsd:element name="unitPrice" type="xsd:float"/>
    </xsd:sequence>
   </xsd:complexType>
  </xsd:element>

  <xsd:element name="comment1" type="xsd:string" minOccurs="0"/>
  <xsd:element name="comment2" type="xsd:string" nillable="true"/>

</xsd:sequence>
</xsd:complexType>
</xsd:schema>

 

Note the two comment elements at the bottom.  When comment1 is saved to data, it will be excluded when null.  When comment2 is saved to data it will be annotated with the xsi:nil attribute.

But what some form authors want is for empty groups (purchase order items in our example) to be excluded when their contents are null.  I can show you how — but we are now getting into the deep end.

Data Description

When we save a PDF/XDP file that is based on a schema or sample XML or WSDL connection, we generate a data description.  This data description is really a distilled version of the schema.  It takes the form of a sample XML annotated with special namespaced attributes.  The data description for the sample above looks like:

<xfa:datasets
      xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/">
<dd:dataDescription
      xmlns:dd="http://ns.adobe.com/data-description/"
      dd:name="purchaseOrder">
  <purchaseOrder>

   <emptyItem dd:minOccur="0">
    <partNum/>
    <description/>
    <quantity/>
    <unitPrice/>
   </emptyItem>

   <excludeItem dd:minOccur="0">
    <partNum/>
    <description/>
    <quantity/>
    <unitPrice/>
   </excludeItem>

   <xsiItem dd:minOccur="0">
    <partNum/>
    <description/>
    <quantity/>
    <unitPrice/>
   </xsiItem>

   <comment1 dd:minOccur="0" dd:nullType="exclude"/>
   <comment2 dd:nullType="xsi"/>

  </purchaseOrder>
</dd:dataDescription>
</xfa:datasets>

 

Note how the comment elements are annotated.  We use dd:nullType to specify the null handling behaviour:

  1. dd:nullType="exclude" don’t write out elements where the value is null.  Note that this option may be used only if the element is marked in the schema as optional (minOccur="0"). 
  2. dd:nullType="xsi" Use the XML schema ni
    l attribute.  As described in the W3C definition:

    "XML Schema: Structures introduces a mechanism for signaling that an element should be accepted as valid when it has no content despite a content type which does not require or even necessarily allow empty content. An element may be valid without content if it has the attribute xsi:nil with the value true. An element so labeled must be empty, but can carry attributes if permitted by the corresponding complex type."

  3. dd:nullType="empty" (the default) save null values as empty elements

The dd:nullType attribute can also be placed on grouping elements.  When it is placed on a grouping element, then the setting gets applied to all children of the group.  While there is nothing in XML schema that will do this for us, we can do it by hand-editing the item elements in XML source view:

<emptyItem dd:minOccur="0" dd:nullType="empty">

<excludeItem dd:minOccur="0" dd:nullType="exclude">

<xsiItem dd:minOccur="0" dd:nullType="xsi">

And here is the obligatory sample form.  And the sample schema.  The sample form is bound to the sample schema, and has a field that shows the current state of the XML instance data.  Try typing values into the 3 different partnum fields and the comment fields, and watch how the data reacts according to the instructions in the data description.

The Really Deep End

The disadvantage to hand-editing your data description is that when your schema changes and the data description gets refreshed, your edits will be lost.

There is a way that you can write script to update your data description automatically.  The general technique is described at this blog post.  The script below will update all your data descriptions and for every element that is marked minOccur="0", we’ll add nullType="exclude".  Note that the changes will be applied when saving as PDF or when generating a PDF using LiveCycle.

purchaseOrder::initialize - (JavaScript, server)
if (xfa.host.name === "XFAPresentationAgent") {
    var vDataDescriptionList = xfa.datasets.dataDescription.all;
    for (var i = 0; i < vDataDescriptionList.length; i++) {
        var vDD = vDataDescriptionList.item(i);
        var sDD = vDD.saveXML();
        sDD = sDD.replace(
             /dd:minOccur= "0" (?!dd:nullType="exclude")/g,
             "dd:minOccur=\"0\" dd:nullType=\"exclude\" ");
        xfa.datasets.nodes.remove(vDD);
        xfa.datasets.loadXML(sDD, false, false);
    }
}