Debug merge and layout

It has been quiet for a while. That is because I took on a more ambitious task over the last couple weeks.  The result is a new XFA/PDF debugger tool.

In the past I’ve posted samples (tools) that help users debug their merge (form dom) and view their form layout. The new tool consolidates and extends those capabilities and implements the debugger in flash — a SWF embedded in a PDF. 

This tool will be useful to anyone having problems designing dynamic forms:

  • Subforms aren’t being created from data where you expected they would
  • Layout has a mysterious blank page
  • Garbled or overlapping layout
  • Layout did not appear in the order you expected
  • leaders and trailers aren’t being created
  • leaders and trailers are created too often
  • Content seems to be missing

If you’re encountering any of these symptoms then this tool could help.

Usage

You need to be running Acrobat (not Reader) to use the tool.  And you need to be using version 9. 

To start a debug session, populate your PDF form with data and save it.  Then open the PDF from the XFADebugger.pdf tool.  You will see a snapshot of the form represented in three vertical boxes:

  • Form DOM tree
  • Data DOM tree
  • Layout — page display (with content areas rendered as grey boxes)

From there on it is hopefully self-explanatory.  You can expand/collapse the trees.  Nodes in the trees are colour coded:

  • grey — node is not bound
  • black — node is bound once
  • blue — data node is bound to more than one form node
  • green — represents a subform leader/trailer that has been added by the layout process

Selecting a node in the form tree will:

  • if bound, highlight the corresponding node in the data tree
  • if not hidden, highlight the area on the page display where that node is rendered.
  • Display any interesting attributes/properties that impact merge and layout

Selecting a node in the data tree will (if bound)

  • select all the form node(s) that are bound to this data
  • display the attributes and highlight the corresponding form node(s)

Warnings

The tool reports on suspicious conditions on the form that could impact merge or layout.  Clicking on the "Find Warnings" button will cycle through any warnings found in the form.  For each warning, the corresponding nodes are highlighted and the the warning text appears in red.

These are the warning conditions detected:

Object extends outside its parent container
Just what it says.  When the offending form node is highlighted on the page layout in dark blue, it’s parent node is highlighted in pale blue.

leader/trailer subforms must not be flowed
Leader and trailer subforms must always have positioned content.  As you might imagine having a variable-sized leader/trailer would make it pretty difficult for the layout algorithm to reserve space for the leader/trailer subform.

Subform is splittable but cannot split because the parent is not splittable
Designer also warns about this condition.

Object is growable, but not-splittable.  With enough data, it could grow too large for its content area
If an object can grow vertically without any upper limit, it could eventually grow too big to be rendered inside a content area.

Object is growable but its parent is not.  With enough data, it could grow too large for its parent
If you place a growable object inside a subform with a fixed layout size, you could end up with an object that’s too big for its container.

Keep with previous’ conflicts with ‘break after’ on previous element
Conflicting break/keep directives. (By the way, the keep will trump the break — but this is condition is a leading cause of mysterious blank pages in your output)

‘Break before’ conflicts with ‘keep with next’ on previous element
Same as previous except the other way around.

Repeating subforms should not specify keep with next or keep with previous
Having a keep on a repeating subform will result in the entire group of subforms being un-splittable.

Subforms with repeating children should be splittable
If a subform has a repeating child, it is likely to require a split.

Multiple repeating form nodes are bound to the same data. This might cause a different merge result when the form is re-opened
This takes more explanation.  Consider this template definition:

<subform name="S0"><occur min=1 max=10/><bind ref="S[*]"/></subform>
<subform name="S1"><occur min=1 max=1/><bind match=once/></subform>

When this form is first opened without any data, we will create two instances of <S> in the data — one for S0 and one for S1. Then when we save/close/reopen, subform S0 will bind to both instances of <S> and subform S1 will create a new instance of <S>. i.e. after save/close/reopen there is one more subform than there was before.  This is a form design issue that crops up occasionally and can be very confusing for novice form authors.

Rows with more than one multi-line field might have difficulty splitting

If you have a splittable table row with more than one multi-line field, you might find that it does not split.  The algorithm for splitting rows requires finding a common font baseline between rows on the sibling cells.  For current shipping product, the check for the baseline is very exact.  If there is any difference between the fields that can cause the lines to be offset slightly, then the split algorithm will not find a split point.  Some of the attributes that affect the position of the baselines include: top margin, paragraph space before/space after, line spacing, vertical justification, typeface, font size, vertical scale… and probably a couple more I haven’t thought of.

 

As is the nature of warnings, not all warnings are problems that need to be fixed.  Your form might report warnings that are innocuous. 

Here is a sample of a very badly designed form that manages to have (at least) one instance of each warning.

How the Tool Works

The sample has two parts.  There is a base PDF with a document-level JavaScript defining:
function PDFLoader() that will:

  • Select and open an XFA-based PDF
  • Extract an XML snapshot representing the state of the form after it has opened

The form has a page-sized embedded SWF which holds the implementation of the debugger.  The SWF has a button that calls the document-level JavaScript using a call to ExternalInterface.call("PDFLoader").

Once the SWF has the XML snapshot of the form, it renders it and doesn’t communicate with the base PDF anymore.

Other Uses

Educational

Loading up a form and
seeing the form/data/layout graphically displayed can help to get insight on how the merge and layout processes work.

Quality Assurance

There are two ways that this tool can be used or adapted to maintain quality in your forms. 

1) loading and viewing your dynamic form in the debugger lets you verify that merge and layout are happening as designed.  Just because your form looks ok on screen doesn’t necessarily mean that your data merged correctly or that your layout is behaving as planned.  You might be surprised by what you see.  You should make it a habit to check for warnings.

2) Adapt the XML snapshot to produce ‘gold data’ for your form.  When you are satisfied that your form is working correctly, produce a snapshot of the form that you can save as a baseline.  Then if your form gets modified — perhaps some cosmetic changes — you can compare the new snapshot to the baseline and confirm that any changes are as expected.

Futures

There are undoubtedly more form design problems that could be flagged by this tool. If you have suggestions for other conditions to detect, please let me know.

The form DOM could include more objects — instance managers and draw elements.  For now I’ve left them out because they clutter the form tree too much.

Updates

June 1, 2009

  • Fixed bug where field splittable status was reported incorrectly
  • Increased the tolerance when checking for objects outside their extent
  • Added a new warning: "Rows with more than one multi-line field might have difficulty splitting"