Tool for Summarizing Form Content

| 4 Comments

I am often asked to take a look at forms that arrive from a variety of sources -- customers, quality assurance, sales engineers etc.  Often one of the first things I do is have a look at some summary information about the template.   Having a unix background, I often save the form as an XDP and poke around with some grep commands e.g.: grep "<field" form.xdp | wc -l to find out how many fields are in the form.  But certain types of information are a little difficult to coax out with unix shell commands, so I set out to do something more user friendly.

Since the template definition, is completely accessible to JavaScript, I decided to design a form that would summarize the contents of another form. The result is today's sample.  The form uses the Acrobat APIs to load and launch a file.  Once launched, we simply iterate over the contents of the template and generate a report from what we find.  The report consists of:

  • meta data about the form: File Name, Creator, Template Version, Compatible Version, Strict Scoping setting, Static/Dynamic setting
  • Enumerate the referenced fonts (including references found inside rich text fragments)
  • List all linked and embedded images (for embedded images, indicate their base64-encoded size)
  • Count instances of plain text vs. rich text
  • Enumerate scripts, indicating which language (FormCalc or JavaScript) along with what context (event) and how many lines long
  • Binding properties -- summarize what kinds of data binding are in use
  • Picture Formats - enumerate all the picture formats found in the form
  • All other properties.  For example, if you want to see how many captions are on the form, note how many times the <caption> element appears.

To use the form, you need to be running Acrobat (not Reader).  Simply press the button, select a form and wait for the report.  Note that for large forms, this can take a few seconds (sometimes more than a few :-).  I've attached another sample that has enough of each kind of content to generate an interesting report.  Save this form to disk and select it from the reporter form.  You'll see each category of the report populated with data.

I have also attached a report generated from a customer form with which I have enjoyed some quality time.

The script in this form is pretty complex.  But if you're a good JavaScript programmer you could probably extend the script to capture other information that you find interesting.

4 Comments

Dear John,

this is a very useful little 'tool'. In addition to that it also shows ways in accessing different kinds of information inside a template.

Thx for your effort developing that.

Maruan

This only works with XFA forms, right?

Rick:
Yes, only for XFA forms. You could do something very similar for acroforms. It's fairly easy to get at all the field properties via the acroform object model. But I think your report would be limited to field and document properties. I don't know of a way to summarize PDF page content from the scripting API.

John

I was contacted by Bruce (a regular commenter on this blog) with a couple of suggestions for the reporter form. I have updated the sample with these changes:
- Changed script at line 312 of the LoadAndReport click event -- explicitly reference the rawValue property
- Access the value property of script using ["#value"] syntax so that we don't conflict with script objects that declare a variable called "value"
- Changed the report fields to be non-interactive so that the report is generated to page content and can be copied/pasted

John

Leave a comment

About this Entry

This page contains a single entry by John Brinkman published on March 25, 2009 12:07 PM.

XFA 3.0: presence="inactive" was the previous entry in this blog.

XFA 3.0: Event Propagation is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.