Tool for Summarizing Form Content

I am often asked to take a look at forms that arrive from a variety of sources — customers, quality assurance, sales engineers etc.  Often one of the first things I do is have a look at some summary information about the template.   Having a unix background, I often save the form as an XDP and poke around with some grep commands e.g.: grep "<field" form.xdp | wc -l to find out how many fields are in the form.  But certain types of information are a little difficult to coax out with unix shell commands, so I set out to do something more user friendly.

Since the template definition, is completely accessible to JavaScript, I decided to design a form that would summarize the contents of another form. The result is today’s sample.  The form uses the Acrobat APIs to load and launch a file.  Once launched, we simply iterate over the contents of the template and generate a report from what we find.  The report consists of:

  • meta data about the form: File Name, Creator, Template Version, Compatible Version, Strict Scoping setting, Static/Dynamic setting
  • Enumerate the referenced fonts (including references found inside rich text fragments)
  • List all linked and embedded images (for embedded images, indicate their base64-encoded size)
  • Count instances of plain text vs. rich text
  • Enumerate scripts, indicating which language (FormCalc or JavaScript) along with what context (event) and how many lines long
  • Binding properties — summarize what kinds of data binding are in use
  • Picture Formats – enumerate all the picture formats found in the form
  • All other properties.  For example, if you want to see how many captions are on the form, note how many times the <caption> element appears.

To use the form, you need to be running Acrobat (not Reader).  Simply press the button, select a form and wait for the report.  Note that for large forms, this can take a few seconds (sometimes more than a few :-).  I’ve attached another sample that has enough of each kind of content to generate an interesting report.  Save this form to disk and select it from the reporter form.  You’ll see each category of the report populated with data.

I have also attached a report generated from a customer form with which I have enjoyed some quality time.

The script in this form is pretty complex.  But if you’re a good JavaScript programmer you could probably extend the script to capture other information that you find interesting.