Tool for Summarizing Form Content

I am often asked to take a look at forms that arrive from a variety of sources — customers, quality assurance, sales engineers etc.  Often one of the first things I do is have a look at some summary information about the template.   Having a unix background, I often save the form as an XDP and poke around with some grep commands e.g.: grep "<field" form.xdp | wc -l to find out how many fields are in the form.  But certain types of information are a little difficult to coax out with unix shell commands, so I set out to do something more user friendly.

Since the template definition, is completely accessible to JavaScript, I decided to design a form that would summarize the contents of another form. The result is today’s sample.  The form uses the Acrobat APIs to load and launch a file.  Once launched, we simply iterate over the contents of the template and generate a report from what we find.  The report consists of:

  • meta data about the form: File Name, Creator, Template Version, Compatible Version, Strict Scoping setting, Static/Dynamic setting
  • Enumerate the referenced fonts (including references found inside rich text fragments)
  • List all linked and embedded images (for embedded images, indicate their base64-encoded size)
  • Count instances of plain text vs. rich text
  • Enumerate scripts, indicating which language (FormCalc or JavaScript) along with what context (event) and how many lines long
  • Binding properties — summarize what kinds of data binding are in use
  • Picture Formats – enumerate all the picture formats found in the form
  • All other properties.  For example, if you want to see how many captions are on the form, note how many times the <caption> element appears.

To use the form, you need to be running Acrobat (not Reader).  Simply press the button, select a form and wait for the report.  Note that for large forms, this can take a few seconds (sometimes more than a few :-).  I’ve attached another sample that has enough of each kind of content to generate an interesting report.  Save this form to disk and select it from the reporter form.  You’ll see each category of the report populated with data.

I have also attached a report generated from a customer form with which I have enjoyed some quality time.

The script in this form is pretty complex.  But if you’re a good JavaScript programmer you could probably extend the script to capture other information that you find interesting.

6 Responses to Tool for Summarizing Form Content

  1. Dear John,this is a very useful little ‘tool’. In addition to that it also shows ways in accessing different kinds of information inside a template.Thx for your effort developing that.Maruan

  2. Rick B says:

    This only works with XFA forms, right?

  3. Rick:Yes, only for XFA forms. You could do something very similar for acroforms. It’s fairly easy to get at all the field properties via the acroform object model. But I think your report would be limited to field and document properties. I don’t know of a way to summarize PDF page content from the scripting API.John

  4. I was contacted by Bruce (a regular commenter on this blog) with a couple of suggestions for the reporter form. I have updated the sample with these changes:- Changed script at line 312 of the LoadAndReport click event — explicitly reference the rawValue property- Access the value property of script using [“#value”] syntax so that we don’t conflict with script objects that declare a variable called “value”- Changed the report fields to be non-interactive so that the report is generated to page content and can be copied/pastedJohn

  5. Rick G says:

    Sounds like an interesting solution to a problem I’m having, but I get two error messages when I run it against my file.

    Invalid append operation: cannot have a child element of font. The error occurred on line 7061.

    and the same error, but line 8920.

    • John Brinkman says:

      Rick:

      Most likely scenario is that your PDF (XDP) somehow has bad grammar. A font element where it’s not supposed to be. In designer, go to XML source view and have a look at the lines mentioned in the error report.

      Good luck.
      John