Big and Complex Forms

How Big and Complex can you make your form?

I get asked this question often. Customers or partners develop very complex or large dynamic forms with many pages and large amounts of script. At what point do we cross the line and reach a level of complexity where Reader/PDF is no longer the right tool for the job?

There is no easy answer. The answer will be different for different users. But it is helpful to look at some of the stress points you’ll encounter with large forms.

Note that these notes apply to forms opened in Acrobat/Reader. The stress points for forms rendered on the server are much different.

  1. Number of pages to render
  2. File Size
  3. Script size, complexity and development methodology
  4. Script performance

Number of pages to render

One of the great properties of regular PDF files is that the file open time is constant no matter how large the PDF. The time to open a two thousand page PDF is pretty much the same as for a one page PDF. This is because Reader doesn’t load the whole PDF into memory and doesn’t read the bytes for page <n> until the user navigates to page <n>.

Dynamic XFA/PDF forms offer a different value proposition. The pages are shaped at form open time by the form data. Of course, there are great advantages to dynamic forms. But there are also associated processing costs. At form open time the entire form definition is loaded into memory. The entire set of data is loaded and merged with the form template. Reader performs enough of the layout to determine how many pages will be rendered. Then when you navigate to page <n>, Reader renders that page from the in-memory structures.

How many pages can Reader handle for a dynamic document? This depends on the complexity of the template. I’ve seen five page forms that take forever to open. I’ve seen a hundred page form open in a second. The limit is more related to the density/complexity of template and data rather than the actual number of pages.

Some form authors attempt to reduce file open time by hiding inactive pages. This strategy was effective in reducing form open time in Acrobat/Reader 7. But in Reader 8.1 when the form open algorithm was improved, the ‘page hiding’ strategy no longer makes a significant difference.

File Size

Dynamic XFA/PDF forms tend to be smaller than static documents. This is because of the template property of forms. For example: a hundred page static PDF will have a hundred pages of PDF mark-up. Whereas in the dynamic case, this could be one page of XFA mark-up that gets replicated a hundred times when merged with data. The latter will be a much smaller file. Nonetheless, dynamic documents can grow to the point where they begin to stress your system. The time to read and parse the documents happens very quickly – even for very large templates. However, the size of the template becomes more of a factor when there are security components in play. Operations such as Certification, Reader extensions and Signatures will perform comparison operations on ‘before-and-after’ versions of the form. The costs of these comparisons are proportionate to the size of the template.

So while there is no absolute threshold on file size, you will find the threshold is lower for certified/extended/signed forms.

Script size and complexity and development methodology

I have seen XFA/PDF files with tens of thousands of lines of JavaScript. Given that there is no debugger, you have to be pretty persistent to create this amount of script. If your big script library is well written, it may perform well enough, but the stress comes with the maintenance of the script:

  • When you change the script, do you have the ability to rigorously test your changes? When you modify fields or subforms, will your script still work? Do you have test collateral that gives you code coverage for all the edge cases in your script? Do you have some form of automated testing? QTP anyone?
  • Is your script maintainable? Or is the code ‘write-only’? Unless you have been disciplined in the creation of your library, you will have longer term maintenance issues when a new developer comes along to update an existing form.
  • When you encounter problems with your script, are you able to isolate the problem when you ask for help? Your friends in our support organization are much better at solving problems with small, simple forms than with large, complex ones. If your script is modular and isolated into components then you’ll be able to ask for help much more easily than if your script is an inter-tangled mess.
  • When you change script, do you preserve previous versions of your form? You need the ability to roll-back changes.

Again, there are no absolutes here, but if you want/need to write lots of script, you need to have the associated discipline in your development environment to make it maintainable.

Script performance

Large amounts of script do not necessarily imply poor performance. But poorly written script of any amount can kill form performance. A script that traverses the entire form hierarchy will have performance that is proportionate to the number of objects in the form. As the form grows, the script slows down. There are many ‘best practises’ for writing efficient script. It is very important to pay close attention to the contents of frequently executed loops.

Conclusion

But before you make a big investment in a form, make sure you consider the alternatives. You might be better off with a Flash form or an AIR application.  If you choose Reader/PDF, the maximum size and complexity of your form depends primarily on your own tolerances.  You need to decide whether the runtime experience is responsive enough.  You need to decide if you are getting the return on investment for your cost to develop and maintain the form.