Testing the Appearance of PDFs
Enterprises often want to test automatically generated PDFs to verify that these PDFs are “correct” when viewed in Adobe Reader by end users.
While end users care some about PDF size and other attributes, they care most that the PDF is visually accurate — it contains everything they expect it to contain – nothing is hidden or truncated when it should be visible and whole.
Such bugs can come from many sources — fonts, databases, PDF generation tools, etc. It is possible that a particular version of Adobe Reader/Acrobat might be at fault. While Adobe tests Reader extensively to prevent these kinds of bugs, it is very hard or impossible to find all of them in every situation.
Currently much of this testing is done by manual human inspection. This is tedious, time consuming, and labor intensive. There are reasonable ways to automate much of it and greatly reduce manual testing although some level of human verification will always be required.
Visual, or appearance, test automation for PDFs breaks into 2 areas:
1) Visualization/rasterization of the PDF into a form that a computerized human-equivalent eye can evaluate.
2) Comparison using a computerized eye between test results and baselines.
Please note that when comparing PDFs in this situation we do not care so much about PDF internals, since PDFs with very different structures can result in the same visual result. Rather we want to focus on how a PDF appears to the end user.
Rasterization is the process of converting image files to bitmapped images such as we would display on a screen to an end user. TIFF files are commonly used and probably the best format to use for rasterization and comparison purposes.
PDF rasterization should be done by the same software that the user will use to view PDFs. Since Reader itself cannot save rasterized files, you could use test automation software to run Reader and capture screen shots but this is awkward. Adobe Acrobat and LiveCycle are able to save rasterized versions of PDFs in TIFF format and they use the same rasterization engine that is built into Reader. So the most likely options for rastorization are to use either of these tools.
In Acrobat you can create TIFFs by opening a PDF, choosing “Save As…” from the File Menu, choosing “TIFF” in the “Save as type” pop up of the Save As dialog, selecting your location, and pressing save. This will create 1 TIFF file for each page in the PDF. You can use Acrobat’s batch function to do this on many files or you can use test automation software like QuickTest Pro and others to run Acrobat against a series of files.
Alternatively you can use LiveCycle, which in many ways is a server based Acrobat for building factories that create and manage PDFs. LiveCycle’s PDF Generation service can run through large sets of PDFs to create corresponding TIFFs. Please refer to LiveCycle documentation or training for instructions on how to do this.
Acrobat is most similar to Adobe Reader and matching versions typically ship at the same time as Reader, so Acrobat may be the best rasterization choice. LiveCycle may ship at a different time from Reader and so may have a slightly different rasterization engine version, but LiveCycle was also designed for handling large numbers of PDF files so using it may be better in many situations.
There are many comparison options around, but for comparing images, one of the best is ImageMagick . The ImageMagick website has considerable documentation on how to use this tool and there are also some published books on it, so I won’t go into the details here.
As a high level summary, you should use your favorite scripting or other automation tool to point ImageMagick at pairs of files (one test to one base reference) and have your test automation system record which ones have significant differences. You can tune ImageMagick, to flag or ignore small pixel shifts.
Then, a human can review what the automated testing thought was significant to see what differences were valid or not. Depending on your automation skills, your test automation system might present original, baseline, and differences pages back to the tester for review and reporting bugs and refining the test system itself.
You will need to experiment quite a bit when setting up test automation for PDFs because situations vary.
Overall the steps are:
1) Establish your set of baselines – your gold standards.
2) Rasterize your baselines.
3) Create test PDFs from your new system that has changes.
4) Rasterize your test PDFs.
5) Compare tests to baselines.
6) Either report bugs, refine your test system, or decide that particular test files are more accurate than the original baselines and re-establish your baselines with these new files.