by lrosenth


November 24, 2010

Attaching files to PDF documents

One of the great features of PDF is its ability to carry attached files, just as e-mail messages can carry attached files. Any kind of file, and any number of files, can be sucked into a PDF file. These are held internal to PDF as “stream” objects, one of the basic 8 object types from which all PDF content is built (numbers, arrays, strings, true, false, names, dictionaries and streams). Streams start with a dictionary object but then carry along an arbitrarily long sequence of arbitrary 8-bit bytes. Stream objects meet the generic description for disk files quite well.

Various compression/decompression methods can be used on streams, so even though including attachments in PDFs can tend to make the PDF files large, it is mitigated considerably by using one of PDF’s 8 compression methods. Flate compression, the technology also used in ZIP and PNG, is most commonly used for PDF attachments since it is lossless (you get exactly the same bytes you started with from a compression/decompression cycle), it gives good compression on arbitrary byte sequences and compression and decompression are both fast. Examples which I have, go from 1/3 reduction in size to 1/200 reduction.

So what might PDF attachments be used for? One use is for “hybrid files”. The office products from ODF while making a PDF file can attach the original source ODF document to it. They have made their software recognize these special PDF files, which they call hybrid files, and when asked to open them, they actually open the attachment. In this way a hybrid file is both a final form PDF file and an editable office document. Adobe’s Acrobat products can attach Microsoft Office documents to the PDF files produced by them automatically, also. The user has to extract the Office document from the PDF, themselves, in order to then edit it with the Office product.

Another very cool way to use attachments is as if the PDF was a file folder. In fact, the ISO standard defines a way to include an index for those file folders so that when opened, a panel alongside the base PDF display can show a list, much like an e-mail list, of all the attached files, providing their name, data of creation, author, or whatever else the creator of the PDF wanted to include in the index (see 1.a and 2.a below). These are called PDF collections.

Choices for using attachments

There are basically two ways, each with two ways, to manage attached files in PDF.

1. Attachments to a base document.

1.a. The names and sizes of all documents attached can be pulled out of the PDF file and display in a special viewer panel. From this panel attachments can be extracted, additional files may be attached, or existing ones deleted.

Here is a screen shot of a sample of what you will find in the base attachments panel of Adobe Reader and Adobe Acrobat. The contents and display of this panel is fixed by the application and just includes the basic information normally available about a file.

1.b. In addition to being an attachment as in (1.a), an attached file can be represented by an annotation on a page. Various icons annotations can be used to represent the attachment like a thumb tack, chart icon, or paper clip. When the icon is double clicked upon, the attachment can be extracted or deleted. Of course, these are normal attachments that show up in the list provided in 1.a above. I like this choice because, for example, you can attach the raw data that was used to produce a pie-chart and have a clickable annotation right alongside the chart to extract the data. You can do that for all 20 pie-charts in the document. In that way, if someone wants to play with the numbers in a spreadsheet or make their own chart of the different type, they can. This is a screenshot of what that can look like with an annotation popup window showing:


2. Attachments with additional indexing and navigation information

2.a.  As noted earlier, the ISO PDF standard provides for the author of the PDF file to include a table of terms and values in order to be able to display a specialized index of all the attachments. PDF files set up this way are called PDF Collections. The mortgage industry has found this to be a very useful way to bring together, into one PDF file, all the various documents that represent a complete mortgage deal.  And they can be indexed, dated and named to help in getting to the right sub-document efficiently. Very similar to the base panel of (1.a), these allow the document author to use any index terms and values. In this example a “Birthday” field has been added to the basic indexing data.

2.b.  Based upon Collections (2.a), an author can also include a graphically rich interactive “navigator” that allows the creative juices to flow for showing various aspects of the attached Collection files. These are called Portfolios. This is a new feature that makes use of Flash embedded within the PDF file and called a Navigator. It has access to the indexing information described in 2.a and it can also get thumbnails of the first pages, etc. In Adobe Acrobat there are a set of pre-made Navigator files that present various effects like a carousel of sub-files, a linear fly-by of them, etc. Adobe currently supports this in Acrobat 9.0 and greater via an Adobe extension to ISO 32000-1.

PDF attachments can be a very effective way to have one file that contains a wealth of related content.

Jim King (