Base64 Encode a PDF attachment

This blog entry has been re-published with updated information.

Some time ago I experimented with PDF attachments — trying to add them to my XML data.  I wasn’t happy with the outcome at that time, and I was going to leave it.  But then I saw a customer scenario that called for this capability, and then one of my regular commenters (Bruce) brought up the topic as well.  So I’ve tried again and had a little more success this time. 

The end goal is to copy a PDF attachments into XML data.  If you can do so, it opens up a couple of interesting possibilities:

  • Take any attachments a user has added to the PDF and include it in a form submission or in a web service request
  • Take image attachments and use them to populate image fields

Acrobat methods for attachments

The acrobat document object has a property: dataObjects that returns an array of all the attachments in the current document.  Then are a set of methods for dealing with attachments: openDataObject, getDataObject, createDataObject, importDataObject, removeDataObject, getDataObjectContents, and setDataObjectContents.

The interesting method in our case is getDataObjectContents().  It returns a stream object with the contents of an attachment.  If your attachment happens to be textual, then you can use util.stringFromStream() to convert to a string value:

var inputStream ="MyNotes.txt");
Notes.rawValue = util.stringFromStream(inputStream);

The default encoding for binary attachments is a hex-encoding (each byte written as a 2 digit hex value).  However, when your attachment is in a binary format, the standard way to include it in an XML file is to encode it as base64.  To convert to a base64 encoding, use the Acrobat Net.streamEncode() method:

// Get a stream for the image attachment
var inputStream ="Smile.jpg");

// Get a new stream with the image encoded as base64
var vEncodedStream = Net.streamEncode(inputStream, "base64");

// Get a string from the stream
var sBase64 = util.stringFromStream(vEncodedStream);

// assign the base64 encoded value to an image field
sampleImage.rawValue = sBase64;

We know there are issues with Net.streamEncode() failing where content has null bytes.  However, when used in the context of encoding an attachment, it seems to work fine.

When I first looked at this problem I assumed that the Net.StreamEncode() method wasn’t working so I wrote a base64 encoding JavaScript.  It works fine — but it is slow!  On a 140K image, it takes 10 seconds to encode.  I’ve included this code in the sample just for interest sake.

Conversion to Base64

The attached sample form has an initialization script that displays a subform for each attachment.  There are two buttons that will take the corresponding attachment, convert it to base64 and assign it to the image field value.  One button uses the (slow) JavaScript encoding algorithm the other button uses Net.streamEncode() and works pretty quickly. 

The easiest and most reliable way to encode attachments in your XML data is to keep them in a hex encoding.  But of course, for this to work the consumer of your XML needs to be able to handle hex encoding as well.