Base64 Encode a PDF attachment

This blog entry has been re-published with updated information.

Some time ago I experimented with PDF attachments — trying to add them to my XML data.  I wasn’t happy with the outcome at that time, and I was going to leave it.  But then I saw a customer scenario that called for this capability, and then one of my regular commenters (Bruce) brought up the topic as well.  So I’ve tried again and had a little more success this time. 

The end goal is to copy a PDF attachments into XML data.  If you can do so, it opens up a couple of interesting possibilities:

  • Take any attachments a user has added to the PDF and include it in a form submission or in a web service request
  • Take image attachments and use them to populate image fields

Acrobat methods for attachments

The acrobat document object has a property: dataObjects that returns an array of all the attachments in the current document.  Then are a set of methods for dealing with attachments: openDataObject, getDataObject, createDataObject, importDataObject, removeDataObject, getDataObjectContents, and setDataObjectContents.

The interesting method in our case is getDataObjectContents().  It returns a stream object with the contents of an attachment.  If your attachment happens to be textual, then you can use util.stringFromStream() to convert to a string value:

var inputStream = event.target.getDataObjectContents("MyNotes.txt");
Notes.rawValue = util.stringFromStream(inputStream);

The default encoding for binary attachments is a hex-encoding (each byte written as a 2 digit hex value).  However, when your attachment is in a binary format, the standard way to include it in an XML file is to encode it as base64.  To convert to a base64 encoding, use the Acrobat Net.streamEncode() method:

// Get a stream for the image attachment
var inputStream = event.target.getDataObjectContents("Smile.jpg");

// Get a new stream with the image encoded as base64
var vEncodedStream = Net.streamEncode(inputStream, "base64");

// Get a string from the stream
var sBase64 = util.stringFromStream(vEncodedStream);

// assign the base64 encoded value to an image field
sampleImage.rawValue = sBase64;

We know there are issues with Net.streamEncode() failing where content has null bytes.  However, when used in the context of encoding an attachment, it seems to work fine.

When I first looked at this problem I assumed that the Net.StreamEncode() method wasn’t working so I wrote a base64 encoding JavaScript.  It works fine — but it is slow!  On a 140K image, it takes 10 seconds to encode.  I’ve included this code in the sample just for interest sake.

Conversion to Base64

The attached sample form has an initialization script that displays a subform for each attachment.  There are two buttons that will take the corresponding attachment, convert it to base64 and assign it to the image field value.  One button uses the (slow) JavaScript encoding algorithm the other button uses Net.streamEncode() and works pretty quickly. 

The easiest and most reliable way to encode attachments in your XML data is to keep them in a hex encoding.  But of course, for this to work the consumer of your XML needs to be able to handle hex encoding as well.

11 Responses to Base64 Encode a PDF attachment

  1. Keith Gross says:

    We have been handling attachments now for a while and would like to pass on one little gotcha to watch out for. We do our submissions via web service calls and found during testing that trying to make a call with a large payload was problematic. In our case placing the attachment within the forms XML would result in submission calls that frequently exceeded 1MB and at that size failures were common (1 in 10). I can’t say if this would also be a problem for form submission but for Web Service calls it definitely was.To work around this attachments are each sent with a separate call and if the attachment is larger then about 64K we break the attachments into chunks and send them using multiple calls. The chunks are then re-assembled at the server and decoded. We’ve used this to handle 10s of thousands of transmissions successfully. Some of them where the total attachment payload was greater then 25MB.With this procedure we don’t have to re-encode to base64 we just send the hex encoded strings. Decode of the hex strings can be handled offline at a later time if need be. We never have to have more then 64K of a file in memory in Reader or the server at a time. Our file size is limited only by user patience and disk space.

  2. Keith:Thanks very much for sharing your experience. I’m not aware of an advertised size limit on submissions/web service calls — but there is always a practical limit.Breaking attachments up into manageable pieces seems like very good practice.John

  3. Jason Hendry says:

    Hey John,I just wanted to give kudos, you write a great blog with some real content.Thanks for all the hard work that’s gone into the samples, discussion and insight.

  4. Jason:Thanks for the comment.Writing this blog is the best part of my day job.John

  5. Bruce says:

    I was interested in Keiths comments as we also use web services for submission including attachments. We have implemented a 5Mb limit but our bottle neck was once it was on the server. We use the Acrobat SOAP object for web service connections if that makes a difference.We did try using JavaScript to base64 encode the data but found it too slow for the size of attachments we have but have included the code as it is about twice as fast in case it suits someone else.function encodeHexStream(stream){var base64chars = “ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/”;var base64array = [];var s = stream.read(3);while (s.length > 0){var n = parseInt(s, 16);if (s.length == 6){base64array.push( base64chars.charAt(n >>> 18) +base64chars.charAt((n >>> 12) & 63) +base64chars.charAt((n >>> 6) & 63) +base64chars.charAt(n & 63) );}else{if (s.length == 4){n = n >> 18) +base64chars.charAt((n >>> 12) & 63) +base64chars.charAt((n >>> 6) & 63) +”=” );}else{n = n >> 18) +base64chars.charAt((n >>> 12) & 63) +”==” );}}s = stream.read(3);}return base64array.join(“”);}

  6. Thanks for this, Bruce.Twice as fast as the version I wrote…That’s great. I still have much to learn.But as you say, even with this improvement, using JavaScript to encode will work only with small attachments.Thanks for sharing.John

  7. Taha says:

    Hi Keith,I have a requirement to attach Word File in the PDF and display the contents of the file in the text field on the form.After converting the contents using util.stringFromStream, When i try to read the contents of the attached word doc, i get square boxes!!!Though it works for “.txt” file.Could u plz suggest somethingThanks,Taha

  8. Taha:A Word file is a binary format.As far as I know, there is no easy way to parse it in JavaScript. Your best bet is to convert it to a parse-able format before embedding it.john

  9. E. Loralon says:

    Print form with attachment.Hi John,I am building a form using LiveCycle Designer ES 8.2 with a Ms Word document that needs to be printed and included into the form as anintegral part.1) So what I need to achieve is to be able, using JavaScript, to print and add the printed attachment as part of the form.2) Once I attach a document to the form, I would like to display the content of the attached document in a field inside the form for visual representation only, kind like a preview of the attached document.Thank you in advance for your help.

  10. Bruce says:

    Hi John, I have just found that the FormCalc GET function also fails when the content has null bytes. I had hoped to be able to update some images in my form by using the FormCalc GET and assigning the result to the ImageFields rawValue. I guess I’m just wondering if there is any plan for a fix for this problem, considering this post is over two years old I’m hoping it might be soon.