Display an XML Spreadsheet

Today’s blog is on a topic that I’ve intended to tackle for a long time. I finally got the opportunity to work on this after a customer contact.  The end goal is to add spreadsheet data to a PDF form.  Copy and paste aren’t enough — we need to preserve the formatting of the cell data.

The solution involves converting the spreadsheet to an xml format and then populating the form with the cell data from a nice friendly xml grammar.

To get consumable XML from within Excel, choose save-as “XML Spreadsheet 2003″.
You can read about this format at:

http://en.wikipedia.org/wiki/Microsoft_Office_XML_formats

and

http://msdn.microsoft.com/en-us/library/aa140066%28office.10%29.aspx

The spreadsheet grammar has all the cell and row properties needed for rendering: formatting, styling, widths, heights etc.  The only problem is that the format does not lend itself to populating a form via data binding. There are too many special cases to construct sensible binding expressions.  Instead, I’ve taken the approach of using JavaScript to parse the spreadsheet format and to turn it into corresponding formatted XFA fields.

Before I go further, I should mention that my implementation took a few shortcuts.  I didn’t implement the full XML Spreadsheet definition.  Some parts of it I implemented fairly poorly.  My intention wasn’t to provide a finished product, but rather to give you a sample you could build on.

Here is the sample form: spreadsheet.pdf.  I trolled around looking for interesting spreadsheets to test with and found the financial statement for Colgate.

To use the form, follow these steps (from Acrobat — not Reader):

  1. Take an excel spreadsheet, save-as “XML Spreadsheet 2003″.
  2. Click “Load XML Spreadsheet” to embed it as an attachment
  3. Select the region to display
  4. Click “Display”
  5. Repeat steps 4,5 with other regions.  Or go back to step 2 with a different spreadsheet.

Note that Acrobat won’t allow you to do step 2 while the spreadsheet is open in Excel. And if you’ve picked a big spreadsheet it will be slow.  Go ahead and try it with a couple of the spreadsheets you have on your hard drive.

The structure of the form is simple:

  • A Table subform that gets repeated for every worksheet. It has a conditional break so that every worksheet after the first one occurs on a new page.
  • A repeating Row Subform that corresponds to a spreadsheet row
  • A repeating Cell Subform with a Data field that repeats for each column

The only really tricky part of the layout is that the cells are explicitly positioned — not flowed left-to-right.  The reason for that is so that I could immitate the Z order of Excel.  When cells overflow they overwrite cells to their right. To make this work I needed the cells to be added in order from right-to-left so that the left-most cells are highest in Z-order.  I needed them to be positioned so that when a cell overflows it doesn’t push its neighbors over.

The script is another matter. There’s over 700 lines of script that are fairly complex.

It starts by prompting the user to add the spreadsheet as an attachment (with the doc.importDataObject() method).  Once the spreadsheet is stored as an attachment it extracts the contents into an E4X XML object.  Then most of the work is simply parsing the style and size information from the spreadsheet and applying those to the cells.

Note that the row and cell columns are unbound.  That means if you looked in the form data for the spreadsheet information, you wouldn’t find it.  But not to worry, the data is close by — available in the attached spreadsheet.