Acrobat 8 and PDF format features

Today we announced Acrobat 8. Lots of other material goes over general features – here’s the official announcement page, and here’s Lori De Furio’s excellent blog. For my entry here I’m just going to talk about Acrobat 8 and the PDF format.

As with every major release of Acrobat, we’ve done an update to the PDF version with this one. The new latest version of PDF is 1.7. Compared to past revisions, 1.7 is a pretty minor update to the spec – most users won’t notice that anything has changed. The revisions are mostly in the areas of 3D, advanced commenting features, and security. Because the update is pretty minor we decided to make a radical break with the past in how Acrobat handles version numbers – even though the latest version is 1.7, Acrobat 8 will default to saving out files as 1.6. It will only write out the 1.7 version number if the user asks for it, or if 1.7-specific features are used.

The benefit of this change is that users with Acrobat or Reader version 7 can open files saved from Acrobat 8 without getting the warning that the file comes from a newer version and “might’ (who knows?) have features that don’t work properly. We hope you like this!

Another change is that it has become easier to use Acrobat to Save a PDF to an older version number. In Acrobat 7 it was possible to reduce the version number by using Reduce File Size or PDF Optimizer. Access to this has been made little more intuitive in Acrobat 8 – you can now get to this functionality through the Save As menu. Under “File, Save As” the default option is still “Adobe PDF Files (.pdf)” but right under that is “Adobe PDF Files, Optimized (.pdf)”. Making this selection enables the Settings… button in the Save As dialog. Clicking on that brings up the Optimizer dialog. If you just want to change the compatibility level of the file, uncheck all the options in the left panel (Images, Fonts, etc.) and set the Make Compatible option to the Acrobat version you want. Note that Acrobat 4 corresponds to PDF 1.3, Acrobat 5 to PDF 1.4, etc.

Looking further down the format choices for “Save As” you will find options to save as PDF/A (PDF for Archive) and PDF/X (PDF for Print Exchange). These options allow you to take the PDF you currently have open and save it in one of those ISO standard PDF subsets. Please be aware that these options only work if your PDF is pretty close to compliant already – you must have fonts embedded for both standards, for example. And unfortunately the product doesn’t give much feedback in this path if it is unable to complete the conversion – you have to go to the Preflight tool to diagnose why the file couldn’t be converted.

All for now – I’ve got to get back to finishing the product. I hope to blog about some of the other features I’ve been involved with soon.

PDF/A metadata – namespace URIs and prefixes

PDF/A developers and users should be aware of a couple of important corrections that are going to be made to the metadata specifcations for PDF/A.

The PDF/A specification requires that certain metadata be written into conforming PDF/A files. This metadata is in XML, and more specifically uses the Adobe XMP specification. Unfortunately the published version of PDF/A had some errors and ambiguities in the specification of that metadata. This may lead to some tools producing PDF/A that is not strictly compliant with PDF/A (either as originally published or as corrected) or to some tools declaring files non-compliant when they are compliant.

Continue reading…

PDF/A and the “openness” of PDF

John Carroll of Microsoft has written an interesting blog item on PDF vs. Office XML that deserves careful reading. In it he criticizes the government of Massachussets for approving the use of PDF 1.5 but not Office XML as a document format. I posted a brief reply that addresses one issue I’d like to discuss at greater length.

A lot of discussions of “openness” get lost in issues that border on metaphysical. Here I’m going to try to bring the discussion back to earth by going back to the question, what problem are “openness” requirements trying to solve?

PDF was originally invented to solve one problem, the problem of sharing documents electronically. If my application for a mortgage needs to be reviewed by twelve people before approval, how can we make sure that all twelve people see the same document, and thus make their decisions based on the same information? The old way to do it was by using paper with ink on it.

That this solution works seems obvious at first glance, but how do you know it works? Ink fades over time, paper gets discolored. Things can look different under different lighting conditions. How do you know I didn’t put an extra zero of some financial figure in a rapidly fading ink so that one reviewer saw it and the next did not?

The questions get more complicated when photocopiers are involved. In practice most people say that if you review a paper document and I reviewed a photocopy of it, we reviewed the same document. But how do you know that? What if the document has some colors that the copier can’t pick up? What if the copier is a digital one with software that randomly scrambles words?

By now you probably think I’m being silly, but this is an important point: standardization is all about risk management. Any document workflow has risks of error and uncertainty being introduced – even paper workflows. For most workflows the use of ordinary paper, pens, printers, and photocopiers such as you find in the average office are good enough that no one thinks about the kinds of risks I’ve discussed. But professional archivists do worry about the risks of preserving paper documents. Last year I had the privilege of touring the British National Archives and seeing documents that date back to the 11th century. The archivists there, and at other governmental organizations like NARA, worry a lot about the stability of different kinds of paper and ink under different conditions of humidity, temperature, and so on.

The point here is that the kind of risks you need to manage vary with your workflow requirements, but no workflow has zero risk.

Which brings me (finally) to digital documents. Since the 1980s, office workers have been sharing digital documents. What are the risks that exist with digital documents, how serious are they to various workflows, and what steps can be taken to reduce them?

The first types of digital documents people tried to share were native word processor files. They quickly discovered a couple of major risks. One was that a colleague wouldn’t be able to open a document at all because of incompatible software. Another, perhaps more dangerous, was that the colleague would be able to open the doc, but subtle changes would occur that would break the workflow – for example, text would be reflowed, with the result that a comment about “the first line on page 23” would have an unpredictable reference. And a final one was that one reader of a document would edit it – inadvertenly or maliciously – before passing it on to the next person.

Adobe invented PDF and Acrobat primarily to reduce these risks to acceptable levels for office and government work. And it worked! In the 12+ years since Acrobat was introduced, PDF has become a de facto standard for documents because experience has shown that the risks of PDF documents becoming unreadable, or reading unpredictably, are negligible in practice. And I repeat “negligible in practice” – as a technical expert in PDF I am well aware that the risks are not zero. But the big point is, we wouldn’t even be talking about PDF as a government standard if we didn’t have many years of experience showing that it works. And this, I think, is why Office XML didn’t get out of the starting gate with certain governments: at this time it lacks the starting point of a wide enough installed base and widespread user experience to merit consideration.

So why PDF/A? PDF/A exists because for certain applications, practical experience that the format works is not sufficient (though it is necessary). Prime examples are legal applications and archival applications. If a document is going to be used as evidence in a court proceeding, or if there is a statutory requirement that it be preserved for posterity, then a stricter level of scrutiny is required. These applications require that independent technical experts examine the risks associated with the technology and make sure that appropriate steps have been taken to minimize those risks.

One of the risks that particularly concerns archivists is intellectual property risk. This is the risk that the technology required to reproduce the document will not be available because relevant information cannot be found or cannot be used for legal reasons. It is this risk that is addressed by concerns for “openness”, and claims about openness need to be addressed in this context.

How do you assess the risk of a technology becoming unavailable or unreliable because of information becoming unavailable or of intellectual property lawsuits? You can’t eliminate that risk completely, but you can make informed prudent judgements. A book that is in widespread publication and present in many libraries is much less likely to vanish from the world than an HTML page on the website of a corporation. A technology that has been implemented by numerous independent vendors over a period of many years is much more likely to be free of intellectual property encumbrances than one just released by a single vendor.

The approval of PDF/A by ISO, and its adoption as a standard by government agencies, reflects a collective judgement by a large community of experts that the use of this technology is prudent and responsible risk management. In particular it reflects a judgement that PDF/A is at least as reliable as paper.

We live in the real world. Disk drives can fail, buildings full of paper can burn down, companies can go out of business, and widely used technologies can suddenly become the target of patent lawsuits.

The due diligence of the expert evaluating technology does not consist in a rigorous proof that the technology strictly conforms to some ideal, but in the intelligent weighing of a lot of relevant considerations. PDF is widely implemented, has proved itself trustworthy for over a decade, and has been found to be licensed on “reasonable and non-discriminatory terms” by standards bodies that specialize in making those evaluations. That is why agencies with archival requirements are adopting it.

PDF/A – background

Next week I’ll be in Washington for a meeting to start work on the second version of PDF/A. So I will be devoting most of my next few blog entries to PDF/A. Today, some background and history.

When PDF was first introduced, it was designed to a large extent as a replacement for paper documents. Many of the first big adopters of PDF were organizations that deal with huge quantities of paper documents, such as the IRS. One that got started a little later was the U.S. Courts. The courts used to accumulate astonishing amounts of paper documents for every case – sometimes literally truckloads. Managing all that paper was a logistical nightmare, and PDF looked like just the ticket to solve that problem.

There are a couple of catches, though. One is that over the years PDF has grown far beyond its “replacement for paper” roots and now includes lots of interactive and dynamic features that aren’t appropriate for the type of documents kept by courts. Another is that most government agencies have legal requirements to preserve many of their documents for very long periods of time – sometimes decades or even “until the end of the Republic”. To preserve paper documents governments have had to develop strict standards about paper and ink quality, storage conditions and so forth, all to be sure that a paper document will still be legible decades or centuries from now.

The question was coming up more and more: how can we be confident that an electronic document such as a PDF will still be readable in, say, a hundred years? Consider how much computing technolgy, operating systems, displays, application software and file formats have changed just over the last ten years, and then try projecting that out to a hundred!

To address these questions, a committee was formed in 2002 under the auspices of two standards-development organizations, AIIM and NPES. Three years later, an ISO standard was ratified.

Tomorrow: the technical content of the standard.

Microsoft Office to support PDF

I was planning to launch my blog today with a discussion of PDF/A. But my plans were derailed by the announcement that Microsoft Office 12 will support export of PDF.

(Just to clear up some confusion that has shown up on some blogs: Microsoft did not license any technology from Adobe for this.)

This is wonderful news for the PDF platform. It confirms, in case there was any lingering doubt, that PDF is here to stay as the de facto standard for “final form” digital equivalents of paper records. It also raises a lot of questions.

Is this the beginning of the end for Microsoft’s nascent attempt to replace PDF with a proprietary equivalent? I don’t understand why Office users, at least, will be interested in this new format when it arrives.

How complete will Microsoft’s support of PDF be? Will Office just produce static paper-equivalent PDFs? Or will it add advanced features like hyperlinks, comments, bookmarks, Tagged PDF, interactive forms, embedded multimedia files? Will they create files that conform to the PDF/A and PDF/X standards? Brian Jones’s blog offers no real clues yet.

A lot of people are going to be very interested in the first beta releases of Office 12 when it arrives.

UPDATE: Since I first posted this, I found a much more detailed posting on Microsoft’s website that answers some of my questions.