PDF documents are all about information communication. The communication aspect is often not completely appreciated by other people inventing document formats. And I am thinking here of the CDF, ODF, OOXML folks.
Let’s look at a more familiar example of communication, the telephone. Within some limits of reason, we absolutely expect to be able to pick up any phone anywhere in the world and successfully talk to any other person in the world who also has a phone. That is amazing! But how is that accomplished? Answer: by using common standards to which all phones and all components of the phone system adhere. Now this blog is less of a plea on my part for standards than it is a plea to understand the very special properties of those standards that must be followed to allow a communication system to be built upon those standards. There are document standards and there are communication document standards and they can be quite different.
Here is the kind of thing you want to avoid at all costs: three kinds of phones, red, blue and black and a phone of a given color can only connect to a phone of the same color. Wouldn’t that be a mess! There actually is a tiny element of this among the cell phone vendors already when it comes to charging for calls. Some vendors offer free calling to any phone from the same vendor (of the same color). But there would be great alarm if those were the only calls that could be made. And how additionally confusing would it be if red phones could call blue but not black, and blue could only call blue, but black could call black and red but not blue. Something that complex would reduce the usage of phones by an order of magnitude.
Well these same ideas apply to communicating information via documents. In this case the sender and receiver are software tools for authoring and displaying documents. In many cases the same software plays both roles of sender and receiver. If I put a document onto a website, I can easily anticipate that 100,000 people or more might read it. (I am waiting for those kinds of numbers for this blog. Tell your friends!) I want all of those people to be able to read it no matter what their computer or favorite software. So we have to have standards that are strictly adhered to as to the form those document files can take and also the software has to be written to strictly obey the standards.
Let’s look at a different situation, that of application specific save files, or what we sometimes call “native file formats.” The primary function for these files is to save any work I am doing so that I can shut down the application and come back later to pick up where I left off. I am thinking of the .psd files of Adobe Photoshop, or the .doc files of Microsoft Word or any other file format closely aligned with a favorite application. Saving is not the only use but that is a key use.
Typically these file formats get revised when a revised version of the product gets introduced. Typically the new application will read older versions of the files and the user can then save then in the new format. But if your files go back too far, your modern software may not be able to read them. Adobe’s products are quite good at release after release being able to read all older files. Other companies are not so good about this. I think it could have something to do with putting pressure on all users of that software to keep updating to the newer versions.
Certainly, most new format files will not successfully be read by the old software. They are new formats because they have new information in them unknown to the old programs.
There is another aspect of these native file formats that is often overlooked. Since they are intended to serve as a snapshot of the work being done with that product, the files generally contain much more than the final appearance of the document. For example, Adobe Illustrator files can contain sets of patterns and color swatches that the user has defined to use with this particular file. They are saved and restored to/from the native .ai files. So these native files may be larger than really needed for communication purposes.
And native file formats are not very good to use as archive files unless they are “turned over” every few years, updating the files to the new revised formats. I wrote about archiving and file formats earlier.
So are OOXML, ODF and CDF native file formats or communication formats? Hard compromises have to be made for a format to be both and, so far, I cannot think of a single example where this has been done truly successfully. PDF is not a native file format and has been designed and managed as a communications format (only).
One more aspect of communications formats is the toughest one to avoid, that of special versions or “profiles” of the format. Usually the formats are designed to be more ambitious than what can be easily handled by limited-function devices. So there is great pressure to define a subset of the original format that is limited in some ways to be supported by limited devices. This then fractures the standard and we get into the situation analogous to red and blue phones where red can call anyone but blue can only call blue. (Subset files can be read by anyone but full format files cannot be read by limited devices.) Not good!
You might have noticed that I mentioned CDF (Compound Document Formats) for the first time in any of these blogs. That is because of a recent announcement by the Open Document Foundation, Inc., previously a strong supporter of ODF, that it will now be helping to promote CDF. I will follow this blog with one about CDF.
Anyway, the bottom line for the PDF architect: PDF is about the only format that has been specifically designed to be a communications format and it shows.
Contact me at: jking@adobe.com

There are reasons other than use on limited devices for having profiles. For instance, PDF/A was designed to insure long-term viability of documents, by keeping them free of external dependencies and encryption. The concerns you mention are important, though.