There’s been a lot of excitement recently around the State of Massachusetts mandating open formats for office applications, including PDF and the new OASIS ODF, the file format used by OpenOffice 2.0. A widespread meme portrays “the advent of XML and the adoption of OpenDocument as a world-changing sequence of events”. But I believe that deployment of XML formats is fundamentally not revolutionary: rather than being a true enabler of new solutions, XML simply lowers the cost of developing such solutions. What’s most critical is the “Infoset” (to borrow an XML term) of a particular file format, not its encoding. And if a paricular solution delivers compelling value, it is going to be done, whether or not that encoding is binary or XML. Increasingly widespread support of XML encodings for currently binary-only Infosets will absolutely lower the bar of feasibility, but in most cases it won’t truly alter the fundamental situation.
For example, RedMonk’s Stephen O’Grady wrote last week:
What if before you emailed someone external to the system, the messaging server could deconstruct the document into various pieces, remove any comments or markup, then reconstruct the document without touching the content? That’s what the kind of document manipulation made possible by the transition from binary to XML formats.
This sounds great. But I believe messaging solutions like that envisioned by Stephen exist, and the fact that document files are binary is simply SMOP (small matter of programming). And there are plenty of 3rd-party libraries that process MS-Word, RTF, PDF, and other binary formats. SourceForge alone has dozens of projects that do various things with PDF.
Thus, I believe the lack of widespread document processing in email gateways as Stephen envisions has a lot more to do with considerations such as limited perceived value (I for one don’t like the idea of some central system munging documents I email) and the proprietary nature of some key formats, than whether or not formats have an XML or binary encoding. For that matter, the ZIP-compressed packages of XML files Stephen mentions are in fact binary data, not human-readable or easily manipulable by pure XML tools like XSLT filters.
Don’t get me wrong, I agree that the XML adoption wave is here and unstoppable. But so was the ASCII adoption wave before it. While XML will definitely lower the bar, a “matter of programming” will still be the order of the day. We may increasingly expect key formats to be available in an XML serialization, but at the end of the day, other considerations will in many cases be more fundamental to decisions to adopt new solutions: what capabilities are enabled by the Infoset of the format? has that Infoset stood the test of time? is the format publicly documented and available without IP or other license restrictions? are there multiple vendors in the ecosystem around the format? And, the one I am most focused on: what concrete user and enterprise value is delivered by the solution?