XML: (just) grease for the wheel of IT

There’s been a lot of excitement recently around the State of Massachusetts mandating open formats for office applications, including PDF and the new OASIS ODF, the file format used by OpenOffice 2.0. A widespread meme portrays “the advent of XML and the adoption of OpenDocument as a world-changing sequence of events”. But I believe that deployment of XML formats is fundamentally not revolutionary: rather than being a true enabler of new solutions, XML simply lowers the cost of developing such solutions. What’s most critical is the “Infoset” (to borrow an XML term) of a particular file format, not its encoding. And if a paricular solution delivers compelling value, it is going to be done, whether or not that encoding is binary or XML. Increasingly widespread support of XML encodings for currently binary-only Infosets will absolutely lower the bar of feasibility, but in most cases it won’t truly alter the fundamental situation.
For example, RedMonk’s Stephen O’Grady wrote last week:

What if before you emailed someone external to the system, the messaging server could deconstruct the document into various pieces, remove any comments or markup, then reconstruct the document without touching the content? That’s what the kind of document manipulation made possible by the transition from binary to XML formats.

This sounds great. But I believe messaging solutions like that envisioned by Stephen exist, and the fact that document files are binary is simply SMOP (small matter of programming). And there are plenty of 3rd-party libraries that process MS-Word, RTF, PDF, and other binary formats. SourceForge alone has dozens of projects that do various things with PDF.
Thus, I believe the lack of widespread document processing in email gateways as Stephen envisions has a lot more to do with considerations such as limited perceived value (I for one don’t like the idea of some central system munging documents I email) and the proprietary nature of some key formats, than whether or not formats have an XML or binary encoding. For that matter, the ZIP-compressed packages of XML files Stephen mentions are in fact binary data, not human-readable or easily manipulable by pure XML tools like XSLT filters.
Don’t get me wrong, I agree that the XML adoption wave is here and unstoppable. But so was the ASCII adoption wave before it. While XML will definitely lower the bar, a “matter of programming” will still be the order of the day. We may increasingly expect key formats to be available in an XML serialization, but at the end of the day, other considerations will in many cases be more fundamental to decisions to adopt new solutions: what capabilities are enabled by the Infoset of the format? has that Infoset stood the test of time? is the format publicly documented and available without IP or other license restrictions? are there multiple vendors in the ecosystem around the format? And, the one I am most focused on: what concrete user and enterprise value is delivered by the solution?

2 Responses to XML: (just) grease for the wheel of IT

  1. interesting thoughts, Bill. but while i’m sure that some of the types of systems i’m envisioning exist, i think their non-mainstream impact is at least as attributable to technical restrictions as it is to perceptions of limited value.
    first, value, as i see it, is tied almost directly to effort. thus if the available solutions are too difficult to implement, their value is inevitably affected.
    second, i think largely as a result of the first condition, the creativity that’s been applied to considerations of what, precisely, can be accomplished via open collaboration formats is limited. far too long have we accepted a poor editing experience of emailing office documents around. i do think that whether it’s messaging, wikis, or what have you we have a chance to fundamentally rethink precisely what can be accomplished in an office productivity format.
    either way, it should be a fun space to watch for a while.

  2. Sam Hiser says:

    Bill-
    I appreciate the view you express here — especially the idea of information encoding type, whether open or closed, is less important than what’s encoded.
    As a technologist, your lens is on the technology; you’re focused intensely on XML itself, its strengths and weeknesses. This I admire; I have had conversations with deep research technologists (from Sun, in fact) who are rather unimpressed with XML, per se. Qualified appropriately, I found their views unimpeachable.
    My enthusiasm around OpenDocument and it’s expression in XML for office documents is not so much driven by open v. proprietary data encoding but by the openness of the ecosystem aroung the specification. This is the same openness that creates intense competition around applications for OpenDocument.
    If there is a difference between your vantage and mine it’s that you are less willing to consider social forces in your model of the future. For whatever it may be worth, my optimism about OpenDocument is justified by an unscientific imagination, a luxury I am permitted as a businessman. And confidence in the prospect of business value being driven by OpenDocument supports my optimism, too.