Archive for June, 2013

June 25, 2013

What is PDF/UA all about, anyway?

The PDF/UA (“Universal Accessibility”) specification, or ISO 14289, published by the International Organization for Standardization (ISO) in August of last year, was a big step forward for authors of the tools we use to create and consume PDF content. But what the spec itself does is a little harder to explain, and there’s been a lot of confusion. I even confused myself recently about what PDF/UA does and doesn’t specify. So I thought it might help to summarize the spec in some detail for those who are coming to grips with its place in the world of PDF accessibility.

If the PDF/UA specification could be summed up in one sentence, it may go something like this:

“PDF/UA makes certain that the PDF format isn’t the source of accessibility problems.”

The end result of a document built using the PDF/UA spec is a more reliable, more accessible document that avoids the tricks and traps that PDF can present. PDF authors don’t need to know anything specific about what goes on behind the scenes; the tools themselves are responsible for adding and preserving the accessibility of the PDF. That’s the value of PDF/UA.

That does not mean that a PDF/UA-compliant document will always be perfectly accessible—issues like poorly-built Word documents or other source material will, of course, carry their accessibility flaws no matter what format they’re converted into. No one should claim that PDF/UA conformance means that a given document will pass the Web Content Accessibility Guidelines (WCAG) 2.0, and organizations shouldn’t treat PDF/UA as a WCAG stand-in. But conformance does indicate that the authoring process for a given piece of content retains its accessibility level when it’s output as a PDF.

The PDF/UA specification defines conformance for three different aspects of PDF: content, readers, and assistive technology. The authoring tools are intentionally omitted: only the content they produce matters here, and that can only be measured at the individual document level.

We’ve seen a number of organizations that are specifying PDF/UA-compliant documents, and that’s a worthwhile approach, as far as it goes. The revamped accessibility checker in Adobe Acrobat Pro XI bases its tests on PDF/UA and WCAG criteria, but doesn’t yet fully test PDF/UA compliance.

The PDF/UA document itself is 23 pages, and thanks to ISO’s publishing model, it’s about $90 US to purchase a copy. We’re sensitive to the cost issue here, but in the interest of full disclosure, Adobe doesn’t own this work (it was the product of participants from a number of companies, and its copyright belongs to ISO), so we’re not able to simply republish the spec. To those who are planning to implement PDF/UA in their readers and assistive technologies, $90 is not a barrier (and if you promise to implement PDF/UA in an open-source tool, I’ll buy you a copy myself, though I’m sure I’d get the better end of that deal).

The material is what we in America call “inside baseball”—it’s very technical, requires a solid understanding of PDF internals, and is heavily cross-referenced with ISO 32000-1:2008, the PDF 1.7 specification. For non-technical authors looking to create accessible PDFs, this spec is very technical; you may be better off looking at the PDF Techniques for WCAG 2.0. However, I’ve taken the liberty of summarizing this spec so that everyone has a chance to understand what conformance entails.

The spec starts by listing what PDF/UA does not do: help with converting paper or electronic documents to PDF/UA; give implementation advice for rendering PDF documents; or tell you how to store PDFs or what OS to use. The rest of the front matter includes normative references to WCAG 2.0, PDF 1.7, PDF/A-2, and PDF/X-1a; defining a handful of terms; and setting a namespace for PDF/UA metadata which goes into every conforming file.

The document then defines conformance at three levels: individual files, PDF readers, and assistive technology.

Conforming files

A conforming file contains features that are valid according to the PDF 1.7 spec, except for features PDF/UA specifically forbids. It has to be marked as a PDF/UA document as described in Section 5, and meet all the requirements in Section 7 below.

Conforming reader

A conforming reader must also be a conforming reader according to PDF 1.7. It will support all the tags, attributes and key values specified for accessibility, and respect when optional content is hidden. It will make the logical reading order available. It will allow AT to inspect artifacts, and its interface must itself be accessible, and not interfere with any AT feature.

There are some repair techniques for headings and tables, rules for handling optional content, attached and embedded files, digital signatures, actions, metadata, navigation, annotations, forms and media.

Conforming assistive technology

A conforming AT supports all of the features of the content and the reader, and should allow navigation by page labels, document structure, or the outline. It should also let the user override default navigation zoom. (A combined reader and AT, perhaps something like TextHelp’s PDF Aloud, could be both a conforming reader and a conforming AT.)

File format requirements

This is the meat of the spec, and there’s some good advice in here for those of you who are already well-versed in tagging PDFs.

All PDF/UA documents must be tagged PDF. Tags must be semantically appropriate (that is, you can’t just mark everything <p> and be done), and in logical reading order. Artifacts (sometimes referred to as “Background” in Acrobat) must not be tagged. If a PDF does anything non-standard with its tags, those tags have to be remapped to standard PDF tags. Standard tags can’t be overridden.

Content can’t flicker, blink or flash, and it can’t be conveyed solely by color, contrast, formatting,  layout or sound. Image-only PDFs may be created, but their content must also be tagged.

The document must have a title, and it must be displayed in the title bar.

Text must be Unicode. The document’s language, and any changes in language, must be declared.

Graphics must be marked up with the Figure tag, and must have alt text, unless it’s presentational, in which case it’s an Artifact. Groups of images that represent one thought are to be tagged as a single Figure. Captions that go with figures must be tagged as such.

Headings must be nested sequentially (e.g., H1-H2-H3 is acceptable, but H1-H3 is not). Headings can go as deep as necessary (e.g., H1041 is valid, if you’ve used the first 1040 levels). Generic “H” headings are acceptable, but can’t be used interchangeably with numbered headings.

Tables should have headers (“TH” tags) with a Scope attribute.

Lists must be marked up appropriately.

Math equations must be in a Formula tag, with alt text.

Page headers and footers must be marked up as Pagination artifacts, so they’re not read out repeatedly.

Footnotes and endnotes must be marked up with the Note tag.

All optional content configuration dictionaries (a PDF feature which allows content to be hidden conditionally) must be named.

Any embedded files must also be accessible.

Article threads (which allow multicolumn layouts across pages) must retain proper reading order.

Digital signature form fields must be laid out accessibly.

Non-interactive forms have to be tagged with PDF “PrintField” attributes so they will appear as read-only form fields to AT.

Static XFA-based forms are allowed. Dynamic XFA forms are not.

Secured documents must allow AT access.

Documents should have outlines that reflect the reading order and nav hierarchy.

Visible annotations must be represented in the right place in the reading order.

Tab order must be defined.

Links must be tagged, and contain an alternate description.

Metadata tags must be properly set for embedded media.

Actions (i.e., scripting) are allowed. Changes in content or focus must be announced to AT, and cannot set time limits on individual keystrokes.

There are requirements for the implementation of fonts that are well out of scope for an overview, but important for reliable rendering of fonts across operating systems and reader implementations.


So, that’s it. We’re hoping a little transparency regarding PDF/UA helps everyone understand what it does and doesn’t do. In time, we anticipate that PDF/UA will make accessible authoring a more automatic and uniform process across authoring tools, which in turn will make the accessibility of PDFs in the wild a lot better in general.

1:45 PM Permalink