by Matt May

 Comments (7)

Created

June 25, 2013

Creative Commons Attribution 3.0 License

Content in this blog post is licensed under a Creative Commons Attribution 3.0 License. Example code provided is licensed under Adobe’s Creative Commons Plus License.

The PDF/UA (“Universal Accessibility”) specification, or ISO 14289, published by the International Organization for Standardization (ISO) in August of last year, was a big step forward for authors of the tools we use to create and consume PDF content. But what the spec itself does is a little harder to explain, and there’s been a lot of confusion. I even confused myself recently about what PDF/UA does and doesn’t specify. So I thought it might help to summarize the spec in some detail for those who are coming to grips with its place in the world of PDF accessibility.

If the PDF/UA specification could be summed up in one sentence, it may go something like this:

“PDF/UA makes certain that the PDF format isn’t the source of accessibility problems.”

The end result of a document built using the PDF/UA spec is a more reliable, more accessible document that avoids the tricks and traps that PDF can present. PDF authors don’t need to know anything specific about what goes on behind the scenes; the tools themselves are responsible for adding and preserving the accessibility of the PDF. That’s the value of PDF/UA.

That does not mean that a PDF/UA-compliant document will always be perfectly accessible—issues like poorly-built Word documents or other source material will, of course, carry their accessibility flaws no matter what format they’re converted into. No one should claim that PDF/UA conformance means that a given document will pass the Web Content Accessibility Guidelines (WCAG) 2.0, and organizations shouldn’t treat PDF/UA as a WCAG stand-in. But conformance does indicate that the authoring process for a given piece of content retains its accessibility level when it’s output as a PDF.

The PDF/UA specification defines conformance for three different aspects of PDF: content, readers, and assistive technology. The authoring tools are intentionally omitted: only the content they produce matters here, and that can only be measured at the individual document level.

We’ve seen a number of organizations that are specifying PDF/UA-compliant documents, and that’s a worthwhile approach, as far as it goes. The revamped accessibility checker in Adobe Acrobat Pro XI bases its tests on PDF/UA and WCAG criteria, but doesn’t yet fully test PDF/UA compliance.

The PDF/UA document itself is 23 pages, and thanks to ISO’s publishing model, it’s about $90 US to purchase a copy. We’re sensitive to the cost issue here, but in the interest of full disclosure, Adobe doesn’t own this work (it was the product of participants from a number of companies, and its copyright belongs to ISO), so we’re not able to simply republish the spec. To those who are planning to implement PDF/UA in their readers and assistive technologies, $90 is not a barrier (and if you promise to implement PDF/UA in an open-source tool, I’ll buy you a copy myself, though I’m sure I’d get the better end of that deal).

The material is what we in America call “inside baseball”—it’s very technical, requires a solid understanding of PDF internals, and is heavily cross-referenced with ISO 32000-1:2008, the PDF 1.7 specification. If your goal is to author accessible PDFs, this is not the spec for you; you’re better off looking at the PDF Techniques for WCAG 2.0. However, I’ve taken the liberty of summarizing this spec so that everyone has a chance to understand what conformance entails.

The spec starts by listing what PDF/UA does not do: help with converting paper or electronic documents to PDF/UA; give implementation advice for rendering PDF documents; or tell you how to store PDFs or what OS to use. The rest of the front matter includes normative references to WCAG 2.0, PDF 1.7, PDF/A-2, and PDF/X-1a; defining a handful of terms; and setting a namespace for PDF/UA metadata which goes into every conforming file.

The document then defines conformance at three levels: individual files, PDF readers, and assistive technology.

Conforming files

A conforming file contains features that are valid according to the PDF 1.7 spec, except for features PDF/UA specifically forbids. It has to be marked as a PDF/UA document as described in Section 5, and meet all the requirements in Section 7 below.

Conforming reader

A conforming reader must also be a conforming reader according to PDF 1.7. It will support all the tags, attributes and key values specified for accessibility, and respect when optional content is hidden. It will make the logical reading order available. It will allow AT to inspect artifacts, and its interface must itself be accessible, and not interfere with any AT feature.

There are some repair techniques for headings and tables, rules for handling optional content, attached and embedded files, digital signatures, actions, metadata, navigation, annotations, forms and media.

Conforming assistive technology

A conforming AT supports all of the features of the content and the reader, and should allow navigation by page labels, document structure, or the outline. It should also let the user override default navigation zoom. (A combined reader and AT, perhaps something like TextHelp’s PDF Aloud, could be both a conforming reader and a conforming AT.)

File format requirements

This is the meat of the spec, and there’s some good advice in here for those of you who are already well-versed in tagging PDFs.

All PDF/UA documents must be tagged PDF. Tags must be semantically appropriate (that is, you can’t just mark everything <p> and be done), and in logical reading order. Artifacts (sometimes referred to as “Background” in Acrobat) must not be tagged. If a PDF does anything non-standard with its tags, those tags have to be remapped to standard PDF tags. Standard tags can’t be overridden.

Content can’t flicker, blink or flash, and it can’t be conveyed solely by color, contrast, formatting,  layout or sound. Image-only PDFs may be created, but their content must also be tagged.

The document must have a title, and it must be displayed in the title bar.

Text must be Unicode. The document’s language, and any changes in language, must be declared.

Graphics must be marked up with the Figure tag, and must have alt text, unless it’s presentational, in which case it’s an Artifact. Groups of images that represent one thought are to be tagged as a single Figure. Captions that go with figures must be tagged as such.

Headings must be nested sequentially (e.g., H1-H2-H3 is acceptable, but H1-H3 is not). Headings can go as deep as necessary (e.g., H1041 is valid, if you’ve used the first 1040 levels). Generic “H” headings are acceptable, but can’t be used interchangeably with numbered headings.

Tables should have headers (“TH” tags) with a Scope attribute.

Lists must be marked up appropriately.

Math equations must be in a Formula tag, with alt text.

Page headers and footers must be marked up as Pagination artifacts, so they’re not read out repeatedly.

Footnotes and endnotes must be marked up with the Note tag.

All optional content configuration dictionaries (a PDF feature which allows content to be hidden conditionally) must be named.

Any embedded files must also be accessible.

Article threads (which allow multicolumn layouts across pages) must retain proper reading order.

Digital signature form fields must be laid out accessibly.

Non-interactive forms have to be tagged with PDF “PrintField” attributes so they will appear as read-only form fields to AT.

Static XFA-based forms are allowed. Dynamic XFA forms are not.

Secured documents must allow AT access.

Documents should have outlines that reflect the reading order and nav hierarchy.

Visible annotations must be represented in the right place in the reading order.

Tab order must be defined.

Links must be tagged, and contain an alternate description.

Metadata tags must be properly set for embedded media.

Actions (i.e., scripting) are allowed. Changes in content or focus must be announced to AT, and cannot set time limits on individual keystrokes.

There are requirements for the implementation of fonts that are well out of scope for an overview, but important for reliable rendering of fonts across operating systems and reader implementations.

Conclusion

So, that’s it. We’re hoping a little transparency regarding PDF/UA helps everyone understand what it does and doesn’t do. In time, we anticipate that PDF/UA will make accessible authoring a more automatic and uniform process across authoring tools, which in turn will make the accessibility of PDFs in the wild a lot better in general.

COMMENTS

  • By Mike Moore - 3:46 PM on June 25, 2013  

    Thanks for this Matt. I do have one question.

    “Static XFA-based forms are allowed. Dynamic XFA forms are not”

    Why?

    We have completed a number of Dynamic XFA forms that are accessible via JAWS and ZT with the current release of Adobe Reader on windows. I am aware that these and static XFA forms are not accessible on iOS devices. Are there other restrictions that we should be aware of?

  • By mattmay - 6:17 PM on June 26, 2013  

    That was the decision of the working group. As it is, there is some accessibility functionality in Reader for XFA, but apparently they didn’t see it as being comprehensive enough. Fortunately, LiveCycle Designer can produce both, so there’s at least one way to output PDF/UA-compliant XFA-based docs.

  • By John Brandt - 3:00 PM on July 3, 2013  

    How much of this, if any, does the Accessibility Checker in Adobe Acrobat Pro XI assess? When can we expect Adobe to come out with a tool to assess and fix files so we can make them PDF/UA compatible?

  • By mattmay - 3:04 PM on July 9, 2013  

    John: We can’t do product announcements for the Acrobat team, no matter how much we might like to. But I hope to have more info to go with the talk I’m giving at the PDF Association Technical Conference next month: http://www.pdfa.org/event/technical-conference-north-america-2013/

  • By Priti Rohra - 2:03 AM on July 22, 2013  

    Many Thanks for this insightful post Matt!

    Are there any sample PDF/UA files available to help understand the standard tags appropriately?

  • By mattmay - 7:45 PM on July 23, 2013  

    Priti: Not sure we have much to add there. ISO-32000 defines the tags, and what we’ve got on PDF tagging, including our Acrobat accessibility guides at http://www.adobe.com/accessibility/products/acrobat/training.html , is our how-to on tagging. PDF/UA itself doesn’t define anything about tagging semantics–it says to follow the semantics that PDF itself defines.

    If we provided a test file, that could actually cloud the issue, because what we’d be testing is the content itself, not the tags. Since the source content could be literally any kind of communication, we can’t say, for all content x, tag it this way. We can only point to the tags that exist to help authors establish those semantics that we offer.

  • By Duff Johnson - 11:14 AM on August 9, 2013  

    Readers of this blog may be interested to know that the PDF Association’s Matterhorn Protocol addresses many questions relating to validating conformance with PDF/UA. More information:

    http://www.pdfa.org/matterhorn-protocol