Author Archive: psorotok

EpubCheck

People who write web browsers know how insanely complex a good browser has to be. The problem is that a lot of design decisions for the web were done in a very lax, ad hoc manner. There are standards, but a lot of content does not obey them. When users encounter the content which does not work in their browser, they tend to blame the browser. This leads to more and more of the “black magic” in browser engines and the web content which more and more relies on very subtle (and non-standard) features in the browsers. And any subtlety that some web content uses has to stay in the browser engines effectively forever. This hurts content portability and usability.

To avoid the same sort of problems with eBook content, it is important to make sure that the content does satisfy the standard. This is especially important to make sure that eBooks work well on mobile devices where adding special-casing to “fix” broken content might be simply too expensive. The best way to make sure that the content is standard is to develop a tool that can validate epub files.

The need for such tool was for a long time recognized by IDPF members. As we were developing epub standard, we did the work on validating epub files. Now this tool (named EpubCheck) is available as an Open Source project. It is not complete (there are still many checks that we can do), but it is already fairly mature and extremely handy. If you author epub files, you should consider running this tool on your content regularly. Standard content is much less likely to have problems in today and future eBook readers and any problems with fully-compliant eBooks are much more likely to get serious attention of the developers. If you are a developer, I would like to invite you both to use EpubChecker code in your development (it is licensed under BSD terms) and to contribute back to the project.

Page template (XPGT) file

Bob Russell asked:

What about the .xpgt file? It is possible to deduce most of it, but more complete detail would be good. (like: reducing the top margin on the first page, and interaction of column min-widths and font-size…)

And how will its operation fit with future implementation of the OPS extra CSS properties: display:oeb-page-head; display:oeb-page-foot; and oeb-column-number:[integer]; ?

Better late than never – here are some details.

Publishing DocBook content for Digital Editions

This week I was looking at the DocBook mark-up and experimented with convertion of DocBook content into ePub. There seems to be a very good fit there. I was able to create a XSLT stylesheet to transform a couple of O’Reilly-published books into ePub. Here is a sample: Greg Kroah-Hartman‘s “Linux Kernel in a Nutshell” book as ePub (IE sniffes it as zip, but it is really an epub file). Unfortunately, I only can post it with freely distributable clip art, no embedded fonts and with low-quality gif images for the illustrations in the book, so it does not look quite as good as the original (but with the right resources it is possible to make it look as good).

The XSLT stylesheet, a simple bash script that drives it and the free art that I have used can be downloaded here. If you want to run it yourself, this is what you need:

  • Find some UNIX-like environment, in particular bash, xsltproc and zip commands. I have used Cygwin on Windows XP.
  • Unzip downloaded file into some folder.
  • Copy source DocBook XML (available at the book’s web site) into that folder; main book file should be named book.xml.
  • Copy images into epub/OEBPS/images
  • Run ./epub.sh
  • If everything goes right, ePub file will be written into book.epub

For this particular book you should also add author’s name and book identifier into the book source (or it will be missing from the metadata and title page). Insert the following into the book.xml after the title tag:

<bookinfo>
<isbn role=”13″>9780596510480</isbn>
<author><firstname>Greg</firstname><surname>Kroah-Hartman</surname></author>
</bookinfo>

I have only tried XSLT stylesheet with one other DocBook, which is certainly not enough testing; thus, I don’t think it will work with arbitrary DocBook content. If someone wants to take it from there, it would be fantastic because I won’t have time to polish it. The right thing to do is, of course, to integrate this capability into the existing XSLT framework for DocBook which is used to publish DocBook content into XSL:FO and PDF.

And, thank you, Greg for writing this book and for making the DocBook sources available!

The point of Digital Editions

Bob Russel writes:

I’d like to get a better feel for how you see Digital Editions from within the company. Not the polished formal or comprehensive and official description, but in your minds what’s the main point of it, what’s the intent, and where would you like to see it progress in the future? What sort of (realistic) hopes do you have for it’s adoption, and how can that be encouraged along?

This is a bit loaded question to ask a tech guy like me, of course. Success depends on much larger set of things than just technology alone and I’d rather concentrate on the technology.

To me the point of Digital Editions is quite clear. There are these things called books and they are not going away, even if they have to morph to fit into the digital world. Reading books today on paper is still much nicer than on an electronic device, but it does not have to be that way. We need these things to happen:

  • we have to agree on a eBook format which is open, easy to author, adapts to the reading environment form factor, but still rich enough to look good;
  • we need software which renders eBooks, so that they are pleasant to read and easy to manage; it also needs to provide new functionality that paper cannot do well (e.g., links, search and annotations – or embedded interactive content);
  • we need handheld devices which are small, easy to read from and have long battery life;
  • we need good authoring tools so that it is easy to create eBooks
  • we need publishers to treat eBooks as first-class citizens.

I think that we are getting there. PDF would have been a perfect eBook format if it could be reflowed to a small screen without quality loss, but at least it is easy to publish an existing content in it. So we had to try again. ePub, I think, as it is now, is quite good (and I think it will evolve – we need to add MathML and perhaps some extra layout and typographical features). We are, as you know, working on the software ;-). We see devices coming (e.g. Sony Reader). The hardest part is to convince publishers that eBooks are the future, but I think we can do it, maybe slowly and case-by-case, but things are moving there as well.

Bidirectional text and MathML

Ahmed Hindawi pointed out that Digital Editions does not support bidirectional text for ePub, so it is not possible to display documents that use right-to-left writing. Certainly it is a missing feature: IDPF standard for ePub mandates that support. As we internationalize the application, we certainly plan to add that as well as other features that are necessary to display Arabic (ligatures and glyph shaping). I should just point out that the fact that Flash 10 will support that does not automaticaly means that Digital Editions will get it for free, as ePub rendering engine in Digital Editions uses different pipeline for text layout.

Another point that Ahmed made is that there is no support for MathML. In the case of MathML the situation is a bit different from bidirectional text: MathML is not something that IDPF standard mandates for ePub. It is only possible to include MathML “islands” in XHTML if a fallback image is included as well and an ePub viewer is free to display that fallback image instead of rendering MathML. So when including a formula in an ePub document one has to supply either a bitmap (PNG or GIF) or vector graphics (SVG) that reperesents it. Bitmaps for formulas are used on the web quite a bit (e.g. some Wikipedia articles), but their major drawbacks are that they look bad when printed (or font size changes), they don’t reflow when width becomes small and they are unaccessible for blind. Vector graphics is basically how formulas are represesnted in PDF. This does not fully solve the accessibility and reflow problems, but they certainly can be made look as good as MathML. I think that for the time being including vector graphics image for a formula is the best bet, maybe along with a MathML island.

Producing ePub Documents from InDesign

Today is the last day of Piotr Kula, who worked with us as a summer intern. While he himself has to go back to study at Berkeley, we still have a lot of documentation that he wrote while working here. I am going to edit and publish some of these documents on this blog and here is the first installment on creating ePub documents with InDesign.

Welcome

Hello – my name is Peter Sorotokin and I along with other engineers on the Adobe Digital Editions team are going to use this blog to talk about technical details, tips and tricks which are relevant to the electronic books. There are a lot of things to cover, so please use this entry to ask questions – this way I know what people are interested to know.