Publishing DocBook content for Digital Editions
This week I was looking at the DocBook mark-up and experimented with convertion of DocBook content into ePub. There seems to be a very good fit there. I was able to create a XSLT stylesheet to transform a couple of O'Reilly-published books into ePub. Here is a sample: Greg Kroah-Hartman's "Linux Kernel in a Nutshell" book as ePub (IE sniffes it as zip, but it is really an epub file). Unfortunately, I only can post it with freely distributable clip art, no embedded fonts and with low-quality gif images for the illustrations in the book, so it does not look quite as good as the original (but with the right resources it is possible to make it look as good).
The XSLT stylesheet, a simple bash script that drives it and the free art that I have used can be downloaded here. If you want to run it yourself, this is what you need:
- Find some UNIX-like environment, in particular bash, xsltproc and zip commands. I have used Cygwin on Windows XP.
- Unzip downloaded file into some folder.
- Copy source DocBook XML (available at the book's web site) into that folder; main book file should be named book.xml.
- Copy images into epub/OEBPS/images
- Run ./epub.sh
- If everything goes right, ePub file will be written into book.epub
For this particular book you should also add author's name and book identifier into the book source (or it will be missing from the metadata and title page). Insert the following into the book.xml after the title tag:
<bookinfo>
<isbn role="13">9780596510480</isbn>
<author><firstname>Greg</firstname><surname>Kroah-Hartman</surname></author>
</bookinfo>
I have only tried XSLT stylesheet with one other DocBook, which is certainly not enough testing; thus, I don't think it will work with arbitrary DocBook content. If someone wants to take it from there, it would be fantastic because I won't have time to polish it. The right thing to do is, of course, to integrate this capability into the existing XSLT framework for DocBook which is used to publish DocBook content into XSL:FO and PDF.
And, thank you, Greg for writing this book and for making the DocBook sources available!
Comments
I have a question about font embedding. I understand if I want to embed them, they have to be OpenType (True Type and Type 1 are not officially supported, right?). However, embedding the fonts amounts to providing a copy of the font files within the zip archive file. Anyone would be able to unzip the file and get a copy of the OpenType files. What font foundries would allow such a plain and simple embedding of fonts? Does Adobe allow such embedding in non-protected epub files?
I read somewhere that InDesign CS3 embeds the fonts in a way that makes them usable only with the current epub file. Is this true? If so, how does it do that? Is there a way or a tool to allow partial font embedding (removing all the glyphs that are not used in the epub file from the font files)?
These questions would not relevant if the font foundries would allow the publishers of epub files to include the font files as is, but I doubt they would be willing to allow that.
Posted by: Ahmed Hindawi | August 26, 2007 03:51 AM
That looks good.
But I suppose some non-conformancy points ought to be noted:
* content.opf / manifest is missing titlepage.css
* toc.ncx / head / dtb:uid content is missing
* toc.ncx / head / dtb:depth content is negative (should be '5')
* toc.ncx / all playOrder attributes are zero (should be a sequence starting at 1)
* style.css and titlepage.css are invalid (there is no 'adobe-page-master' property)
* coverpage.xhtml / style / import document is not CSS
(It is too long since I used XSLT... 1,2, and 4 seem straightforward changes, but 3 (counting depth of tree) seems awkward...)
Posted by: Harrison Ainsworth | August 26, 2007 06:06 AM
OK, I was really excited to see an "epub" file created from something other than Adobe® InDesign® that will work in Adobe® DigitalEditions®, and, given the O's are already down by 4 despite Bedard starting, took time to look, but something's odd about that file:
Um, where did this come from in the sample DocBook file?
<item id="template" href="template.xpgt" media-type="application/vnd.adobe-page-template+xml"/ >
And how did this file
<ade:template xmlns="http://www.w3.org/1999/xhtml" xmlns:ade="http://ns.adobe.com/2006/ade"
xmlns:fo="http://www.w3.org/1999/XSL/Format" >
Get into the epub document?
There are already a couple scripts written that'll generate .epub from valid XML, but as I've asked repeatedly, is .xpgt (not required in the OPF standard), required for use in Adobe® DigitalEditions®? I've yet to see a file that lacks the .xpgt actually working in DE?
Posted by: David Moynihan | August 26, 2007 12:06 PM
Ahmed,
Your observations are very good. I will add font embedding issues to my list of topics for future posts.
Harrison,
These are valid points, although I am not sure if OPS/OPF specs prohibits unknown properties or non-CSS stylesheets - I don't think this is the case.
David,
template.xpgt is a stylesheet and it came (along with style.css) from the conversion tool. DocBook does not contain any styling information.
xpgt file is certainly not required for an epub to work in Digital Editions. This file is already on my list of things to cover here.
Posted by: Peter Sorotokin | August 27, 2007 12:41 AM
OPS allows non-CSS stylesheets linked from the head of the HTML, but it also says any CSS used must be conformant. So if it is refactored so the xpgt is linked by the HTML, rather than imported by the CSS, I think that is then cool. And other reader apps just ignore (correctly) any unrecognized stylesheet links, which is good.
(I wrote an epub construction guide recently, so I had to check the details of the specs.)
Posted by: Harrison Ainsworth | August 27, 2007 11:46 AM
Harrison,
Where exactly in the spec is the conformance requirements for CSS that you mention? My interpretation of the spec is that unknown CSS properties and constructs should just be ignored by Reading systems that do not understand them, but they do not make the document invalid. CSS contains an elaborate mechanism to ignore unrecognized content built exactly for this purpose. Maybe I missed something, though.
Posted by: Peter Sorotokin | August 27, 2007 01:16 PM
What about the requirement of mimetype file being stored as is?
Posted by: MishaS | August 28, 2007 10:18 AM
Misha,
I am not sure what your question is. The script that I have seems to do that part correctly:
$ dd if=LinuxKernel.epub bs=1 skip=38 count=20 2>/dev/null
application/epub+zip
Posted by: Peter Sorotokin | August 28, 2007 10:36 AM
Sorry for not being specific: your script does not seem to do anything to ensure that the file is not compressed. That's it. I'm not sure how zip works: it might compress the data depending on the surrounding happenings :)
I'll check it out and return back if anything is wrong.
Posted by: MishaS | August 28, 2007 12:29 PM
Ah, I understand now. Here is the secret: zip command (per its man page) will always do what's best in terms of storing vs. compression. Since it determined that storing is better in this case, it'll alawys store it. Some other zip utilities have this property as well (e.g. Windows built-in "compressed folder" handler), while others do not.
Posted by: Peter Sorotokin | August 28, 2007 01:22 PM
Wellllll... I infer: extensions to the XHTML subset are explicitly regulated, but there is no mention of CSS extensions.
CSS itself seems strict: http://www.w3.org/TR/CSS21/conform.html section 'Valid style sheet' says unknown properties (etc.) are invalid.
And with the old default ethic of being conservative in what you provide (and tolerant in what you require), the summation/simplification seemed to be: write valid CSS2.1 OPS subset.
Posted by: Harrison Ainsworth | August 29, 2007 07:38 AM
Harrison,
CSS2.1 validity criteria are developed specifically for pure CSS styling using box model, so it is not a very useful yardstick for OPS applications. For instance, using any OPS-specific values for the display property would make it just as invalid as using an Adobe-specific property. Even using any SVG-specific properties would make it invalid in CSS2.1 sense (even though SVG was developed by W3C).
I see your point, though. It is certainly possible to move all Adobe-specific styling into xpgt file.
Posted by: Peter Sorotokin | August 29, 2007 08:00 AM
Hello! Good Site! Thanks you! ptjkllomemg
Posted by: vuchrfvdvr | October 1, 2007 10:46 AM
Since I don't see any way to post a question, other than as a comment to a blog entry, I'm posting here.
The following style will create a dropcap that is two lines high, pushing both lines to the right, as it should. This displays ok in Firefox 2 and IE 6, with negligible difference. In Digital Editions, this only displays like a large letter on the first line. No dropcap effect. I arrived at these values by experimenting until I got a reasonable result in both Firefox and IE.
span.dropcap
{
float: left;
font-size: 3em;
line-height: 0.7em;
margin-top: 0em;
margin-bottom: 0;
}
Some other CSS that doesn't seem to work in Digital Editions are the following:
text-align: justify;
margin-left: auto;
margin-right: auto;
As I mentioned on the TeleRead blog, a list of exactly which XHTML and CSS statements/properties are supported in Digital Editions would be very helpful.
Posted by: Joseph Gray | October 26, 2007 10:54 PM
I'll look at the drop cap code. It's most likely a bug in the Digital Editions. All of these properties are supported and should work.
Support for justification is optional per W3C CSS spec and support for margin value of auto is optional per IDPF OPS spec. We are planning to implement them at some point, but you can never rely on a reading system to support those.
Posted by: Peter Sorotokin | October 27, 2007 11:31 AM
Great thing, but the XSLT ignore tag. Has anyone idea how to fix it?
Posted by: Tomas Ulej | December 22, 2007 11:40 AM