Unicode has become the preferred way in which to represent text in digital form, and for good reason. Its broad coverage of our planet’s scripts and languages is the single greatest reason why this has happened. All of the major OSes have embraced Unicode. In other words, if you develop a product that makes use of text data, and if it doesn’t support Unicode, you’re doing something wrong.
Unicode comes in a variety of representations called encoding forms. The three most basic Unicode encoding forms are UTF-8, UTF-16, and UTF-32. The latter two are also available in explicit little- or big-endian flavors: UTF-16LE, UTF-16BE, UTF-32LE, and UTF-32BE. These are covered in Chapter 4 of CJK Information Processing, Second Edition. But, there are times when a bomb-proof way of representing Unicode characters is needed, or when an otherwise ASCII-only web document requires the occasional Unicode characters. For these purposes, and in the context of web documents, Numeric Character References (aka, NCRs) have great advantages. One of the advantages is its human-readability in terms of conveying an explicit Unicode code point. Another advantage is that only ASCII characters are used for this notation, which is its bomb-proof aspect.
When hinting name- or CID-keyed fonts, appropriate hinting parameters are required. One of these parameters are alignment zones whose purpose is to snap shapes to pixel boundaries. Alignment zones are specified in the required /BlueValues array, and also in the optional /OtherBlues array.
The required /BlueValues array is specified in the /Private dictionary of name-keyed fonts, and in the /Private dictionary of each FDArray element of CID-keyed fonts. The purpose of this array is to specify alignment zones that are at the baseline or above, such as for the baseline, x-height, and cap-height. The optional /OtherBlues array is used to specify alignment zones that are below the baseline, such as for the descender. This article will demonstrate how the AFDKO stemHist tool can be used to determine appropriate alignment zone values.
I am extraordinarily pleased that the upcoming ATypI (Association Typographique Internationale) conference will be held in Hong Kong: ATypI Hong Kong 2012. The dates are October 10th through the 14th, 2012, and the theme is between black and white (墨 in Chinese). For font developers who relish at the thought of discussing font-related issues and ideas with others in the same industry, the annual ATypI conference represents a unique opportunity. And, given its venue for this year’s iteration, a larger-than-usual number of CJK font developers are likely to attend, and the number of CJK-related presentations and workshops should be greater than usual.
In any case, I am planning to attend and present at this conference, and very much look forward to meeting other CJK font developers there.
The photo below, which was recently taken by my long-term Adobe colleague Dirk Meyer in Beijing, serves as a not-so-gentle reminder that intersecting outlines can result in very obvious printing errors:
The photo depicts the two ideographs 出口, which represent the word meaning exit. The glyphs are obviously designed through the use of components whose outlines necessarily intersect, and under some circumstances—including the circumstance that led to the printing of this signage—can result in a negative or reverse fill.
Thanks to an excellent suggestion from Taichi Kawabata (川幡太一), the 2012-03-02 version of the IVD (Ideographic Variation Database) includes three IVD Code Charts, which were released today. The two earlier versions of the IVD—2007-12-14 and 2010-11-14—included only one IVD Code Chart, named IVD_Charts.pdf.
I spent the better part of last weekend revising Adobe Tech Note #5099, which was originally published in 1998 (14 years ago). If memory serves, I wrote the bulk of the original version of this document while on vacation in Indonesia. It seemed like the thing to do at the time.
[I'd like to preface this article by stating that it was written and contributed by our esteemed colleague, Taro Yamamoto (山本太郎), who manages our Japanese typeface design efforts in our Tokyo office. — KL]
We were very pleased to hear the news that Morisawa announced the Morisawa Type Design Competition 2012 to be held this year. This triennial competition was held from 1984 to 2002, and this announcement means that they have reintroduced it. The type design categories for entries are Kanji and Latin.
When using AFDKO to develop CID-keyed OpenType/CFF fonts, the most important CMap resources are the UTF-32 ones, for the following reasons:
- Unicode has become the de facto character encoding for today’s OSes and applications.
- When the font includes mappings outside the BMP (Basic Multilingual Plane), the Format 12 (UTF-32) ‘cmap‘ subtable is included. When a font includes only BMP mappings, the AFDKO makeotf tool is smart enough not to create a Format 12 ‘cmap’ subtable, and instead creates only a Format 4 (BMP-only UTF-16) one.
- UTF-32 is arguably the most human-readable of the Unicode Encoding Forms, because its big-endian hexadecimal representation is simply the Unicode Scalar Value without the “U+” prefix and zero-padded to eight digits.
The AFDKO makeotf tool is used to build a fully-functional font, and a UTF-32 CMap resource is specified as the argument of its “-ch” command-line option.
Unicode Version 6.1 includes a total of 1,002 CJK Compatibility Ideographs. The February 22, 2012 CJK Type Blog article includes a table that provides the details in terms of when they were added to Unicode, version-wise.
Of the 1,002 CJK Compatibility Ideographs that are in Unicode, 89 have Japanese sources. The Japanese sources are JIS X 0213:2004, Jinmei-yō Kanji (人名用漢字), IBM, and ARIB STD-B24. In addition, some of them have multiple Japanese sources, and while most of them are intended to use the same glyph regardless of the source, a very small number of them—three to be precise—do not.
As the IVD Registrar, I am very pleased to announce that a new version of the IVD (Ideographic Variation Database) was released on March 2nd, 2012. It incorporates the results of PRI 183 and PRI 187.
I am pleased to announce that Adobe once again has the privilege and honor of being a Gold Sponsor of the Internationalization & Unicode Conference, the 36th iteration of which will take place in October of this year.
For those who have had the opportunity to attend this conference in the past, I am preaching to the choir when I state that much of the benefit of attending is not from listening to the scheduled sessions—though they have incredible value—but rather that there is an opportunity to have face-to-face discussions with others in the industry.
If you plan to attend IUC36, I hope to see you there!
On this 29th day of February in the year 2012, which is a leap year, I decided that it would be a good idea to whip up a tool (written in Perl, of course) that enumerates the FDArray elements in a CID-keyed font, by name and index, and to list the CIDs that are assigned to it. Also reporting the ROS is useful. This tool, called fdarray-check.pl, makes use of the AFDKO tx tool, and massages its output into a form that is much more human-readable, and which can be repurposed.
As I detailed in the February 24, 2012 CJK Type Blog post, the “first one wins” principle is useful when employing the AFDKO mergeFonts tool for replacing one or more glyphs in CIDFont resources. This comes at the expense of changing the FDArray indexes, at least for the example that was used. I noted that this is not a particularly important issue, but I felt that some clarification was necessary, thus the topic of today’s CJK Type Blog post.
What matters is the assignment of CIDs to FDArray elements (by name) and their associated hinting parameters and other attributes, and these are unchanged even if the FDArray index changes. When the “first one wins” principle is invoked, it means that two or more CIDFont resources include a glyph for the same CID, and the one that is used in the resulting CIDFont resource is the one that is first encountered, as specified by the order of the merge fonts on the command line. However, there are very useful command-line options that allow one to exclude (or include) CIDs so that the FDArray indexes can be preserved, if that is important to you.
The process of building a CIDFont resource, which serves as the source file for the ‘CFF‘ table of a CID-keyed OpenType/CFF font, usually entails “rolling up” or combining two or more name-keyed fonts into a larger CID-keyed one. Depending on which tools are used to build the CIDFont resource, fixing glyphs can become a cumbersome or time-consuming task. First, you need to map the CID to a glyph name in the name-keyed source fonts, and if you are fixing multiple glyphs, you may need to modify more than one name-keyed source font. Many font developers are not aware that some AFDKO tools, such as tx, mergeFonts, sfntedit, and autohint, can simplify this process, if used appropriately.
Unicode Version 6.1 was released on 01/31/2012, and now includes 74,617 CJK Unified Ideographs, along with 1,002 CJK Compatibility Ideographs. 732 characters were added, and there are now a staggering 110,116 characters in the standard.
Speaking of staggering, as Unicode grows, it becomes more important to keep track of what character is encoded where, and sometimes it is useful to know when a character was encoded. For this purpose, the DerivedAge.txt datafile is an incredibly useful resource.
In terms of CJK Unified Ideographs and CJK Compatibility Ideographs, I spent part of the morning assembling a single-page PDF file that encapsulates many important details of their history. I hope that readers of this blog find it to be useful.
We have made AFDKO (Adobe Font Development Kit for OpenType) available for Mac OS X and Windows, but we also realize that these are not the only OSes that font developers prefer to use. In an effort to make AFDKO more appealing to more font developers, and more broadly available, we’d like to gauge the interest in supporting additional OSes, particularly Linux and other Unix-like OSes (mainly because AFDKO tools are, for the most part, batch- and command-line driven).
…Adobe were to host a font-development workshop in Japan, with a focus on leveraging specific AFDKO tools to simplify the effort needed to develop OpenType Japanese fonts? Tools, such as tx, mergeFonts, rotateFont, autohint, and stemHist, immediately come to mind. While there are currently no concrete plans in place, if there were to be sufficient demand for such an event, along with suggestions for specific topics to be covered, a tentative agenda could be produced.
Until such an event is scheduled and actually takes place, Adobe Tech Note #5900 (AFDKO Version 2.0 Tutorial: mergeFonts, rotateFont & autohint), which includes a Japanese translation, should prove to be useful. This document is included in AFDKO as part of its documentation, but its link is provided above for convenience.
I encourage anyone with an interest in attending such a workshop, to be held in Japan, to post comments that include suggestions for topics to be covered.
One of the benefits of OpenType/CFF, whether you’re building name- or CID-keyed fonts, is that the ‘CFF‘ table can be subroutinized. And, the AFDKO makeotf tool can be used to apply subroutinization when building OpenType/CFF fonts. The tx tool, by using its “+S” option, can do so as well.
For the longest time I have felt that the names used for many of our CMap resources deserve some amount of explanation. I see these names written in books from time to time, and it usually gives me a chuckle, mainly because I am the one responsible for coining many of them. This post is an opportunity for me to provide (some) definitive answers, along with some history. Of course, if this post raises more questions, please submit a comment, and I will make an honest effort to provide a timely answer.
In general, and with few exceptions, a CMap resource name is composed of a character set name, and encoding name, and a writing direction. For the most part, it is the character set names that deserve some explanation, because the encoding and writing direction names are fairly straight-forward. Also, whenever I mention a CMap resource name, it almost always has a corresponding vertical CMap resource.
Unicode Version 6.1 was released today (January 31, 2012). This release triggered an update to the Unicode CMap resources for Adobe-Japan1-6 and Adobe-Korea1-2. The updated CMap resources are now available at the CMap Resources open source project that is hosted at Open @ Adobe. Details have been posted.
Given that Unicode has become the de facto encoding for digital text for modern environments, I encourage readers of this blog to explore for themselves what is new in Unicode Version 6.1.