Posts in Category "Essay"

Towards Breaking The 64K Glyph Barrier…

In the April 20, 2012 CJK Type Blog article, I wrote about the publishing of ISO/IEC 14496-28:2012 (Composite Font Representation), which provides a venue for breaking the 64K glyph barrier that is inherent in all sfnt-based font formats, including name- and CID-keyed PostScript fonts. If the number of glyphs of the combined component fonts that are referenced by a CFR object exceed 64K, would constitute breaking the 64K glyph barrier.
Continue reading…

Making “Character Codes” Look Better

In my work, I need to deal with character codes on a regular basis, such as Unicode scalar values and hexadecimal values for legacy encodings. This includes writing documents that include them. For most purposes, especially when used in tables, tabular figures work best because they are monospaced. Of course, one could simply choose to use a monospaced font. But, unless a different font is actually desired for character codes, using the same typeface design is usually preferred, because it better matches the surrounding text. The issue is that very few, if any, fonts include tabular glyphs that support hexadecimal notation, specifically referring to ‘A’ through ‘F’ (or ‘a’ through ‘f’ for lowercase). Luckily, I was able to solve this particular dilemma.
Continue reading…

Never Say Never

In the realm of CJK Unified Ideographs, there is always talk about no more characters to encode, or that any new characters are simply unifiable variants. This is, in large part, merely wishful thinking.

In my experience, there are three important words to embrace: Never Say Never.
Continue reading…

The AFDKO ‘tx’ Tool

Among the many excellent and powerful tools included in AFDKO (Adobe Font Development Kit for OpenType) is one with a two-letter name: tx. Although it has the shortest name, it is arguably one of the most powerful AFDKO tools.

The tx tool is best thought of as a multi-purpose font-file–manipulation tool. For those who don’t leverage this tool in the font development activities, I strongly encourage you to explore its capabilities, which is best done by perusing its built-in help and through experimentation.
Continue reading…

The All-Important Macron

When transliterating Japanese text using Latin characters, there are three systems or methods for doing so. Of these, the Hepburn system (ヘボン式 hebon shiki) is the most commonly used one, and differs in one important way: long vowels are represented with a macron (U+00AF MACRON or U+0304 COMBINING MACRON) diacritic. Almost all signage in Japan that includes transliterated text, such as in train and subway stations, uses the Hepburn system. However, if we look back to the 1990s and earlier, it was not common to include glyphs for macroned vowels in fonts, whether they were for Latin or Japanese use.

The two other systems, the Kunrei system (訓令式 kunrei shiki) and the Nippon system (日本式 nippon shiki), represent long vowels with a circumflex (U+005E CIRCUMFLEX ACCENT or U+0302 COMBINING CIRCUMFLEX ACCENT) diacritic. It was common for Latin fonts to include glyphs for circumflexed vowels, meaning U+00C2/U+00E2 (Ââ), U+00CA/U+00EA (Êê), U+00CE/U+00EE (Îî), U+00D4/U+00F4 (Ôô), and U+00DB/U+00FB (Ûû), by virtue of being included in ISO/IEC 8859-1 (aka Latin 1). However, due to limitations of Shift-JIS encoding, even Japanese fonts did not include glyphs for these characters.
Continue reading…

ISO/IEC 14496-28:2012 Published

Born from the conclusion that OpenType’s 64K glyph barrier cannot be broken in the context of the format itself, ISO/IEC 14496-28:2012 (Composite Font Representation) was developed, and was subsequently published three days ago, on April 17, 2012, as a new ISO standard. As described in the January 26, 2012 CJK Type Blog article, CID-keyed fonts can include a maximum of 65,535 glyphs (CIDs 0 through 65534). Considering that Unicode Version 6.1 includes over 100K characters, with approximately 75K of which being CJK Unified Ideographs, it becomes immediately apparent that a single font resource cannot support all of Unicode, let alone all of the characters for a single script (referring to CJK Unified Ideographs).
Continue reading…

Adobe-Japan1-6 Radical/Stroke Database

I spent approximately two weeks in August of 2004 developing a radical/stroke database for the 14,664 kanji in Adobe-Japan1-6 (CIDs 656, 1125–7477, 7633–7886, 7961–8004, 8266, 8267, 8284, 8285, 8359–8717, 13320–15443, 16779–20316, and 21071–23057), which is available as a tab-delimited text file that is keyed by Adobe-Japan1-6 CIDs, and as a PDF file that is keyed by indexing radical, then by the number of strokes of the indexing radical instance, followed by the number of remaining strokes, and finally by Adobe-Japan1-6 CID.
Continue reading…


When working with OpenType/CFF fonts, particularly those that are CID-keyed, CIDs (Character IDs) and GIDs (Glyph IDs) are often referenced as ways to uniquely identify glyphs in a font resource. But, how are CIDs and GIDs different, and perhaps more importantly, under what circumstances are they different, or the same? These are good questions, and the answers can be found in today’s article.
Continue reading…

Advantages of Numeric Character References

Unicode has become the preferred way in which to represent text in digital form, and for good reason. Its broad coverage of our planet’s scripts and languages is the single greatest reason why this has happened. All of the major OSes have embraced Unicode. In other words, if you develop a product that makes use of text data, and if it doesn’t support Unicode, you’re doing something wrong.

Unicode comes in a variety of representations called encoding forms. The three most basic Unicode encoding forms are UTF-8, UTF-16, and UTF-32. The latter two are also available in explicit little- or big-endian flavors: UTF-16LE, UTF-16BE, UTF-32LE, and UTF-32BE. These are covered in Chapter 4 of CJK Information Processing, Second Edition. But, there are times when a bomb-proof way of representing Unicode characters is needed, or when an otherwise ASCII-only web document requires the occasional Unicode characters. For these purposes, and in the context of web documents, Numeric Character References (aka, NCRs) have great advantages. One of the advantages is its human-readability in terms of conveying an explicit Unicode code point. Another advantage is that only ASCII characters are used for this notation, which is its bomb-proof aspect.
Continue reading…

ATypI Hong Kong 2012

I am extraordinarily pleased that the upcoming ATypI (Association Typographique Internationale) conference will be held in Hong Kong: ATypI Hong Kong 2012. The dates are October 10th through the 14th, 2012, and the theme is between black and white (墨 in Chinese). For font developers who relish at the thought of discussing font-related issues and ideas with others in the same industry, the annual ATypI conference represents a unique opportunity. And, given its venue for this year’s iteration, a larger-than-usual number of CJK font developers are likely to attend, and the number of CJK-related presentations and workshops should be greater than usual.

In any case, I am planning to attend and present at this conference, and very much look forward to meeting other CJK font developers there.

Not One, But Three, IVD Code Charts

Thanks to an excellent suggestion from Taichi Kawabata (川幡太一), the 2012-03-02 version of the IVD (Ideographic Variation Database) includes three IVD Code Charts, which were released today. The two earlier versions of the IVD—2007-12-14 and 2010-11-14—included only one IVD Code Chart, named IVD_Charts.pdf.
Continue reading…

Morisawa Type Design Competition 2012

[I’d like to preface this article by stating that it was written and contributed by our esteemed colleague, Taro Yamamoto (山本太郎), who manages our Japanese typeface design efforts in our Tokyo office. — KL]

We were very pleased to hear the news that Morisawa announced the Morisawa Type Design Competition 2012 to be held this year. This triennial competition was held from 1984 to 2002, and this announcement means that they have reintroduced it. The type design categories for entries are Kanji and Latin.
Continue reading…

CJK Compatibility Ideographs

Unicode Version 6.1 includes a total of 1,002 CJK Compatibility Ideographs. The February 22, 2012 CJK Type Blog article includes a table that provides the details in terms of when they were added to Unicode, version-wise.

Of the 1,002 CJK Compatibility Ideographs that are in Unicode, 89 have Japanese sources. The Japanese sources are JIS X 0213:2004, Jinmei-yō Kanji (人名用漢字), IBM, and ARIB STD-B24. In addition, some of them have multiple Japanese sources, and while most of them are intended to use the same glyph regardless of the source, a very small number of them—three to be precise—do not.
Continue reading…


I am pleased to announce that Adobe once again has the privilege and honor of being a Gold Sponsor of the Internationalization & Unicode Conference, the 36th iteration of which will take place in October of this year.

For those who have had the opportunity to attend this conference in the past, I am preaching to the choir when I state that much of the benefit of attending is not from listening to the scheduled sessions—though they have incredible value—but rather that there is an opportunity to have face-to-face discussions with others in the industry.

If you plan to attend IUC36, I hope to see you there!

CJK Unified/Compatibility Ideographs in Unicode Version 6.1

Unicode Version 6.1 was released on 01/31/2012, and now includes 74,617 CJK Unified Ideographs, along with 1,002 CJK Compatibility Ideographs. 732 characters were added, and there are now a staggering 110,116 characters in the standard.

Speaking of staggering, as Unicode grows, it becomes more important to keep track of what character is encoded where, and sometimes it is useful to know when a character was encoded. For this purpose, the DerivedAge.txt datafile is an incredibly useful resource.

In terms of CJK Unified Ideographs and CJK Compatibility Ideographs, I spent part of the morning assembling a single-page PDF file that encapsulates many important details of their history. I hope that readers of this blog find it to be useful.

CMap Resource Names Explained

For the longest time I have felt that the names used for many of our CMap resources deserve some amount of explanation. I see these names written in books from time to time, and it usually gives me a chuckle, mainly because I am the one responsible for coining many of them. This post is an opportunity for me to provide (some) definitive answers, along with some history. Of course, if this post raises more questions, please submit a comment, and I will make an honest effort to provide a timely answer.

In general, and with few exceptions, a CMap resource name is composed of a character set name, and encoding name, and a writing direction. For the most part, it is the character set names that deserve some explanation, because the encoding and writing direction names are fairly straight-forward. Also, whenever I mention a CMap resource name, it almost always has a corresponding vertical CMap resource.
Continue reading…

Excruciating details about the Adobe Tech Note #5079 update

I spent the early part of this week updating Adobe Tech Note #5079 (The Adobe-GB1-5 Character Collection). The number of glyphs remained the same (30,284), as did the glyphs themselves. So, why the update? Well, mainly to bring it in line, format-wise, with the other three related Adobe Tech Notes: #5078 (The Adobe-Japan1-6 Character Collection), #5080 (The Adobe-CNS1-6 Character Collection), and #5093 (The Adobe-Korea1-2 Character Collection). The biggest effort was to create its 61-page glyph table. Besides announcing the update, building the glyph table is the substance of this blog post.
Continue reading…

Adobe-Japan1-6 Turns 20 Years Old

The Adobe-Japan1-6 Character Collection, which has become the de facto glyph set for today’s mainstream OpenType Japanese fonts, celebrates its 20th anniversary this year. This glyph set began its life in 1992, as Adobe-Japan1-0 (Supplement 0). Given that I have been at Adobe longer than 20 years, and was involved in the development of this glyph set, I will use this opportunity to detail some of its history, at least as seen through my eyes.
Continue reading…

Genuine Han Unification

I have been attending the Internationalization & Unicode Conference (aka, IUC) every year for the past several years, and I typically deliver a presentation (or two) during the two-day conference proper. I was given the opportunity to present about an intriguing and forward-looking topic at IUC35 last October that I entitled Genuine Han Unification (click on the title to view the presentation slides).
Continue reading…



Continue reading…