Posts in Category "Essay"

From The Archives: JIS2004 CMap Resource History

I was recently asked, indirectly via Twitter, about changes and additions that were made to our JIS2004-savvy CMap resources, specifically UniJIS2004-UTF32-H and UniJISX02132004-UTF32-H. The former also includes UTF-8 (UniJIS2004-UTF8-H) and UTF-16 (UniJIS2004-UTF16-H) versions that are kept in sync with the master UTF-32 version by being automagically generated by the CMap resource compiler (and decompiler), cmap-tool.pl, which I developed years ago.

Of course, all of these CMap resources also have vertical versions that use a “V” at the end of their names in lieu of the “H,” but in the context of OpenType font development, the vertical CMap resources are virtually unused and worthless because it is considered much better practice to explicitly define a ‘vert‘ GSUB feature for handling vertical substitution. In the absence of an explicit definition, the AFDKO makeotf tool will synthesize a ‘vert’ GSUB feature by using the corresponding vertical CMap resources.

With all that being said, what follows in this article is a complete history of these two CMap resources, which also assign dates, and sometimes notes, to each version.
Continue reading…

Twenty Years Ago…

Twenty years ago this month, in September of 1993, something remarkable happened in my life. My first book, entitled Understanding Japanese Information Processing (日本語情報処理), was published by O’Reilly Media (called O’Reilly & Associates back then). It had a very distinctive cover, which is shown with the two subsequent books:

UJIP CoverCJKV CoverCJKV2E Cover

Continue reading…

URO

The first set of ideographs to be encoded in Unicode (Version 1.1), which are referred to as CJK Unified Ideographs, are also referred to as the URO, which is an abbreviation for Unified Repertoire and Ordering. None of the other extensions are given this label. Extensions A through D have been standardized, and Extension E will soon be standardized. Only Extension A is in the BMP (Basic Multilingual Plane). Extension B and beyond are in Plane 2, which is called the SIP (Supplementary Ideographic Plane). What makes the URO special or unique?
Continue reading…

Worlds Apart

I spent the second half of June in Korea (attending IRG 38) and Japan (to present at the Tokyo AFDKO Workshop), and am now spending the first two weeks of July in Hot Springs, South Dakota, on vacation. These place are worlds apart, in terms of location and cultural differences.

Still, I enjoy traveling to these places. Of course, not much happens in terms of font development in South Dakota, unless some crisis arises that requires my attention. There are many interesting places to visit in this area, such as Mount Rushmore (the photo below was taken in August of last year).

I am enjoying this vacation, but I also look forward to returning to work in about two weeks.

ISO/IEC 10646:2012 Published!

ISO/IEC 10646:2012 (Third Edition) was just published. This is the first version of the standard that includes multiple-column Code Charts for Extension B, and for CJK Compatibility Ideographs. Another significant aspect of ISO/IEC 10646:2012 is that it is equivalent to Unicode Version 6.1.

For Adobe, the publishing of this new version of the standard represents a significant milestone, because it means that every Adobe-Japan1-6 kanji is either directly encoded, or is directly associated with a registered IVS in the IVD (Ideographic Variation Database).

Speaking of Unicode Version 6.1, the printed version of the Core Specification is available via POD from Lulu, and at a very attractive price.

Hong Kong or Bust!

I just received good news, in the form of confirmation that both of my ATypI Hong Kong 2012 presentation abstracts were accepted, which means that I will definitely be attending this conference. I alluded to this in the March 30th, 2012 CJK Type Blog article. One of the abstracts is for a 30-minute presentation entitled Kazuraki: Under The Hood, which will immediately follow a 30-minute presentation entitled Kazuraki: Its Art & Design, that will be presented by my colleagues Taro Yamamoto (山本太郎) and Ryoko Nishizuka (西塚涼子). For those who are not aware, Ryoko is the typeface designer of Kazuraki (かづらき), which is the centerpiece of both 30-minute presentations. The other is for a three-hour workshop entitled Manipulating CID-keyed Fonts Using AFDKO Tools, which will be co-presented by my colleague Masataka Hattori (服部正貴).

I am very much looking forward to attending an ATypI conference for the first time, and meeting many people. If you are planning to attend ATypI Hong Kong 2012, please be sure to introduce yourself to me, in case I don’t introduce myself to you first.

CMap Resource Updates & Change Policies

For those font developers who are not aware, the official CMap resource repository for our public ROSes is the CMap Resources open source project at Open @ Adobe, which is hosted by SourceForge. When CMap resources are updated, in addition to providing the updates through this portal, an announcement is made in the CMap Resources Forum.

The UTF-16 and UTF-32 CMap resources were introduced in August of 2001, beginning with Adobe-CNS1-4. Those for Adobe-Korea1-2 and Adobe-Japan2-0 followed in January of 2002, followed by those for Adobe-GB1-4 in June of the same year. The UTF-16 and UTF-32 CMap resources for Adobe-Japan1-5 were not released until November of 2002. From that point, the UCS-2 CMap resources were deprecated, and were no longer updated. Clients that used the UCS-2 CMap resources were encouraged to use the UTF-16 or UTF-32 ones instead. For OpenType font development, in terms of building the Unicode (Format 4 and 12) ‘cmap‘ subtables, the UTF-32 CMap resources are recommended.
Continue reading…

Adobe-Japan1-6 Unicode Version 6.1 Tables

Years ago, I wrote a Perl script, called unicode-rows.pl, that takes a fully-qualified PostScript name—composed of a CIDFont resource name, two hyphens, and a UTF-32 CMap resource name—then generates a PostScript file that can be distilled into a PDF. The resulting PDF file is a Unicode table, arranged in groups of 256 code points. If the UTF-32 CMap resource includes even a single mapping for a particular group of 256 code points, a page is created.

I have prepared examples that are based on the UniJIS2004-UTF32-H and UniJIS-UTF32-H CMap resources.
Continue reading…

The Special-Purpose Adobe-Identity-0 ROS

Adobe has thus far released two CID-keyed OpenType/CFF fonts that use the special-purpose Adobe-Identity-0 ROS (“ROS” is an abbreviation for /Registry, /Ordering, and /Supplement, which represent the three /CIDSystemInfo dictionary elements that are present in CIDFont and CMap resources): Kazuraki SP2N L (かづらき SP2N L) and Kenten Generic. The former is a commercial OpenType/CFF font, and the latter is an open source one. I have also developed several Adobe-Identity-0 ROS OpenType/CFF fonts for testing purposes, many of which have been provided in recent CJK Type Blog articles, the most recent of which being the May 9th, 2012 article.

The big question that may be on a font developer’s mind is under what circumstances is it appropriate to use the Adobe-Identity-0 ROS?
Continue reading…

Towards Breaking The 64K Glyph Barrier…

In the April 20, 2012 CJK Type Blog article, I wrote about the publishing of ISO/IEC 14496-28:2012 (Composite Font Representation), which provides a venue for breaking the 64K glyph barrier that is inherent in all sfnt-based font formats, including name- and CID-keyed PostScript fonts. If the number of glyphs of the combined component fonts that are referenced by a CFR object exceed 64K, would constitute breaking the 64K glyph barrier.
Continue reading…

Making “Character Codes” Look Better

In my work, I need to deal with character codes on a regular basis, such as Unicode scalar values and hexadecimal values for legacy encodings. This includes writing documents that include them. For most purposes, especially when used in tables, tabular figures work best because they are monospaced. Of course, one could simply choose to use a monospaced font. But, unless a different font is actually desired for character codes, using the same typeface design is usually preferred, because it better matches the surrounding text. The issue is that very few, if any, fonts include tabular glyphs that support hexadecimal notation, specifically referring to ‘A’ through ‘F’ (or ‘a’ through ‘f’ for lowercase). Luckily, I was able to solve this particular dilemma.
Continue reading…

Never Say Never

In the realm of CJK Unified Ideographs, there is always talk about no more characters to encode, or that any new characters are simply unifiable variants. This is, in large part, merely wishful thinking.

In my experience, there are three important words to embrace: Never Say Never.
Continue reading…

The AFDKO ‘tx’ Tool

Among the many excellent and powerful tools included in AFDKO (Adobe Font Development Kit for OpenType) is one with a two-letter name: tx. Although it has the shortest name, it is arguably one of the most powerful AFDKO tools.

The tx tool is best thought of as a multi-purpose font-file–manipulation tool. For those who don’t leverage this tool in the font development activities, I strongly encourage you to explore its capabilities, which is best done by perusing its built-in help and through experimentation.
Continue reading…

The All-Important Macron

When transliterating Japanese text using Latin characters, there are three systems or methods for doing so. Of these, the Hepburn system (ヘボン式 hebon shiki) is the most commonly used one, and differs in one important way: long vowels are represented with a macron (U+00AF MACRON or U+0304 COMBINING MACRON) diacritic. Almost all signage in Japan that includes transliterated text, such as in train and subway stations, uses the Hepburn system. However, if we look back to the 1990s and earlier, it was not common to include glyphs for macroned vowels in fonts, whether they were for Latin or Japanese use.

The two other systems, the Kunrei system (訓令式 kunrei shiki) and the Nippon system (日本式 nippon shiki), represent long vowels with a circumflex (U+005E CIRCUMFLEX ACCENT or U+0302 COMBINING CIRCUMFLEX ACCENT) diacritic. It was common for Latin fonts to include glyphs for circumflexed vowels, meaning U+00C2/U+00E2 (Ââ), U+00CA/U+00EA (Êê), U+00CE/U+00EE (Îî), U+00D4/U+00F4 (Ôô), and U+00DB/U+00FB (Ûû), by virtue of being included in ISO/IEC 8859-1 (aka Latin 1). However, due to limitations of Shift-JIS encoding, even Japanese fonts did not include glyphs for these characters.
Continue reading…

ISO/IEC 14496-28:2012 Published

Born from the conclusion that OpenType’s 64K glyph barrier cannot be broken in the context of the format itself, ISO/IEC 14496-28:2012 (Composite Font Representation) was developed, and was subsequently published three days ago, on April 17, 2012, as a new ISO standard. As described in the January 26, 2012 CJK Type Blog article, CID-keyed fonts can include a maximum of 65,535 glyphs (CIDs 0 through 65534). Considering that Unicode Version 6.1 includes over 100K characters, with approximately 75K of which being CJK Unified Ideographs, it becomes immediately apparent that a single font resource cannot support all of Unicode, let alone all of the characters for a single script (referring to CJK Unified Ideographs).
Continue reading…

Adobe-Japan1-6 Radical/Stroke Database

I spent approximately two weeks in August of 2004 developing a radical/stroke database for the 14,664 kanji in Adobe-Japan1-6 (CIDs 656, 1125–7477, 7633–7886, 7961–8004, 8266, 8267, 8284, 8285, 8359–8717, 13320–15443, 16779–20316, and 21071–23057), which is available as a tab-delimited text file that is keyed by Adobe-Japan1-6 CIDs, and as a PDF file that is keyed by indexing radical, then by the number of strokes of the indexing radical instance, followed by the number of remaining strokes, and finally by Adobe-Japan1-6 CID.
Continue reading…

CID vs GID

When working with OpenType/CFF fonts, particularly those that are CID-keyed, CIDs (Character IDs) and GIDs (Glyph IDs) are often referenced as ways to uniquely identify glyphs in a font resource. But, how are CIDs and GIDs different, and perhaps more importantly, under what circumstances are they different, or the same? These are good questions, and the answers can be found in today’s article.
Continue reading…

Advantages of Numeric Character References

Unicode has become the preferred way in which to represent text in digital form, and for good reason. Its broad coverage of our planet’s scripts and languages is the single greatest reason why this has happened. All of the major OSes have embraced Unicode. In other words, if you develop a product that makes use of text data, and if it doesn’t support Unicode, you’re doing something wrong.

Unicode comes in a variety of representations called encoding forms. The three most basic Unicode encoding forms are UTF-8, UTF-16, and UTF-32. The latter two are also available in explicit little- or big-endian flavors: UTF-16LE, UTF-16BE, UTF-32LE, and UTF-32BE. These are covered in Chapter 4 of CJK Information Processing, Second Edition. But, there are times when a bomb-proof way of representing Unicode characters is needed, or when an otherwise ASCII-only web document requires the occasional Unicode characters. For these purposes, and in the context of web documents, Numeric Character References (aka, NCRs) have great advantages. One of the advantages is its human-readability in terms of conveying an explicit Unicode code point. Another advantage is that only ASCII characters are used for this notation, which is its bomb-proof aspect.
Continue reading…

ATypI Hong Kong 2012

I am extraordinarily pleased that the upcoming ATypI (Association Typographique Internationale) conference will be held in Hong Kong: ATypI Hong Kong 2012. The dates are October 10th through the 14th, 2012, and the theme is between black and white (墨 in Chinese). For font developers who relish at the thought of discussing font-related issues and ideas with others in the same industry, the annual ATypI conference represents a unique opportunity. And, given its venue for this year’s iteration, a larger-than-usual number of CJK font developers are likely to attend, and the number of CJK-related presentations and workshops should be greater than usual.

In any case, I am planning to attend and present at this conference, and very much look forward to meeting other CJK font developers there.

Not One, But Three, IVD Code Charts

Thanks to an excellent suggestion from Taichi Kawabata (川幡太一), the 2012-03-02 version of the IVD (Ideographic Variation Database) includes three IVD Code Charts, which were released today. The two earlier versions of the IVD—2007-12-14 and 2010-11-14—included only one IVD Code Chart, named IVD_Charts.pdf.
Continue reading…