Although I am a couple days late, I’d like to use this opportunity to welcome the year of the monkey, and to wish a Chinese New Year to all of my Chinese friends, colleagues, and blog readers. May this year be safe, prosperous, and enjoyable.
One of the fringe benefits of moving offices—especially when one has accumulated nearly 25 years of font-related material and it is thus not a pain-free exercise—is discovering historical documents, some of which turn out to be true gems. Our team is preparing to move from the Adobe East Tower to the West one, and part of the process is figuring which material to keep, and which to put into File 13. Anyway, I had been recently looking for a particular presentation that I prepared many years ago, and was fortunate enough to come across it while sifting through my accumulated materials.
For those familiar with typeface design, there is no doubt that the Latin and Latin-like glyphs—to include those for Greek and Cyrillic—in Source Han Sans are based on Source Sans Pro. One may also wonder about the half-width Latin glyphs in Source Han Sans and how they compare to those in Source Code Pro. The purpose of this short article is to make these relationships and differences clear, or at least clearer.
While it was not uncommon for early (pre-Unicode) CJK character set standard to include characters that correspond to scripts of other languages or used in other countries, such as the extent to which Japanese kana were included in standards from China and Korea, it was not common for one of these countries to produce a standard for a seemingly different language. Enter GB 12052-89 (entitled Korean Character Coded Character Set for Information Interchange, or 信息交换用朝鲜文字编码字符集 in Chinese), which is a GB (PRC) standard that sort of broke this mold.
Our most recent project, Source Han Sans, led me to much closer collaboration with our three-person typeface design and development team in our Tokyo office, which is managed by the Taro YAMAMOTO (山本太郎), with Ryoko NISHIZUKA (西塚涼子) as the primary typeface designer, and Masataka HATTORI (服部正貴) serving multiple roles, but mainly typeface design and production. The purpose of this article is to describe this team, with which I have worked for over 20 years on various projects, and its accomplishments from my perspective.
Although this article is not about CJK, its purpose is to describe how I was put onto the CJK path. I studied French in high school, but it was really my studies of Russian, courtesy of the United States Army, that eventually put me on the CJK path. Immediately after graduating high school in 1983, I entered US Army Basic Training at Fort Leonard Wood, Missouri, which was followed by Interrogator School at Fort Huachuca, Arizona. The third part of my training, which was associated with my MOS (Military Occupational Specialty), Interrogator, was to learn Russian.
Twenty years ago this month, in May of 1994, I successfully defended my PhD dissertation, entitled Prescriptive Kanji Simplification, which concluded my graduate studies at The University of Wisconsin-Madison’s Department of Linguistics. Madison is located approximately 20 miles from where I grew up (Mount Horeb, Wisconsin).
[For those who are interested in reading my own release notes for the Adobe-Japan1-6 UTF-32 CMap resource history, which includes the non-JIS2004 ones, I made them available here on January 20, 2016.]
I was recently asked, indirectly via Twitter, about changes and additions that were made to our JIS2004-savvy CMap resources, specifically UniJIS2004-UTF32-H and UniJISX02132004-UTF32-H. The former also includes UTF-8 (UniJIS2004-UTF8-H) and UTF-16 (UniJIS2004-UTF16-H) versions that are kept in sync with the master UTF-32 version by being automagically generated by the CMap resource compiler (and decompiler), cmap-tool.pl, which I developed years ago.
Of course, all of these CMap resources also have vertical versions that use a “V” at the end of their names in lieu of the “H,” but in the context of OpenType font development, the vertical CMap resources are virtually unused and worthless because it is considered much better practice to explicitly define a ‘vert‘ GSUB feature for handling vertical substitution. In the absence of an explicit definition, the AFDKO makeotf tool will synthesize a ‘vert’ GSUB feature by using the corresponding vertical CMap resources.
With all that being said, what follows in this article is a complete history of these two CMap resources, which also assign dates, and sometimes notes, to each version.
Twenty years ago this month, in September of 1993, something remarkable happened in my life. My first book, entitled Understanding Japanese Information Processing (日本語情報処理), was published by O’Reilly Media (called O’Reilly & Associates back then). It had a very distinctive cover, which is shown with the two subsequent books:
The first set of ideographs to be encoded in Unicode (Version 1.1), which are referred to as CJK Unified Ideographs, are also referred to as the URO, which is an abbreviation for Unified Repertoire and Ordering. None of the other extensions are given this label. Extensions A through D have been standardized, and Extension E will soon be standardized. Only Extension A is in the BMP (Basic Multilingual Plane). Extension B and beyond are in Plane 2, which is called the SIP (Supplementary Ideographic Plane). What makes the URO special or unique?
I spent the second half of June in Korea (attending IRG 38) and Japan (to present at the Tokyo AFDKO Workshop), and am now spending the first two weeks of July in Hot Springs, South Dakota, on vacation. These place are worlds apart, in terms of location and cultural differences.
Still, I enjoy traveling to these places. Of course, not much happens in terms of font development in South Dakota, unless some crisis arises that requires my attention. There are many interesting places to visit in this area, such as Mount Rushmore (the photo below was taken in August of last year).
I am enjoying this vacation, but I also look forward to returning to work in about two weeks.
ISO/IEC 10646:2012 (Third Edition) was just published. This is the first version of the standard that includes multiple-column Code Charts for Extension B, and for CJK Compatibility Ideographs. Another significant aspect of ISO/IEC 10646:2012 is that it is equivalent to Unicode Version 6.1.
For Adobe, the publishing of this new version of the standard represents a significant milestone, because it means that every Adobe-Japan1-6 kanji is either directly encoded, or is directly associated with a registered IVS in the IVD (Ideographic Variation Database).
Speaking of Unicode Version 6.1, the printed version of the Core Specification is available via POD from Lulu, and at a very attractive price.
I just received good news, in the form of confirmation that both of my ATypI Hong Kong 2012 presentation abstracts were accepted, which means that I will definitely be attending this conference. I alluded to this in the March 30th, 2012 CJK Type Blog article. One of the abstracts is for a 30-minute presentation entitled Kazuraki: Under The Hood, which will immediately follow a 30-minute presentation entitled Kazuraki: Its Art & Design, that will be presented by my colleagues Taro Yamamoto (山本太郎) and Ryoko Nishizuka (西塚涼子). For those who are not aware, Ryoko is the typeface designer of Kazuraki (かづらき), which is the centerpiece of both 30-minute presentations. The other is for a three-hour workshop entitled Manipulating CID-keyed Fonts Using AFDKO Tools, which will be co-presented by my colleague Masataka Hattori (服部正貴).
I am very much looking forward to attending an ATypI conference for the first time, and meeting many people. If you are planning to attend ATypI Hong Kong 2012, please be sure to introduce yourself to me, in case I don’t introduce myself to you first.
For those font developers who are not aware, the official CMap resource repository for our public ROSes is the CMap Resources open source project at Open @ Adobe, which is hosted by SourceForge. When CMap resources are updated, in addition to providing the updates through this portal, an announcement is made in the CMap Resources Forum.
The UTF-16 and UTF-32 CMap resources were introduced in August of 2001, beginning with Adobe-CNS1-4. Those for Adobe-Korea1-2 and Adobe-Japan2-0 followed in January of 2002, followed by those for Adobe-GB1-4 in June of the same year. The UTF-16 and UTF-32 CMap resources for Adobe-Japan1-5 were not released until November of 2002. From that point, the UCS-2 CMap resources were deprecated, and were no longer updated. Clients that used the UCS-2 CMap resources were encouraged to use the UTF-16 or UTF-32 ones instead. For OpenType font development, in terms of building the Unicode (Format 4 and 12) ‘cmap‘ subtables, the UTF-32 CMap resources are recommended.
Years ago, I wrote a Perl script, called unicode-rows.pl, that takes a fully-qualified PostScript name—composed of a CIDFont resource name, two hyphens, and a UTF-32 CMap resource name—then generates a PostScript file that can be distilled into a PDF. The resulting PDF file is a Unicode table, arranged in groups of 256 code points. If the UTF-32 CMap resource includes even a single mapping for a particular group of 256 code points, a page is created.
Adobe has thus far released two CID-keyed OpenType/CFF fonts that use the special-purpose Adobe-Identity-0 ROS (“ROS” is an abbreviation for /Registry, /Ordering, and /Supplement, which represent the three /CIDSystemInfo dictionary elements that are present in CIDFont and CMap resources): Kazuraki SP2N L (かづらき SP2N L) and Kenten Generic. The former is a commercial OpenType/CFF font, and the latter is an open source one. I have also developed several Adobe-Identity-0 ROS OpenType/CFF fonts for testing purposes, many of which have been provided in recent CJK Type Blog articles, the most recent of which being the May 9th, 2012 article.
The big question that may be on a font developer’s mind is under what circumstances is it appropriate to use the Adobe-Identity-0 ROS?
In the April 20, 2012 CJK Type Blog article, I wrote about the publishing of ISO/IEC 14496-28:2012 (Composite Font Representation), which provides a venue for breaking the 64K glyph barrier that is inherent in all sfnt-based font formats, including name- and CID-keyed PostScript fonts. If the number of glyphs of the combined component fonts that are referenced by a CFR object exceed 64K, would constitute breaking the 64K glyph barrier.
In my work, I need to deal with character codes on a regular basis, such as Unicode scalar values and hexadecimal values for legacy encodings. This includes writing documents that include them. For most purposes, especially when used in tables, tabular figures work best because they are monospaced. Of course, one could simply choose to use a monospaced font. But, unless a different font is actually desired for character codes, using the same typeface design is usually preferred, because it better matches the surrounding text. The issue is that very few, if any, fonts include tabular glyphs that support hexadecimal notation, specifically referring to ‘A’ through ‘F’ (or ‘a’ through ‘f’ for lowercase). Luckily, I was able to solve this particular dilemma.
In the realm of CJK Unified Ideographs, there is always talk about no more characters to encode, or that any new characters are simply unifiable variants. This is, in large part, merely wishful thinking.
In my experience, there are three important words to embrace: Never Say Never.