As I described in an article earlier this year, GB 18030 artificially imposes a visual difference between Radicals #74 (⽉) and #130 (⾁) for character pairs that differ only in this component, though conventions for Simplified Chinese use a unified form that looks like Radical #74. In that article I pinpointed a case for which the character that uses Radical #130 is in error, because its left-side radical uses the Radical #74 form, and the corresponding character that uses Radical #74 is outside the scope of GB 18030 (at least for now).
Thanks to Jaemin Chung, I was able to find three errors within the scope of GB 18030, as shown below:
According to the principles imposed by GB 18030, the characters on the left are in error, and should be visually distinct from those on the right in terms of their left-side radical.
So, how do I feel about this? Captain Jean-Luc Picard below sums it up pretty well:
For the first time in my life, I visited three East Asian countries in a single trip: China, South Korea, and Japan. I have had trips that involved two countries—South Korea & Japan, China & South Korea—but never three. This particular one was also done in the span of only one week.
The purpose of this trip was to visit the three type foundries who were involved in the Source Han Sans/Noto Sans CJK project: Changzhou SinoType (常州华文) in Changzhou, China; Iwata (イワタ) in Tōkyō, Japan; and Sandoll Communication (산돌커뮤니케이션) in Seoul, South Korea. In addition to thanking each company in person, we also used the opportunity to discuss particulars of the project, in terms of what worked well and what didn’t, and I also demonstrated the processes that I used to take their raw glyph data and turn it into the final fonts. All three companies gave us a warm welcome, and were very gracious hosts. We had excellent lunches and dinners with all three companies, which allowed for greater social interaction.
Masataka HATTORI (服部正貴) from our Tōkyō office traveled with me to China and South Korea, and Jinho KANG from our Seoul office participated in the meeting with Sandoll Communication. In addition to Masataka, Taro YAMAMOTO (山本太郎) participated in the meeting with Iwata.
This week’s festivities have thus far included attending IUC38 in Santa Clara, California. I presented twice, both times about Source Han Sans and Noto Sans CJK development.
For those who were unable to attend this excellent conference, the slides for my two presentations, Developing & Deploying The World’s First Open Source Pan-CJK Typeface Family and Building Source Han Sans & Noto Sans CJK, are now available.
P.S. The image shown above, which was used on page 47 of my first presentation to describe the Super OTC deployment configuration, became popular during IUC38, and was used by at least three other presentations. ☺
In the spirit of team-building and developing new skills, my manager, David Lemon, invited Chris Stinehour of Christopher Stinehour Design to give the Adobe Type Team a two-day workshop on letter cutting in stone. The workshop took place on Wednesday and Thursday of this week. The result of my efforts, which most definitely involved learning a new skill, is shown below:
Our most recent project, Source Han Sans, led me to much closer collaboration with our three-person typeface design and development team in our Tokyo office, which is managed by the Taro YAMAMOTO (山本太郎), with Ryoko NISHIZUKA (西塚涼子) as the primary typeface designer, and Masataka HATTORI (服部正貴) serving multiple roles, but mainly typeface design and production. The purpose of this article is to describe this team, with which I have worked for over 20 years on various projects, and its accomplishments from my perspective.
Now that the Version 1.001 update for Source Han Sans (源ノ角ゴシック), in the form of sources and installable fonts, is under my belt, my attention has now turned to my IUC38 (38th Internationalization & Unicode Conference) presentation, which is entitled Deploying & Developing The World’s First Open Source Pan-CJK Typeface Family. Much of what was involved in building these fonts—to include some of the hurdles that needed to be overcome—will be detailed in this presentation.
For those who use social media, you can follow IUC38 developments via the #IUC38 hashtag.
Oh, and I am also pleased to state that Adobe is once again a Gold Sponsor of this event, for the eighth year in a row.
Before I begin the series of articles about what went into building Source Han Sans, I think that it is worth writing a few things about actually installing and using the fonts, including how to determine which of the four deployment formats best suits your needs.
(Uni-chan image designed by Mary Jenkins)
In addition to attending IRG43 (ISO/IEC JTC1/SC2/WG2/IRG Meeting #43) in November as a US/Unicode delegate, I will also be serving as the Adobe host for this meeting, which will take place at Adobe’s headquarters in downtown San José, California. It will be a busy week for me, because while I will need to stay focused on the meeting itself, I also need to be mindful of matters related to logistics, before and during the meeting. Extension F (called Extension F1 by the IRG) is in the process of being handed off to WG2, and work on Extension G (called Extension F2 by the IRG) is expected to begin in earnest before and during IRG43.
By the way, the last time that an IRG meeting was held in the US was IRG37, which was hosted by Google in Mountain View, California in November of 2011. Before that, it was IRG29, which was hosted by Adobe in November of 2007.
I am very much looking forward to the meeting, meeting with the delegates, and being part of important CJK Unified Ideograph work.
A commemorative T-shirt may be necessary… ☺
Unless you have been living in a cave or under a rock, you’ve no doubt heard of Source Han Sans or Noto Sans CJK through the initial announcements from Adobe or Google who jointly developed them, or elsewhere. These two Pan-CJK typeface families, which are joined at the hip because they differ only in name, were released to the world at large, as open source fonts, on the afternoon of July 15, 2014 in the US, which was the morning of July 16, 2014 in East Asia, their target audience. Click on the preview below to view a single-page PDF that shows all 65,535 glyphs from one of these fonts:
Over the next several months I plan to publish a series of articles on this blog that will detail various aspects of the development process that I employed for building these two typeface families. Although the subsequent articles will mention only Source Han Sans by name, they also pertain to its twin, Noto Sans CJK.
Although this article is not about CJK, its purpose is to describe how I was put onto the CJK path. I studied French in high school, but it was really my studies of Russian, courtesy of the United States Army, that eventually put me on the CJK path. Immediately after graduating high school in 1983, I entered US Army Basic Training at Fort Leonard Wood, Missouri, which was followed by Interrogator School at Fort Huachuca, Arizona. The third part of my training, which was associated with my MOS (Military Occupational Specialty), Interrogator, was to learn Russian.
Given that Unicode has declared mid-year annual major releases, we can expect Unicode Version 8.0 to be released in about a year, in mid-2015. In terms of ideographs, we can expect some additions, specifically a small number of UNC (Urgently Needed Character) additions to the URO (Unified Repertoire & Ordering) that were discussed in the June article, along with Extension E. This single-page PDF provides a tentative look at the CJK Unified Ideographs, along with CJK Compatibility Ideographs for good measure.
Unicode Version 7.0 was release on June 16, 2014.
One of the accomplishments at IRG #42 last month was the addition of 29 new CJK Unified Ideographs to the URO (Unified Repertoire & Ordering), specifically from U+9FCD through U+9FE9. The first four are shown above.
Twenty years ago this month, in May of 1994, I successfully defended my PhD dissertation, entitled Prescriptive Kanji Simplification, which concluded my graduate studies at The University of Wisconsin-Madison’s Department of Linguistics. Madison is located approximately 20 miles from where I grew up (Mount Horeb, Wisconsin).
The Moji_Joho IVD Collection was first introduced via PRI 259 last December, which initiated a mandatory—according to UTS #37—90-day Public Review Period. The submitter received three sets of comments, and after making minor changes, submitted the materials for registering the new IVD collection, along with its initial set of IVSes. The Moji_Joho IVD Collection and its initial set of IVSes were officially registered on May 16, 2014, which represents the fourth version of the IVD (Ideographic Variation Database).
The 2014-05-16 version of the IVD thus registers a new IVD collection, Moji_Joho, along with its initial set of 10,710 IVSes, 9,685 of which are shared—through mutual agreement—with the registered Hanyo-Denshi IVD Collection. Some enhancements were also made to the IVD_Stats.txt file, specifically that the shared IVSes are explicitly listed at the end of the file.
One additional statistic is that the highest VS (Variation Selector) used is currently VS48 (U+E011F), meaning that 32 of the 240 VSes allocated for IVS use are now being used. Of course, it is relatively easy to figure out with which BC (Base Character) VS48 is used, and an educated guess would be that it is either U+9089 (邉) or U+908A (邊). It is the former:
9089 E011F; Moji_Joho; MJ026193
The highest VS used with the latter BC is currently VS37 (U+E0114).
As the IVD Registrar, I’d like to use this opportunity to thank everyone who made the effort to review PRI 259. I’d also like to congratulate those who prepared the Moji_Joho IVD Collection for both review and registration.
Shown above is the top portion of the printed version of Taiwan’s MOE 國字標準字體方體母稿 (Fangti) standard. (For those who are interested, its ISBN is 957-00-8392-1.) What is provided online are effectively scans of the 常用字 and 次常用字 sections, which contain 4,808 and 6,343 hanzi, respectively. Although included in the printed version of the standard, the 罕用字 section, which contains 1,907 additional hanzi, is not provided online. In terms of sheer numbers, these 1,907 additional hanzi appear to completely cover Big Five (both levels) and CNS 11643 Planes 1 and 2.
The purpose of today’s article is to describe two additional issues in this glyph standard that my new friend, Kuang-che Wu (吳光哲) of Google, recently found.
What is shown above is a trivial difference in a two-component structure that is present in many ideographs, such as 滕 (U+6ED5), 縢 (U+7E22), 螣 (U+87A3), 謄 (U+8B04), and 騰 (U+9A30). This difference is, of course, unifiable. What this article is about is consistency within a standard, mainly referring to the source standards from each region. The focus of this article is on the forms used in ROC (Republic of China; 中華民國 Zhōnghuá Mínguó), which is more commonly referred to Taiwan (臺灣 or 台灣 Táiwān).
Recent work has led me to more closely explore U+4548 (☞䕈☜), which is in CJK Unified Ideographs Extension A. (What is shown in parentheses in the previous sentence is likely to be different than what is shown in the excerpt above.)
The image above is an excerpt from the latest Extension A Code Charts. At first glance, everything seem normal. The differences between the G (China) and T (Taiwan) glyphs are expected, and perhaps more importantly, unifiable.
In the previous article I mentioned that 85 kanji that correspond to JIS X 0213:2004 currently have kIRG_JSource JA source references, but I made no mention about possible glyph differences between what is shown in the Code Charts and JIS X 0213:2004. I found at least seven kanji, among these 85, that have significant glyph differences between these two Japanese sources. I prepared this table that shows these glyph differences, by using excerpts from the Extension A code charts for the kIRG_JSource JA glyphs and Heisei Mincho W3 (平成明朝W3) for the JIS X 0213:2004 glyphs.
To continue yesterday’s article about different prototypical glyphs for Unicode code points that are common between JIS X 0212-1990 and JIS X 0213:2004, today’s article will focus on the normative references that correspond to JIS X 0213:2004, or rather the lack thereof.
Most Japanese font developers are—perhaps painfully—aware of the 168 kanji whose prototypical glyphs changed in 2004 via the JIS X 0213:2004 standard. What is not broadly known are those kanji whose prototypical glyphs are different between JIS X 0212-1990 and JIS X 0213 (both versions).
JIS X 0212-1990 was established in 1990, and included 5,801 kanji in a single block. JIS X 0213:2000 was established a full ten years later, and included 3,685 kanji in two levels (1,249 kanji in Level 3, and 2,436 in level 4). Ten additional kanji were added in JIS X 0213:2004, bringing the total to 3,695. When the Unicode code points that correspond to these two JIS standards are compared, 2,743 of them are common, 3,058 are specific to JIS X 0212-1990, and 952 are specific to JIS X 0213:2004.
Interestingly, when the prototypical glyphs of the 2,743 kanji that are in common—in terms of having a shared Unicode code point—are compared, 30 of them are different. I prepared a single-page table that shows the differences using genuine Heisei Mincho W3 (平成明朝W3) glyphs, which also provides Adobe-Japan1-6 CIDs for all but three of the JIS X 0212-1990 prototypical glyphs (these three glyphs are thus candidates for Adobe-Japan1-7). Also, all of the JIS X 0213 kanji are from the original 2000 version, except for the one that corresponds to U+7626 that was introduced in 2004. This character’s entry is shaded in the PDF.