I am scheduled to present at IUC39 (The 39th Internationalization & Unicode Conference) in late October, and the title of my presentation is Pan-CJK Font Development Techniques, Tips, Tricks & Pitfalls. While the related presentations that I delivered at IUC38 last November focused on actual Source Han Sans and Noto Sans CJK development details, this presentation will be more general, and will instead focus more on techniques and best practices when developing large multilingual fonts, drawing on the experience of developing and deploying those two joined-at-the-hip typeface families when necessary.
I am currently dealing with properly categorizing the various tidbits of the presentation as Techniques, Tips, Tricks, or Pitfalls. I decided to combine Tips and Tricks into the single category Tips & Tricks, because they’re roughly the same, but mainly because I found an excellent image that conveys the meaning of tricks. ☺
Anyway, I still have a lot of work left to do on this presentation, but at least I have another two months to complete it.
As I may have mentioned in past articles, the benefits of this conference go beyond the scheduled presentations, and much of the value is the golden opportunity for face-to-face interaction with developers who are involved in the development of Unicode, or who are working with Unicode on a daily basis.
For those who are planning to attend IUC39, I look forward to meeting you there. 🍷
(Uni-chan image designed by Mary Jenkins)
IRG44 (ISO/IEC JTC1/SC2/WG2/IRG Meeting #44), which was originally scheduled to take place from 2015-06-15 through 2015-06-19 in Seoul, Republic of Korea and was canceled due to MERS, will instead take place during the first part of next week in Beijing, People’s Republic of China, from 2015-08-24 through 2015-08-26.
Besides the obvious work on Extensions F1 and F2, other items of interest are the three UNC (Urgently Needed Character) proposals that will be discussed, from the UTC (two characters in IRG N2068), Japan (five characters in IRG N2078), and Macao SAR (36 characters in IRG N2071). Of particular interest is IRG N2071, because Section C.2 of IRG Principles and Procedures (IRG N2016) states that UNC submissions should not include more than 30 characters.
2015-08-21 Update: The revised version of Macao SAR’s IRG N2017 (N2071R) includes only 23 characters, meaning that it is now within the terms set forth in Section C.2 of IRG Principles and Procedures.
I have personal interest in IRG N2074, which provides preliminary details about Hong Kong SAR’s forthcoming HKCS (Hong Kong Character Set) 2015 standard, which is intended to replace Hong Kong SCS-2008. One reason for my interest is that I plan to support HKCS 2015 in the Source Han Sans Version 2.000 glyph set.
Although I cannot attend IRG44, a colleague and friend who works in our Beijing office will be attending as my proxy.
While I won’t repeat here any of the exciting details in Typekit’s recent announcement for East Asia web font support (简体中文, 繁體中文, 日本語, 한국어) that employs dynamic kits, I’d like to seize this opportunity to demonstrate some of the default behavior that this new development exposes in various browsers.
Due to an inadvertent error on my part, the glyphs for the vertical-only kana were incorrect in Source Han Sans Version 1.002 (and, by extension, in Version 1.003 because there were no glyph changes). Many thanks to the person who identified and reported this issue, and I’d like to convey my sincere apologies to those who were affected by it.
[This article was written by Masataka HATTORI (服部正貴) and translated into English by yours truly.]
Source Han Code JP（日本語メニューネーム：源ノ角ゴシック Code JP）は、自分がほしくて個人的にはじめたオープンソースプロジェクトでした。Source Han Sans（源ノ角ゴシック）と Source Code Pro をフォールバックするエディタで使うと、漢字・仮名とくらべ英数字が小さくなってしまい全体的に読みにくいと感じていました。そんなとき、友人のプログラマーから、日本語も使えてコーディングにも適したフォントはないか？と相談されて、これは自分で作ってしまえと考えました。
オリジナルの Source Code Pro は、600 ユニット字幅を採用した欧文専用のモノスペースフォントで、まぎらわしいアルファベットや数字をディスプレイで判別しやすくするために、文字のデザインが工夫されています。それを、Source Han Sans JP（源ノ角ゴシック JP）の日本語と合わせてもフィットするようにサイズやウエイトを調整しました。文字幅は 660 ユニットあたりがちょうど良いと思いました。もともと読みやすさの観点から半角欧文はすこしコンデンスすぎると感じていたので、思い切って 2/3（667 ユニット）字幅を採用することにしました。一般的な半角（500 ユニット字幅）の等幅フォントにくらべ、全角文字との正確なインデントには向きませんが、読みやすさを確保しつつ、使い方次第で様々な表現ができると思いました。Source Han Code JP は、オリジナルの Source Han Sans JP と同じ７ウェイトのファミリーですが、ウェイトを切り替えても文字列の長さは変わりません。
結果的に、日本語を含むプログラミングやマークアップなどソースコードの表示や編集に使用できる Adobe Source シリーズの派生フォントとして、Adobe Fonts GitHub サイトから公開することになりました。
Read in English
Although it has been less than two months since the Source Han Sans Version 1.002 update was released, a Version 1.003 maintenance update was released on 2015-06-09 to address two particular issues. No glyphs nor Unicode mappings were added or modified.
Google’s corresponding Noto Sans CJK fonts, which continue to differ from Source Han Sans only by name, were also updated to Version 1.003 at the same time, and reflect the same changes.
Contrary to the opinion of some of those who have never participated in a UTC (Unicode Technical Committee) meeting, whose attendees include representatives from companies, organizations, and even governments, along with individual members, all of whom share a strong passion for the continued development of The Unicode Standard (TUS), which has become the de facto way in which to represent digital text on virtually all modern devices. These representatives and individual members are world-class experts who are also incredibly sensitive to cultural and regional issues that affect the interpretation and usefulness of the standard, and do everything in their power to ensure that it is usable in the broadest possible way. No other character set standard can even come close to making such a claim.
To put this into perhaps better perspective, standard-wise, it takes a typical government entity years to accomplish what The Unicode Consortium accomplishes in only one year.
The Source Han Sans Version 1.002 update was released on 2015-04-20, which involved turning a very large crank on something that has a very large number of moving parts. The updated region-specific subset OTFs are also available on Typekit via desktop sync.
Google’s corresponding Noto Sans CJK fonts, which differ from Source Han Sans only by name, were also updated to Version 1.002 at the same time, and reflect the same changes.
Yesterday morning I came up with the idea to produce a font for testing the extent to which applications and other text-handling environments support IVSes (Ideographic Variation Sequences), and ended up devoting the better part of this Easter weekend assembling, testing, and releasing the font as open source on GitHub. The font is named IVS Test, and as usual for me, it is an Adobe-Identity-0 ROS CID-keyed OpenType/CFF font.
I started the process of migrating to GitHub the font-related open source projects that I maintain, and recently finished. In some cases, the projects were split between SourceForge and GitHub, with the installable font resources (and sources) on the former, and only the sources on the latter. Some projects were available only on SourceForge.
There are a couple of motivations for this migration. First, GitHub provides a great user experience for posting, tracking, and responding to “Issues” for a project. In fact, I made good use of the mobile app for Android while vacationing in Wisconsin late last July. Second, we prefer the control that GitHub offers in terms of updating projects. I use the GitHub command-line tools, along with the SourceTree app for OS X, when initiating or updating projects on GitHub.
One of the questions one may ask about the Adobe-branded Source Han Sans and Google-branded Noto Sans CJK open source Pan-CJK typeface families is whether they are GB 18030–compliant. Compliant? Sort of. Certified? Not yet.
Let me explain…
Let it be known that the “OpenType Collection” (OTC) format was born on 09/21/2011 at Pho Minh Restaurant in Cupertino, California. Present from Adobe were the following: David Lemon, Ken Lunde, Sairus Patel, and Read Roberts. Present from Apple were Antonio Cavedoni, Julio Gonzalez, Yasuo Kida, Peter Lofting, and Tony Tseung. — Adobe & Apple
The above declaration paved the way for supporting (CFF-based) OpenType Collections in Apple’s OS X (beginning from Version 10.8) and in Adobe’s applications (beginning from CS6).
This is an update to the short article that I wrote last August.
Now that it is officially 2015, and given that Unicode Version 8.0 is scheduled to be released mid-year, exactly what is expected to be included—at least in terms of CJK Unified Ideographs—is becoming clearer. Perusing the still-in-process UCD (Unicode Character Database) sheds much light on this. (I also recommend checking the pipeline from time to time.)
I recently updated the single-page PDF that tallies the CJK Unified Ideographs and CJK Compatibility Ideographs that are in Unicode, to include the latest information for Version 8.0, along with what can be expected to be included in Version 9.0 (mid-2016).
The highlights for Unicode Version 8.0 include nine of the 29 Urgently Needed Characters being appended to the URO (Unified Repertoire & Ordering) and Extension E (5,762 characters). The remaining 20 Urgently Needed Characters, along with Extension F, are expected to be included in Unicode Version 9.0.
Source Han Sans (and Noto Sans CJK) supports the first four of the nine Urgently Needed Characters that are being fast-tracked for Version 8.0, along with 108 Extension E code points for supporting China’s 通用规范汉字表 (Tōngyòng Guīfàn Hànzìbiǎo).
While it was not uncommon for early (pre-Unicode) CJK character set standard to include characters that correspond to scripts of other languages or used in other countries, such as the extent to which Japanese kana were included in standards from China and Korea, it was not common for one of these countries to produce a standard for a seemingly different language. Enter GB 12052-89 (entitled Korean Character Coded Character Set for Information Interchange, or 信息交换用朝鲜文字编码字符集 in Chinese), which is a GB (PRC) standard that sort of broke this mold.
One of the reasons why Source Han Sans—and obviously the Google-branded Noto Sans CJK—can be considered the world’s first Pan-CJK typeface family is due to its support for Korean hangul. While it is common to support modern hangul in Korean fonts, supporting archaic hangul is relatively uncommon. One of the more challenging aspects of developing Source Han Sans was implementing support for archaic hangul, which also included handling 500 high-frequency archaic hangul syllables. This article will thus detail what went into supporting archaic hangul in Source Han Sans. I’d like to once again thank our talented friends at Sandoll Communication for designing the glyphs for these characters.
As I described in an article earlier this year, GB 18030 artificially imposes a visual difference between Radicals #74 (⽉) and #130 (⾁) for character pairs that differ only in this component, though conventions for Simplified Chinese use a unified form that looks like Radical #74. In that article I pinpointed a case for which the character that uses Radical #130 is in error, because its left-side radical uses the Radical #74 form, and the corresponding character that uses Radical #74 is outside the scope of GB 18030 (at least for now).
Thanks to Jaemin Chung, I was able to find three errors within the scope of GB 18030, as shown below:
According to the principles imposed by GB 18030, the characters on the left are in error, and should be visually distinct from those on the right in terms of their left-side radical.
(Uni-chan image designed by Mary Jenkins)
In addition to attending IRG43 (ISO/IEC JTC1/SC2/WG2/IRG Meeting #43) in November as a US/Unicode delegate, I will also be serving as the Adobe host for this meeting, which will take place at Adobe’s headquarters in downtown San José, California. It will be a busy week for me, because while I will need to stay focused on the meeting itself, I also need to be mindful of matters related to logistics, before and during the meeting. Extension F (called Extension F1 by the IRG) is in the process of being handed off to WG2, and work on Extension G (called Extension F2 by the IRG) is expected to begin in earnest before and during IRG43.
By the way, the last time that an IRG meeting was held in the US was IRG37, which was hosted by Google in Mountain View, California in November of 2011. Before that, it was IRG29, which was hosted by Adobe in November of 2007.
I am very much looking forward to the meeting, meeting with the delegates, and being part of important CJK Unified Ideograph work.
A commemorative T-shirt may be necessary… ☺
Unless you have been living in a cave or under a rock, you’ve no doubt heard of Source Han Sans or Noto Sans CJK through the initial announcements from Adobe or Google who jointly developed them, or elsewhere. These two Pan-CJK typeface families, which are joined at the hip because they differ only in name, were released to the world at large, as open source fonts, on the afternoon of July 15, 2014 in the US, which was the morning of July 16, 2014 in East Asia, their target audience. Click on the preview below to view a single-page PDF that shows all 65,535 glyphs from one of these fonts:
Over the next several months I plan to publish a series of articles on this blog that will detail various aspects of the development process that I employed for building these two typeface families. Although the subsequent articles will mention only Source Han Sans by name, they also pertain to its twin, Noto Sans CJK.
Given that Unicode has declared mid-year annual major releases, we can expect Unicode Version 8.0 to be released in about a year, in mid-2015. In terms of ideographs, we can expect some additions, specifically a small number of UNC (Urgently Needed Character) additions to the URO (Unified Repertoire & Ordering) that were discussed in the June article, along with Extension E. This single-page PDF provides a tentative look at the CJK Unified Ideographs, along with CJK Compatibility Ideographs for good measure.
Unicode Version 7.0 was release on June 16, 2014.
One of the accomplishments at IRG #42 last month was the addition of 29 new CJK Unified Ideographs to the URO (Unified Repertoire & Ordering), specifically from U+9FCD through U+9FE9. The first four are shown above.