Unlike the first and second similarly-titled articles that I published last month, this article will focus on a minor efficiency for the combining jamo feature of the Adobe-branded Source Han Sans and Google-branded Noto Sans CJK Pan-CJK typeface families.
Again. I arrived on the afternoon of 2016-10-16.
This month provided to me yet another opportunity to visit Japan, the Land of the Rising Sun and my wife’s home country, thanks to IRG #47 (Ideographic Rapporteur Group Meeting #47) being hosted there. This trip was also the first time for me to visit an island of Japan other than Honshū (本州), specifically Shikoku (四国).
One of my more popular open source fonts is Adobe Blank, and to a less extent the related Adobe Blank 2 because it uses a 'cmap' table format, Format 13, that is not broadly supported. Actually, Adobe Blank provides absolutely nothing, because it maps all 1,111,998 Unicode code points to a range of 2,048 non-spacing and non-marking glyphs, yet such a font is useful for particular scenarios, such as addressing the FOUT (Flash Of Unstyled Text) problem.
Allow me to introduce Adobe NotDef, which is modeled after Adobe Blank in that it covers all of Unicode and maps to a range of 2,048 glyphs, but differs in that the functional glyphs are spacing and marking. The original suggestion for Adobe NotDef came from Dave Crossland. The glyphs match the shape and advance width of the standard Adobe .notdef glyph that is invoked in environments that do not support font fallback when the selected font does not include a glyph for a particular character, and as Dave wrote, Adobe NotDef is useful for font fallback purposes in that it can be used to prevent the display of non-standard .notdef glyphs that may be present in some fonts in the font fallback chain.
It seems that I am on roll, having released two new open source fonts on GitHub within the past week. The previous—and brief—article that was about the LOCL Test OpenType/CFF font simply pointed to the repository. This article will be longer. I promise.
Inspired by the font that I prepared for and referenced in the previous article, I decided to launch a dedicated open source project for this useful test font, LOCL Test.
Although this article shares its title with an article from four years ago that was about the excitement associated with attending ATypI Hong Kong 2012, this particular one will focus on efforts to properly support Hong Kong SAR (aka HK or Hong Kong) in the Adobe-branded Source Han Sans and Google-branded Noto Sans CJK typeface families, but also in infrastructure, such as OSes and apps.
In other words, this article is not about traveling to Hong Kong, but rather about properly supporting Hong Kong in OSes, apps, and fonts.
A peculiar series of events that took place on April 1st (no joke) and 2nd of this year led to the discovery of what can only be described as somewhat of a revelation: A small number of CJK Compatibility Ideographs are necessary for China. This is important, because I made the following statement on page 168 of CJKV Information Processing, Second Edition:
—Humans make mistakes—
—Anything made by humans has the potential to include mistakes—
The most important things about mistakes are that 1) we recognize them, lest they propagate; 2) we learn from them; 3) we make an effort not to repeat them; and 4) we try to fix them, if possible.
Some mistakes are more easily fixed than others. Mistakes that cannot be fixed must be worked around.
With that said, an interesting event of historical significance occurred in June of 2000:
The next UTC (Unicode Technical Committee) meeting, the 147th one, takes place during the week of May 9th, and will be hosted at the Adobe headquarters in San José, California. All members of the Unicode Consortium, especially voting members, are encouraged to attend.
Much of the thinking that I did with regard to this unregistered—but hopefully soon-to-be-registered—IVD (Ideographic Variation Database) collection was done while visiting my parents in South Dakota, with one of the highlights of that trip being a scenic drive through Badlands National Park.
First and foremost, please forget, or at least ignore, most everything that was written in the 2016-02-13 and 2016-02-20 articles (which makes one wonder why I am linking to them, but I digress). Far too many things have changed, and what I present in this article represents the IVD collection that I hope will be registered later this year.
Continuing where I left off with the first article about this subject, I’d like to point out some of the implementation details and their ramifications in this article.
One of my longer term goals for the open source Source Han Sans project has been to eventually register a Pan-CJK IVD (Ideographic Variation Database) collection that would allow the regional variants to display and be preserved in “plain text” environments, and I think that I may have achieved a breakthrough the other day.
One of the fringe benefits of moving offices—especially when one has accumulated nearly 25 years of font-related material and it is thus not a pain-free exercise—is discovering historical documents, some of which turn out to be true gems. Our team is preparing to move from the Adobe East Tower to the West one, and part of the process is figuring which material to keep, and which to put into File 13. Anyway, I had been recently looking for a particular presentation that I prepared many years ago, and was fortunate enough to come across it while sifting through my accumulated materials.
In late 2015, I collaborated with Daisuke MIURA to submit a proposal (L2/15-328) to the UTC (Unicode Technical Committee) to encode the characters for four tally mark systems. The proposal was discussed during UTC #146, and the result was that the five ideographic tally mark characters were accepted. Good news.
The Script Ad Hoc Committee originally recommended in their report for UTC #146 (see page 9 of L2/15-037) that IDEOGRAPHIC TALLY DIGIT TWO not be encoded, because they felt that it could be unified with U+1D36E (COUNTING ROD TENS DIGIT SIX), but concerns over typographic consistency led to it being accepted as a separate character.
(The introductory graphic illustrates how the character 剣 (U+5263) is displayed using the fonts that are introduced in this article. The code point for this character maps to a glyph that displays as “63” in the FDArray Test 257 font, which is the hexadecimal equivalent of the decimal index of the FDArray element to which its glyph is assigned, which is 99. Likewise, the code point for this character maps to a glyph that displays as “52” in the FDArray Test 65535 font, which is the hexadecimal equivalent of the decimal index of the FDArray element to which its glyph is assigned, which is 82.)
I have built several CID-keyed OpenType/CFF fonts that are specifically designed to test various limits, by exercising various implementation limits, such as the number of glyphs (65,535 is the architectural limit), the number of FDArray elements (256 is the architectural limit), and the number of mappings in the ‘cmap‘ table (when the surrogates and non-characters are factored out, Unicode has 1,111,998 possible mappings in its 17 planes). I have sometimes made these fonts available, such as in this May of 2012 article that explains how such fonts can be built.
Anyway, I spent pretty much all day yesterday—except for a somewhat longer than usual lunch break that was actually used to watch The Martian (2015) with my wife—preparing a pair of open source CID-keyed OpenType/CFF fonts that exercise these limits but to different degrees, and I also managed to prepare and release the project on GitHub as FDArray Test.
I am scheduled to present at IUC39 (The 39th Internationalization & Unicode Conference) in late October, and the title of my presentation is Pan-CJK Font Development Techniques, Tips, Tricks & Pitfalls. While the related presentations that I delivered at IUC38 last November focused on actual Source Han Sans and Noto Sans CJK development details, this presentation will be more general, and will instead focus more on techniques and best practices when developing large multilingual fonts, drawing on the experience of developing and deploying those two joined-at-the-hip typeface families when necessary.
I am currently dealing with properly categorizing the various tidbits of the presentation as Techniques, Tips, Tricks, or Pitfalls. I decided to combine Tips and Tricks into the single category Tips & Tricks, because they’re roughly the same, but mainly because I found an excellent image that conveys the meaning of tricks. ☺
Anyway, I still have a lot of work left to do on this presentation, but at least I have another two months to complete it.
As I may have mentioned in past articles, the benefits of this conference go beyond the scheduled presentations, and much of the value is the golden opportunity for face-to-face interaction with developers who are involved in the development of Unicode, or who are working with Unicode on a daily basis.
For those who are planning to attend IUC39, I look forward to meeting you there. 🍷
Due to an inadvertent error on my part, the glyphs for the vertical-only kana were incorrect in Source Han Sans Version 1.002 (and, by extension, in Version 1.003 because there were no glyph changes). Many thanks to the person who identified and reported this issue, and I’d like to convey my sincere apologies to those who were affected by it.
[This article was written by Masataka HATTORI (服部正貴) and translated into English by yours truly.]
Source Han Code JP（日本語メニューネーム：源ノ角ゴシック Code JP）は、自分がほしくて個人的にはじめたオープンソースプロジェクトでした。Source Han Sans（源ノ角ゴシック）と Source Code Pro をフォールバックするエディタで使うと、漢字・仮名とくらべ英数字が小さくなってしまい全体的に読みにくいと感じていました。そんなとき、友人のプログラマーから、日本語も使えてコーディングにも適したフォントはないか？と相談されて、これは自分で作ってしまえと考えました。
オリジナルの Source Code Pro は、600 ユニット字幅を採用した欧文専用のモノスペースフォントで、まぎらわしいアルファベットや数字をディスプレイで判別しやすくするために、文字のデザインが工夫されています。それを、Source Han Sans JP（源ノ角ゴシック JP）の日本語と合わせてもフィットするようにサイズやウエイトを調整しました。文字幅は 660 ユニットあたりがちょうど良いと思いました。もともと読みやすさの観点から半角欧文はすこしコンデンスすぎると感じていたので、思い切って 2/3（667 ユニット）字幅を採用することにしました。一般的な半角（500 ユニット字幅）の等幅フォントにくらべ、全角文字との正確なインデントには向きませんが、読みやすさを確保しつつ、使い方次第で様々な表現ができると思いました。Source Han Code JP は、オリジナルの Source Han Sans JP と同じ７ウェイトのファミリーですが、ウェイトを切り替えても文字列の長さは変わりません。
結果的に、日本語を含むプログラミングやマークアップなどソースコードの表示や編集に使用できる Adobe Source シリーズの派生フォントとして、Adobe Fonts GitHub サイトから公開することになりました。
Read in English
Although it has been less than two months since the Source Han Sans Version 1.002 update was released, a Version 1.003 maintenance update was released on 2015-06-09 to address two particular issues. No glyphs nor Unicode mappings were added or modified.
Google’s corresponding Noto Sans CJK fonts, which continue to differ from Source Han Sans only by name, were also updated to Version 1.003 at the same time, and reflect the same changes.
The Source Han Sans Version 1.002 update was released on 2015-04-20, which involved turning a very large crank on something that has a very large number of moving parts. The updated region-specific subset OTFs are also available on Typekit via desktop sync.
Google’s corresponding Noto Sans CJK fonts, which differ from Source Han Sans only by name, were also updated to Version 1.002 at the same time, and reflect the same changes.