It is with great sadness that I write that Unicode Version 9.0, whose beta was authorized yesterday, on the last day of UTC #146, will include no additional CJK Unified Ideographs. The next opportunity for additional CJK Unified Ideographs is therefore Unicode Version 10.0, which is slated for a June 2017 release, and is expected to include 21 Urgently Needed Characters (UNCs) that are appended to the URO (Unified Repertoire & Ordering), along with Extension F (see IRG N2156 for the latest version) that currently includes 7,473 characters.
Interestingly, and as long as Extension F’s block remains stable, there are only 3,088 code points remaining in Plane 2 (SIP), specifically U+2EBF0 through U+2F7FF, along with 1,502 code points at the end of Plane 2, immediately following CJK Compatibility Ideographs Supplement, specifically U+2FA20 through U+2FFFD.
The image above is an excerpt of a PDF that shows what Unicode Version 10.0 is likely to include in terms of ideographs. If you click on the image, you will get the actual PDF. Of course, the yellow stuff is tentative and subject to change.
Updated on 2016-06-26 to reflect the additional UNC appended to the URO at U+9FEA, along with a net decrease of 12 characters in Extension F, reducing it to 7,473 characters.
In late 2015, I collaborated with Daisuke MIURA to submit a proposal (L2/15-328) to the UTC (Unicode Technical Committee) to encode the characters for four tally mark systems. The proposal was discussed during UTC #146, and the result was that the five ideographic tally mark characters were accepted. Good news.
The Script Ad Hoc Committee originally recommended in their report for UTC #146 (see page 9 of L2/15-037) that IDEOGRAPHIC TALLY DIGIT TWO not be encoded, because they felt that it could be unified with U+1D36E (COUNTING ROD TENS DIGIT SIX), but concerns over typographic consistency led to it being accepted as a separate character.
In December of 2015, Unicode launched their Adopt a Character campaign, whose goal is to raise funding for the purpose of encoding a large number of remaining scripts, along with encoding additional characters for scripts that are already encoded. In other words, to help Unicode do its important work. The Unicode Consortium has 501(c)(3) tax status, meaning that donations are tax-deductible in the US, and if your company supports matching grants, you can leverage that to significantly increase the effective donation.
By default, the AFDKO makeotf tool includes Macintosh (platformID=1, encodingID=0, languageID=0) ‘name‘ table strings, and if specified in the “FontMenuNameDB” or “features” files, localized Macintosh ‘name’ table strings will also be included. The next release of AFDKO will include “-omitMacNames” as a new command-line option for makeotf whose purpose is to exclude Macintosh ‘name’ table strings, other than any that are explicitly specified in the “features” file.
IUC39 (The 39th Internationalization & Unicode Conference) took place in Santa Clara earlier this week, and Adobe was once again proud to be a Gold Sponsor. It was another outstanding and successful conference, and as usual, one of the greatest benefits of the conference—besides the many excellent presentations—was the opportunity for face-to-face exchanges with Unicode leaders, experts, and enthusiasts.
(The introductory graphic illustrates how the character 剣 (U+5263) is displayed using the fonts that are introduced in this article. The code point for this character maps to a glyph that displays as “63” in the FDArray Test 257 font, which is the hexadecimal equivalent of the decimal index of the FDArray element to which its glyph is assigned, which is 99. Likewise, the code point for this character maps to a glyph that displays as “52” in the FDArray Test 65535 font, which is the hexadecimal equivalent of the decimal index of the FDArray element to which its glyph is assigned, which is 82.)
I have built several CID-keyed OpenType/CFF fonts that are specifically designed to test various limits, by exercising various implementation limits, such as the number of glyphs (65,535 is the architectural limit), the number of FDArray elements (256 is the architectural limit), and the number of mappings in the ‘cmap‘ table (when the surrogates and non-characters are factored out, Unicode has 1,111,998 possible mappings in its 17 planes). I have sometimes made these fonts available, such as in this May of 2012 article that explains how such fonts can be built.
Anyway, I spent pretty much all day yesterday—except for a somewhat longer than usual lunch break that was actually used to watch The Martian (2015) with my wife—preparing a pair of open source CID-keyed OpenType/CFF fonts that exercise these limits but to different degrees, and I also managed to prepare and release the project on GitHub as FDArray Test.
The Unicode Consortium is planning to once again propose the encoding of the well-attested ideograph whose reading is biáng. Previous attempts at encoding this ideograph have failed due to the lack of sufficient evidence, such as appearing in a dictionary or other printed source. This time, however, there is sufficient evidence, and the simplified form of this ideograph will also be included in the proposal. Both forms, along with their U-Source references UTC-00791 and UTC-01312, are depicted below:
Historically, there have been two methods of supporting vertical writing in CID-keyed OpenType/CFF fonts, in terms of specifying the ‘vert‘ (Vertical Alternates) GSUB feature. One method involved using a vertical CMap resource, which was supplied to the AFDKO makeotf tool as an argument to its no-longer-supported “-cv” command-line option, that was used to synthesize the ‘vert’ GSUB feature. The other method, which is the preferred one, involves defining a ‘vert’ GSUB feature in the “features” file that is supplied to the AFDKO makeotf tool. In this brief article, I will explain why the first method is no longer supported, but more importantly, why the second method is preferred.
I am scheduled to present at IUC39 (The 39th Internationalization & Unicode Conference) in late October, and the title of my presentation is Pan-CJK Font Development Techniques, Tips, Tricks & Pitfalls. While the related presentations that I delivered at IUC38 last November focused on actual Source Han Sans and Noto Sans CJK development details, this presentation will be more general, and will instead focus more on techniques and best practices when developing large multilingual fonts, drawing on the experience of developing and deploying those two joined-at-the-hip typeface families when necessary.
I am currently dealing with properly categorizing the various tidbits of the presentation as Techniques, Tips, Tricks, or Pitfalls. I decided to combine Tips and Tricks into the single category Tips & Tricks, because they’re roughly the same, but mainly because I found an excellent image that conveys the meaning of tricks. ☺
Anyway, I still have a lot of work left to do on this presentation, but at least I have another two months to complete it.
As I may have mentioned in past articles, the benefits of this conference go beyond the scheduled presentations, and much of the value is the golden opportunity for face-to-face interaction with developers who are involved in the development of Unicode, or who are working with Unicode on a daily basis.
For those who are planning to attend IUC39, I look forward to meeting you there. 🍷
(Uni-chan image designed by Mary Jenkins)
IRG44 (ISO/IEC JTC1/SC2/WG2/IRG Meeting #44), which was originally scheduled to take place from 2015-06-15 through 2015-06-19 in Seoul, Republic of Korea and was canceled due to MERS, will instead take place during the first part of next week in Beijing, People’s Republic of China, from 2015-08-24 through 2015-08-26.
Besides the obvious work on Extensions F1 and F2, other items of interest are the three UNC (Urgently Needed Character) proposals that will be discussed, from the UTC (two characters in IRG N2068), Japan (five characters in IRG N2078), and Macao SAR (36 characters in IRG N2071). Of particular interest is IRG N2071, because Section C.2 of IRG Principles and Procedures (IRG N2016) states that UNC submissions should not include more than 30 characters.
2015-08-21 Update: The revised version of Macao SAR’s IRG N2017 (N2071R) includes only 23 characters, meaning that it is now within the terms set forth in Section C.2 of IRG Principles and Procedures.
I have personal interest in IRG N2074, which provides preliminary details about Hong Kong SAR’s forthcoming HKCS (Hong Kong Character Set) 2015 standard, which is intended to replace Hong Kong SCS-2008. One reason for my interest is that I plan to support HKCS 2015 in the Source Han Sans Version 2.000 glyph set.
Although I cannot attend IRG44, a colleague and friend who works in our Beijing office will be attending as my proxy.
While I won’t repeat here any of the exciting details in Typekit’s recent announcement for East Asia web font support (简体中文, 繁體中文, 日本語, 한국어) that employs dynamic kits, I’d like to seize this opportunity to demonstrate some of the default behavior that this new development exposes in various browsers.
Due to an inadvertent error on my part, the glyphs for the vertical-only kana were incorrect in Source Han Sans Version 1.002 (and, by extension, in Version 1.003 because there were no glyph changes). Many thanks to the person who identified and reported this issue, and I’d like to convey my sincere apologies to those who were affected by it.
[This article was written by Masataka HATTORI (服部正貴) and translated into English by yours truly.]
Source Han Code JP（日本語メニューネーム：源ノ角ゴシック Code JP）は、自分がほしくて個人的にはじめたオープンソースプロジェクトでした。Source Han Sans（源ノ角ゴシック）と Source Code Pro をフォールバックするエディタで使うと、漢字・仮名とくらべ英数字が小さくなってしまい全体的に読みにくいと感じていました。そんなとき、友人のプログラマーから、日本語も使えてコーディングにも適したフォントはないか？と相談されて、これは自分で作ってしまえと考えました。
オリジナルの Source Code Pro は、600 ユニット字幅を採用した欧文専用のモノスペースフォントで、まぎらわしいアルファベットや数字をディスプレイで判別しやすくするために、文字のデザインが工夫されています。それを、Source Han Sans JP（源ノ角ゴシック JP）の日本語と合わせてもフィットするようにサイズやウエイトを調整しました。文字幅は 660 ユニットあたりがちょうど良いと思いました。もともと読みやすさの観点から半角欧文はすこしコンデンスすぎると感じていたので、思い切って 2/3（667 ユニット）字幅を採用することにしました。一般的な半角（500 ユニット字幅）の等幅フォントにくらべ、全角文字との正確なインデントには向きませんが、読みやすさを確保しつつ、使い方次第で様々な表現ができると思いました。Source Han Code JP は、オリジナルの Source Han Sans JP と同じ７ウェイトのファミリーですが、ウェイトを切り替えても文字列の長さは変わりません。
結果的に、日本語を含むプログラミングやマークアップなどソースコードの表示や編集に使用できる Adobe Source シリーズの派生フォントとして、Adobe Fonts GitHub サイトから公開することになりました。
Read in English
Although it has been less than two months since the Source Han Sans Version 1.002 update was released, a Version 1.003 maintenance update was released on 2015-06-09 to address two particular issues. No glyphs nor Unicode mappings were added or modified.
Google’s corresponding Noto Sans CJK fonts, which continue to differ from Source Han Sans only by name, were also updated to Version 1.003 at the same time, and reflect the same changes.
Contrary to the opinion of some of those who have never participated in a UTC (Unicode Technical Committee) meeting, whose attendees include representatives from companies, organizations, and even governments, along with individual members, all of whom share a strong passion for the continued development of The Unicode Standard (TUS), which has become the de facto way in which to represent digital text on virtually all modern devices. These representatives and individual members are world-class experts who are also incredibly sensitive to cultural and regional issues that affect the interpretation and usefulness of the standard, and do everything in their power to ensure that it is usable in the broadest possible way. No other character set standard can even come close to making such a claim.
To put this into perhaps better perspective, standard-wise, it takes a typical government entity years to accomplish what The Unicode Consortium accomplishes in only one year.
The Source Han Sans Version 1.002 update was released on 2015-04-20, which involved turning a very large crank on something that has a very large number of moving parts. The updated region-specific subset OTFs are also available on Typekit via desktop sync.
Google’s corresponding Noto Sans CJK fonts, which differ from Source Han Sans only by name, were also updated to Version 1.002 at the same time, and reflect the same changes.
Yesterday morning I came up with the idea to produce a font for testing the extent to which applications and other text-handling environments support IVSes (Ideographic Variation Sequences), and ended up devoting the better part of this Easter weekend assembling, testing, and releasing the font as open source on GitHub. The font is named IVS Test, and as usual for me, it is an Adobe-Identity-0 ROS CID-keyed OpenType/CFF font.
Well, it’s April 1st, which represents an ideal time to release and introduce Adobe Blank 2, which is a new version of the popular Adobe Blank font.
In early 2008, as part of writing and typesetting CJKV Information Processing, Second Edition and preparing the latest version of Adobe Tech Note #5078 (The Adobe-Japan1-6 Character Collection), I built a small—in terms of the number of glyphs—special-purpose font for displaying registration marks for glyphs, and named it Tombo. Such registration marks are incredibly useful for showing the relative position of a glyph within its em-box, and for conveying the visual horizontal advance (aka glyph width). The excerpt above shows this font’s use in the Source Han Sans ReadMe (note that the PDF file will download if clicked).
One of the questions one may ask about the Adobe-branded Source Han Sans and Google-branded Noto Sans CJK open source Pan-CJK typeface families is whether they are GB 18030–compliant. Compliant? Sort of. Certified? Not yet.
Let me explain…