Although this article shares its title with an article from four years ago that was about the excitement associated with attending ATypI Hong Kong 2012, this particular one will focus on efforts to properly support Hong Kong SAR (aka HK or Hong Kong) in the Adobe-branded Source Han Sans and Google-branded Noto Sans CJK typeface families, but also in infrastructure, such as OSes and apps.
In other words, this article is not about traveling to Hong Kong, but rather about properly supporting Hong Kong in OSes, apps, and fonts.
A peculiar series of events that took place on April 1st (no joke) and 2nd of this year led to the discovery of what can only be described as somewhat of a revelation: A small number of CJK Compatibility Ideographs are necessary for China. This is important, because I made the following statement on page 168 of CJKV Information Processing, Second Edition:
—Humans make mistakes—
—Anything made by humans has the potential to include mistakes—
The most important things about mistakes are that 1) we recognize them, lest they propagate; 2) we learn from them; 3) we make an effort not to repeat them; and 4) we try to fix them, if possible.
Some mistakes are more easily fixed than others. Mistakes that cannot be fixed must be worked around.
With that said, an interesting event of historical significance occurred in June of 2000:
The first version of the IVD (Ideographic Variation Database) was issued on 2007-12-14, meaning over eight years ago, and there have been three subsequent revisions, the latest being issued on 2014-05-16. There are currently three registered IVD collections: Adobe-Japan1, Hanyo-Denshi, and Moji_Joho. A significant number of IVSes are shared between the latter two IVD collections, 9,685 to be exact. While I cannot speak to the latter two IVD collections, the Adobe-Japan1 one is supported by hundreds of OpenType fonts via the Format 14 (Unicode Variation Sequences) ‘cmap‘ subtable. Furthermore, the number of apps and OSes that support UVSes has reached critical mass.
Much of the thinking that I did with regard to this unregistered—but hopefully soon-to-be-registered—IVD (Ideographic Variation Database) collection was done while visiting my parents in South Dakota, with one of the highlights of that trip being a scenic drive through Badlands National Park.
First and foremost, please forget, or at least ignore, most everything that was written in the 2016-02-13 and 2016-02-20 articles (which makes one wonder why I am linking to them, but I digress). Far too many things have changed, and what I present in this article represents the IVD collection that I hope will be registered later this year.
One of my longer term goals for the open source Source Han Sans project has been to eventually register a Pan-CJK IVD (Ideographic Variation Database) collection that would allow the regional variants to display and be preserved in “plain text” environments, and I think that I may have achieved a breakthrough the other day.
CJK Unified Ideographs is a very deep and fascinating subject, and there are people who sometimes— and anecdotally—claim that it is a bottomless pit (or infinite tunnel, hence the quote in this article’s title). The latter may appear to be accurate, especially when one considers what is happening on that front, such as Extension F with its 7,473 new characters slated for Unicode Version 10.0 in June of 2017, and with work on Extension G commencing.
One of the fringe benefits of moving offices—especially when one has accumulated nearly 25 years of font-related material and it is thus not a pain-free exercise—is discovering historical documents, some of which turn out to be true gems. Our team is preparing to move from the Adobe East Tower to the West one, and part of the process is figuring which material to keep, and which to put into File 13. Anyway, I had been recently looking for a particular presentation that I prepared many years ago, and was fortunate enough to come across it while sifting through my accumulated materials.
It is with great sadness that I write that Unicode Version 9.0, whose beta was authorized yesterday, on the last day of UTC #146, will include no additional CJK Unified Ideographs. The next opportunity for additional CJK Unified Ideographs is therefore Unicode Version 10.0, which is slated for a June 2017 release, and is expected to include 21 Urgently Needed Characters (UNCs) that are appended to the URO (Unified Repertoire & Ordering), along with Extension F (see IRG N2156 for the latest version) that currently includes 7,473 characters.
Interestingly, and as long as Extension F’s block remains stable, there are only 3,088 code points remaining in Plane 2 (SIP), specifically U+2EBF0 through U+2F7FF, along with 1,502 code points at the end of Plane 2, immediately following CJK Compatibility Ideographs Supplement, specifically U+2FA20 through U+2FFFD.
The image above is an excerpt of a PDF that shows what Unicode Version 10.0 is likely to include in terms of ideographs. If you click on the image, you will get the actual PDF. Of course, the yellow stuff is tentative and subject to change.
Updated on 2016-06-26 to reflect the additional UNC appended to the URO at U+9FEA, along with a net decrease of 12 characters in Extension F, reducing it to 7,473 characters.
In late 2015, I collaborated with Daisuke MIURA to submit a proposal (L2/15-328) to the UTC (Unicode Technical Committee) to encode the characters for four tally mark systems. The proposal was discussed during UTC #146, and the result was that the five ideographic tally mark characters were accepted. Good news.
The Script Ad Hoc Committee originally recommended in their report for UTC #146 (see page 9 of L2/15-037) that IDEOGRAPHIC TALLY DIGIT TWO not be encoded, because they felt that it could be unified with U+1D36E (COUNTING ROD TENS DIGIT SIX), but concerns over typographic consistency led to it being accepted as a separate character.
In December of 2015, Unicode launched their Adopt a Character campaign, whose goal is to raise funding for the purpose of encoding a large number of remaining scripts, along with encoding additional characters for scripts that are already encoded. In other words, to help Unicode do its important work. The Unicode Consortium has 501(c)(3) tax status, meaning that donations are tax-deductible in the US, and if your company supports matching grants, you can leverage that to significantly increase the effective donation.
By default, the AFDKO makeotf tool includes Macintosh (platformID=1, encodingID=0, languageID=0) ‘name‘ table strings, and if specified in the “FontMenuNameDB” or “features” files, localized Macintosh ‘name’ table strings will also be included. The next release of AFDKO will include “-omitMacNames” as a new command-line option for makeotf whose purpose is to exclude Macintosh ‘name’ table strings, other than any that are explicitly specified in the “features” file.
IUC39 (The 39th Internationalization & Unicode Conference) took place in Santa Clara earlier this week, and Adobe was once again proud to be a Gold Sponsor. It was another outstanding and successful conference, and as usual, one of the greatest benefits of the conference—besides the many excellent presentations—was the opportunity for face-to-face exchanges with Unicode leaders, experts, and enthusiasts.
The Unicode Consortium is planning to once again propose the encoding of the well-attested ideograph whose reading is biáng. Previous attempts at encoding this ideograph have failed due to the lack of sufficient evidence, such as appearing in a dictionary or other printed source. This time, however, there is sufficient evidence, and the simplified form of this ideograph will also be included in the proposal. Both forms, along with their U-Source references UTC-00791 and UTC-01312, are depicted below:
Historically, there have been two methods of supporting vertical writing in CID-keyed OpenType/CFF fonts, in terms of specifying the ‘vert‘ (Vertical Alternates) GSUB feature. One method involved using a vertical CMap resource, which was supplied to the AFDKO makeotf tool as an argument to its no-longer-supported “-cv” command-line option, that was used to synthesize the ‘vert’ GSUB feature. The other method, which is the preferred one, involves defining a ‘vert’ GSUB feature in the “features” file that is supplied to the AFDKO makeotf tool. In this brief article, I will explain why the first method is no longer supported, but more importantly, why the second method is preferred.
I am scheduled to present at IUC39 (The 39th Internationalization & Unicode Conference) in late October, and the title of my presentation is Pan-CJK Font Development Techniques, Tips, Tricks & Pitfalls. While the related presentations that I delivered at IUC38 last November focused on actual Source Han Sans and Noto Sans CJK development details, this presentation will be more general, and will instead focus more on techniques and best practices when developing large multilingual fonts, drawing on the experience of developing and deploying those two joined-at-the-hip typeface families when necessary.
I am currently dealing with properly categorizing the various tidbits of the presentation as Techniques, Tips, Tricks, or Pitfalls. I decided to combine Tips and Tricks into the single category Tips & Tricks, because they’re roughly the same, but mainly because I found an excellent image that conveys the meaning of tricks. ☺
Anyway, I still have a lot of work left to do on this presentation, but at least I have another two months to complete it.
As I may have mentioned in past articles, the benefits of this conference go beyond the scheduled presentations, and much of the value is the golden opportunity for face-to-face interaction with developers who are involved in the development of Unicode, or who are working with Unicode on a daily basis.
For those who are planning to attend IUC39, I look forward to meeting you there. 🍷
(Uni-chan image designed by Mary Jenkins)
IRG44 (ISO/IEC JTC1/SC2/WG2/IRG Meeting #44), which was originally scheduled to take place from 2015-06-15 through 2015-06-19 in Seoul, Republic of Korea and was canceled due to MERS, will instead take place during the first part of next week in Beijing, People’s Republic of China, from 2015-08-24 through 2015-08-26.
Besides the obvious work on Extensions F1 and F2, other items of interest are the three UNC (Urgently Needed Character) proposals that will be discussed, from the UTC (two characters in IRG N2068), Japan (five characters in IRG N2078), and Macao SAR (36 characters in IRG N2071). Of particular interest is IRG N2071, because Section C.2 of IRG Principles and Procedures (IRG N2016) states that UNC submissions should not include more than 30 characters.
2015-08-21 Update: The revised version of Macao SAR’s IRG N2017 (N2071R) includes only 23 characters, meaning that it is now within the terms set forth in Section C.2 of IRG Principles and Procedures.
I have personal interest in IRG N2074, which provides preliminary details about Hong Kong SAR’s forthcoming HKCS (Hong Kong Character Set) 2015 standard, which is intended to replace Hong Kong SCS-2008. One reason for my interest is that I plan to support HKCS 2015 in the Source Han Sans Version 2.000 glyph set.
Although I cannot attend IRG44, a colleague and friend who works in our Beijing office will be attending as my proxy.