It seems that not a day goes by that I am not using Adobe InDesign CC.
It is my preferred document-authoring app, whether I am preparing a relatively simple single-page document or one that is much longer and complex.
Besides being the world’s first open source serif-style Pan-CJK typeface families, the Adobe-branded Source Han Serif and the Google-branded Noto Serif CJK also represent the first broad deployment of two highly-complex and related ideographs that are in the process of being encoded. Their glyphs are shown above in all seven weights. Although it may be hard to believe, the fourth line illustrates the simplified version.
Or, perhaps more accurately, the project that has been keeping me busy for the past couple of years.
The Adobe-branded Source Han Serif (named 源ノ明朝 in Japanese, 본명조 in Korean, 思源宋体 in Simplified Chinese, and 思源宋體 in Traditional Chinese) and Google-branded Noto Serif CJK open source Pan-CJK typeface families, which represent the serif-style counterparts to the similarly-named and also open source Source Han Sans and Noto Sans CJK Pan-CJK typeface families, were released on 2017-04-03. You can read more about the Source Han Serif release here (日本語—한국어—简体中文—繁體中文), which includes a six-minute promotional video.
This article provides information that you would not expect to find in the official announcements for Source Han Serif or Noto Serif CJK, mainly because such information is intended for a completely different audience, which is primarily comprised of font developers.
Unless noted otherwise, all further references to Source Han Serif or Source Han Sans will apply to Noto Serif CJK or Noto Sans CJK, respectively.
Perhaps as a continuation of this article from almost a year ago with a clever image, I’d like to use this opportunity to mention that the AFDKO tx tool is about to get a new and improved CFF subroutinizer.
The tx tool has actually had a CFF subroutinizer for quite some time, since late 2008 or so, which is invoked by using the “+S” command-line option in combination with the “-cff” command-line option, and while it was noticeably faster than the AFDKO makeotf tool’s built-in subroutinizer, there were issues that prevented me from using it, such as recursion depth and the inability to limit the number of local and global subroutines.
Based on my testing thus far—using my trusty 2014 Apple MacBook Pro—the tx tool’s new subroutinizer is over three orders of magnitude faster that the makeotf tool’s built-in one. Yes, over one-thousand times faster! CIDFont resources that once took hours to subroutinize now take mere seconds, and with comparable results both in terms of number of subroutines and reduced CFF size. The 65,535-glyph Source Han Sans CIDFont resources take approximately 30 seconds to become subroutinized CFFs, and the 23,058-glyph Kozuka Gothic Pr6N (小塚ゴシック Pr6N) and Kozuka Mincho Pr6N (小塚明朝 Pr6N) ones take less than 10 seconds each.
Anyway, the next release of AFDKO will include a version of the tx tool that includes this new and improved subroutinizer. Of course, the primary beneficiaries of this new version are those who build OpenType/CFF fonts that include thousands or tens of thousands of glyphs, like me.
In closing, I’d like to draw attention to the open source otfcc project on GitHub, which apparently provides similar CFF subroutinization results, in terms of speed and the end result.
The IVD (Ideographic Variation Database) is all about ideograph variants. Up until earlier this year, its scope was limited to CJK Unified Ideographs, per UTS #37 (Unicode Ideographic Variation Database). Its scope now includes characters with the Ideographic property that are not canonically nor compatibly decomposable, which still excludes CJK Compatibility Ideographs.
In an ideal world, a particular glyph—whether it’s considered the standard (aka encoded) form or an unencoded variant of the standard form—would be associated with a single registered IVS (Ideographic Variation Sequence) within an IVD collection. However, we do not live in a perfect world, and several real-world conditions can lead to duplicate sequence identifiers within an IVD collection.
I will open this article by stating that OpenType features are almost always GSUB (Glyph SUBstitution) or GPOS (Glyph POSitioning). The former table specifies features that substitute glyphs with other glyphs, usually in a 1:1 fashion, but not always. The latter table specifies features that alter the metrics of glyphs, or the inter-glyph metrics (aka kerning).
The focus of this particular article will be the 'vert' (Vertical Alternates) feature, which substitutes a glyph with the appropriate glyph for vertical writing, and is invoked when in vertical writing mode. In other words, it’s a GSUB feature, and one that needs to be invoked for proper vertical writing. Current implementations that support the 'vert' GSUB feature, which tend to be CJK fonts, substitute glyphs with their vertical forms on a 1:1 basis, though language-tagging may affect the outcome for Pan-CJK fonts, such as the Adobe-branded Source Han Sans and the Google-branded Noto Sans CJK, which support multiple languages.
This article is largely a test, but also serves to start the process of resurrecting L2/14-006 (Proposal to add standardized variation sequences for nine characters) for discussion at UTC #151 in early May.
Liang Hai (梁海) brought up this document for discussion at UTC #150 last week, and while I had an opportunity to have it accepted by the UTC, to be included in Unicode Version 10.0 (June, 2017), I decided that it was prudent to instead prepare a revised proposal that is more complete, mainly because L2/14-006 was submitted and discussed prior to the first release of the Adobe-branded Source Han Sans and Google-branded Noto Sans CJK Pan-CJK typeface families. This functionality was implemented in those typeface families via the 'locl' GSUB feature, which requires the text to be language-tagged. In other words, I learned a lot since L2/14-006 was discussed, and prefer to submit a more complete proposal, even if it means waiting for Unicode Version 11.0 (June, 2018).
As recorded on the very first page of Adobe Tech Note #5078, Adobe-Japan1-6 was released on 2004-03-05, and one of the glyphs that was added was CID+20958. According to the Adobe-Japan1-6 ordering file, its glyph name is freedial, and is assigned to the Dingbats FDArray element for the purpose of hinting. Of course, if you look for CID+20958 in Adobe Tech Note #5078, you can find it on the bottom of page 54, immediately to the right of CID+20957 that maps from U+26BD ⚽ SOCCER BALL, though it is blank. This is simply because Adobe does not have the rights to use NTT’s trademarked FreeDial mark. CID+20958 was included in Adobe-Japan1-6 for the benefit of font developers who do have the rights to use this mark, and can thus include the glyph in their fonts.
UTC #150, the 150th Unicode Technical Committee meeting, takes place later this month, from 2017-01-23 through 2017-01-26, and will be hosted by our friends at Apple in Sunnyvale, California. I will attend as Adobe’s representative. As usual, there will be CJK- and IRG-related items to discuss. One item will be the UTC’s review of IRG Working Set 2015 Version 3.0 (aka CJK Unified Ideographs Extension G), L2/17-006, which I recently finished.
A major focus of UTC #150 will be Unicode Version 10.0, which is scheduled to be released in June of this year. Unicode Version 10.0 will include 21 additional characters appended to the URO (Unified Repertoire & Ordering), along with Extension F (7,473 characters), as shown here.
While we’re on the subject of Unicode, be sure to explore the sidebar on the right side of this blog’s landing page, which includes links to several useful Unicode-related resources.
Prior to Japan’s script reform of 1900, there was more than one shape associated with each syllable of the hiragana syllabary. There is now only one shape associated with each syllable. The now-obsolete and nonstandard shapes are referred to as hentaigana (変体仮名), which simply means variant kana. Hentaigana are still in use today in Japan, but are limited to Japan’s family registry (戸籍 koseki in Japanese) and specialized uses, such as business signage and other decor that are specifically designed to convey a feeling of nostalgia or traditional charm.
In addition to the Wikipedia article that is linked from the previous paragraph, 『変体仮名のこれまでとこれから—情報交換のための標準化』 (The past, present, and future of hentaigana: Standardization for information processing) by TAKADA Tomokazu (高田智和) et al. and About the inclusion of standardized codepoints for Hentaigana by YADA Tsutomu (矢田勉) serve as excellent reading material.
To (significantly) expand yesterday’s super exciting article, and in the continued interest of (stress-)testing the extent to which combining jamo works in various browsers—and when being served as a fully-functional webfont via Adobe Typekit—if you click here, you will open a 40MB HTML file that includes all 1,626,875 possible three-character combining jamo sequences (125 leading consonants, 95 vowels, and 137 trailing consonants) rendered using Adobe Clean Han and its 'ljmo' (Leading Jamo Forms), 'vjmo' (Vowel Jamo Forms), and 'tjmo' (Trailing Jamo Forms) GSUB features.
In the interest of testing the extent to which combining jamo works in various browsers—and when being served as a fully-functional webfont via Adobe Typekit—if you click here, you will open a 200K HTML file that includes all 11,875 possible two-character combining jamo sequences (125 leading consonants and 95 vowels) rendered using Adobe Clean Han and its 'ljmo' (Leading Jamo Forms), 'vjmo' (Vowel Jamo Forms), and 'tjmo' (Trailing Jamo Forms) GSUB features.
Again. I arrived on the afternoon of 2016-10-16.
This month provided to me yet another opportunity to visit Japan, the Land of the Rising Sun and my wife’s home country, thanks to IRG #47 (Ideographic Rapporteur Group Meeting #47) being hosted there. This trip was also the first time for me to visit an island of Japan other than Honshū (本州), specifically Shikoku (四国).
As a follow on to our seven-year-old May of 2009 article of the same name, several things have happened with the Adobe Clean family that have yet to be reported, and which have CJK implications. Hence the reason for spending my Sunday morning writing this article.
In the following year, 2010, I developed and deployed a Japanese version of Adobe Clean named Ryo Clean PlusN (りょう Clean PlusN in Japanese), and then in 2015, I developed and deployed a Pan-CJK version named Adobe Clean Han (Adobe Clean 黑体 in Simplified Chinese, Adobe Clean 黑體 in Traditional Chinese, Adobe Clean 角ゴシック in Japanese, and Adobe Clean 고딕 in Korean). These typeface families are Adobe corporate fonts that are meant to be used for product literature, for serving to Adobe websites, and for use by Adobe apps. They are not meant to be used by our customers, but I suspect that the readership of this blog may be interested in some of the development details. If this interests you, please continue reading.
Attention, students! Class is in session.
In my experience, the following two statements about standards are seemingly conflicting yet accurate:
On one hand, developing products, such as typeface designs and their fonts, depends on standards.
On the other hand, standards themselves are developed by humans, meaning that they are prone to error, especially when they happen to be character set or glyph standards that include thousands or tens of thousands of representative glyphs.
The first day of Autumn this year is Thursday, September 22nd, and my schedule for this upcoming season is filled with several standards-related activities…
UTC #148 took place in Redmond, Washington last week, hosted by our friends at Microsoft. It was a four-day working meeting, and many important Unicode-related issues and proposals were discussed. A total of 7,888 new characters were formally accepted into the standard during this meeting. Among them were the 7,473 CJK Unified Ideographs of Extension F, along with the lone CJK Unified Ideograph U+9FEA that is appended to the URO (Unified Repertoire & Ordering) and is the result of the disunification of 㸂 U+3E02, which were accepted on 2016-08-04 for inclusion into Unicode Version 10.0. Version 10.0 is slated for a June 2017 release. This means that my table above is now less tentative (clicking on the image will reveal the entire PDF file that includes details about the unchanged CJK Compatibility Ideographs).
Other CJK Unified Ideographs that are slated to be included in Unicode Version 10.0 are the 20 characters, U+9FD6 through U+9FE9, which were accepted on 2014-10-28 (UTC #141).
This will bring the total number of CJK Unified Ideographs to 87,882, and as the table at the top of this article suggests, there is not much room left in Plane 2, and Extension G is just around the corner.