Er, um, oops.
One of my hobbies is apparently to explore various ways to stress-test Adobe products, and the target of today’s article happens to be recent adventures with Adobe InDesign and our Source Han families.
The background is that I produced Unicode-based glyph synopses as part of the Source Han Sans and Source Han Serif releases, but those PDFs show only up to 256 code points per page, and it takes several hundred pages to show their complete Unicode coverage. I also produced single-page PDFs that show all 65,535 glyphs. A Source Han Sans one is available here, and a Source Han Serif one is available here. However, they are not Unicode-based.
At seemingly every opportunity, whether via this blog or during public speaking engagements, I have made it abundantly clear that the Adobe-branded Source Han families share the same glyph set as the corresponding Google-branded Noto CJK families. That is simply because it is true. What requires a bit of explanation, however, is how the two typeface designs—Source Han Sans and Source Han Serif—differ. That is what this particular article is about.
As the Project Architect of these Pan-CJK typeface families, I have my fingers on all of the data that was used during their development, and for preparing each release. I can therefore impart some useful tidbits of information that cannot be found elsewhere.
The release of Source Han Serif earlier this month, on 2017-04-03, gave me an opportunity to build yet another resource for stress-testing environments, particularly those that consume OpenType/CFF Collections. (This also continues to simplify file management by combining three Super OTCs into a much larger one.)
Perhaps as a continuation of this article from almost a year ago with a clever image, I’d like to use this opportunity to mention that the AFDKO tx tool is about to get a new and improved CFF subroutinizer.
The tx tool has actually had a CFF subroutinizer for quite some time, since late 2008 or so, which is invoked by using the “+S” command-line option in combination with the “-cff” command-line option, and while it was noticeably faster than the AFDKO makeotf tool’s built-in subroutinizer, there were issues that prevented me from using it, such as recursion depth and the inability to limit the number of local and global subroutines.
Based on my testing thus far—using my trusty 2014 Apple MacBook Pro—the tx tool’s new subroutinizer is over three orders of magnitude faster that the makeotf tool’s built-in one. Yes, over one-thousand times faster! CIDFont resources that once took hours to subroutinize now take mere seconds, and with comparable results both in terms of number of subroutines and reduced CFF size. The 65,535-glyph Source Han Sans CIDFont resources take approximately 30 seconds to become subroutinized CFFs, and the 23,058-glyph Kozuka Gothic Pr6N (小塚ゴシック Pr6N) and Kozuka Mincho Pr6N (小塚明朝 Pr6N) ones take less than 10 seconds each.
Anyway, the next release of AFDKO will include a version of the tx tool that includes this new and improved subroutinizer. Of course, the primary beneficiaries of this new version are those who build OpenType/CFF fonts that include thousands or tens of thousands of glyphs, like me.
In closing, I’d like to draw attention to the open source otfcc project on GitHub, which apparently provides similar CFF subroutinization results, in terms of speed and the end result.
I will open this article by stating that OpenType features are almost always GSUB (Glyph SUBstitution) or GPOS (Glyph POSitioning). The former table specifies features that substitute glyphs with other glyphs, usually in a 1:1 fashion, but not always. The latter table specifies features that alter the metrics of glyphs, or the inter-glyph metrics (aka kerning).
The focus of this particular article will be the 'vert' (Vertical Alternates) feature, which substitutes a glyph with the appropriate glyph for vertical writing, and is invoked when in vertical writing mode. In other words, it’s a GSUB feature, and one that needs to be invoked for proper vertical writing. Current implementations that support the 'vert' GSUB feature, which tend to be CJK fonts, substitute glyphs with their vertical forms on a 1:1 basis, though language-tagging may affect the outcome for Pan-CJK fonts, such as the Adobe-branded Source Han Sans and the Google-branded Noto Sans CJK, which support multiple languages.
This article is largely a test, but also serves to start the process of resurrecting L2/14-006 (Proposal to add standardized variation sequences for nine characters) for discussion at UTC #151 in early May.
Liang Hai (梁海) brought up this document for discussion at UTC #150 last week, and while I had an opportunity to have it accepted by the UTC, to be included in Unicode Version 10.0 (June, 2017), I decided that it was prudent to instead prepare a revised proposal that is more complete, mainly because L2/14-006 was submitted and discussed prior to the first release of the Adobe-branded Source Han Sans and Google-branded Noto Sans CJK Pan-CJK typeface families. This functionality was implemented in those typeface families via the 'locl' GSUB feature, which requires the text to be language-tagged. In other words, I learned a lot since L2/14-006 was discussed, and prefer to submit a more complete proposal, even if it means waiting for Unicode Version 11.0 (June, 2018).
To (significantly) expand yesterday’s super exciting article, and in the continued interest of (stress-)testing the extent to which combining jamo works in various browsers—and when being served as a fully-functional webfont via Adobe Typekit—if you click here, you will open a 40MB HTML file that includes all 1,626,875 possible three-character combining jamo sequences (125 leading consonants, 95 vowels, and 137 trailing consonants) rendered using Adobe Clean Han and its 'ljmo' (Leading Jamo Forms), 'vjmo' (Vowel Jamo Forms), and 'tjmo' (Trailing Jamo Forms) GSUB features.
In the interest of testing the extent to which combining jamo works in various browsers—and when being served as a fully-functional webfont via Adobe Typekit—if you click here, you will open a 200K HTML file that includes all 11,875 possible two-character combining jamo sequences (125 leading consonants and 95 vowels) rendered using Adobe Clean Han and its 'ljmo' (Leading Jamo Forms), 'vjmo' (Vowel Jamo Forms), and 'tjmo' (Trailing Jamo Forms) GSUB features.
Attention, students! Class is in session.
In my experience, the following two statements about standards are seemingly conflicting yet accurate:
On one hand, developing products, such as typeface designs and their fonts, depends on standards.
On the other hand, standards themselves are developed by humans, meaning that they are prone to error, especially when they happen to be character set or glyph standards that include thousands or tens of thousands of representative glyphs.
One of the most powerful font-development tools available today is tx (Type eXchange), which is included in AFDKO (Adobe Font Development Kit for OpenType) and whose sources are available on GitHub. Despite its two-letter name, this command-line utility is packed with an enormous amount of features and functionality.
Four years ago I wrote a similar article, but it seems like a good time to revisit tx and the useful things that it can do. I still recommend that its “-u” and -h” command-line options be used to explore its vast capabilities.
(The introductory graphic illustrates how the character 剣 (U+5263) is displayed using the fonts that are introduced in this article. The code point for this character maps to a glyph that displays as “63” in the FDArray Test 257 font, which is the hexadecimal equivalent of the decimal index of the FDArray element to which its glyph is assigned, which is 99. Likewise, the code point for this character maps to a glyph that displays as “52” in the FDArray Test 65535 font, which is the hexadecimal equivalent of the decimal index of the FDArray element to which its glyph is assigned, which is 82.)
I have built several CID-keyed OpenType/CFF fonts that are specifically designed to test various limits, by exercising various implementation limits, such as the number of glyphs (65,535 is the architectural limit), the number of FDArray elements (256 is the architectural limit), and the number of mappings in the ‘cmap‘ table (when the surrogates and non-characters are factored out, Unicode has 1,111,998 possible mappings in its 17 planes). I have sometimes made these fonts available, such as in this May of 2012 article that explains how such fonts can be built.
Anyway, I spent pretty much all day yesterday—except for a somewhat longer than usual lunch break that was actually used to watch The Martian (2015) with my wife—preparing a pair of open source CID-keyed OpenType/CFF fonts that exercise these limits but to different degrees, and I also managed to prepare and release the project on GitHub as FDArray Test.
To follow up on my June 2011 article about managing XUID arrays in CIDFont resources, which still conveys accurate information, it has come to our attention that the integer values for the second and subsequent XUID array elements should not exceed seven digits, meaning that 9999999 is the largest integer value that should be used. Integer values that exceed seven digits can result in some implementations treating the XUID arrays of different fonts within the same printing job the same, which affects font caching, and which can result in the wrong font being used to render some characters. This printing issue may happen even if the glyphs display correctly in the PDF file on screen.
Another solution is to simply omit the XUID array from the CIDFont resource header, which effectively disables font caching. For modern printers, font caching has little or no benefit.
Lastly, for those font developers who still include a UIDBase value in their CIDFont resource headers, it can be safely removed. In fact, I strongly recommend that it be removed.
While I won’t repeat here any of the exciting details in Typekit’s recent announcement for East Asia web font support (简体中文, 繁體中文, 日本語, 한국어) that employs dynamic kits, I’d like to seize this opportunity to demonstrate some of the default behavior that this new development exposes in various browsers.
Yesterday morning I came up with the idea to produce a font for testing the extent to which applications and other text-handling environments support IVSes (Ideographic Variation Sequences), and ended up devoting the better part of this Easter weekend assembling, testing, and releasing the font as open source on GitHub. The font is named IVS Test, and as usual for me, it is an Adobe-Identity-0 ROS CID-keyed OpenType/CFF font.
I am pleased to announce that the new CSS Orientation Test OpenType Fonts open source project was launched on Adobe’s open-source portal, Open@Adobe, today. This open source project consists of two OpenType/CFF fonts that were developed at the request of Koji Ishii (石井宏治), the editor of Unicode’s forthcoming UTR #50 (Unicode Vertical Text Layout). The purpose of these fonts is for developers to be able to more easily test whether glyph orientation in their implementation is correct or not.