Japanese line layout is very complex, and the first attempt to standardize its rules and principles was in the JIS X 4051 standard, which was first issued in 1993 with the title 日本語文書の行組版方法 (Line Composition Rules for Japanese Documents in English). There was a revision issued in 1995, and the latest version was issued in 2004 with the slightly different title 日本語文書の組版方法 (Formatting rules for Japanese documents). Another important document is the W3C Working Group Note JLREQ (Requirements for Japanese Text Layout), which provides much of what is described in JIS X 4051, but covers additional areas, and is tailored toward web technologies. Although still considered working drafts, W3C is also preparing similar documents for Chinese and Korean as CLREQ (Requirements for Chinese Text Layout) and KLREQ (Requirements for Hangul Text Layout and Typography), respectively.
This article is not about these standards per se, which are intended for apps and environments that implement sophisticated line layout. Rather, this article is about harsher “plain text” or comparable environments that generally do not need such treatment, yet still benefit from a modest amount of context-based spacing adjustment, particularly to get rid of unwanted space between full-width brackets and other punctuation whose glyphs generally fill half of the em-box. App menus, app dialogs, and simple text editors are examples of where such adjustments can improve text layout in these modest ways.
Today’s article provides useful details for our relatively small number of customers who author documents with our flagship Creative Cloud apps and make use of CID-keyed OpenType SVG fonts. A rather broadly-deployed CID-keyed OpenType SVG typeface is the open source Source Han Code JP family, whose development details are described in the very first section of this article.
While it is fully possible to build OpenType fonts—CID-keyed or otherwise—that include an 'SVG ' (Scalable Vector Graphics) table, the infrastructure to support them in apps is still maturing. That is the purpose of this article, so please continue reading if the details interest or otherwise affect you.
Per a suggestion by a friend named Leroy, I recently renamed the multiple-style and multiple-family OTCs (OpenType Collections) in this open source repository which includes such OTCs that are based on the Adobe-branded Source Han and Google-branded Noto CJK families. These multiple-style and multiple-family OpenType Collections were described in this article from April of this year. The purpose of this particular article is to introduce better names for them besides Super OTC.
First, some background about Super OTCs…
Shortly after Source Han Sans and Noto Sans CJK were released, I came up with the idea of creating a single OpenType Collection that includes all languages and all weights, and the name Super OTC was coined. This was included in the Version 1.001 update (2014-09-12) as a fourth deployment format for both families, and each one included 28 fonts. These were expanded to 36 fonts when the HW (half-width, ASCII-only) fonts, which covered only the Regular and Bold weights, were added as part of the Version 1.002 update (2015-04-20). Source Han Serif and Noto Serif CJK included a Super OTC in their Version 1.000 release (2017-04-03).
One of my hobbies is apparently to explore various ways to stress-test Adobe products, and the target of today’s article happens to be recent adventures with Adobe InDesign and our Source Han families.
The background is that I produced Unicode-based glyph synopses as part of the Source Han Sans and Source Han Serif releases, but those PDFs show only up to 256 code points per page, and it takes several hundred pages to show their complete Unicode coverage. I also produced single-page PDFs that show all 65,535 glyphs. A Source Han Sans one is available here, and a Source Han Serif one is available here. However, they are not Unicode-based.
At seemingly every opportunity, whether via this blog or during public speaking engagements, I have made it abundantly clear that the Adobe-branded Source Han families share the same glyph set as the corresponding Google-branded Noto CJK families. That is simply because it is true. What requires a bit of explanation, however, is how the two typeface designs—Source Han Sans and Source Han Serif—differ. That is what this particular article is about.
As the Project Architect of these Pan-CJK typeface families, I have my fingers on all of the data that was used during their development, and for preparing each release. I can therefore impart some useful tidbits of information that cannot be found elsewhere.
To take the previous article further—and because I tend to have an urge to stress-test environments—I added two more Super OTCs to the Source Han Super OTC open source project this morning.
The release of Source Han Serif earlier this month, on 2017-04-03, gave me an opportunity to build yet another resource for stress-testing environments, particularly those that consume OpenType/CFF Collections. (This also continues to simplify file management by combining three Super OTCs into a much larger one.)
Perhaps as a continuation of this article from almost a year ago with a clever image, I’d like to use this opportunity to mention that the AFDKO tx tool is about to get a new and improved CFF subroutinizer.
The tx tool has actually had a CFF subroutinizer for quite some time, since late 2008 or so, which is invoked by using the “+S” command-line option in combination with the “-cff” command-line option, and while it was noticeably faster than the AFDKO makeotf tool’s built-in subroutinizer, there were issues that prevented me from using it, such as recursion depth and the inability to limit the number of local and global subroutines.
Based on my testing thus far—using my trusty 2014 Apple MacBook Pro—the tx tool’s new subroutinizer is over three orders of magnitude faster that the makeotf tool’s built-in one. Yes, over one-thousand times faster! CIDFont resources that once took hours to subroutinize now take mere seconds, and with comparable results both in terms of number of subroutines and reduced CFF size. The 65,535-glyph Source Han Sans CIDFont resources take approximately 30 seconds to become subroutinized CFFs, and the 23,058-glyph Kozuka Gothic Pr6N (小塚ゴシック Pr6N) and Kozuka Mincho Pr6N (小塚明朝 Pr6N) ones take less than 10 seconds each.
Anyway, the next release of AFDKO will include a version of the tx tool that includes this new and improved subroutinizer. Of course, the primary beneficiaries of this new version are those who build OpenType/CFF fonts that include thousands or tens of thousands of glyphs, like me.
In closing, I’d like to draw attention to the open source otfcc project on GitHub, which apparently provides similar CFF subroutinization results, in terms of speed and the end result.
I will open this article by stating that OpenType features are almost always GSUB (Glyph SUBstitution) or GPOS (Glyph POSitioning). The former table specifies features that substitute glyphs with other glyphs, usually in a 1:1 fashion, but not always. The latter table specifies features that alter the metrics of glyphs, or the inter-glyph metrics (aka kerning).
The focus of this particular article will be the 'vert' (Vertical Alternates) feature, which substitutes a glyph with the appropriate glyph for vertical writing, and is invoked when in vertical writing mode. In other words, it’s a GSUB feature, and one that needs to be invoked for proper vertical writing. Current implementations that support the 'vert' GSUB feature, which tend to be CJK fonts, substitute glyphs with their vertical forms on a 1:1 basis, though language-tagging may affect the outcome for Pan-CJK fonts, such as the Adobe-branded Source Han Sans and the Google-branded Noto Sans CJK, which support multiple languages.
This article is largely a test, but also serves to start the process of resurrecting L2/14-006 (Proposal to add standardized variation sequences for nine characters) for discussion at UTC #151 in early May.
Liang Hai (梁海) brought up this document for discussion at UTC #150 last week, and while I had an opportunity to have it accepted by the UTC, to be included in Unicode Version 10.0 (June, 2017), I decided that it was prudent to instead prepare a revised proposal that is more complete, mainly because L2/14-006 was submitted and discussed prior to the first release of the Adobe-branded Source Han Sans and Google-branded Noto Sans CJK Pan-CJK typeface families. This functionality was implemented in those typeface families via the 'locl' GSUB feature, which requires the text to be language-tagged. In other words, I learned a lot since L2/14-006 was discussed, and prefer to submit a more complete proposal, even if it means waiting for Unicode Version 11.0 (June, 2018).
Please pardon the apparent non-CJK interruption in the form of this particular article, but I wanted to bring to the readership’s attention a new open source project that has a very long history: ehandler.ps.
Unlike the first and second similarly-titled articles that I published last month, this article will focus on a minor efficiency for the combining jamo feature of the Adobe-branded Source Han Sans and Google-branded Noto Sans CJK Pan-CJK typeface families.
To (significantly) expand yesterday’s super exciting article, and in the continued interest of (stress-)testing the extent to which combining jamo works in various browsers—and when being served as a fully-functional webfont via Adobe Typekit—if you click here, you will open a 40MB HTML file that includes all 1,626,875 possible three-character combining jamo sequences (125 leading consonants, 95 vowels, and 137 trailing consonants) rendered using Adobe Clean Han and its 'ljmo' (Leading Jamo Forms), 'vjmo' (Vowel Jamo Forms), and 'tjmo' (Trailing Jamo Forms) GSUB features.
In the interest of testing the extent to which combining jamo works in various browsers—and when being served as a fully-functional webfont via Adobe Typekit—if you click here, you will open a 200K HTML file that includes all 11,875 possible two-character combining jamo sequences (125 leading consonants and 95 vowels) rendered using Adobe Clean Han and its 'ljmo' (Leading Jamo Forms), 'vjmo' (Vowel Jamo Forms), and 'tjmo' (Trailing Jamo Forms) GSUB features.
Attention, students! Class is in session.
In my experience, the following two statements about standards are seemingly conflicting yet accurate:
- Standards are incredibly useful—and required—for product development.
- Standards cannot be completely trusted.
On one hand, developing products, such as typeface designs and their fonts, depends on standards.
On the other hand, standards themselves are developed by humans, meaning that they are prone to error, especially when they happen to be character set or glyph standards that include thousands or tens of thousands of representative glyphs.
Inspired by the font that I prepared for and referenced in the previous article, I decided to launch a dedicated open source project for this useful test font, LOCL Test.
One of the most powerful font-development tools available today is tx (Type eXchange), which is included in AFDKO (Adobe Font Development Kit for OpenType) and whose sources are available on GitHub. Despite its two-letter name, this command-line utility is packed with an enormous amount of features and functionality.
Four years ago I wrote a similar article, but it seems like a good time to revisit tx and the useful things that it can do. I still recommend that its “-u” and -h” command-line options be used to explore its vast capabilities.
(The introductory graphic illustrates how the character 剣 (U+5263) is displayed using the fonts that are introduced in this article. The code point for this character maps to a glyph that displays as “63” in the FDArray Test 257 font, which is the hexadecimal equivalent of the decimal index of the FDArray element to which its glyph is assigned, which is 99. Likewise, the code point for this character maps to a glyph that displays as “52” in the FDArray Test 65535 font, which is the hexadecimal equivalent of the decimal index of the FDArray element to which its glyph is assigned, which is 82.)
I have built several CID-keyed OpenType/CFF fonts that are specifically designed to test various limits, by exercising various implementation limits, such as the number of glyphs (65,535 is the architectural limit), the number of FDArray elements (256 is the architectural limit), and the number of mappings in the ‘cmap‘ table (when the surrogates and non-characters are factored out, Unicode has 1,111,998 possible mappings in its 17 planes). I have sometimes made these fonts available, such as in this May of 2012 article that explains how such fonts can be built.
Anyway, I spent pretty much all day yesterday—except for a somewhat longer than usual lunch break that was actually used to watch The Martian (2015) with my wife—preparing a pair of open source CID-keyed OpenType/CFF fonts that exercise these limits but to different degrees, and I also managed to prepare and release the project on GitHub as FDArray Test.
To follow up on my June 2011 article about managing XUID arrays in CIDFont resources, which still conveys accurate information, it has come to our attention that the integer values for the second and subsequent XUID array elements should not exceed seven digits, meaning that 9999999 is the largest integer value that should be used. Integer values that exceed seven digits can result in some implementations treating the XUID arrays of different fonts within the same printing job the same, which affects font caching, and which can result in the wrong font being used to render some characters. This printing issue may happen even if the glyphs display correctly in the PDF file on screen.
Another solution is to simply omit the XUID array from the CIDFont resource header, which effectively disables font caching. For modern printers, font caching has little or no benefit.
Lastly, for those font developers who still include a UIDBase value in their CIDFont resource headers, it can be safely removed. In fact, I strongly recommend that it be removed.