Earlier this month, I decided to move the Adobe-Japan1-6 character collection specification to the Adobe Type Tools organization on GitHub, which was partly motivated by constantly-changing URLs on our Font Technical Notes page. Another motivation was to make the specification itself easier to maintain. At some point, I will be adding a more complete list of Supplement 7 (aka Adobe-Japan1-7) candidates to its wiki.
To this end, I decided to do the same for the Adobe-CNS1-7 and Adobe-GB1-5 character collection specifications while on vacation in South Dakota. For the former, I also used the opportunity to update the specification to include Supplement 7 (aka Adobe-CNS1-7), by adding its representative glyphs and other details.
So, that’s three down, and one to go.
This is a very brief article whose purpose is to simply state that—due to recent events beyond my control*—the Adobe-Japan1-6 character collection specification is now an open source project that is hosted on GitHub as a new repository in the Adobe Type Tools organization.
Most of my morning was consumed by porting the original text from Adobe InDesign to GitHub-flavored Markdown, and, while I was touching the text, I decided to seize the opportunity to make several corrections and updates. The 500-glyphs-per-page representative glyph charts are now in a separate PDF file. I also used the opportunity to update the aj16-kanji.txt datafile, and also added the latest-and-greatest Adobe-Japan1-6 UVS (Unicode Variation Sequence) definition file. All good stuff, I think.
* Adobe’s IT folks apparently felt compelled to (once again) change the URLs for all of the font-related Adobe Tech Notes, including Adobe Tech Note #5078 (The Adobe-Japan1-6 Character Collection). Its URL is somewhat broadly referenced, including in the IVD_Collection.txt file of the latest version of the IVD (Ideographic Variation Database). The bottom line is that I needed a stable URL.
It is difficult to imagine that it has been over 20 years since a new RO—or Adobe CID-keyed glyph set—was born. Of course, I am referring to the static glyph sets, not the ones based on the special-purpose Adobe-Identity-0 ROS.
“RO” stands for Registry and Ordering, which represent compatibility names or identifiers for CID-keyed glyph sets that are referred to as character collections. Adobe CID-keyed glyph sets are usually referred to as ROSes, with the final “S” being an integer that refers to a specific Supplement. The first Supplement, of course, is 0 (zero).
One of my recent projects is to revitalize and modernize our Korean glyph set, Adobe-Korea1-2 (see Adobe Tech Note #5093), which was last modified on 1998-10-12 by defining Supplement 2 that added only pre-rotated versions of the proportional and half-width glyphs that are referenced by the effectively-deprecated 'vrt2' (Vertical Alternates and Rotation) GSUB feature. Instead of defining a new Supplement, I decided that it would be better to simply define a completely new glyph set for a variety of reasons. The tentative Registry and Ordering names are Adobe and KR (meaning “Adobe-KR”), and unlike other ROSes for which Supplements are defined incrementally, my current plan is to simultaneously define seven Supplements, 0 through 6.
I have attended every Internationalization & Unicode Conference (IUC) since IUC31 in 2007, and Adobe has been a continuous Gold Sponsor since IUC31. Unfortunately, duty calls, in the form of attending and hosting IRG #49 that takes place during the same week as IUC41, which means that I can neither attend nor present this year. Of course, Adobe continues to be a Gold Sponsor of this important event.
U+2F9B2 䕫 is a CJK Compatibility Ideograph, and like all CJK Compatibility Ideographs, it canonically decomposes to a CJK Unified Ideograph, and also has a Standardized Variation Sequence (SVS) that uses its canonical equivalent as its base character. This character also has a single source reference, H-8FA8, which corresponds to HKSCS (Hong Kong Supplementary Character Set) 0x8FA8.
So, what’s the problem? Put simply, its canonical equivalent, U+456B 䕫, is neither in HKSCS nor in its Big Five subset:
If this character is ever normalized—regardless of the normalization form—it is converted to its canonical equivalent, U+456B 䕫, which is not likely to be included in fonts that are designed for use in Hong Kong SAR. Furthermore, even if its SVS, <U+456B,U+FE00>, is used, there is a similar problem in that its base character is also not likely to be present in fonts that are designed for use in Hong Kong SAR.
There has been a flurry of IVD (Ideographic Variation Database) activity this year.
First, UTS #37 (Unicode Ideographic Variation Database) was updated at the end of January to allow characters with the “Ideographic” property to serve as valid base characters in an IVS (Ideographic Variation Sequence). This effectively means that the Tangut (西夏文) and Nüshu (女书/女書) scripts can now participate in the IVD.
Unlike Unicode, which has been on an annual release cycle from Version 7.0, mainly to provide predictability to the release schedule for the benefit of developers, national standards—particularly East Asian ones—are updated much less frequently.
The latest East Asian national standard to be updated is HKSCS (Hong Kong Supplementary Character Set). HKSCS-2016, the fifth version of this particular standard, was published in May of 2017. As a result, and for the benefit of font developers whose fonts are based on Adobe’s public glyph sets, I used the morning of 🇺🇸’s Fourth of July of this year to publish Adobe-CNS1-7 via the CMap Resources open source project.
One of my hobbies is apparently to explore various ways to stress-test Adobe products, and the target of today’s article happens to be recent adventures with Adobe InDesign and our Source Han families.
The background is that I produced Unicode-based glyph synopses as part of the Source Han Sans and Source Han Serif releases, but those PDFs show only up to 256 code points per page, and it takes several hundred pages to show their complete Unicode coverage. I also produced single-page PDFs that show all 65,535 glyphs. A Source Han Sans one is available here, and a Source Han Serif one is available here. However, they are not Unicode-based.
At seemingly every opportunity, whether via this blog or during public speaking engagements, I have made it abundantly clear that the Adobe-branded Source Han families share the same glyph set as the corresponding Google-branded Noto CJK families. That is simply because it is true. What requires a bit of explanation, however, is how the two typeface designs—Source Han Sans and Source Han Serif—differ. That is what this particular article is about.
As the Project Architect of these Pan-CJK typeface families, I have my fingers on all of the data that was used during their development, and for preparing each release. I can therefore impart some useful tidbits of information that cannot be found elsewhere.
Besides being the world’s first open source serif-style Pan-CJK typeface families, the Adobe-branded Source Han Serif and the Google-branded Noto Serif CJK also represent the first broad deployment of two highly-complex and related ideographs that are in the process of being encoded. Their glyphs are shown above in all seven weights. Although it may be hard to believe, the fourth line illustrates the simplified version.
Or, perhaps more accurately, the project that has been keeping me busy for the past couple of years.
The Adobe-branded Source Han Serif (named 源ノ明朝 in Japanese, 본명조 in Korean, 思源宋体 in Simplified Chinese, and 思源宋體 in Traditional Chinese) and Google-branded Noto Serif CJK open source Pan-CJK typeface families, which represent the serif-style counterparts to the similarly-named and also open source Source Han Sans and Noto Sans CJK Pan-CJK typeface families, were released on 2017-04-03. You can read more about the Source Han Serif release here (日本語—한국어—简体中文—繁體中文), which includes a six-minute promotional video.
This article provides information that you would not expect to find in the official announcements for Source Han Serif or Noto Serif CJK, mainly because such information is intended for a completely different audience, which is primarily comprised of font developers.
Unless noted otherwise, all further references to Source Han Serif or Source Han Sans will apply to Noto Serif CJK or Noto Sans CJK, respectively.
The IVD (Ideographic Variation Database) is all about ideograph variants. Up until earlier this year, its scope was limited to CJK Unified Ideographs, per UTS #37 (Unicode Ideographic Variation Database). Its scope now includes characters with the Ideographic property that are not canonically nor compatibly decomposable, which still excludes CJK Compatibility Ideographs.
In an ideal world, a particular glyph—whether it’s considered the standard (aka encoded) form or an unencoded variant of the standard form—would be associated with a single registered IVS (Ideographic Variation Sequence) within an IVD collection. However, we do not live in a perfect world, and several real-world conditions can lead to duplicate sequence identifiers within an IVD collection.
I will open this article by stating that OpenType features are almost always GSUB (Glyph SUBstitution) or GPOS (Glyph POSitioning). The former table specifies features that substitute glyphs with other glyphs, usually in a 1:1 fashion, but not always. The latter table specifies features that alter the metrics of glyphs, or the inter-glyph metrics (aka kerning).
The focus of this particular article will be the 'vert' (Vertical Alternates) feature, which substitutes a glyph with the appropriate glyph for vertical writing, and is invoked when in vertical writing mode. In other words, it’s a GSUB feature, and one that needs to be invoked for proper vertical writing. Current implementations that support the 'vert' GSUB feature, which tend to be CJK fonts, substitute glyphs with their vertical forms on a 1:1 basis, though language-tagging may affect the outcome for Pan-CJK fonts, such as the Adobe-branded Source Han Sans and the Google-branded Noto Sans CJK, which support multiple languages.
This article is largely a test, but also serves to start the process of resurrecting L2/14-006 (Proposal to add standardized variation sequences for nine characters) for discussion at UTC #151 in early May.
Liang Hai (梁海) brought up this document for discussion at UTC #150 last week, and while I had an opportunity to have it accepted by the UTC, to be included in Unicode Version 10.0 (June, 2017), I decided that it was prudent to instead prepare a revised proposal that is more complete, mainly because L2/14-006 was submitted and discussed prior to the first release of the Adobe-branded Source Han Sans and Google-branded Noto Sans CJK Pan-CJK typeface families. This functionality was implemented in those typeface families via the 'locl' GSUB feature, which requires the text to be language-tagged. In other words, I learned a lot since L2/14-006 was discussed, and prefer to submit a more complete proposal, even if it means waiting for Unicode Version 11.0 (June, 2018).
It is now January 28, 2017 in China and other Chinese-speaking regions.
I’d like to use this opportunity to welcome the Year of the Rooster, and to wish a Chinese New Year to all of my Chinese friends, colleagues, and blog readers. May this year be safe, prosperous, and enjoyable.
As recorded on the very first page of Adobe Tech Note #5078, Adobe-Japan1-6 was released on 2004-03-05, and one of the glyphs that was added was CID+20958. According to the Adobe-Japan1-6 ordering file, its glyph name is freedial, and is assigned to the Dingbats FDArray element for the purpose of hinting. Of course, if you look for CID+20958 in Adobe Tech Note #5078, you can find it on the bottom of page 54, immediately to the right of CID+20957 that maps from U+26BD ⚽ SOCCER BALL, though it is blank. This is simply because Adobe does not have the rights to use NTT’s trademarked FreeDial mark. CID+20958 was included in Adobe-Japan1-6 for the benefit of font developers who do have the rights to use this mark, and can thus include the glyph in their fonts.
UTC #150, the 150th Unicode Technical Committee meeting, takes place later this month, from 2017-01-23 through 2017-01-26, and will be hosted by our friends at Apple in Sunnyvale, California. I will attend as Adobe’s representative. As usual, there will be CJK- and IRG-related items to discuss. One item will be the UTC’s review of IRG Working Set 2015 Version 3.0 (aka CJK Unified Ideographs Extension G), L2/17-006, which I recently finished.
A major focus of UTC #150 will be Unicode Version 10.0, which is scheduled to be released in June of this year. Unicode Version 10.0 will include 21 additional characters appended to the URO (Unified Repertoire & Ordering), along with Extension F (7,473 characters), as shown here.
While we’re on the subject of Unicode, be sure to explore the sidebar on the right side of this blog’s landing page, which includes links to several useful Unicode-related resources.
Unlike the first and second similarly-titled articles that I published last month, this article will focus on a minor efficiency for the combining jamo feature of the Adobe-branded Source Han Sans and Google-branded Noto Sans CJK Pan-CJK typeface families.
Prior to Japan’s script reform of 1900, there was more than one shape associated with each syllable of the hiragana syllabary. There is now only one shape associated with each syllable. The now-obsolete and nonstandard shapes are referred to as hentaigana (変体仮名), which simply means variant kana. Hentaigana are still in use today in Japan, but are limited to Japan’s family registry (戸籍 koseki in Japanese) and specialized uses, such as business signage and other decor that are specifically designed to convey a feeling of nostalgia or traditional charm.
In addition to the Wikipedia article that is linked from the previous paragraph, 『変体仮名のこれまでとこれから—情報交換のための標準化』 (The past, present, and future of hentaigana: Standardization for information processing) by TAKADA Tomokazu (高田智和) et al. and About the inclusion of standardized codepoints for Hentaigana by YADA Tsutomu (矢田勉) serve as excellent reading material.