Preparing for May 1, 2019—2019年5月1日の準備

(After realizing that the retargeting of Adobe-Japan1-7 to include only two glyphs, and with a fairly predictable release date range, exhibited characteristics of a pregnancy, I became inspired to write the text for the Adobe-Japan1-6 is Expecting! article while flying from SJC to ORD on the morning of 2018-07-20. I also prepared the article’s images while in-flight. The passenger sitting next to me was justifiably giving me funny looks. My flight to MSN, which was the final destination to attend my 35th high school class reunion in greater-metropolitan Mount Horeb, was delayed three hours, and this gave me an opportunity to publish the article while still on the ground at ORD.)

What do we know about Japan’s new era name? First and foremost, its announcement is unlikely to occur before 2019-02-25, because doing so would divert attention away from the 30th anniversary of the enthronement, 2019-02-24, but it may occur as late as 2019-05-01, which is the date on which the new era begins. That’s effectively a two-month window of uncertainty.

Interestingly, the date 2019-05-01 takes place not only during UTC #159, which will be hosted by me at Adobe, but also during Japan’s Golden Week (ゴールデンウィーク), which may begin early to prepare for the imperial transition.

Two Separate Kanji vs Two-kanji Square Ligature

It needs to be made absolutely clear that it is perfectly acceptable to represent Japan’s era names as two separate kanji, such as 平成 (Heisei) for the current era. However, the very first version of Unicode, Version 1.1 (1993), includes the two-kanji square ligature form of 平成 as U+337B SQUARE ERA NAME HEISEI, along with those for the three previous era names. JIS X 0213:2000 was the first JIS standard in which these four characters appeared (JIS X 0221-1995 doesn’t count). This means that there is a precedent for applying the same treatment to Japan’s forthcoming new era name.

I predict that the two-kanji square ligature form of Japan’s new era name will be used more frequently than U+337B for the current era, mainly because its use will be considered trendy. In addition, because it requires half the number of encoding units to represent, it may become popular or preferred in length–challenged environments, such as Twitter.

This also means that the JIS X 0213 standard, which was amended in 2004 and 2012, may be amended for a third time to include this new character. If that actually happens, which seems very likely, my best guess is that its Plane-Row-Cell value will be 1-13-63, which is the code point immediately before 1-13-64 (aka U+337B ).

Adobe’s Preparations

Adobe is already making preparations for its apps, fonts, and other pieces of infrastructure on which our customers and development partners depend.

To that end—and to minimize risk due to the fairly fixed timeline—I decided to define Adobe-Japan1-7 to add exactly two glyphs, CIDs 23058 and 23059, for the horizontal and vertical forms, respectively, of the two-kanji square ligature that will represent Japan’s forthcoming new era name. Furthermore, the code point that will represent the two-kanji square ligature form of Japan’s new era name, U+32FF ㋿, has been reserved by both the UTC (aka Unicode) and WG2 (aka ISO/IEC 10646). Given that the code point and CIDs are stable, I was able to release the Adobe-Japan1-7 versions of the CMap resources and ToUnicode mapping file to the CMap Resources and Mapping Resources for PDF open source projects, respectively, on 2018-07-30. This allows our apps to update to the Adobe-Japan1-7 versions. It also enables Japanese type foundries to prepare prototype Adobe-Japan1-7 fonts that use placeholder glyphs for CIDs 23058 and 23059. The updated ToUnicode mapping file, which maps Adobe-Japan1-7 CIDs to Unicode values, is particularly important for PDF workflows, especially for PDFs that do not include their own ToUnicode mapping table. Without the ToUnicode mapping file update, PDFs that include this glyph, but lack an embedded ToUnicode mapping table, will not be able to properly Copy&Paste U+32FF ㋿.

The actual Adobe-Japan1-7 specification cannot be updated until shortly after the announcement, because representative glyphs are necessary. Its Wiki does describe Adobe-Japan1-7, and also indicates that any glyphs that were previously candidates for Adobe-Japan1-7 are now Adobe-Japan1-8 ones.

Adobe’s priority, in terms of updating key typeface families to include the glyphs for U+32FF ㋿, is Kozuka Mincho (小塚明朝), because its glyphs are needed as the representative glyphs for the glyph charts of the Adobe-Japan1-7 specification. Next will be the open source Source Han Sans (源ノ角ゴシック) and Noto Sans CJK Pan-CJK typeface families, primarily because our friends at Google will need the latter family’s fonts for their ecosystem. (I am planning to include the hooks for supporting U+32FF ㋿ in the Version 2.000 update, by including placeholder glyphs and their mappings, to make the already-planned dot-release much easier.) I will then turn my attention to the Kozuka Gothic (小塚ゴシック) typeface family.

I already built—for internal Adobe testing purposes—Adobe-Japan1-7 prototypes of the Kozuka fonts that use placeholder glyphs for CIDs 23058 and 23059. The 'cmap' table maps U+32FF ㋿ to CID+23058, and the 'vert' (Vertical Alternates) GSUB feature substitutes CID+23058 for CID+23059 when in vertical writing mode. These fonts have already proven to be very useful. The JIS2004-savvy fonts include “Pr7N” in their names, and the JIS90-savvy ones include “Pr7” instead. I also used the opportunity to build two types of OpenType/CFF Collections. One type includes two fonts, specifically the Pr7N and Pr7 versions for each family and weight, which share the same CFF. The other one simply includes all 24 Kozuka fonts, and is a little less than 70MB. It’s not difficult to guess which one I am using.

Prototype Adobe-Japan1-7 Font

Although I cannot make the prototype Kozuka fonts available, because they are commercial fonts, I did build an open source font named Adobe Japan1 7 Heavy whose CIDs use full-width glyphs of the single digits that represent the Supplement to which they belong. For example, the glyphs for CIDs 23058 and 23059 are displayed as the digit for the number seven. I prepared test PDFs that were exported from Adobe InDesign and Adobe Illustrator that include five Japan era name characters—U+337E ㍾, U+337D ㍽, U+337C ㍼, U+337B ㍻, & U+32FF ㋿—in a horizontal and vertical text frame. (Note that the original Adobe InDesign or Adobe Illustrator files are attached to the respective PDFs for the purpose of repurposing.) The horizontal text displays as “00017” because the glyphs for the first three are in Supplement 0 (aka Adobe-Japan1-0), that of the fourth is in Supplement 1 (aka Adobe-Japan1-1), and that of the fifth is in Supplement 7 (aka Adobe-Japan1-7). The vertical text displays as “44447” because the glyphs for the first four are in Supplement 4 (aka Adobe-Japan1-4), and that of the fifth is in Supplement 7 (aka Adobe-Japan1-7). The InDesign-exported PDF file includes an embedded ToUnicode mapping table, whereas the Illustrator-exported one does not, and therefore depends on Adobe Acrobat’s “Adobe-Japan1-UCS” ToUnicode mapping file to correctly Copy&Paste the glyphs for U+32FF ㋿.

Other Preparations

Even if Japan’s new era name were to be announced as early as 2019-02-25, it would still be too late to include it in Unicode Version 12.0, which is scheduled to be released on 2019-03-05. The UTC has therefore decided to issue a dot-release, Version 12.1, shortly after the announcement, and it will include a single character, specifically U+32FF ㋿. This isn’t the first time that Unicode issued a dot-release that included only one character. Version 6.2, which was released in 2012, added only U+20BA ₺ TURKISH LIRA SIGN. Anyway, what makes this new character particularly problematic, in terms of not being able to include it in Version 12.0, is the fact that its character name cannot be established until the two kanji that comprise it are announced, and also that it requires a decomposition to be defined that results in the same two kanji. For example, U+337B decomposes to 平成 according to NFKD (Normalization Form KD: Compatibility Decomposition) and NFKC (Normalization Form KC: Compatibility Decomposition, followed by Canonical Composition).

Shortly after Unicode Version 12.1 is released, CLDR (Common Locale Data Repository) and ICU (International Components for Unicode) are expected to be updated to support the new era name. This is particularly important for updating calendar and date formats.

In closing, if your company develops products that may be effected by Japan’s era name change, I strongly encourage you to start taking action now, at least to the extent that is possible. If you are a font developer, hopefully the preparations that I have made thus far are helpful for your own efforts.

🐡

Leave a Reply

Your email address will not be published. Required fields are marked *