The Missing Link

The first version of the IVD (Ideographic Variation Database) was issued on 2007-12-14, meaning over eight years ago, and there have been three subsequent revisions, the latest being issued on 2014-05-16. There are currently three registered IVD collections: Adobe-Japan1, Hanyo-Denshi, and Moji_Joho. A significant number of IVSes are shared between the latter two IVD collections, 9,685 to be exact. While I cannot speak to the latter two IVD collections, the Adobe-Japan1 one is supported by hundreds of OpenType fonts via the Format 14 (Unicode Variation Sequences) ‘cmap‘ subtable. Furthermore, the number of apps and OSes that support UVSes has reached critical mass.

With all that said, there is a rather substantial missing link in terms of IVD support infrastructure: the all-important input method.

But, before we explore what it takes to be an IVS-savvy input method, let’s review some of the ways in which IVSes can be used to represent otherwise unencoded variant forms in “plain text.”

Below is a live table that shows the representative glyphs that are associated with the Adobe-Japan1 IVSes for three base characters (the default glyph, meaning the one that corresponds to the base character and does not require an IVS but uses one anyway, is shown in red):

Base Character IVSes VSes
劍 U+528D 劍󠄀劍󠄁 U+E0100, U+E0101
邉 U+9089 邉󠄀邉󠄁邉󠄂邉󠄃邉󠄄邉󠄅邉󠄆邉󠄇邉󠄈邉󠄉邉󠄊邉󠄋邉󠄌邉󠄍邉󠄎 U+E0100–U+E010E
邊 U+908A 邊󠄀邊󠄁邊󠄂邊󠄃邊󠄄邊󠄅邊󠄆邊󠄇 U+E0100–U+E0107

While it is important to discuss IVSes, we shouldn’t neglect the 1,002 Standardized Variants that correspond to the CJK Compatibility Ideographs, some of which happen to have registered IVSes in more than one IVD collection, and which use the VSes (Variation Selectors) in the BMP (VS1 through VS16), not the ones in Plane 14 (VS17 through VS256). The next table, which is also live, shows four ways in which the Japanese traditional form of U+6F22 () can be represented, which includes a CJK Compatibility Ideograph and a Standardized Variant:

Representation Character Code Point or Sequence
CJK Unified Ideograph U+6F22
CJK Compatibility Ideograph U+FA47
Standardized Variant 漢︀ <6F22 FE00>
Adobe-Japan1 IVS 漢󠄁 <6F22 E0101>
Hanyo-Denshi IVS 漢󠄃 <6F22 E0103>

It is actually intentional that the glyph associated with the Hanyo-Denshi IVS does not display correctly, because the font that is being served to this blog does not support the Hanyo-Denshi IVD collection, and the base character (U+6F22) displays using the default (non-traditional) glyph.

While the tables above are useful in that they demonstrate that the glyphs associated with UVSes (Unicode Variation Sequences, which is a blanket term that covers both IVSes and Standardized Variants) can be properly displayed when the app and selected font supports them, the act of entering them is still very much a manual process, and is non-intuitive for all but those who work closely with such data. What is sorely needed is a visual way in which one can enter the characters that are associated with the IVSes of one or more IVD collections, along with the Standardized Variants that correspond to the CJK Compatibility Ideographs.

The first step toward the goal of UVS-savvy input methods is for the OSes to bundle fonts that support UVSes in their Format 14 ‘cmap’ subtables, which provides a minimalistic way to display a representative glyph for each UVS. Another approach is for the input method to be aware of the selected font, and to detect what UVSes are supported, and to present them to the user as alternates of the base character. Clearly, a non-zero amount of effort is involved, but it is not an impossible problem, and is arguably well within the realm of extreme possibilities.

In any case, my ongoing hope is that our friends at Apple, Google, and Microsoft will step up to the proverbial plate to make their respective input methods UVS-savvy, even in some baby steps sort of way.

It would be outstanding if independent input method developers would also consider doing the same, as JustSystems’ ATOK for Android remains my favorite mobile input method.


Comments are closed.