Posted by Thomas Phinney
Every so often I get a request (either from within or outside Adobe) for a “Unicode font.” Unfortunately, that term is not very meaningful to me. The obvious interpretations are:
1) To me as a font geek, the phrase “a Unicode font” “logically” means “a font with a unicode encoding (cmap table).” That would be pretty much every one of the 2400+ OpenType fonts Adobe has in our type library. So that interpretation doesn’t really narrow things much.
2) They could mean “a font that covers all of Unicode.” However, Unicode today has over 100,000 defined code points, and as there is no font format that can include more than 65,535 glyphs, such a font is not technically possible. (There’s a separate question as to whether it would be desirable – see below.)
3) They could also mean “a font that covers some useful subset of Unicode that is more than just the basic WinANSI or MacRoman 8-byte (256-character) set.” However, for that to be meaningful, they’d have to define exactly what writing systems or languages are important to them.
In practice, people usually mean either #2 or #3, and if they meant #2, they’re willing to fall back to #3 once they discover #2 is impossible. So then they sort out more of what writing systems or languages they care about. But even then, things tend to remain complicated. (Wikipedia is a bit vague about their definition – they seem to want to say #2, but of course that’s impossible, and none of their examples fit that definition. I’m working on that.)
These days, there are a fair number of typefaces that have decent Latin, Greek and Cyrillic in a single font. This is reasonable from a design standpoint: the three writing systems share a fair number of character designs and have related origins.
However, there are plenty of other writing systems that are quite dissimilar from the Latin/Greek/Cyrillic triad, such as Arabic, Hebrew, Thai, the various systems of India, or the various Han-derived ideographic systems (Chinese, and the ideographic parts of Japanese and Korean). A typeface such as “Myriad” can meaingfully combine Latin, Greek and Cyrillic. But if somebody wants a “Japanese version of Myriad” or a Hindi version or whatever, that isn’t such a meaningful concept any more. For Japanese, one can reasonably make the serif/sans distinction, and talk about the weight of the strokes, but it’s more a matter of the Japanese glyphs seeming to be reasonably compatible with Myriad than being a Japanese version of Myriad, if you get the distinction.
In such cases, it may make more sense to simply keep separate fonts and let the user or the operating system do some kind of composite font or font fallback mechanism, where a series of physically distinct fonts are used in combination. Such a mechanism allows the user to specify a single “logical font” and be reasonably sure that they’ll get the font they named for the writing systems it covers well, and something reasonable (or at least better than nothing) for just about any other language. But that’s another long story.
Perhaps the biggest problem in making extensive multilingual is Unicode’s Han unification for the ideographic East Asian languages. If you want a font to support more than one of Simplified Chinese, Traditional Chinese, Japanese and Korean in one font, you have a problem. There are slightly (or sometimes very) different designs for certain characters for all these different languages. Currently, the only functional way to distinguish them is to build an OpenType font, pick one of the languages to use for the default forms in the font, and use the OpenType ’locl’ (locale) feature to access the other forms as variants. This requires using applications and/or operating systems which process that feature correctly, for those languages.
I’m not sure how widespread such app/OS support is for the ’locl’ feature with those languages, outside of InDesign CS3, but I know that such fonts are pretty much non-existent in the wild. AFAIK, thus far such fonts have only been built by mad scientists in labs (pace Adobe’s own Dr Lunde & Mr Meyer). [update 2008 08 19: It turns out that Arial Unicode MS has variant forms to deal with this problem. I don't know of any other shipping/available fonts that do this. The main limitation of Arial Unicode is that it only covers characters encoded in Unicode 2.1, but that's still interesting and potentially useful.]
Because of the potential for much more compact font files, I am sure we will see more such fonts in the wild; I just don’t know whether it will be next year or four years from now.
[updated later the same day of posting: corrected glyph count to 65,535, and added Wikipedia reference. Updated 21 Aug 2008 to tweak wording around writing systems.]