August 2, 2016 is the official release date for Microsoft’s Windows 10 Anniversary Update (aka Redstone or RS1). Although I do not use Windows OS, I am jumping for joy, for the benefit of those who do use this modern and world-class OS.
Thanks to our friends at Microsoft, the DirectWrite that ships with the Windows 10 Anniversary Update supports OpenType/CFF Collections (aka OTCs), such as those deployed as part of the Adobe-branded Source Han Sans and Google-branded Noto Sans CJK open source projects, to include their all-inclusive “one font to rule them all” Super OTCs.
It seems that I am on roll, having released two new open source fonts on GitHub within the past week. The previous—and brief—article that was about the LOCL Test OpenType/CFF font simply pointed to the repository. This article will be longer. I promise.
—Humans make mistakes—
—Anything made by humans has the potential to include mistakes—
The most important things about mistakes are that 1) we recognize them, lest they propagate; 2) we learn from them; 3) we make an effort not to repeat them; and 4) we try to fix them, if possible.
Some mistakes are more easily fixed than others. Mistakes that cannot be fixed must be worked around.
With that said, an interesting event of historical significance occurred in June of 2000:
The first version of the IVD (Ideographic Variation Database) was issued on 2007-12-14, meaning over eight years ago, and there have been three subsequent revisions, the latest being issued on 2014-05-16. There are currently three registered IVD collections: Adobe-Japan1, Hanyo-Denshi, and Moji_Joho. A significant number of IVSes are shared between the latter two IVD collections, 9,685 to be exact. While I cannot speak to the latter two IVD collections, the Adobe-Japan1 one is supported by hundreds of OpenType fonts via the Format 14 (Unicode Variation Sequences) ‘cmap‘ subtable. Furthermore, the number of apps and OSes that support UVSes has reached critical mass.
With all that said, there is a rather substantial missing link in terms of IVD support infrastructure: the all-important input method.
One of my longer term goals for the open source Source Han Sans project has been to eventually register a Pan-CJK IVD (Ideographic Variation Database) collection that would allow the regional variants to display and be preserved in “plain text” environments, and I think that I may have achieved a breakthrough the other day.
While I won’t repeat here any of the exciting details in Typekit’s recent announcement for East Asia web font support (简体中文, 繁體中文, 日本語, 한국어) that employs dynamic kits, I’d like to seize this opportunity to demonstrate some of the default behavior that this new development exposes in various browsers.
Due to an inadvertent error on my part, the glyphs for the vertical-only kana were incorrect in Source Han Sans Version 1.002 (and, by extension, in Version 1.003 because there were no glyph changes). Many thanks to the person who identified and reported this issue, and I’d like to convey my sincere apologies to those who were affected by it.
[This article was written by Masataka HATTORI (服部正貴) and translated into English by yours truly.]
Source Han Code JP（日本語メニューネーム：源ノ角ゴシック Code JP）は、自分がほしくて個人的にはじめたオープンソースプロジェクトでした。Source Han Sans（源ノ角ゴシック）と Source Code Pro をフォールバックするエディタで使うと、漢字・仮名とくらべ英数字が小さくなってしまい全体的に読みにくいと感じていました。そんなとき、友人のプログラマーから、日本語も使えてコーディングにも適したフォントはないか？と相談されて、これは自分で作ってしまえと考えました。
オリジナルの Source Code Pro は、600 ユニット字幅を採用した欧文専用のモノスペースフォントで、まぎらわしいアルファベットや数字をディスプレイで判別しやすくするために、文字のデザインが工夫されています。それを、Source Han Sans JP（源ノ角ゴシック JP）の日本語と合わせてもフィットするようにサイズやウエイトを調整しました。文字幅は 660 ユニットあたりがちょうど良いと思いました。もともと読みやすさの観点から半角欧文はすこしコンデンスすぎると感じていたので、思い切って 2/3（667 ユニット）字幅を採用することにしました。一般的な半角（500 ユニット字幅）の等幅フォントにくらべ、全角文字との正確なインデントには向きませんが、読みやすさを確保しつつ、使い方次第で様々な表現ができると思いました。Source Han Code JP は、オリジナルの Source Han Sans JP と同じ７ウェイトのファミリーですが、ウェイトを切り替えても文字列の長さは変わりません。
結果的に、日本語を含むプログラミングやマークアップなどソースコードの表示や編集に使用できる Adobe Source シリーズの派生フォントとして、Adobe Fonts GitHub サイトから公開することになりました。
Read in English
Although it has been less than two months since the Source Han Sans Version 1.002 update was released, a Version 1.003 maintenance update was released on 2015-06-09 to address two particular issues. No glyphs nor Unicode mappings were added or modified.
Google’s corresponding Noto Sans CJK fonts, which continue to differ from Source Han Sans only by name, were also updated to Version 1.003 at the same time, and reflect the same changes.
The Source Han Sans Version 1.002 update was released on 2015-04-20, which involved turning a very large crank on something that has a very large number of moving parts. The updated region-specific subset OTFs are also available on Typekit via desktop sync.
Google’s corresponding Noto Sans CJK fonts, which differ from Source Han Sans only by name, were also updated to Version 1.002 at the same time, and reflect the same changes.
Let it be known that the “OpenType Collection” (OTC) format was born on 09/21/2011 at Pho Minh Restaurant in Cupertino, California. Present from Adobe were the following: David Lemon, Ken Lunde, Sairus Patel, and Read Roberts. Present from Apple were Antonio Cavedoni, Julio Gonzalez, Yasuo Kida, Peter Lofting, and Tony Tseung. — Adobe & Apple
The above declaration paved the way for supporting (CFF-based) OpenType Collections in Apple’s OS X (beginning from Version 10.8) and in Adobe’s applications (beginning from CS6).
Before I begin the series of articles about what went into building Source Han Sans, I think that it is worth writing a few things about actually installing and using the fonts, including how to determine which of the four deployment formats best suits your needs.
For those who are not aware, there are twelve IDCs (Ideographic Description Characters) in Unicode, from U+2FF0 through U+2FFB, that are used in IDSes (Ideographic Description Sequences) which are intended to visually describe the structure of ideographs by enumerating their components and arrangement in a hierarchical fashion. Any Unicode character can serve as a IDS component, and the IDCs describe their arrangement. The IRG uses IDSes as a way to detect potentially duplicate characters in new submissions. All existing CJK Unified Ideographs have an IDS, and new submissions require an IDS.
This article describes a technique that uses IDSes combined with OpenType functionality to pseudo-encode glyphs that are unencoded or not yet encoded. If memory serves, it was Taichi KAWABATA (川幡太一) who originally suggested this technique.
Not all PDF authoring applications are the same, in terms of the extent to which they preserve the text content of the original document. Of course, this is not necessarily the fault of the PDF authoring application, but rather it is due to a disconnect between the PDF authoring process and access to the text content of the original document.
The best example for demonstrating this is to create a document that includes the two kanji 一 (U+4E00) and ⼀ (U+2F00). The reason why these two characters represent a good example is because in mainstream Japanese fonts, mainly those that are based on the Adobe-Japan1-x ROS, both map to the same glyph, specifically CID+1200.
If you download and unpack the 4E00vs2F00.zip file, you will find two PDF files, an Adobe InDesign file, and an MS Word file. If you open the original documents and search for 一 (U+4E00), you will find only a single instance, which is the one that is marked by the Unicode scalar value. However, if you open the respective PDF files, you will notice a difference. The one that is based on the MS Word file now includes two instances of 一 (U+4E00), and ⼀ (U+2F00) is no longer included in its content. You can search a PDF file by Unicode scalar value by using the “\uXXXX” notation, such as \u4E00 for U+4E00 (一). (Note: Depending on the version of MS Word that is being used, the PDF file may instead include two instances of ⼀ (U+2F00). I am using Microsoft Word for Mac 2011 Version 14.3.8.)
Adobe InDesign has a built-in PDF library that has direct access to the text content, and is thus able to inject it into the text layer of the PDF file that it produces. MS Word uses a different pathway for producing a PDF file, one that does not have access to the text content of the original document.
UTC (Unicode Technical Committee) Meeting #136 took place last week, and one of the significant outcomes was that UTR (Unicode Technical Report) #50 was advanced from Draft to Approved status. Congratulations to Koji ISHII (石井宏治), its editor, and also to Eric Muller, who took the initiative to start this project and served as its first editor.
[This (Simplified) Chinese version of the May 1, 2013 Typblography article entitled Adobe contributes font rasterizer technology to FreeType is courtesy of Gu Hua (顾华).]
现代字库有两种字形轮廓格式可供选择—TrueType或者CFF。TrueType是Apple于1990年开发的，而CFF（Compact Font Format）格式是Adobe基于1984年首次发布的Type 1格式（常称为PostScript字库）衍生出的第二代格式。无论是TrueType还是CFF都可被用于OpenType字库中。它们有很多共性，但也有两个主要区别：它们使用不同的数学运算方法描述字形曲线，以及使用不同的hinting技术（Hinting：提供光栅化提示，以确保在有限的像素里尽可能地准确显示每个字形）。TrueType侧重于在字体中构建指令，而Type1和CFF更多地依赖光栅器的智能处理。这使得光栅器质量显得尤为重要，对于这次合作，Adobe期望在使用FreeType环境上能显著改善CFF字体显示效果。
[This Japanese version of the May 1, 2013 Typblography article entitled Adobe contributes font rasterizer technology to FreeType is courtesy of Hitomi Kudo (工藤仁美).]
近年のフォントは、TrueTypeかCFFどちらかのフォーマットを使用するのが通例です。TrueTypeは１９９０年にアップルによって開発されたフォーマットですが、CFF（Compact Font Format）は、アドビが１９８４年にリリースした(PostScriptフォントとして知られている）Type 1フォントフォーマットの第２世代にあたるフォーマットです。OpenTypeフォントでは、TrueTypeとCFFどちらも使用可能となっています。この二つのフォーマットは多くの共通点がありますが、最大の違いは次の２点です。カーブの表現に違う数式が使用されること、そして「ヒント」の形式が違うことです。（「ヒント」とは、限定されたピクセル数の中でも書体が最適の条件で表現されるようラスタライザーに指示を与えること）TrueTypeは殆どのヒント情報をフォント内のデータとして保持していますが、Type 1やCFFフォントの場合は高度なインテリジェンスをもつラスタライザーに多くを依存しています。
As I wrote nearly a year ago, the Adobe-Identity-0 ROS is useful for building special-purpose fonts, especially CJK ones whose glyph coverage does not match one of our public ROSes. Our latest Adobe-Identity-0 ROS font is the open-source Adobe Blank, whose purposes and implementation details are described on our sister blog, Typblography.
Years ago, I wrote a Perl script, called unicode-rows.pl, that takes a fully-qualified PostScript name—composed of a CIDFont resource name, two hyphens, and a UTF-32 CMap resource name—then generates a PostScript file that can be distilled into a PDF. The resulting PDF file is a Unicode table, arranged in groups of 256 code points. If the UTF-32 CMap resource includes even a single mapping for a particular group of 256 code points, a page is created.
I have prepared examples that are based on the UniJIS2004-UTF32-H and UniJIS-UTF32-H CMap resources.
In my work, I need to deal with character codes on a regular basis, such as Unicode scalar values and hexadecimal values for legacy encodings. This includes writing documents that include them. For most purposes, especially when used in tables, tabular figures work best because they are monospaced. Of course, one could simply choose to use a monospaced font. But, unless a different font is actually desired for character codes, using the same typeface design is usually preferred, because it better matches the surrounding text. The issue is that very few, if any, fonts include tabular glyphs that support hexadecimal notation, specifically referring to ‘A’ through ‘F’ (or ‘a’ through ‘f’ for lowercase). Luckily, I was able to solve this particular dilemma.