CMap Resources & Character Collections

The CMap resources that are associated with our public glyph sets—called character collections—were first open-sourced on 2009-09-21 via Adobe’s first open source portal, and about a year later the project was moved to SourceForge. I then migrated the project to GitHub on 2015-03-27 where it is likely to remain for the foreseeable future. The main purpose for open-sourcing our CMap resources was to make it easier for developers to include them in their own open source projects, many of which require that the components themselves be open source.

I then open-sourced three of our four character collections on GitHub—Adobe-GB1-5, Adobe-CNS1-7, and Adobe-Japan1-6—in October of last year. The Adobe-Korea1-2 character collection was intentionally not open-sourced, because it will soon be replaced by the Adobe-KR-9 character collection that is expected to be published in mid-May.

A recent—as in this morning—addition to the CMap Resources project is a UTF-32.pdf PDF file that is a visual representation of the UTF-32 CMap resources, and was made available in the latest release. It is bookmarked, organized by character collection, and specifies the CMap resource version. Note that some character collections, such as Adobe-Japan1-6, include multiple UTF-32 CMap resources. This PDF file was created by first using the script in the Command-line Perl Scripts project that generates a PostScript file, then feeding it to Adobe Acrobat Distiller to produce a PDF file. Below is what this PDF file looks like when opened:

If a code point maps to a glyph in a particular CMap resource, the CID is shown under the glyph, the Supplement is show in the upper-right corner, and if the glyph is proportional- or half-width, a “P” or “H” is shown in the upper-left corner.

I have been using separate per–CMap resource versions of these PDFs for years, and have repurposed them for recent projects, such as Source Han Sans, Source Han Serif, and Ten Mincho. It is convenient to have a consolidated version that represents all of our latest-and-greatest UTF-32 CMap resources for easy reference, and I am pleased to be able to share it.


Leave a Reply

Your email address will not be published. Required fields are marked *