IDML is designed to be generated and manipulated by XML tools and programmers. To support this, IDML can be validated against a RelaxNG schema.
When it comes to schemas and validation, there are two types of IDML files:
- There are many single-file variants (snippets, assignments, ICML, etc.). These files need to be validated with the snippet schema.
- Packages are multi-file ZIP archives that represent an entire InDesign document. Packages need to be validated with the package schema.
For more information on IDML files see the “IDML File Types” post.
Generating Schema Files
// Generate a non-package schema
// Generate package schema
This snippet schema is used to validate all single-file variants of IDML. It comprises two files:
|datatype.rnc||Shared data type file included by all schema files.|
|IDMarkupLanguage.rnc||Validates all single file IDML variants (ICML, IDMS, ICMA, etc.).|
The package schema comprises one shared file and schema file for each type of XML file that can appear in an IDML package:
|datatype.rnc||Shared data type file included by all schema files.|
|MasterSpreads/MasterSpread.rnc||Validates all master spread files in the MasterSpreads directory.|
|Spreads/Spread.rnc||Validates all spread files in the Spreads directory.|
|Stories/Story.rnc||Validates all story files in the Story directory.|
Finding Errors in IDML
For demonstration purposes, we need files that contain errors. Imagine the following IDML fragment in both a snippet file (test.idms) and package (test.idml) file. The IDML contains four fairly obvious errors; try to spot all four.
<Rectangle foo="Test" Self="uec" …>
You may have found it difficult to spot all four errors. Imagine if this was buried in a huge XML file. Instead of trying to find errors ourselves, we use schema validation.
Schema Validation Basics
A RelaxNG schema can be used to verify the structural correctness of a document. It checks to make sure all XML nodes (elements, attributes, text data, etc.) are used at the right places in the document. It detects any unknown or unexpected nodes and ensures that required nodes are present.
InDesign’s RelaxNG schemas can be used to check the structure of a document; however, it does not check the content of these nodes. For example, it doesn’t check that all IDML references exist. It’s possible to do some non-RelaxNG-based error detection. This is discussed in “Additional Error Detection” below.
You can validate IDML files with any software that supports the compact form of RelaxNG. For snippet files, this is relatively straightforward: it amounts to pointing whatever validation engine you are using to the IDMarkupLanguage.rnc file (which includes the datatyps.rnc file).
Validating packages is more complex; there are two difficulties:
- Packages are ZIP archives, and most RelaxNG validation engines don’t deal with ZIP files.
- An IDML package comprises many XML files. The package schema comprises several schema files: there is one schema for each type of file that can appear in an IDML file. To validate a package, you need to match each XML file with its appropriate schema file.
Validating with IDMLTools
The InDesign CS4 Products SDK includes a Java package called IDMLTools. This package contains a validation application based on the Jing RelaxNG Validator, which handles both snippets and package files. It’s especially handy for package files, because it unzips the files and matches XML files to the appropriate schema file.
For information about setting up IDMLTools, see the IDML ReadMe. This amounts to the following:
- Add the IDMLTOOLS_HOME environment variable. This should contain the path to your IDMLTools folder. (Do not terminate with a trailing \ or /.)
- Add /bin to your search PATH environment variable. This provides easy access to the validation script.
Once set up, you can validate files by running the appropriate platform script, validate.bat on Windows and validate.sh on Mac OS. These scripts set the appropriate Java classpath and run the validation application. The validation application can be used to validate both types of IDML files (snippets and package files). Running the platform scripts with no arguments produces the following usage message:
Validator SchemaPath PackagePath [PackagePath...]
This means you validate by specifying a path to the schema folder, followed by paths to one or more package files that you want to validate.
Validating a Snippet
To validate the test.idms snippet, specify the path to the snippet schema, followed by the path to the actual snippet:
validate.bat “c:\idml-schema\snippet” test.idms
The validation application writes errors to standard error. Here are the results from validating test.idms:
Test.idms:143:10: error: required attributes missing
Test.idms:144:527: error: attribute “foo” not allowed at this point; ignored
Test.idms:145:14: error: element “RectData” not allowed in this context
Test.idms:146:15: error: element “Propertie” not allowed in this context
Validating a Package
To validate a package specify the path to the package schema, followed by the path to the IDML file:
validate.bat “c:\idml-schema\package” test.idml
The validation application unzips the archive to a temporary directory, validates each file against the appropriate schema file, then writes any errors to standard error.
Here are the results written when validating test.idml:
Spreads\Spread_ubd.xml:3:245: error: required attributes missing
Spreads\Spread_ubd.xml:27:527: error: attribute “foo” not allowed at this point; ignored
Spreads\Spread_ubd.xml:28:14: error: unknown element “RectData”
Spreads\Spread_ubd.xml:29:15: error: unknown element “Propertie”
Notice that the XML file containing the error is reported on the left. In this case, the error is in the Spread_ubd.xml file. Because package validation deals with multiple XML files in one pass, error results can come from several files.
Interpreting the Results
From the results above, we can deduce the four errors:
- The Spread element is missing a required attribute. Unfortunately, Jing does not report which attribute is missing, but it is easy enough to look at the schema or Adobe InDesign CS4 IDML File Format Specification and determine that it is the Self attribute that is missing.
- There is no “foo” attribute on the Rectangle element.
- “RectData” is not a child of the Rectangle element.
- Rectangle does not have a child element called “Propertie.” Looking at the schema, Adobe InDesign CS4 IDML File Format Specification, and numerous other examples, we can conclude that this element should be called “Properties” (with an ‘s’).
Additional Error Detection
The IDMLTools validation application includes some non-RelaxNG-based error detection. Currently, it checks for the following errors:
Missing designmap.xml file.
Missing or improper processing instruction at the top of the designmap.xml file.
Missing package files included in designmap.xml; e.g., Spreads/Spread_ubd.xml in the following output:
Because Adobe distributes the source for IDMLTools, it’s possible to add additional error detection. You’ll find the code for these items in the Validator.preVerifyPackage() method.