Using XML Catalogs

XML Catalogs (defined by OASIS) provide a way to avoid hard-coding the locations of XML documents and other resources in your application. Instead, the application refers to the resource using a conventional system identifier (URI) or public identifier, and a local catalog is used to map the system and public identifiers to an actual location.

When using Saxon from the command line, it is possible to specify a catalog to be used using the option -catalog:files . Here files is the catalog file to be searched, or a list of filenames separated by semicolons. This catalog will be used to locate DTDs and external entities required by the XML parser, XSLT stylesheet modules requested using xsl:import and xsl:include, documents requested using the document() and doc() functions, and also schema documents, however they are referenced.

The catalog is NOT currently used for non-XML resources, including JSON documents, query modules, unparsed text files, collations, and collections.

With Saxon on the Java platform, if the -catalog option is used on the command line, then the open-source Apache library resolver.jar must be present on the classpath. With Saxon on .NET, this module (cross-compiled to IL) is included within the Saxon DLL.

Setting the -catalog option is equivalent to setting the following options:

-r

org.apache.xml.resolver.tools.CatalogResolver

-x

org.apache.xml.resolver.tools.ResolvingXMLReader

-y

org.apache.xml.resolver.tools.ResolvingXMLReader

In addition, the system property xml.catalog.files is set to the value of the supplied files value. And if the -t option is also set, Saxon sets the verbosity level of the catalog manager to 2, causing it to report messages for each resolved URI. Saxon customizes the Apache resolver library to integrate these messages with the other output from the -t option: that is, by default it is sent to the standard error output.

This mechanism means that it is not possible to use any of the options -r, -x, or -y when the -catalog option is used.

When the -catalog option is used on the command line, this overrides the internal resolver used in Saxon (from 9.4) to redirect well-known W3C references (such as the XHTML DTD) to Saxon's local copies of these resources. Because both these features rely on setting the XML parser's EntityResolver, it is not possible to use them in conjunction.

This support for OASIS catalogs is implemented only in the Saxon command line. To use catalogs from a Saxon application, it is necessary to configure the various options individually. For example:

Here is an example of a very simple catalog file. The publicId and systemId attributes give the public or system identifier as used in the source document; the uri attribute gives the location (in this case a relative location) where the actual resource will be found.

<?xml version="1.0"?> <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"> <group prefer="public" xml:base="file:///usr/share/xml/" > <public publicId="-//OASIS//DTD DocBook XML V4.5//EN" uri="docbook45/docbookx.dtd"/> <system systemId="http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" uri="docbook45/docbookx.dtd"/> </group> </catalog>

There are many tutorials for XML catalogs available on the web, including some that have information specific to Saxon, though this may well relate to earlier releases.