Resolving entities
XML documents may contain references to external entities, including general entities, DTDs, and parameter entities. Typically these references include a system identifier (URI) and/or a public ID. It is the responsibility of the XML parser to resolve these references, but the process can be influenced using Saxon interfaces.
On Java, most applications use a SAX parser such as Xerces. Entity
resolution with a SAX parser is controlled by supplying an EntityResolver
.
If no other EntityResolver
is supplied, then when Saxon instantiates a SAX
parser, it constructs an EntityResolver
that invokes the configuration-level
ResourceResolver, which has the capability to resolve entity references
using catalogs (as well as supporting the classpath
and data
URI schemes).
If the SAX parser is instantiated by user code, however (for example when a
SAXSource
is supplied), then it is the responsibility of the user
code to initialize the parser's EntityResolver
as required: Saxon will
not modify the settings.
Saxon will also accept input from a StAX parser. In this case, configuring the parser for entity resolution is entirely the responsibility of the calling application.
On .NET, Saxon always uses the Microsoft System.Xml
parser.
Entity resolution in this parser is controlled using the System.Xml.XmlResolver
interface. When Saxon instantiates a System.Xml
parser, it constructs an XmlResolver
that invokes the configuration-level
ResourceResolver, which has the capability to resolve entity references
using catalogs (as well as supporting the data
URI schemes).
Many XML users are concerned about security vulnerabilities in the area of external
entity references. If source documents containing untrusted entity references are accepted,
it is possible for these to access files in local filestore that might contain sensitive
data. It is good practice to configure an XML parser to avoid these risks. If all entity
references are resolved using a user-supplied ResourceResolver
, then the
resolver has total control over which URIs are accepted and which are rejected.
On Java, JAXP interfaces provide a number of configuration properties to control
this directly: see the
JAXP Security Guide.
Saxon recognizes the properties FEATURE_SECURE_PROCESSING
,
ACCESS_EXTERNAL_DTD
,
ACCESS_EXTERNAL_SCHEMA
, and ACCESS_EXTERNAL_STYLESHEET
in its
implementations of relevant JAXP interfaces.
Generalizing this mechanism, Saxon also provides a configuration property ALLOWED_PROTOCOLS which has the same format as the JAXP properties
(a comma-separated list of permitted URI schemes), which is enforced by the default
configuration-level ResourceResolver
. Note that not all URIs are processed using
this mechanism: for example, a URI that is resolved by a local ResourceResolver
set on an XsltTransformer
or XQueryEvaluator
is able to bypass
these checks.