Using XPath

This section describes how to use Saxon as a class library for XPath processing from Java, C#, C/C++, Python or PHP, without making any use of XSLT stylesheets or XQuery queries. It includes information on the XPath API, and the API for the Saxon implementation of the XPath object model. In other sections you will find the information for Using XSLT, Using XQuery, and Using XML Schema from applications in Java, C#, etc.

For information about the different ways of loading source documents, see Handling XML documents.

Saxon supports two public Java APIs for XPath processing, as follows:

  • The preferred interface for XPath processing is Saxon's s9api interface ("snappy"), which also supports XSLT and XQuery processing, schema validation, and other Saxon functionality in an integrated set of interfaces. This is described at Evaluating XPath expressions using s9api.

  • The JAXP API is a (supposedly) standard API defined in Java 5. Saxon implements this interface. Details of Saxon's implementation are described at JAXP XPath API. Note that there are some extensions and other variations in the Saxon implementation. Some of the extensions to this interface are provided because Saxon supports XPath 2.0 (and higher), whereas JAXP 1.3 is designed primarily for XPath 1.0; some are provided because Saxon supports multiple object models, not only DOM; some are for backwards compatibility; and some are provided to allow applications a finer level of control if required.

On .NET, XPath processing is available via classes in the Saxon.Api namespace, as described at Evaluating XPath expressions from a C# application.

For details about XPath processing from SaxonC, see Evaluating XPath expressions from a C/C++, Python or PHP application.

Saxon allows XPath expressions to be evaluated either against its own native tree models of XML (the tiny tree and linked tree), or against trees built using external third-party libraries:

Note that use of a third party tree implementation may have a significant performance overhead compared with Saxon's native tree models; furthermore, most of these implementations are not thread-safe, which can cause problems when Saxon's multithreading capability comes into play.

Namespaces

When XPath expressions use prefixed element or attribute names (such as p:foo//p:bar) it is necessary to declare the prefix (here p) so that the XPath processor knows which namespace it refers to. It's irrelevant what prefix is used in the source document: the vital thing is that the prefix used in the XPath expression maps to the same namespace as the prefix (or defaulted prefix) used in the source document.

All the APIs therefore provide a mechanism for binding prefixes to namespaces. Because Saxon supports XPath 3.1, it's also possible to use alternative mechanisms to refer to namespaced elements:

In XPath 1.0, an unprefixed name in an XPath expression was defined to match no-namespace elements in the source document. If the source document uses a namespace (even if it's the default namespace, used with no prefix), then the corresponding names in the XPath expression need to be prefixed.

XPath 2.0 introduced the idea of a default namespace for elements and types, allowing you to declare that unprefixed element names in the XPath expression refer to some specific namespace in the source document. This isn't supported in the JAXP API (which was never updated to handle XPath 2.0), but it's supported in the Saxon APIs: binding the zero-length prefix to a namespace URI has the effect of making that the default namespace for element and type names.

Saxon (from 11.0) goes a step further and allows you to declare an unprefixed element matching policy that determines how unprefixed element names are handled. The possible values are:

XPath versions

W3C has published four versions of XPath: 1.0, 2.0, 3.0, and 3.1. In addition, there is a W3C Community Group, led by Saxonica, working on ideas for a version 4.0, some of which are implemented experimentally in Saxon 11.

XPath variables

Very often you will want to execute the same XPath expression repeatedly, but with some parameter taking different values: for example //person[@id='A1234'] where the required ID changes each time.

Constructing an XPath expression by string concatenation, for example "//person[@id='" + id + "']" is bad practice for two reasons:

Instead, write the expression to contain a variable reference: "//person[@id=$requiredId]", compile it once, and then execute it repeatedly binding different values to the variable $requiredId. All the XPath APIs provide a mechanism for doing this.