saxonica.com

The legacy XQuery API

This page and the following pages describe Saxon's original native Java API for XQuery. For the new XQJ interface, see Invoking XQuery using the XQJ API. For .NET interfaces, see Saxon API for .NET.

Rather than using the query processor from the command line, you may want to issue queries from your own application, perhaps one that enables it to be used within an applet or servlet. If you run the processor repeatedly, this will always be much faster than running it each time from a command line, even if it handles a different query each time.

In the absence of a standard API for XQuery, so Saxon provides its own. It is fully described in the JavaDoc included in the download: look for the package net.sf.saxon.query. The starting point is the class StaticQueryContext. What follows here is an overview. For an example of how the API can be used, take a look at the source code for the class QueryAPIExamples in the samples/java directory.

Getting started

The first thing you need to do is to create a net.sf.saxon.Configuration object. This holds values of all the system settings, corresponding to flags available on the command line. You don't need to set any properties in the Configuration object if you are happy with the default settings. However, there are many options that you can set by calling setter methods on the Configuration object, or, if you prefer, by calling the general-purpose method setConfigurationProperty(name, value) where the name is a constant from the class net.sf.saxon.FeatureKeys: the available properties are described on the page Using XSLT from an Application

You can also create a Configuration by reading a configuration file, using the static method Configuration.readConfiguration()

For schema-aware processing, you will need to create an instance of com.saxonica.config.EnterpriseConfiguration, which is a subclass of Configuration.

Then you need to create a net.sf.saxon.query.StaticQueryContext object, which you can do using the static method Configuration.newStaticQueryContext(). As the name implies, this holds information about the static (compile-time) context for a query. Most aspects of the static context can be defined in the Query Prolog, but this object allows you to initialize the static context from the application instead if you need to. Some of the facilities provided are very much for advanced users only, for example the ability to declare variables and functions, and the ability to specify a NamePool to be used. One aspect of the static context that you may need to use is the ability to declare collations. Using the method declareCollation you can create a mapping between a collation URI (which can then be used anywhere in the Query) and a Java Comparator object used to implement that collation.

Compiling the Query

The StaticQueryContext object can now be used to compile a Query. The text of the Query can be supplied either as a String or as a Java Reader. There are thus two different compileQuery methods. Each of them returns the compiled query in the form of an XQueryExpression. The XQueryExpression, as you would expect, can be executed repeatedly, as often as you want, in the same or in different threads.

For example:


Configuration config = new Configuration();
StaticQueryContext staticContext = config.newStaticQueryContext();
XQueryExpression exp = 
        staticContext.compileQuery("count(//ITEM)");

Note: the StaticQueryContext object no longer gets updated by the query parser with additional information defined in the query prolog. It is therefore no longer necessary to create a new StaticQueryContext object for each query you compile. This also means that you can't use the StaticQueryContext to obtain information about the query you have just compiled; instead, use the internal StaticQueryContext object created by Saxon, which is available using the getStaticContext() method on the XQueryExpression object.

Note: since Saxon 9.2, the constructor new StaticQueryContext() will return a context that provides access to Saxon-HE functionality only. Use the factory method in the appropriate Configuration class to create a context with access to all functionality offered by the particular Configuration.

You can optionally register a ModuleURIResolver with the Configuration (using the setModuleURIResolver() method). This will be used to handle the URIs found in any import module declaration. The resolver returns a set of JAXP StreamSource objects, each containing either an InputSource or a Reader providing access to the text of the query module. The StreamSource must also contain a SystemId, representing the base URI of the query module. (Supply a Reader if you want to handle encoding issues yourself, or an InputSource if you want Saxon to deal with this.)

Building a Source Document

Before you run your query, you may want to build one or more trees representing XML documents that can be used as input to your query. You don't need to do this: if the query loads its source documents using the doc() function then this will be done automatically, but doing it yourself gives you more control. A document node at the root of a tree is represented in Saxon by the net.sf.saxon.DocumentInfo interface. The Configuration provides a convenience method, buildDocument(), that allows an instance of DocumentInfo to be constructed. The input parameter to this is defined by the class javax.xml.transform.Source, which is part of the standard Java JAXP API: the Source interface is an umbrella for different kinds of XML document source, including a StreamSource which parses raw XML from a byte or character stream, SAXSource which takes the input from a SAX parser (or an object that is simulating a SAX parser), and DOMSource which provides the input from a DOM.

Saxon also provides a several additional implementations of the Source interface that can be used as input to this method. Saxon's DocumentInfo and NodeInfo classes both implement this interface, though this isn't useful for this particular method because you will only have one of these once you have built the tree from some other source. There are a number of wrapper classes that allow trees in other object models to be treated as Saxon trees: net.sf.saxon.jdom.DocumentWrapper class for wrapping a JDOM document, net.sf.saxon.xom.DocumentWrapper for XOM, net.sf.saxon.dom.DocumentWrapper for DOM, and net.sf.saxon.dom4j.DocumentWrapper for DOM4J.

The net.sf.saxon.AugmentedSource object can wrap any other kind of Source, and provides additional options as to how the Source should be processed, for example whether it should be validated against a schema, whether whitespace should be stripped, and whether XInclude processing should take place. Validation is only possible if you created an EnterpriseConfiguration.

Running the Query

To execute your compiled query, you need to create a DynamicQueryContext object that holds the run-time context information. The main things you can set in the run-time context are:

You are now ready to evaluate the query. There are several methods on the QueryExpression object that you can use to achieve this. The evaluate() method returns the result sequence as a Java java.util.List. The evaluateSingle() method is suitable when you know that the result sequence will contain a single item: this returns this item as an Object, or returns null if the result is an empty sequence. There is also an iterator method that returns an iterator over the results. This is a Saxon object of class net.sf.saxon.om.SequenceIterator: it is similar to the standard Java iterator, but not quite identical; for example, it can throw exceptions. Finally, there is a run() method, which executes the query, converts the results to an XML document, and writes this document to a JAXP Result object, which may represent a DOM, a SAX ContentHandler, or a serial output stream.

The evaluate() and evaluateSingle() methods return the result as a Java object of the most appropriate type: for example a String is returned as a java.lang.String, a boolean as a java.lang.Boolean. A node is returned using the Saxon representation of a node, net.sf.saxon.om.NodeInfo. With the standard and tinytree models, this object also implements the DOM Node interface (but any attempt to update the node throws an error).

The iterator() method, by contrast, does not do any conversion of the result. It is returned using its native Saxon representation, for example a String is returned as an instance of sf.net.saxon.value.StringValue. You can then use all the methods available on this class to process the returned value.

The run() method is probably the most efficient in the case of queries that construct a new document as their output, because it allows the nodes of the result document to be serialized (or sent to the destination) as they are created, without creating a tree structure in memory first.

Here is a simple example for a query that returns a singleton integer result:


DynamicQueryContext dynamicContext = 
        new DynamicQueryContext(config);
dynamicContext.setContextNode(
        config.buildDocument(
                new StreamSource(new File("books.xml"))));
Long count = (Long)exp.evaluateSingle(dynamicContext);
System.out.println("There are " + count.intValue() + " books");

Here is an example where the query returns a list of nodes:


XQueryExpression exp = staticContext.compileQuery("//ITEM/TITLE");
DynamicQueryContext dynamicContext = 
        new DynamicQueryContext(config);
dynamicContext.setContextNode(
        config.buildDocument(
                new StreamSource(new File("books.xml")));
SequenceIterator books = exp.iterator(dynamicContext);
while (true) {
    NodeInfo book = (NodeInfo)books.next();
    if (book==null) break;
    String title = book.getStringValue();
    System.out.println(title);
}

Wrapped Output

If you want to process the results of the query in your application, that's all there is to it. But you may want to output the results as serialized XML. Saxon provides two ways of doing this: you can produce wrapped output, or raw output. Raw output works only if the result consists of a single document or element node, and it outputs the subtree rooted at that element node in the form of a serialized XML document. The simplest way to produce raw output is to use the run() method on the XQueryExpression object, but you can also do it by retrieving the result as a SequenceIterator and passing this to the serialize() method of the QueryResult class.

Wrapped output works for any result sequence, for example a sequence of integers or a sequence of attribute and comment nodes; this works by wrapping each item in the result sequence as an XML element, with details of its type and value. To produce wrapped output, you first wrap the result sequence as an XML tree, and then serialize the tree. This can be done using the QueryResult class. This class doesn't need to be instantiated, its methods are static. The method QueryResult.wrap takes as input the iterator produced by evaluating the query using the iterator() method, and produces as output a DocumentInfo object representing the results wrapped as an XML tree. The method QueryResult.serialize takes any document or element node as input, and writes it to a specified destination, using specified output properties. The destination is supplied as an object of class javax.xml.transform.Result. Like the Source, this is part of the JAXP API, and allows the destination to be specified as a StreamResult (representing a byte stream or character stream), a SAXResult (which wraps a SAX ContentHandler), or a DOMResult (which delivers the result as a DOM). The output properties are used only when writing to a StreamResult: they correspond to the properties available in the xsl:output element for XSLT. The property names are defined by constants in the JAXP javax.xml.transform.OutputKeys class (or net.sf.saxon.event.SaxonOutputKeys for Saxon extensions): for details of the values that are accepted, see the JavaDoc documentation or the JAXP specification.

Here is an example that produces wrapped output:


XQueryExpression exp = 
        staticContext.compileQuery("//ITEM");
DynamicQueryContext dynamicContext = 
        new DynamicQueryContext(config);
dynamicContext.setContextNode(
        config.buildDocument(
                new StreamSource(new File("books.xml")));
SequenceIterator books = exp.iterator(dynamicContext);
DocumentInfo resultDoc = QueryResult.wrap(books, config);
Properties props = new Properties();
props.setProperty(OutputKeys.METHOD, "xml");
props.setProperty(OutputKeys.INDENT, "yes");
QueryResult.serialize(resultDoc, 
        new StreamResult(System.out), props);

This example produces output without wrapping:


XQueryExpression exp = staticContext.compileQuery("//ITEM");
DynamicQueryContext dynamicContext = 
        new DynamicQueryContext(config);
dynamicContext.setContextNode(
        config.buildDocument(
                new StreamSource(new File("books.xml")));
SequenceIterator books = exp.iterator(dynamicContext);
Properties props = new Properties();
props.setProperty(OutputKeys.METHOD, "xml");
props.setProperty(OutputKeys.INDENT, "no");
int nr = 1;
while (true) {
    NodeInfo book = (NodeInfo)books.next();
    if (book==null) break;
    System.out.println("===== BOOK " + nr + " =====");
    QueryResult.serialize(book, new StreamResult(System.out), props);
}     

If the results do not need to be processed by the application, the same effect can be achieved more efficiently using the code shown below:


XQueryExpression exp = staticContext.compileQuery("//ITEM");
DynamicQueryContext dynamicContext = 
        new DynamicQueryContext(config);
dynamicContext.setContextNode(
        config.buildDocument(
                new StreamSource(new File("books.xml")));
Properties props = new Properties();
props.setProperty(OutputKeys.METHOD, "xml");
props.setProperty(OutputKeys.INDENT, "no");
exp.run(dynamicContext, new StreamResult(System.out), props);           

Next