System Programming Interfaces
Nodes and Fingerprints
The gradual move to reduce dependence on the NamePool has continued.
The methods NodeInfo.getFingerprint()
and NodeInfo.getNameCode()
have been dropped, except for nodes that
implement the FingerprintedNode
interface. This means that implementations of NodeInfo
that wrap third-party
XML tree models no longer need to implement these methods, and no longer need to be tied to a NamePool
.
In earlier releases, document nodes were always represented by an object that implemented the
DocumentInfo
interface (which extended
NodeInfo). The DocumentInfo
object
was used to hold information about the tree as a whole, for example keys and IDs. In Saxon
9.7, the class DocumentInfo
is retained to provide a measure of compatibility
for some commonly used interfaces, but it is no longer the case that every document node is represented by
an instance of DocumentInfo
; in fact DocumentInfo
is now just a wrapper
around a NodeInfo
designed to keep existing code working. Information about a tree
as a whole is now contained in a new TreeInfo
object; this exists for all trees, whether or not they are rooted at a document node. This provides
a place to put information about accumulators, which can exist for any tree whether or not the
root is a document node.
Collections
A number of changes have been made to the way collection URIs are handled, mainly: (a) to support the XPath 3.1 capability to return any kind of item in a collection, not only a node (for example, collections can now include maps derived from JSON files, unparsed text files, and binary objects); (b) to allow streamed processing of the documents in a collection; and (c) to conform with the rules in the specification as regards stability (that is, repeated calls returning the same results).
The CollectionURIResolver
interface is superseded by a new more flexible CollectionFinder. The old
CollectionURIResolver
is still supported, but provides less capability.
The new mechanism is described in the Javadoc documentation; for an outline, see Collections.
To handle the Saxon collection URIs with options such as validation=strict
,
the Source
object that is returned can be an AugmentedSource, which holds parser
options as well as the source information itself.
In Saxon-EE, fn:collection()
is multi-threaded, parsing multiple documents simultaneously in different threads. This
previously happened within the default collection URI resolver; it now happens within
the code of the fn:collection()
function itself, so it works even if a
user-defined collection URI resolver is in use. An additional change in this release is
that the order in which documents are returned in the result of
fn:collection()
is now always the same as the order in which they are
delivered by the collection URI resolver, making the order more predictable at a slight
cost in latency.
Collections can now be stable, meaning that multiple calls with the same collection URI
are guaranteed to return the same results. Collection stability can be expensive,
because the contents of a collection have
to be maintained in memory just in case it is used again; it is therefore not the default,
even though required for conformance with the W3C specifications. Collection stability can be
switched on in several ways: the collection URI can include the query parameter
stable=yes
; the collection finder can return a
ResourceCollection object whose
isStable()
method returns true; or the configuration property
STABLE_COLLECTION_URI can be set to true. A collection
is stable if any of these methods returns true.
The option unparsed=true
among the query parameters of the collection URI is
no longer supported, as the functionality can now be achieved by calling fn:uri-collection()
followed by fn:unparsed-text().
A new option for the collection URI query parameters is metadata=yes
. When
this is used, the items returned by the collection()
function are maps; the
entries in the map include properties of the resources within the collection, plus a
function fetch()
that can be called to fetch the actual content of the
resource. For further details see Collections.
The standard URIResolver
and the standard ModuleURIResolver have been
enhanced to recognize the classpath URI scheme. For example, in XSLT it is now possible
to write <xsl:include href="classpath:utility.xsl">
which locates
utility.xsl
on the Java classpath. (The classpath URI scheme was
introduced as part of the Spring framework, but Saxon's implementation is
free-standing.) On the command line, in options such as -s
, names prefixed
classpath:
are now recognized (along with http
and
file
) as being URIs rather than filenames, avoiding the need to specify
the -u
option.
Location information
The Receiver interface has changed, so
that location information is now passed with all events (for example,
startElement
as a Location
object, rather than as an
integer locationId
). This change was necessary because with independent
compilation of packages, it becomes difficult to allocate globally unique location IDs
at package compile time. The change also enables richer location information to be
maintained, enabling more precise diagnostics especially of dynamic errors.
The move away from integer location IDs to Location
objects is fairly
pervasive, and affects many interfaces that are important to products that interface
intimately to Saxon, for example to provide debugging support. In particular expressions
in the expression tree now contain location information in the form of a Location
object; they no longer implement the SourceLocator
interface directly.
The Expression tree
There have been substantial changes to the internal structure of the Expression
tree.
These are only likely to affect applications that interface to Saxon at a very low level. Among the changes:
- The
Container
object has gone. - Expressions now contain a reference to their parent expression in the tree.
- An expression now contains a reference to a
RetainedStaticContext object,
which holds that part of the static context that might be needed at execution time. To save
space, an expression whose static context is the same as its parent or sibling expressions will
generally share the same
RetainedStaticContext
object. - Because expressions now hold more context information, the need to pass this information
dynamically during the type-checking and optimization processes using the
ExpressionVisitor
object is diminished.