Internal changes
There have been some changes to key internal interfaces which affect a great many classes throughout the product, and which also occasionally surface in APIs.
The SequenceIterator
interface, which is widely used throughout the Saxon code, has been
changed so that it no longer has a hasNext()
method. Instead, the caller should invoke next()
repeatedly, and the end of the sequence is indicated by returning null. The purpose of this change is to reduce
the number of method calls, but more importantly, to reduce the amount of state information that iterators
have to hold, and to reduce the effect whereby each iterator in a pipeline looks ahead by one item, causing
an unnecessary amount of wasted effort if the pipeline is aborted, which happens for example when finding
the effective boolean value of a sequence.
The internal representation of type information has changed, because of the need to accommodate
user-defined types. A new class (actually an Interface) ItemType
has been introduced; this
and the occurrence indicator form the two parts of a SequenceType
. The method getItemType
on an expression now returns an object that implements this interface. For atomic values, this is an
AtomicType
object, which is also used in the heirarchy of schema types. In the case of user-defined
atomic types, this object contains a reference to the SimpleType object held in the schema data model
(which will be available only in the schema-aware version of the product).
For nodes, the ItemType
interface is implemented by a NodeTest
,
which is also used to represent conditions in an AxisStep of a path expression, and which is a subclass
of Pattern
. In the case of node types that specify the required content type, for example
attribute(*,xs:date)
, a ContentTypeTest
is used.
A number of the implementations of the tree model create transient wrapper nodes whenever a path
expression is used to select a set of nodes. A new optimization has been introduced so that in the case
where the nodes are immediately atomized, the tree model is allowed to return the typed value of a node
instead of returning the node. This firstly avoids the cost of creating the wrapper node, and secondly avoids the cost of
creating another iterator to process the typed value, in the case where the typed value is
a singleton. This is currently done only in the common case where the typed value is actually untypedAtomic.
Any user-defined implementations of the tree model that implements the interface AxisIterator
will need
to support the additional method setIsAtomizing(); however, an implementation that does nothing is
acceptable.
The method getAttributeValue(uri, localName)
has been removed from the NodeInfo interface,
so there is one less thing that suppliers of this interface have to provide. It is replaced by a helper
method in the Navigator
object.
The typeCode
passed down the Receiver
pipeline is now the name pool
fingerprint of the actual type name. This is also the value that is stored as a type annotation in the
data model. Currently this is supported only in the TinyTree. In the non-schema-aware product, the
typeCode will always be -1, indicating that the node is untyped.
The way that standard names are handled in known namespaces such as XSLT, Saxon, and XML Schema has changed. The fingerprints for these names are now compile-time constants. The NamePool code has been adapted so that these namespaces are specially recognized, and the standard constants are returned. This saves time and space when building the NamePool. It also makes it possible to have a standard schema defined as a static Java object for the built-in types.
In response to suggestions from Karsten Rucker, I have made some changes designed to conserve memory in both the standard tree and tiny tree implementations of the data model. In the standard tree, the document node no longer contains a reference to the factory used to build it: this was preventing the XML parser and its buffers being garbage-collected. In the tiny tree, the condense() operation is now called after building trees from source documents (it was previously called only for temporary trees). It also now condenses the buffer used for character data.