Optimizations and performance improvements
The strategy for lazy evaluation of variables has changed. In the past, Saxon made a compile-time decision whether to evaluate variables eagerly or lazily. The problem with this is that it's hard to get the decision right: lazy evaluation imposes a significant overhead (it has to save a copy of the evaluation context) which is not always justified. So Saxon 12 now uses a dynamic learning approach: if lazy evaluation of a variable, after the first few dozen attempts, looks as if it is giving no benefit, future evaluations of the same variable will be done eagerly. This can make significant differences to the execution time of a query or stylesheet, and also to its allocation of heap memory and hence garbage collection costs.
The evaluation of filter expressions with more than one predicate has changed. In some cases, predicates can be
reordered to allow more efficient evaluation, taking advantage of indexes. For example, given the expression
//item[contains(@description, "cheese")][@useByDate="2022-12-01"]
, the evaluation (in Saxon-EE only)
might be rearranged to use an index on the value of @useByDate
. The problem is that this can sometimes
trigger dynamic errors that the code is written to prevent: consider
//item[@code castable as xs:integer][xs:integer(@code)=4]
. While this rewrite is explicitly
permitted in XPath 3.0, it is recognized that it causes problems, so the rules have changed in the draft 4.0
specification: it is no longer permitted to rearrange the predicates if this might trigger a dynamic error.
Saxon 12 implements the new rules. It will still change the order of evaluation of the predicates where
appropriate, but if the second predicate throws an error, it will evaluate the first predicate and mask the
error if the first predicate is false.
The same logic applies to and
and or
expressions. The effect is that although
the operands may be evaluated in any order, an error in evaluating one operand will never be propagated
if the other operand is false (in the case of and
), or true (in the case of or
).
Bytecode generation is dropped from SaxonJ. Over time, as the JVM JIT compiler has improved, the benefits obtained from bytecode generation have been steadily diminishing, to the point where it is no longer worth maintaining the code. Internal changes in 12.0 to improve the interpreted code have further reduced any advantage obtained from bytecode generation, to the point where the majority of workloads gain no benefit at all. In addition, bytecode generation is not applicable for the newer platforms (SaxonCS is now generated from C# source code, while SaxonC uses the ahead-of-time code generation capabilities of GraalVM).
Internally, a number of code paths have been changed to avoid use of Class.newInstance()
, which is
deprecated since Java 9, and which causes operational difficulties under GraalVM. For example, system functions
were previously registered with a Class<? extends SystemFunction>
object such as Replace.class
,
and were instantiated
using newInstance()
; they are now registered as a Supplier<? extends SystemFunction>
,
with a lambda function of the form () -> new Replace()
, and are instantiated by invoking this factory
method. (A consequence is that the same class can now implement several closely-related functions, such as fn:true()
and fn:false()
, or fn:exists()
and fn:empty()
.)
There are significant changes in the implementation of XDM arrays:
- Saxon uses two main implementations of arrays: SimpleArrayItem is a wrapper over a Java list of GroundedValue objects representing its members; and ImmutableArrayItem is a structure that allows efficient modification (operations such as put, remove, and append do not require copying all the existing data).
- The
ImmutableArrayItem
implementation has been rewritten to use Saxon's ZenoChain internally: this is a completely different data structure. (This change was actually made in 11.4 to fix bugs.) - The choice between the two structures is now in some cases made dynamically, based on accumulated experience. Specifically, if an
expression delivers arrays which tend to be frequently modified by addition or removal of members, then Saxon will learn
to use an
ImmutableArrayItem
for future evaluation of the same expression. At present this is implemented only for "square array constructors".