Java extension functions: full interface

With this approach, each extension function is implemented as a pair of Java classes. The first class, the ExtensionFunctionDefinition, provides general static information about the extension function (including its name, arity, and the types of its arguments and result). The second class, an ExtensionFunctionCall, represents a specific call on the extension function, and includes the call() method that Saxon invokes to evaluate the function.

The arguments passed in a call to an integrated extension function are type-checked against the declared types in the same way as for any other XPath function call, including the standard conversions such as atomization and numeric promotion. The return value is checked against the declared return type but is not converted: it is the responsibility of the function implementation to return a value of the correct type.

Here is an example extension written to the Java version of this interface. It takes two integer arguments and performs a "shift left" operation, shifting the first argument by the number of bit-positions indicated in the second argument:

class ShiftLeft extends ExtensionFunctionDefinition { @Override public StructuredQName getFunctionQName() { return new StructuredQName("eg", "http://example.com/saxon-extension", "shift-left"); } @Override public SequenceType[] getArgumentTypes() { return new SequenceType[]{SequenceType.SINGLE_INTEGER, SequenceType.SINGLE_INTEGER}; } @Override public SequenceType getResultType(SequenceType[] suppliedArgumentTypes) { return SequenceType.SINGLE_INTEGER; } @Override public ExtensionFunctionCall makeCallExpression() { return new ExtensionFunctionCall() { @Override public Sequence call(XPathContext context, Sequence[] arguments) throws XPathException { long v0 = ((IntegerValue)arguments[0]).longValue(); long v1 = ((IntegerValue)arguments[1]).longValue(); long result = v0<<v1; return Int64Value.makeIntegerValue(result); } }; } }

The extension must be registered with the configuration:

configuration.registerExtensionFunction(new ShiftLeft())

and it can then be called like this:

declare namespace eg="http://example.com/saxon-extension"; for $i in 1 to 10 return eg:shift-left(2, $i)

The methods that must be implemented (or that may be implemented) by an integrated extension function are listed in the table below. Further details are in the Javadoc.

First, the ExtensionFunctionDefinition class:

Method

Effect

getFunctionQName

Returns the name of the function, as a QName (represented by the Saxon class StructuredQName). Like all other functions, integrated extension functions must be in a namespace. The prefix part of the QName is immaterial.

getMinumumNumberOfArguments

Indicates the minimum number of arguments that must be supplied in a call to the function. A call with fewer arguments than this will be rejected as a static error.

getMaximumNumberOfArguments

Indicates the maximum number of arguments that must be supplied in a call to the function. A call with more arguments than this will be rejected as a static error.

getArgumentTypes

Returns the static type of each argument to the function, as an array with one member per argument. The type is returned as an instance of the Saxon class SequenceType. Some of the more commonly-used types are represented by static constants in the SequenceType class. If there are fewer members in the array than there are arguments in the function call, Saxon assumes that all arguments have the same type as the last one that is explicitly declared; this allows for functions with a variable number of arguments, such as concat().

getResultType

Returns the static type of the result of the function. The actual result returned at runtime will be checked against this declared type, but no conversion takes place. Like the argument types, the result type is returned as an instance of SequenceType. When Saxon calls this method, it supplies an array containing the inferred static types of the actual arguments to the function call. The implementation can use this information to return a more precise result, for example in cases where the value returned by the function is of the same type as the value supplied in the first argument.

trustResultType

This method normally returns false. It can return true if the implementor of the extension function is confident that no run-time checking of the function result is needed; that is, if the method is guaranteed to return a value of the declared result type.

dependsOnFocus

This method must return true if the implementation of the function accesses the context item, context position, or context size from the dynamic evaluation context. The method does not need to be implemented otherwise, as its default value is false.

hasSideEffects

This method should be implemented, and return true, if the function has side-effects of any kind, including constructing new nodes if the identity of the nodes is signficant. When this method returns true, Saxon will try to avoid moving the function call out of loops or otherwise rearranging the sequence of calls. However, functions with side-effects are still discouraged, because the optimizer cannot always detect their presence if they are deeply nested within other calls.

makeCallExpression

This method must be implemented; it is called at compile time when a call to this extension function is identified, to create an instance of the relevant ExtensionFunctionCall object to hold details of the function call expression.

The methods defined on the second object, the ExtensionFunctionCall, are:

Method

Effect

supplyStaticContext

Saxon calls this method fairly early on during the compilation process to supply details of the static context in which the function call appears. The method may in some circumstances be called more than once; it will always be called at least once. As well as the static context information itself, the expressions supplied as arguments are also made available. If evaluation of the function depends on information in the static context, this information should be copied into private variables for use at run-time.

rewrite

Saxon calls this method at a fairly late stage during compilation to give the implementation the opportunity to optimize itself, for example by performing partial evaluation of intermediate results, or if all the arguments are compile-time constants (instances of Literal) even by early evaluation of the entire function call. The method can return any Expression (which includes the option of returning a Literal to represent the final result); the returned expression will then be evaluated at run-time in place of the original. It is entirely the responsibility of the implementation to ensure that the substitute expression is equivalent in every way, including the type of its result.

copyLocalData

Saxon occasionally needs to make a copy of an expression tree. When it copies an integrated function call it will invoke this method, which is responsible for ensuring that any local data maintained within the function call objects is correctly copied.

call

Saxon calls this method at run-time to evaluate the function. The value of each argument is supplied in the form of a Sequence, representing the sequence that make up the value of the argument (this might of course be a single value, considered as a sequence of length one). This may use lazy evaluation, which means that a dynamic error can occur when reading the next item from the Sequence; it also means that if the implementation does not require all the items from the value of one of the arguments, they will not necessarily be evaluated at all.

The call method must also deliver the result in the form of a Sequence. Saxon provides many subclasses of Sequence that are available for use. To return a string S, use new net.sf.saxon.value.StringValue(S); to return a boolean B, use net.sf.saxon.value.BooleanValue.get(B); to return an integer I, use net.sf.saxon.value.Int64Value.makeIntegerValue(I). The Saxon class NodeInfo also implements Sequence, and can be used to return single nodes. To return an empty sequence, use EmptySequence.getInstance(). If you want to return a sequence of two or more items, a convenient class to use is ZeroOrMore, which has a constructor that takes a Java List<net.sf.saxon.om.Item>, where the items can be, for example, strings, booleans, integers, or nodes represented as described above.

For ultimate performance when returning a very long sequence (for example, the results of a database query), use the class LazySequence, which allows the items in the sequence to be computed on demand rather than being stored en bloc in memory.

Having written an integrated extension function, it must be registered with Saxon so that calls on the function are recognized by the parser. This is done using the registerExtensionFunction method available on the Configuration class, and also on the s9api Processor class. It can also be registered via an entry in the configuration file. The function can be given any name, although names in the fn:, xs:, and saxon: namespaces are strongly discouraged and may not work.

It is also possible to register integrated extension functions under XQJ: this is done by locating the Configuration that underpins the XQDataSource or XQConnection by casting it to the Saxon implementation class (SaxonXQDataSource or SaxonXQConnection) and calling getConfiguration().