Document Projection
Document Projection is a mechanism that analyzes a query to determine what parts of a document it can potentially access, and then while building a tree to represent the document, leaves out those parts of the tree that cannot make any difference to the result of the query.
Document projection can be enabled as an option on the XQuery command line interface: set
-projection:on
. It is only used if requested. The command line option affects
both the primary source document supplied on the command line, and any calls on the
doc()
function within the body of the query that use a literal string argument
for the document URI.
For feedback on the impact of document projection in terms of reducing the size of the source
document in memory, use the -t
option on the command line, which shows for each
document loaded how many nodes from the input document were retained and how many
discarded.
From the s9api API, document projection can be invoked as an option on the DocumentBuilder. The call
setDocumentProjectionQuery()
supplies as its argument a compiled query (an
XQueryExecutable
), and the document built by the document builder is then
projected to retain only the parts of the document that are accessed by this query, when it
operates on this document as the initial context item. For example, if the supplied query is
count(//ITEM)
, then only the ITEM
elements will be retained.
It is also possible to request that a query should perform document projection on documents
that it reads using the doc()
function, provided this has a string-literal
argument. This can be requested using the option setAllowDocumentProjection(true)
on the XQueryExpression
object. This is not available directly in the s9api
interface, but the XQueryExpression
is reachable from the
XQueryExecutable
using the accessor method
getUnderlyingCompiledQuery()
.
The more complex the query, the less likely it is that Saxon will be able to analyze it to determine the subset of the document required. If precise analysis is not possible, document projection has no effect. Currently Saxon makes no attempt to analyze accesses made within user-defined functions. Also, of course, Saxon cannot analyze the expectations of external (Java) functions called from the query.
Currently document projection is supported only for XQuery, and it works only when a document
is parsed and loaded for the purpose of executing a single query. It is possible, however, to
use the mechanism to create a manual filter for source documents if the required subset of the
document is known. To achieve this, create a query that selects the required parts of the
document supplied as the context item, and compile it to a s9api
XQueryExecutable
. The query does not have to do anything useful: the only
requirement is that the result of the query on the subset document must be the same as the
result on the original document. Then supply this XQueryExecutable
to the s9api
DocumentBuilder
used to build the document.
Of course, when document projection is used manually like this then it is entirely a user responsibility to ensure that the selected part of the document contains all the nodes required.