This article describes how an XSLT processor, in this case the author's open-source Saxon, actually works. Although several open-source XSLT implementations exist, no one, as far as we know, has published a description of how they work. This article is intended to fill that gap. It describes the internal workings of Saxon, and shows how this processor addresses XSLT optimization. It also shows how much more work remains to be done. This article assumes that you already know what XSLT is and how it works.
I hope this article serves a number of purposes. First, I hope it will give style sheet authors a feel for what kind of optimizations they can expect an XSLT processor to take care of, and by implication, some of the constructs that are not currently being optimized. Of course, the details of such optimizations vary from one processor to another and from one release to another, but I'm hoping that reading this account will give you a much better feel for the work that's going on behind the scenes. Second, it describes what I believe is the current state of the art in XSLT technology (I don't think Saxon is fundamentally more or less advanced than other XSLT processors in this respect), and describes areas where I think there is scope for further development of techniques. I hope this description might stimulate further work in this area by researchers with experience in compiler and database optimization.
Finally (last and also least), this article is intended to be a starting point for anyone who wants to study the Saxon source code. It isn't written as a tour of the code, and it doesn't assume that you want to go into that level of depth. But if you are interested in getting a higher-level overview than you can get by diving into the JavaDoc specs or the source code itself, you'll probably find this useful.
A couple of caveats: the version I describe is Saxon 6.1, and I
describe a functional breakdown of the code that doesn't always map
cleanly to the package and module structure. For example, this article
describes the compiler and interpreter as separate functional components.
But in the actual code, the module that handles the
<xsl:choose>
instruction, for example, contains both
compile-time and run-time code to support this instruction. In case you do
want to use this article as a guide to the code, I've included occasional
references to package and module names so you know where to look.
First I'll describe the design of the Saxon product. Saxon is an XSLT processor. That is, it is a program that takes an XML document and a style sheet as input and produces a result document as output.
Saxon includes a copy of the open-source AElfred XML parser originally written by David Megginson, although it can be used with any other XML parser that implements the Java SAX interface. Saxon also includes a serializer that converts the result tree to XML, HTML, or plain text. The serializer is not technically part of the XSLT processor, but it is essential for practical use.
Saxon implements the TrAX (transformation API for XML) interface defined as part of the JAXP 1.1 Java extensions. You don't need to know about this interface to appreciate this article, but understanding the architecture of TrAX would help you to understand the way Saxon is structured.
Saxon architecture: The main components of the Saxon software are shown in Figure 1.
Saxon Architecture
The tree constructor creates a tree representation of a source XML document. It is used to process both the source document and the style sheet. There are two parts to this:
com.icl.saxon.aelfred
)
reads the source document and notifies events such as the start and end
of an element. com.icl.saxon.Builder
)
is notified of these events, and uses them to construct an in-memory
representation of the XML document. The interface between the parser and the builder is the Java SAX2 API.
Although this SAX API has been only informally standardized, it is
implemented by half a dozen freely available XML parsers, allowing Saxon
to be used with any of these parsers interchangeably. In between the
parser and the tree builder sits a component which I call the
Stripper
(I couldn't resist the name): The
Stripper
performs the function of removing whitespace text
nodes before they are added to the tree, according to the
<xsl:preserve-space>
and
<xsl:strip-space>
directives in the style sheet (module
com.icl.saxon.Stripper
). The Stripper
is a good
example of a SAX filter, a piece of code that takes a stream of SAX events
as input and produces another stream of SAX events as output. At a more
macroscopic level, an entire Saxon transformation can also be manipulated
as a SAX filter. This approach makes it very easy to split up a complex
transformation into a series of simple transformations arranged in a
pipeline.
The tree navigator, as the name suggests, allows applications to select nodes from the tree by navigating through the hierarchy. The tree representation constructed by the builder component is proprietary to Saxon. This is an area where Saxon differs from some other XSLT processors: some of them use a general-purpose DOM model as their internal tree. The advantage of using the DOM is that the tree can then be produced by third-party software. Trees constructed for a different purpose can be supplied directly as input to a transformation, and equally, the output of a transformation can be used directly by DOM-based applications.
In Saxon I took the view that the interoperability offered by using the DOM comes at too high a cost. First, the DOM tree model differs subtly from the XPath model needed by an XSLT processor, and this difference imposes run-time costs in mapping one model to the other. For example, a DOM tree can contain information that the XPath model does not require, such as entity nodes. Second, DOM trees can be updated in place, whereas the XSLT processing model means that trees are written only sequentially. Designing a tree model that can be written only sequentially allows efficiencies to be achieved. For example, each node can contain a sequence number that makes it easy to sort nodes in their sequential document order, a frequent XSLT requirement. Finally, DOM implementations generally include a lot of synchronization code to make multithreaded access safe. Because the XSLT processing model is "write-once, read-many," the synchronization logic can be much simpler, leading to faster navigation of the tree.
Actually, as you will see, Saxon offers two different tree
implementations, each with its own builder and navigation classes
(packages com.icl.saxon.tree
and
com.icl.saxon.tinytree
). The two implementations offer
different performance trade-offs.
The style sheet compiler analyses the style sheet prior to
execution. It does not produce executable code; it produces a
decorated-tree representation of the style sheet in which all XPath
expressions are validated and parsed, all cross-references are resolved,
stack-frame slots are pre-allocated, and so on. The style sheet compiler
thereby performs the important function of constructing a decision tree to
use at execution time to find the right template rule to process each
input node; it would be grossly inefficient to try matching each node
against each possible pattern. The decorated tree then comes into play at
transformation time to drive the style sheet processing. (The compiler is
distributed across the classes in the com.icl.saxon.style
package, especially the methods prepareAttributes()
,
preprocess()
, and validate()
).
At one stage Saxon did actually include a style sheet compiler that produced executable Java code. However, it handled only a subset of the XSLT language, and as the technology developed, the performance gains achieved by full compilation were dwindling. Eventually I abandoned that approach as the development complexity grew while the performance benefits declined. There is currently no full XSLT compiler on the market. Sun has produced an alpha release of a compiler called XSLTC which looks promising (see Resources), though it is still at an early stage of development.
The decorated tree produced by Saxon's style sheet compiler (rooted at
class com.icl.saxon.style.XSLStyleSheet
) cannot be saved to
disk, because reading the tree back into memory takes longer than
recompiling the original (largely because of its increased size). You can
reuse the tree so long as it remains in memory. The tree is wrapped in an
object called the PreparedStyleSheet
, which implements the
javax.xml.transform.Templates
interface in JAXP 1.1. It is
quite common in a server environment to use the same style sheet
repeatedly to transform many different source documents. To allow this,
the compiled style sheet is strictly read-only at execution time, allowing
it to be used in multiple execution threads simultaneously.
The core of the Saxon processor is the style sheet interpreter
(class com.icl.saxon.Controller
, which implements the
javax.xml.transform.Transformer
interface in JAXP 1.1). This
interpreter uses the decorated style sheet tree to drive processing.
Following the processing model of the language, it first locates the
template rule for processing the root node of the input tree. Then it
evaluates that template rule (or it's "instantiated," in the jargon of the
standard).
The style sheet tree uses different Java classes to represent each XSL instruction type. For example, consider the instruction:
<xsl:if test="parent::section">
<h3><xsl:value-of select="../@title"/></h3>
</xsl:if>
The effect of this code fragment is to output an
<h3>
element to the result tree if the current node on
the source tree has a parent element of type <section>
;
the text content of the generated <h3>
node is the
value of the title
attribute of the parent
section
.
This code fragment is represented on the decorated style sheet tree by the structure shown in Figure 2.
Figure 2. The decorated style sheet tree
Elements in the style sheet map directly to nodes on the tree, as shown
in Table
1. All the Java objects that represent elements are subclasses of
com.icl.saxon.style.StyleElement
, which is a subclass of
com.icl.saxon.tree.ElementImpl
, the default implementation of
an element node in the Saxon tree structure. The two XPath expressions are
represented by com.icl.saxon.expr.Expression
objects
referenced from the nodes of the tree.
Table 1. Style sheet elements and their corresponding Java classes
Element or expression... | ...represented by an object in this Java class
(subclasses of com.icl.saxon.style.StyleElement ) |
<xsl:if> |
com.icl.saxon.style.XSLIf |
<h3> (output, not instruction) |
com.icl.saxon.style.LiteralResultElement |
<xsl:value-of> |
com.icl.saxon.style.XSLValueOf . |
XPath expressions | com.icl.saxon.expr.Expression |
Executing the <xsl:if>
instruction causes the
process()
method of the corresponding XSLIf
object to be executed. This method accesses the test
Expression
object, which has a method,
evaluateAsBoolean()
. evaluateAsBoolean()
is used
to evaluate the expression to return a Boolean result. (This is an
optimization: it would be possible to use a straightforward
evaluate()
call and then convert the result to a Boolean, as
described in the specification. But knowing that a Boolean is wanted often
enables faster evaluation. For example, when the actual value or the
expression is a node-set, the final Boolean result is known as soon as a
single member of the node-set is found).
If the result of evaluateAsBoolean()
is true, the
process()
method calls the process()
method of
all the child nodes of the XSLIf
node on the style sheet
tree. If the result not true, it simply exits.
Similarly, the process()
method for a
LiteralResultElement
copies the element to the result tree
and processes the children of the LiteralResultElement
, while
the process()
method of the XSLValueOf
object
evaluates the select expression as a string, and copies the result as a
text node to the result tree.
So the key components of the style sheet interpreter are:
The XPath interpreter (package com.icl.saxon.expr
) closely
follows the Interpreter design pattern, one of the 23 classic
patterns for object-oriented software described by Gamma, Helm, Johnson,
and Vlissides. Each construct in the XPath grammar has a corresponding
Java class. For example, the UnionExpr
construct (written as
A|B
, and representing the union of two node sets) is
implemented by the class com.icl.saxon.expr.UnionExpression
.
The XPath expression parser (module
com.icl.saxon.expr.ExpressionParser
), which is normally
executed when the style sheet is compiled, generates a data structure that
directly reflects the parse tree of the expression. To evaluate the
expression, each class in this structure has an evaluate()
method, which is responsible for returning its value. In the case of the
UnionExpression
class, the evaluate()
method
evaluates the two operands, checks that the result is in both cases a
node-set, and then forms the union using a sort-merge strategy.
As in the design pattern described by Gamma et al., the
evaluate()
method takes a Context
parameter. The
Context
object encapsulates all contextual information needed
to evaluate the expression.
This includes:
position()
and last()
) com.icl.saxon.Bindery
object, where
values of variables are held The XPath interpreter also includes optimization features that extend the basic Interpreter pattern of Gamma et al.
For example:
simplify()
method, to
allow expression rewriting. This enables context-independent
optimizations to be performed. Sometimes this results in transformation
to a different XPath expression (for example title[2=2]
is
rewritten as title
, while count($x)=0
is
rewritten as not($x)
). More often the expression rewrite
exploits internal classes that represent special cases. For example, the
expression section[@title]
returns all child
<section>
s of the current element that have a title
attribute. Because of the context in which the sub-expression
@title
appears, it is possible to rewrite it using a
special-purpose class that tests for the presence of a given attribute
on the current node and returns a Boolean value. evaluate()
method, and
an enumerate()
method. This (in the case of expressions
representing node sets) allows the nodes to be retrieved incrementally,
rather than all at once. This allows pipelined execution, in the typical
manner adopted by relational database systems. Calling
enumerate()
on a union expression, for example, works by
merging the enumerations of its two operands. So long as the operands
are both already sorted into document order, this avoids the need to
allocate memory for the intermediate node sets. getDependencies()
which returns information about the
aspects of the context that the method depends on. This already makes
certain optimizations possible. For example, if the expression doesn't
use the last()
function then it is not necessary to do
look-ahead processing to determine how many elements there are in the
context list. Further, each expression has a method
reduceDependencies()
, which returns an equivalent
expression in which specified dependencies are eliminated, while others
are retained. This is useful where the same expression is used
repeatedly. For example, before a sort is carried out, the sort key
expression is reduced to eliminate dependencies on variables (because
these will be the same for every node in the list) but not on the
current node (which will be different for each item in the list).
The XSLT language gives the processor great freedom to evaluate expressions in any order it chooses, because of the absence of side effects. The general policy in Saxon is that scalar values (strings, numbers, Booleans) are evaluated as early as possible, while node-set values are evaluated as late as possible. Evaluating scalar values early enables optimization by doing things only once. Delaying the evaluation of node-set values saves memory, by avoiding holding large lists in memory unnecessarily. It can also save time, if it turns out (as it often does) that the only thing done with the node-set is to test whether it is empty, or to get the value of its first element.
Finally, the outputter component (class
com.icl.saxon.output.Outputter
) is used to control the output
process. Saxon's result tree is not normally materialized in memory --
because its nodes are always written in document order, they can be
serialized as soon as they are output to the result tree. In practice the
transformation does not have a single result tree but a changing stack of
result trees, because XSL instructions such as
<xsl:message>
and <xsl:attribute>
effectively redirect the output to an internal destination, while
<xsl:variable>
constructs a result-tree fragment which
is actually a separate tree in its own right. The interpreter code for
these elements calls the outputter to switch to a new destination and
subsequently to revert to the original destination.
External output is written to a file using a serializer. Logically this
takes the result tree as input and turns it into a flat file document. In
practice, as you have seen, the result tree is not materialized in memory,
so the serializer is handed the nodes of the tree one at a time in
document order. This stream of nodes is presented using a SAX2-like
interface (com.icl.saxon.Emitter
): it differs from SAX2 in
the details of how names and namespaces are represented. As defined in the
XSLT Recommendation, there are separate serializers for XML, HTML, and
plain text output. Saxon also allows the tree to be supplied to
user-written code for further processing, or to be fed as input to another
style sheet. This allows a you to achieve a multiphase transformation by
applying several style sheets in sequence.
Performance
Good
performance is necessarily a driving factor in the design of Saxon, second
only to conformance with the XSLT specification. This is partly because it
is critical to users, but also because in a world where there are several
free XSLT processors available, performance will tend to be the main
distinguishing feature.
This section discusses some of the factors that affect the performance of an XSLT processor, as well as the strategies Saxon uses to improve speed in each case.
Java language issues
It is
often said that Java is slow. There is some justification in this, but the
statement needs to be carefully qualified.
Many people imagine Java is slow because it generates interpreted bytecode rather than native code. This used to be true, but not any longer with today's just-in-time compilers. Raw code execution speed is usually almost as good as -- sometimes better than -- the equivalent code written in a compiled language such as C.
Where Java can have a problem is with memory allocation. Unlike C and C++, Java takes care of memory itself, using a garbage collector to free unwanted objects. This brings great convenience to the programmer, but it is easy to create programs that are profligate with memory: they thrash due to excessive use of virtual memory, or they place great strain on the garbage collector due to the frequency with which objects are allocated and released.
Some coding techniques minimize the memory-allocation problems. For
example the use of StringBuffer
objects rather than
String
s, use of pools of reusable objects, and so on.
Diagnostic tools can help the programmer determine when to use those
techniques. Getting the code fast does require a lot of tuning, but that
is arguably still much easier than using a language such as C++, in which
you must manage all the memory allocation manually.
XSLT processing brings a particular challenge to Java in the
implementation of the tree structure. Java imposes considerable red-tape
overhead in the size of each object (up to 32 bytes, depending on the Java
VM used). This often yields a tree structure in memory many times larger
than the source XML file. For example, the empty element
<a/>
(four bytes in the source file) could expand to an
Element
object for the node, a Name
object for
its name, a String
object referenced by the Name
object, an empty AttributeList
node, an empty
NamespaceList
node, plus numerous 64-bit object references to
link these objects with each other and with the parent, sibling, and child
nodes in the tree. A nave implementation could easily generate 200 bytes
of tree storage from these four bytes of source. Given that some users are
trying to process XML documents whose raw size is 100MB or more, the
consequences are predictable and generally fatal.
This is one reason Saxon went down the route of having its own tree
structure. By removing the requirement to implement the full DOM
interface, I was able to eliminate some data from the tree. Removing the
requirement to support update is particularly useful. For example, Saxon
uses a different class for elements that have no attributes, knowing that
if an element has no attributes to start with, it will never acquire any
later. Another technique Saxon uses is to optimize the storage of the
common situation of an element that contains a single child text node, for
example <b>text</b>
.
The XPath tree model, as described in the W3C specification, includes nodes for attributes and namespaces. Because these nodes are rarely accessed in the course of a transformation, Saxon constructs these nodes on demand rather than having them permanently take up space on the tree. (This is the Flyweight design pattern of Gamma et al.)
The latest release of Saxon has gone one step further: using a tree
implementation in which the nodes are not represented by Java objects at
all. Instead, all the information in the tree is represented as arrays of
integers. All nodes are created as transient (or flyweight) objects,
constructed on demand as references into these arrays and discarded when
they are no longer needed. This tree implementation (package
com.icl.saxon.tinytree
) takes up far less memory and is
quicker to build, at the cost of slightly slower tree navigation. On
balance, it appears to perform better than the standard tree, and I
therefore provide it as the default.
The standard utility classes such as Hashtable
and
Vector
also affect Java program performance. Developers find
it tempting to use these convenient classes liberally throughout an
application. However, there is a price to pay. Partly because the classes
usually do more than you actually need, they impose more overhead than a
class designed for one purpose only. They are also designed to handle a
worst-case situation in terms of multithreading. If you know that a data
structure will not be accessed by multiple threads simultaneously, you can
spare yourself the synchronization costs by designing your own objects
rather than using these off-the-shelf classes. Replacing
Vector
s by arrays often pays dividends, the only downside
being that you need to handle expansion of the array manually whereas
Vectors
are self managing.
Location path
evaluation
The most characteristic kind of XPath expression
(the one from which XPath gets its name) is the location path. Location
paths are used to select nodes in the source tree. A location path
essentially consists of an origin and a number of steps. It is similar to
a UNIX filename or a URL, except that each step selects a set of nodes
rather than a single node. For example ./chapter/section
selects all the <section>
children of all the
<chapter>
children of the current node. The origin
identifies the start point for navigating the tree: it might be the
current node, the root node of the source tree, the root node of a
different tree, or a set of nodes located by value using a key. Each step
in the location path navigates from one node to a related set of nodes.
Each step is defined in terms of a navigation axis (the child axis being
the default): For example the ancestor axis selects all ancestor nodes,
the following-sibling axis selects all following siblings of the origin
node, the child axis selects its children. As well as specifying an axis,
each step may specify the type of node required (such as elements,
attributes, or text nodes), the name of the required nodes, and predicates
that the nodes must satisfy (for example, child text nodes whose value
begins with B).
Devising an execution strategy for a location path is equivalent to the
problem of optimizing a relational query, though the theory is currently
much less advanced, and most of the techniques used differ little from the
nave strategy of doing the navigation exactly the way it is described in
the specification. For example, although it is possible in a style sheet
to specify keys that must be built to support associative access (rather
like the CREATE INDEX
statement in SQL), Saxon currently uses
these indexes only to support queries that reference them explicitly (by
using the key()
function) and never to optimize a query that
uses straightforward predicates.
The optimization techniques currently used in Saxon for location paths include:
//item
(which is
defined to be an abbreviation for
/descendent-or-self::node()/item
) can be replaced by
/descendent::item
provided it uses no positional
predicates. The latter expression will naturally retrieve nodes in
document order, whereas the former might not. $x[count(.|$y)=count($y)]
(which is the only convenient way
in XSLT 1.0 of doing a set intersection operation), Saxon will evaluate
count($y)
only once. para[position() <= 3]
selects the first three
<para>
children of the current node. It is not
necessary to apply this predicate explicitly to every
<para>
element to see if it is true, since processing
can stop after the third node. child::title
scans all the child elements looking for those
whose name is title
, the similar expression
attribute::title
(usually abbreviated to
@title
) gets the relevant attribute directly. com.icl.saxon.expr.NodeSetIntent
) in which all the context
dependencies have been removed. It is only when the intensional node-set
is actually used that its members are enumerated: and depending on how
it is used, they may not need to be retrieved at all. For example if the
node-set is used in a Boolean context, the only processing needed is to
test whether it is empty. When an intensional node-set is used for the
third time, it is stored extensionally, trading memory for processing
time. This is like materializing a view in SQL. Style sheet compilation
I
have already described how the first thing Saxon does is to "compile" the
style sheet into a decorated tree for efficient execution subsequently.
This offers a great opportunity to do things once only rather than doing
them each time the relevant instructions are executed.
Some of the tasks done during the style sheet compilation phase are as follows:
<xsl:for-each
select="$x+2">
can be instantly recognized as an error because
the XPath expression $x+2
can never return a node-set. In
many cases it is even possible to detect that <xsl:for-each
select="$x">
is an error, because the absence of assignment
statements means that the type of the variable can often be inferred
from its declaration. <td width="{$x * 2}">
outputs a
<td>
element whose width
attribute is
computed at run time. An important compilation task is to convert
attribute value templates into an efficient structure for evaluation at
run time. In other cases, doing things at compile time is less feasible, but
savings can be made by avoiding repeated execution at run time. An example
is the format-number()
function, which takes as one of its
arguments a pattern describing the output format required for a decimal
number. Considerable savings are possible by detecting the common case
where the format pattern is the same as on the previous execution. The
only tricky aspect of such optimizations is to keep the memory of previous
executions in a place associated with the current thread: it cannot be
kept on the style sheet tree itself, as that needs to be thread safe.
Pattern matching for template rules
The pattern matching operation is potentially expensive, so it is vital to focus the search intelligently. The style sheet compiler therefore constructs a decision tree which is used at run time to decide which template rule to apply to a given node.
I'm using the term decision tree here loosely. This section
describes the actual data structures and algorithms in a little more
detail. (See modules com.icl.saxon.RuleManager
and
com.icl.saxon.Mode
in the source code.)
When the <xsl:apply-templates/>
instruction is
applied to a node, a template rule must be selected for that node. If
there is no matching rule, Saxon uses a built-in rule. If there is more
than one rule, the processor resorts to an algorithm from the XSLT
specification for deciding which rule takes precedence. This algorithm is
based on user-allocated priorities, system-allocated priorities, the
precedence relationships established when one style sheet imports another,
and -- if all else fails -- the relative order of rules in the source
style sheet.
In a typical style sheet, most template rules match element nodes in
the tree. Rules to match other nodes, such as text nodes, attributes, and
comments, are comparatively rare. Also, most template rules supply the
name of the element they must match in full. Rules for unnamed nodes are
allowed but not often used (for example, *[string-length(.) >
10]
, which matches any element with more than 10 characters of text
content).
Saxon's strategy is therefore to separate rules into two kinds: specific rules, where the node type and name are explicitly specified in the pattern, and general rules, where they aren't. The data structure for the decision tree contains a hash table for the specific rules, keyed on the node type and node name, in which each entry is a list of rules sorted by decreasing priority; plus a single list for all the general rules, again in priority order. To find the pattern for a particular node, the processor makes two searches: one for the highest-priority specific rule for the relevant node type and name that matches the node, and one for the highest-priority general rule that matches the node. Whichever of these has highest priority is then chosen.
For a multipart pattern such as chapter/title
, the
algorithm used is recursive: The match is true if the node being tested
matches title
and if its parent node matches
chapter
(module
com.icl.saxon.pattern.LocationPathPattern
). This simple
approach can't be used for patterns that use positional predicates; for
example chapter/para[last()]
, which only matches a
para
element if it is the last one in a chapter. Matching
these positional patterns is potentially very expensive, so it's worth
handling the common case of a pattern like para[1]
specially.
Numbering
Numbering the
nodes on the tree (using the <xsl:number/>
instruction)
poses a particular optimization challenge. This is because each execution
of <xsl:number/>
works independently to assign a number
to the current node, the number being defined by a complex algorithm using
various attributes on the <xsl:number/>
instruction.
Nothing inherent in the algorithm says that if the last node was numbered
19, the next one will be numbered 20, yet in most common cases that is
indeed the case. It is important to detect those common sequential cases.
Otherwise the numbering of a large node-set will have O(n2)
performance, which is what happens if the numbering algorithm as specified
in the XSLT Recommendation is applied to each node independently.
Saxon achieves this optimization for a small number of common cases,
where most of the attributes to the numbering algorithm are defaulted.
Specifically, it remembers the most recent result of an
<xsl:number/>
instruction, and if certain rather
complex but frequently satisfied conditions are true, it knows that it can
number a node by adding one to this remembered number.
Finally
I have tried in
this article to give an overview of the internals of the Saxon XSLT
processor, and in particular of some of the techniques it uses to improve
the speed of transformation. In the 18 months or so since I released the
first early versions of Saxon, performance has improved by a factor of 20
(or more, in the case of runs that were thrashing for lack of memory).
It's unlikely that the next 18 months will see a similar improvement.
However, there is still plenty of scope, especially for constructs like
<xsl:number/>
. To take another example, Saxon has not
even started to explore to possibilities opened by parallel execution,
something the language makes a highly attractive option.
Perhaps the biggest research challenge is to write an XSLT processor that can operate without building the source tree in memory. Many people would welcome such a development, but it certainly isn't an easy thing to do.