Saxonica: Publications

Publications

Blog

The Saxonica blogs contain in-depth entries about a variety of topics relating to the current development of Saxon.

Books

XSLT 2.0 Programmer's Reference 4th edition by Michael Kay, published by Wrox Press. This book is widely recognized as the authoritative reference on the XSLT 2.0 language, second only to the W3C specification itself. It covers every feature of the language comprehensively, while at the same time explaining the concepts behind the language design, and giving many examples of practical stylesheets to illustrate each language feature.

Michael Kay's XSLT 2.0 and XPath 2.0 (for XML, XSLT, and XPath) is some of the best money I've ever spent on XML-technology-related documentation - it is a fantastic piece of work.

— Bridger Dyson-Smith, posting on xsl-list, 2 August 2014

Find it on amazon.com

Previous editions

The third edition was published in two separate volumes, covering XSLT 2.0 and XPath 2.0 separately. This edition was produced before the final specifications were ratified by W3C, so there are some inaccuracies. The format (split into two volumes) was not especially popular with readers, especially as many made the mistake of buying the XSLT volume on its own, without realising that it relied heavily on the reader also having access to the XPath book. Navigation in the book was also difficult because of the absence of running heads for the alphabetical chapters. The fourth edition corrects all these problems, and has received a much more enthusiastic reception.
The second edition remains in print, and is useful as the definitive reference to the XSLT 1.0 language (though it does include some features from the draft XSLT 1.1 specification, which W3C abandoned just before the book went to print.
The first edition was published in April 2000, very soon after the XSLT 1.0 specification was ratified. It quickly established itself as the definitive guide to the language and played a significant part in ensuring the rapid and successful adoption of XSLT by the user community.

Also available:

XQuery from the Experts: A Guide to the W3C XML Query Language http://www.amazon.com/exec/obidos/ASIN/0321180607

Eight chapters by members of W3C's Query Working Group provide an overview of XQuery designed to be of interest to programmers at every skill level. Coverage ranges from strictly technical subjects to historical essays on the language's ancestry and the process behind XQuery's design. The book presents its material in both tutorial and reference form.

Michael Kay's chapter provides a high-level comparison of XQuery and XSLT, looking both at the differences between the two languages and at their similarities.

Chapter Three is especially helpful for understanding the similarities and differences between XQuery, XPath and XSLT. To really understand where XQuery fits, you must understand this interrelationship. Not only does Mr. Kay do a great job explaining that, he actually makes it fun to read.

— A quote from a reader's review

Return to top of page

Podcasts

In January 2022, Michael Kay was interviewed by Yegor Bugayenko for the Shift-M podcast. They discussed the history and the future of XSLT, the secrets of the Saxonica business, and software development in general. The podcast video is available on YouTube.

Return to top of page

Published papers and articles

Schema-Aware Conversion of XML to JSON

Michael Kay. Presented at Balisage 2023, Washington DC.

A W3C Community Group has been formed to develop proposed specifications for 4.0 versions of XSLT, XPath, and XQuery. One of the aims is to provide improved capabilities for processing of JSON, and the associated constructs in the data model such as maps and arrays. One of the development threads is conversion between XML, JSON, and other data formats such as HTML and CSV. This paper looks at one particular aspect of the proposals, a new function for XML-to-JSON conversion.

Kay, Michael. "Schema-Aware Conversion of XML to JSON." Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). https://doi.org/10.4242/BalisageVol28.Kay01.

Ambiguity in iXML: And How to Control It

Norm Tovey-Walsh. Presented at Balisage 2023, Washington DC.

Humans are really good at resolving ambiguities. Our senses are trained for it: is that pattern of shadows in the forest dappled sunlight, or a tiger waiting to pounce? Our minds quickly and almost effortlessly adjust interpretations based on contextual clues that change over time. Parsers? Not so much. Our everyday languages and formats: XML, JSON, JavaScript, Java, etc. are rigorously defined to avoid ambiguity: you must put a quote here, a semicolon there. (Most) parsers reject anything that cannot be unambiguously identified within a small textual window. Invisible XML is an uncommon format in that it doesn’t reject grammars or parses that are ambiguous. That doesn’t mean ambiguity is a good thing, and it doesn’t mean authors wouldn’t like to control it.

Tovey-Walsh, Norm. "Ambiguity in iXML: And How to Control It." Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). https://doi.org/10.4242/BalisageVol28.Tovey-Walsh01.

XSLT Extensions for JSON Processing

Michael Kay. Presented at Balisage 2022, Washington DC.

XSLT 3.0 contains basic facilities for transforming JSON as well as XML. But looking at actual use cases, it's clear that some things are a lot harder than they need to be. How could we extend XSLT to make JSON transformations as easy as XML transformations, using the same rule-based tree-walking paradigm? Some of these extensions are already implemented in current Saxon releases, so we are starting to get user feedback.

Kay, Michael. "XSLT Extensions for JSON Processing." Presented at Balisage: The Markup Conference 2022, Washington, DC, August 1 - 5, 2022. In Proceedings of Balisage: The Markup Conference 2022. Balisage Series on Markup Technologies, vol. 27 (2022). https://doi.org/10.4242/BalisageVol27.Kay01.

Designing for change: Pragmas in Invisible XML as an extensibility mechanism

Norm Tovey-Walsh, Tomos Hillman, C. M. Sperberg-McQueen and Bethan Tovey-Walsh. Presented at Balisage 2022, Washington DC.

Invisible XML (ixml) is a method for treating non-XML documents as if they were XML. The 1.0 specification for Invisible XML was announced in June of this year. No technology foresees all of its use cases, especially in 1.0. How can ixml allow experimentation, and channel experimentation in useful ways, to allow ideas to be expressed in ixml grammars that go beyond what is foreseen, without compromising interoperability or the value of strict conformance to the specification?

Many programming languages (C, JavaScript, Pascal, XQuery, etc.) address this question with pragmas. A pragma is a semi-formal way to instruct a processor/compiler/interpreter how it should operate. Typical pragmas extend a specification but are not a part of it. We propose pragmas as an optional add-on to ixml to allow implementation of non-standardized functionality in a way that does not interfere with standard ixml processing. We describe our general framework for pragmas, some specific pragmas (to illustrate how pragmas can be used), and a few pragmatic implementations.

Hillman, Tomos, C. M. Sperberg-McQueen, Bethan Tovey-Walsh and Norm Tovey-Walsh. "Designing for change: Pragmas in Invisible XML as an extensibility mechanism." Presented at Balisage: The Markup Conference 2022, Washington, DC, August 1 - 5, 2022. In Proceedings of Balisage: The Markup Conference 2022. Balisage Series on Markup Technologies, vol. 27 (2022). https://doi.org/10.4242/BalisageVol27.Sperberg-McQueen01.

Invisible XML Coming into Focus: Status report from the community group

Norm Tovey-Walsh, Tomos Hillman, John Lumley, Steven Pemberton, C. M. Sperberg-McQueen and Bethan Tovey-Walsh. Presented at Balisage 2022, Washington DC.

Invisible XML has had a long incubation process, but in the last year things have heated up. A W3C Community Group has been formed, the spec has been improved, and implementations have been released or are in various stages of development. This paper gives an overview of iXML in its stable version 1.0 form, with discussion of some of the design decisions that have shaped it, and accounts from implementors of their practical experiences with iXML.

Hillman, Tomos, John Lumley, Steven Pemberton, C. M. Sperberg-McQueen, Bethan Tovey-Walsh and Norm Tovey-Walsh. "Invisible XML coming into focus: Status report from the community group." Presented at Balisage: The Markup Conference 2022, Washington, DC, August 1 - 5, 2022. In Proceedings of Balisage: The Markup Conference 2022. Balisage Series on Markup Technologies, vol. 27 (2022). https://doi.org/10.4242/BalisageVol27.Eccl01.

Expression Elaboration

Michael Kay. Presented at XML Prague 2022.

This paper describes an approach to evaluation of expression-based languages such as XSLT, XQuery, and XPath, in which nodes on the expression tree output by the language parser are converted to lambda expressions in Java, JavaScript, or C#, with the aim of doing as much work as possible once only, in advance of the actual expression evaluation.

Michael Kay. "Expression Elaboration". XML Prague 2022. https://archive.xmlprague.cz/2022/files/xmlprague-2022-proceedings.pdf

ZenoString: A Data Structure for Processing XML Strings

Michael Kay. Presented at Balisage 2021, Washington DC.

This paper describes a novel data structure for the representation of Unicode strings, designed to efficiently support the usage patterns that arise when processing XML using languages such as XSLT, XPath, and XQuery.

Kay, Michael. "ZenoString: A Data Structure for Processing XML Strings." Presented at Balisage: The Markup Conference 2021, Washington, DC, August 2 - 6, 2021. In Proceedings of Balisage: The Markup Conference 2021. Balisage Series on Markup Technologies, vol. 26 (2021). https://doi.org/10.4242/BalisageVol26.Kay01.

Interactivity Three Ways

Norm Tovey-Walsh and Michael Sperberg-McQueen. Presented at Balisage 2021, Washington DC.

One of the most obvious differences between documents physically printed on pages of paper and documents displayed on electronic devices is that the latter can be interactive in ways that the former cannot. More than 50 years ago, this is what convinced Ted Nelson and others that when used well computers would dramatically change our relation with text. What kinds of interactivity are possible, and to what extent interactivity adds value to a document, are challenging questions that require careful analysis.

Deciding that some specific interactive feature would add value immediately raises a new challenge: how is that feature going to be realized? In this paper, we look at three different technologies that can be used to add interactivity to a document presented on the web: "plain old JavaScript", Saxon-JS, and XForms. We examine a specific feature and compare the differences between similar implementations across these three platforms.

Walsh, Norman, and C. M. Sperberg-McQueen. "Interactivity Three Ways." Presented at Balisage: The Markup Conference 2021, Washington, DC, August 2 - 6, 2021. In Proceedings of Balisage: The Markup Conference 2021. Balisage Series on Markup Technologies, vol. 26 (2021). https://doi.org/10.4242/BalisageVol26.Walsh01.

<transpile from="Java" to="C#" via="XML" with="XSLT"/>

Michael Kay. Presented at Markup UK 2021.

This paper describes a project to convert a substantial piece of software (an XSLT processor, as it happens, but it could have been anything) from Java to C#, using an XML representation as the intermediate format, and using XSLT as the transformation language.

Michael Kay. "<transpile from="Java" to="C#" via="XML" with="XSLT"/>". Markup UK 2021. https://markupuk.org/pdf/Markup-UK-2021-proceedings.pdf

Asynchronous XSLT

Michael Kay. Presented at Balisage 2020, Washington DC.

This paper describes a proposal for language extensions to XSLT 3.0, and to the XDM data model, to provide for asynchronous processing. The proposal is particularly motivated by the requirement for asynchronous retrieval of external resources on the Javascript platform (whether client-side or server-side), but other use cases for asynchronous processing, and other execution platforms, are also considered.

Michael Kay. "Asynchronous XSLT" Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). https://doi.org/10.4242/BalisageVol25.Kay01.

A Proposal for XSLT 4.0

Michael Kay. Presented at XML Prague 2020.

This paper defines a set of proposed extensions to the XSLT 3.0 language, suitable for inclusion in version 4.0 of the language were that ever to be defined. The proposed features are described in sufficient detail to enable the functionality to be understood and assessed, but not in the microscopic detail needed for the eventual language specification. Brief motivation is given for each feature. The ideas have been collected by the author both from his own experience in using XSLT 3.0 to develop some sizable applications (such as an XSLT compiler), and also from feedback from users, reported either directly to Saxonica in support requests, or registered on internet forums such as StackOverflow.

Michael Kay. "A Proposal for XSLT 4.0". XML Prague 2020. http://archive.xmlprague.cz/2020/files/xmlprague-2020-proceedings.pdf

<Angle-brackets/> on the Branch Line

John Lumley. Presented at Markup UK 2019.

As a retirement 'hobby', somewhat removed from the computing milieu, the author has started building a model railway in his garden. Surveying the extant tools for designing such layouts and finding them not quite right, he started building a design tool himself, using the familiar technologies of XSLT3 and SVG executing in a browser, employing Saxon-JS as the processing platform. The results of this were demonstrated, with some success, at Markup UK in 2018. This paper describes the design of this tool in some detail, as well as developments since that demonstration.

John Lumley. "<Angle-brackets/> on the Branch Line". Markup UK 2019. https://markupuk.org/Markup-UK-2019-proceedings.pdf

An XSLT compiler written in XSLT: can it perform?

Michael Kay and John Lumley. Presented at XML Prague 2019.

This paper discusses the implementation of an XSLT 3.0 compiler written in XSLT 3.0. XSLT is a language designed for transforming XML trees, and since the input and output of the compiler are both XML trees, compilation can be seen as a special case of the class of problems for which XSLT was designed. Nevertheless, the peculiar challenges of multi-phase compilation in a declarative language create performance challenges, and much of the paper is concerned with a discussion of how the performance requirements were met.

Michael Kay and John Lumley. "An XSLT compiler written in XSLT: can it perform?". XML Prague 2019. http://archive.xmlprague.cz/2019/files/xmlprague-2019-proceedings.pdf

Task Abstraction for XPath Derived Languages

Debbie Lockett and Adam Retter. Presented at XML Prague 2019.

XPDLs (XPath Derived Languages) such as XQuery and XSLT have been pushed beyond the envisaged scope of their designers. Perversions such as processing Binary Streams, File System Navigation, and Asynchronous Browser DOM Mutation have all been witnessed. Many of these novel applications of XPDLs intentionally incorporate non-sequential and/or concurrent evaluation and embrace side effects to achieve their purpose. To arrive at a solution for safely managing side effects and concurrent execution, this paper first surveys both the available XPDL vendor extensions and approaches offered in non-XPDLs, and then describes EXPath Tasks, a novel solution derived for the safe evaluation of side effects in XPDLs which respects both sequential and concurrent execution.

Debbie Lockett and Adam Retter. "Task Abstraction for XPath Derived Languages". XML Prague 2019. http://archive.xmlprague.cz/2019/files/xmlprague-2019-proceedings.pdf

An XSD 1.1 Schema Validator Written in XSLT 3.0

Michael Kay. Presented at Markup UK 2018.

This paper presents a successfully completed project to write an XSD 1.1 validator using XSLT 3.0. There are several motivations for attempting this; the most immediate was the need for a schema validator to run in the browser, and given the existence of XSLT 3.0 in the browser (in the form of Saxon-JS) writing the validator in XSLT 3.0 seems a more attractive choice than the alternative, of writing it in Javascript. The portability benefits of being able to do schema validation anywhere you can run XSLT 3.0 are an additional factor. Possibly too, wider availability of XSD 1.1 validators will encourage those who publish XML Schemas for common standard vocabularies to take advantage of the powerful features introduced in version 1.1 of the XSD standard. The second motivation was simply as a usability test of XSLT 3.0: this is a complex application, and it is useful to see whether XSLT 3.0 is up to the job.

Michael Kay. "An XSD 1.1 Schema Validator Written in XSLT 3.0". Markup UK 2018. http://markupuk.org/2018/Markup-UK-2018-proceedings.pdf

Implementing XForms using interactive XSLT 3.0

O'Neil Delpratt and Debbie Lockett. Presented at XML Prague 2018.

In this paper, we discuss our experiences in developing Saxon-Forms, a new partial XForms implementation for browsers using "interactive" XSLT 3.0, and suggest some benefits of this implementation over others. Firstly we describe the mechanics of the implementation - how XForms features such as actions are implemented using the interactive XSLT extensions available with Saxon-JS, to update form data in the (X)HTML page, and handle user input using event handling templates. Secondly we discuss how Saxon- Forms can be used, namely by integrating it into the client-side XSLT of a web application, and examples of the advantages of this architecture. As a motivation and use case we use Saxon-Forms in our in-house license tool application.

O'Neil Delpratt and Debbie Lockett. "Implementing XForms using interactive XSLT 3.0". XML Prague 2018. http://archive.xmlprague.cz/2018/files/xmlprague-2018-proceedings.pdf

XML Tree Models for Efficient Copy Operations

Michael Kay. Presented at XML Prague 2018.

A large class of XML transformations involves making fairly small changes to a document. The functional nature of the XSLT and XQuery languages mean that data structures must be immutable, so these operations generally involve physically copying the whole document, including the parts that are unchanged, which is expensive in time and memory. Although efficient techniques are well known for avoiding these overheads with data structures such as maps, these techniques are difficult to apply to the XDM data model because of two closely-related features of that model: it exposes node identity (so a copy of a node is distinguishable from the original), and it allows navi- gation upwards in the tree (towards the root) as well as downwards. This paper proposes mechanisms to circumvent these difficulties.

Michael Kay. "XML Tree Models for Efficient Copy Operations". XML Prague 2018. http://archive.xmlprague.cz/2018/files/xmlprague-2018-proceedings.pdf

Compiling XSLT3, in the browser, in itself

John Lumley. Presented at Balisage 2017, Washington DC.

This paper describes the development of a compiler for XSLT 3.0 which can run directly in modern browsers. It exploits a virtual machine written in JavaScript, Saxon-JS, which interprets an execution plan for an XSLT transform, consuming source documents and interpolating the results into the displayed web page. Ordinarily these execution plans (Stylesheet Export File, SEF), which are written in XML, are generated offline by the Java-based Saxon-EE product. Saxon-JS has been extended to handle dynamic XPath evaluation, by adding an XPath parser and a compiler from the XPath parse tree to SEF. By constructing an XSLT transform that consumes an XSLT stylesheet and creates an appropriate SEF, exploiting this XPath compiler, we have managed to construct an in-browser compiler for XSLT 3.0 with high levels of standards compliance. This opens the way to support dynamic transforms, in-browser stylesheet construction and execution, and a potential route to language-portable XSLT compiler technologies.

Lumley, John, Debbie Lockett and Michael Kay. "Compiling XSLT3, in the browser, in itself." Presented at Balisage: The Markup Conference 2017, Washington, DC, August 1 - 4, 2017. In Proceedings of Balisage: The Markup Conference 2017. Balisage Series on Markup Technologies, vol. 19 (2017). doi:10.4242/BalisageVol19.Lumley01.

Distributing XSLT Processing between Client and Server

O'Neil Delpratt and Debbie Lockett. Presented at XML London 2017.

This paper presents work on improving an existing in-house License Tool application. The current tool is a server-side web application, using XForms in the front end. The tool generates licenses for the Saxon commercial products using server-side XSLT processing. Our main focus is to move parts of the tool's architecture client-side, by using "interactive" XSLT 3.0 with Saxon-JS. A beneficial outcome of this redesign is that we have produced a truly XML end-to-end application.

O'Neil Delpratt and Debbie Lockett. "Distributing XSLT Processing between Client and Server". Presented at XML London 2017, June 10 - 11th, 2017. doi:10.14337/XMLLondon17.Lockett01.

Projection and Streaming: Compared, Contrasted, and Synthesized

Michael Kay. Presented at XML Prague 2017.

This paper describes, compares, and contrasts two techniques designed to enable an XML document to be processed without building an entire tree representation of the document in memory. Document projection analyses a query to determine which parts of the document are relevant to the query, and discards everything else during source document parsing. Streaming attempts to execute a stylesheet "on the fly" while the source document is being read. For both techniques, the paper describes the way that they are implemented in the Saxon XSLT and XQuery engine. Performance results are given that apply to both techniques, in relation to the queries in the XMark benchmark applied to a 118Mb source document. The paper concludes with a discussion of ideas for combining the benefits of both techniques and getting more synergy between them.

Michael Kay. "Projection and Streaming: Compared, Contrasted, and Synthesized". XML Prague 2017. http://archive.xmlprague.cz/2017/files/xmlprague-2017-proceedings.pdf

XPath 3.1 in the Browser

John Lumley, Debbie Lockett, Michael Kay. Presented at XML Prague 2017.

This paper discusses the implementation of an XPath 3.1 processor with high levels of standards compliance that runs entirely within current modern browsers. The runtime engine Saxon-JS, written in JavaScript and developed by Saxonica, used to run pre-compiled XSLT 3.0 stylesheets, is extended with a dynamic XPath parser and converter to the Saxon-JS compilation format. This is used to support both XSLT's xsl:evaluate instruction and a JavaScript API XPath.evaluate() which supports XPath outside an XSLT context.

John Lumley, Debbie Lockett, and Michael Kay. "XPath 3.1 in the Browser". XML Prague 2017. http://archive.xmlprague.cz/2017/files/xmlprague-2017-proceedings.pdf

Approximate CSS Styling in XSLT

John Lumley. Presented at Balisage 2016, Washington DC.

This paper discusses transforming a CSS stylesheet into an XSLT transform that projects an approximation of the styling from the CSS onto a target XML document. It was developed during several XSLT-based projects involving multi-dialect XML documents, where there was a need either to evaluate CSS properties for another external tool, such as in an HTML → XSL-FO → PDF pipeline, or where a document styling needed to be "fixed" for embedding in another document, such as examples in professional papers. The paper presents examples, explains the general architecture of the generated XSLT transform, discusses how that transform is itself constructed from the CSS stylesheet and outlines the strengths and weaknesses and some of the directions in which the tool could be developed. It is approximate in that it only supports some of the core CSS features, assumes the user is "skilled in the art" and is working with CSS stylesheets that are understood and visible, and that the execution speed of the CSS "projection" is not an issue. Nevertheless, in the author's experience the ability to mix CSS styling into the "XSLT researcher's toolbox" has proved to be of some utility.

Lumley, John. "Approximate CSS Styling in XSLT". Presented at Balisage: The Markup Conference 2016, Washington, DC, August 2 - 5, 2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). doi:10.4242/BalisageVol17.Lumley01.

Saxon-JS: XSLT 3.0 in the Browser

Debbie Lockett and Michael Kay. Presented at Balisage 2016, Washington DC.

We introduce Saxon-JS, an XSLT 3.0 run-time written in pure JavaScript. We've effectively split the Saxon product into its compile time and run time components. The compiler runs on the server, and generates an intermediate representation of the compiled and optimized stylesheet in a custom XML format. Saxon-JS, running on the browser, reads in the compiled stylesheet and executes it. We describe some particular features of Saxon-JS: the event-handling extensions to the XSLT language (as used for Saxon-CE), the way that XSLT and JavaScript can interwork, conformance to the W3C XSLT and XPath specifications, and some details of the internal implementation.

Lockett, Debbie, and Michael Kay. "Saxon-JS: XSLT 3.0 in the Browser." Presented at Balisage: The Markup Conference 2016, Washington, DC, August 2 - 5, 2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). doi:10.4242/BalisageVol17.Lockett01.

Transforming JSON using XSLT 3.0

Michael Kay. Presented at XML Prague 2016.

The XSLT 3.0 and XPath 3.1 specifications, now at Candidate Recommendation status, introduce capabilities for importing and exporting JSON data, either by converting it to XML, or by representing it natively using new data structures: maps and arrays. The purpose of this paper is to explore the usability of these facilities for tackling some practical transformation tasks. Two representative transformation tasks are considered, and solutions for each are provided either by converting the JSON data to XML and transforming that in the traditional way, or by transforming the native representation of JSON as maps and arrays. The exercise demonstrates that the absence of parent or ancestor axes in the native representation of JSON means that the transformation task needs to be approached in a very different way.

Kay, Michael. "Transforming JSON using XSLT 3.0". XML Prague 2016. http://archive.xmlprague.cz/2016/files/xmlprague-2016-proceedings.pdf

Two from Three (in XSLT)

John Lumley. Presented at Balisage 2015, Washington DC.

This paper discusses automated methods of 'downgrading' XSLT 3.0 programs into XSLT 2.0 syntax and semantics. The stimulus was running portions of a document processing system, that had been upgraded to use more coherent features of XSLT 3.0, in the environment of a browser-based standards-compliant XSLT 2.0 implementation (Saxon-CE). The work involves detailed knowledge of XSLT and is intended to automate significant sections of the 'downconversion', leaving other sections to conditional compilation directives. All conversion tools are of course written in XSLT and several aspects involve partial processing and evaluation of XSLT semantics within XSLT.

Lumley, John. "Two from Three (in XSLT)". Presented at Balisage: The Markup Conference 2015, Washington, DC, August 11 - 14, 2015. In Proceedings of Balisage: The Markup Conference 2015. Balisage Series on Markup Technologies, vol. 15 (2015). doi:10.4242/BalisageVol15.Lumley01.

Improving Pattern Matching Performance in XSLT

John Lumley and Michael Kay. Presented at XML London 2015 and again at Balisage 2015, Washington DC.

This paper discusses improving the performance of XSLT programs that use very large numbers of similar patterns in their push-mode templates. The experimentation focusses around stylesheets used for processing DITA document frameworks, where much of the document logical structure is encoded in @class attributes. The processing stylesheets, often defined in XSLT 1.0, use string-containment tests on these attributes to describe push-template applicability. For some cases this can mean a few hundred string tests have to be performed for every element node in the input document to determine which template to evaluate, which sometimes means up to 30% of the entire processing time is taken up with such pattern matching. The paper examines methods, within XSLT implementations, to ameliorate this situation, including using sets of pattern preconditions and pretokenization of the class-describing attributes. How such optimisation may be configured for an XSLT implementation is discussed.

Dr. John Lumley and Dr. Michael Kay. "Improving Pattern Matching Performance in XSLT". Presented at XML London 2015, June 6 - 7th, 2015. doi:10.14337/XMLLondon15.Lumley01.

Parallel Processing in the Saxon XSLT Processor

Michael Kay. Presented at XML Prague 2015.

One of the supposed benefits of using declarative languages (like XSLT) is the potential for parallel execution, taking advantage of the multi-core processors that are now available in commodity hardware. This paper describes recent developments in one popular XSLT processor, Saxon, which start to exploit this potential. It outlines the challenges in implementing parallel execution, and reports on the benefits that have been observed.

Kay, Michael. "Parallel Processing in the Saxon XSLT Processor". XML Prague 2015. http://archive.xmlprague.cz/2015/files/xmlprague-2015-proceedings.pdf

Analysing XSLT Streamability

John Lumley. Presented at Balisage 2014, Washington DC.

Determining streamability of constructs in XSLT 3.0 involves the application of a set of rules that appear to be complex. A tool that analyses these rules on a given stylesheet has been developed to help developers understand why sections which were designed with streaming might fail the required conditions. This paper discusses the structure of this analysis tool.

Lumley, John. "Analysing XSLT Streamability". Presented at Balisage: The Markup Conference 2014, Washington, DC, August 5 - 8, 2014. In Proceedings of Balisage: The Markup Conference 2014. Balisage Series on Markup Technologies, vol. 13 (2014). doi:10.4242/BalisageVol13.Lumley01.

Benchmarking XSLT Performance

Michael Kay and Debbie Lockett. Presented at XML London 2014.

This paper presents a new benchmarking framework for XSLT. The project, called XT-Speedo, is open source and we hope that it will attract a community of developers. The tangible deliverable consists of a set of test material, a set of test drivers for various XSLT processors, and tools for analyzing the test results. Underpinning these deliverables is a methodology and set of measurement objectives that influence the design and selection of material for the test suite, which are also described in this paper.

Dr. Michael Kay and Dr. Debbie Lockett. "Benchmarking XSLT Performance". Presented at XML London 2014, June 7 - 8th, 2014. doi:10.14337/XMLLondon14.Kay01.

Streaming in the Saxon XSLT Processor

Michael Kay. Presented at XML Prague 2014.

Streaming is a major new feature of the XSLT 3.0 specification, currently a Last Call Working Draft. This paper discusses streaming as defined in the W3C specification, and as implemented in Saxon. Streaming refers to the ability to transform a document that is too big to fit in memory, which depends on transformation itself being in some sense linear, so that pieces of the output appear in the same order as the pieces of the input on which they depend. This constraint is reflected in the W3C specification by a set of streamability rules that determine statically whether a stylesheet is streamable or not. This paper gives a tutorial introduction to the streamability rules and they way they are implemented in Saxon. It then does on to describe the implementation architecture for implementing streaming in the Saxon run-time, by means of push pipelines, and gives rationale for this choice of architecture.

Kay, Michael. "Streamability in Saxon". XML Prague 2014. http://archive.xmlprague.cz/2014/files/xmlprague-2014-proceedings.pdf

Finalising a (small) Standard

John Lumley. Presented at XML Prague 2014.

This paper discusses issues and lessons that arose during the finalisation of a standard (library) for XSLT/XPath/XQuery extension functions to manipulate binary data. This process took place during 2013 in the EXPath community, through shared (mailing-list) commenting, specification redrafting, implementation experimentation and test suite development. The purpose, form and specification of the library (which isn’t technically difficult) are described briefly. Lessons and suggestions arising from the development are presented in four broad categories: establishing policies, concurrent implementation and application, using tools and declarative approaches, and pragmatic issues. None of these lessons are new, but bear reinforcement. This work was performed under the auspices of the EXPath community and was funded by Saxonica Ltd.

Lumley, John. "Finalising a (small) Standard". XML Prague 2014. http://archive.xmlprague.cz/2014/files/xmlprague-2014-proceedings.pdf

XML on the Web: Is it still relevant?

O'Neil Delpratt. Presented at XML London 2013.

This paper discusses what is meant by the term XML on the Web and how this relates to the browser. The success of XSLT in the browser has so far been underwhelming, and it examines the reasons for this and considers whether the situation might change. It describes the capabilities of the first XSLT 2.0 processor designed to run within web browsers, bringing not just the extra capabilities of a new version of XSLT, but also a new way of thinking about how XSLT can be used to create interactive client-side applications. Using this processor, the author demonstrates as a use-case, a technical documentation application which permits browsing and searching in a intuitive way and shows its internals to illustrate how it works.

O'Neil Delpratt. "XML on the Web: Is it still relevant?". Presented at XML London 2013, June 15 - 16th, 2013. doi:10.14337/XMLLondon13.Delpratt01.

Multi-user interaction using client-side XSLT

O'Neil Delpratt and Michael Kay. Presented at XML Prague 2013.

This paper describes two use-case applications to illustrate the capabilities of the first XSLT 2.0 processor designed to run within web browsers. The first is a technical documentation application, which permits browsing and searching in a intuitive way. The second is a multi-player chess game application; using the same XSLT 2.0 processor as the first application, it is in fact very different in purpose and design in that it provides multi-user interaction on the GUI and implements communication via a social media network: namely Twitter.

O'Neil Delpratt and Michael Kay. "Multi-user interaction using client-side XSLT". XML Prague 2013. http://archive.xmlprague.cz/2013/files/xmlprague-2013-proceedings.pdf

The Effects of Bytecode Generation in XSLT and XQuery

O'Neil Delpratt and Michael Kay. Presented at Balisage 2011, Montréal.

This paper discusses highly efficient optimization of expression with XSLT and XQuery processors today and presents further speed improvements that can be gained by generating bytecode rather than interpreting queries directly. Although optimization produces the most throughput gain, the gains from optimization and bytecode generation are orthogonal, and compilation can produce about 25% gain over and above gains from optimization. Tests with two variants of a well-known XSLT/XQuery processor, one with code generation and one with optimization alone, demonstrate the effect on a range of queries.

Delpratt, O'Neil Davion, and Michael Kay. "The Effects of Bytecode Generation in XSLT and XQuery". Presented at Balisage: The Markup Conference 2011, Montréal, Canada, August 2 - 5, 2011. In Proceedings of Balisage: The Markup Conference 2011. Balisage Series on Markup Technologies, vol. 7 (2011). doi:10.4242/BalisageVol7.Delpratt01.

A Streaming XSLT Processor

XSLT transformations can refer to any information in the source document from any point in the stylesheet, without constraint; XSLT implementations typically support this freedom by building a tree representation of the entire source document in memory and in consequence can process only documents which fit in memory. But many transformations can in principle be performed without storing the entire source tree. The paper (given at Balisage 2010, Montréal) reports on the progress of the W3C XSL Working Group implementation of a new version of XSLT, designed to make streamed implementations of XSLT feasible.

Kay, Michael. "A Streaming XSLT Processor". Presented at Balisage: The Markup Conference 2010, Montréal, Canada, August 3 - 6, 2010. In Proceedings of Balisage: The Markup Conference 2010. Balisage Series on Markup Technologies, vol. 5 (2010). doi:10.4242/BalisageVol5.Kay01.

You Pull, I’ll Push: On the Polarity of Pipelines

This paper (given at Balisage 2009, Montréal) discusses the most effective way to move XML data through a processing pipeline. It draws on the concept of program inversion, originally developed to eliminate bottlenecks in magnetic-tape-based processes, and ideas derived from Jackson Structured Programming which allow processes written in a convenient pull style to be compiled into push-style code; thus potentially reducing both coordination overhead and latency.

Kay, Michael. "You Pull, I’ll Push: on the Polarity of Pipelines". Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:10.4242/BalisageVol3.Kay01.

Ten Reasons Why Saxon XQuery is Fast

A paper written for the IEEE Data Engineering Bulletin, included in a special issue published in December 2008 and devoted to papers on the state-of-the-art in XQuery implementation. Most of what the paper says is of course equally applicable to XSLT.

Writing an XSLT Optimizer in XSLT

This paper (given at Extreme Markup 2007) explores the possibility that since query optimization is an exercise in transforming expression trees, and XSLT is a language for transforming trees, it ought to be possible to write an optimizer in XSLT. (The rendition of the paper is poor because it has been only partially recovered after IDEAlliance, the conference organizers, withdrew their public archive of the conference proceedings.)

C24 White Paper: Using XQuery with Financial Messages

Back in 2006-7, Saxonica collaborated with C24 to enable Saxon to be used as the query engine within the C24 Integration Objects product. (The company was subsequently acquired by Iona, which in turn was acquired by Progress, but it is now independent again and trading under its old name. In 2013 we've resumed the collaboration and hope to move the technology forward to take advantage of all the things that have happened in Saxon in the meantime.) This May 2007 paper describes how such an integration enables XQuery to be used to access non-XML data such as SWIFT financial messages, and to convert data between different formats.

Positional Grouping in XQuery

Published at the XIME-P 2006 XQuery workshop at the SIGMOD Conference in Chicago, this paper proposes an extension to XQuery to handle positional grouping problems, derived from experience with the xsl:for-each-group construct in XSLT 2.0.

Using XSLT and XQuery for Life-Size Applications

This paper discusses the role of the XSLT 2.0 and XQuery 1.0 languages when it comes to writing real-life, sizeable applications for performing data transformations: especially factors such as error handling, debugging, performance, reuse and customization of code, relationships with XML Schema and other technologies such as XForms, and the use of pipeline-based application architectures.

Comparing XSLT and XQuery

This paper by Michael Kay was presented at XTech 2005 in Amsterdam. It compares XSLT and XQuery not just using a blow-by-blow feature comparison, but an assessment of the suitability of the languages for different tasks, and the kinds of users the two languages are aimed at.

Up-Conversion using XSLT 2.0

This paper by Michael Kay was presented at XML 2004 in Washington DC. By means of a case study, it shows how some of the new features in XSLT 2.0 (notably the grouping instructions and the facilities for handling regular expressions) make XSLT 2.0 suitable for applications such as up-conversion (creating structured XML from unstructured input) that were quite infeasible in XSLT 1.0.

XSLT and XPath Optimization

This paper by Michael Kay, presented at XML Europe 2004 in Amsterdam, looked at the techniques used inside an XSLT processor (Saxon, of course!) to optimize performance. It described some of the techniques actually used in the Saxon processor, and surveyed other ideas coming from academia.

XML Five Years On: a review of the achievements so far and the challenges ahead

Keynote address given by Michael Kay at the Document Engineering 2003 Conference in Grenoble, France.

XML & Co. - was bringt die Zukunft?

Article in ComputerWoche (in German): XML begann als "SGML light" und sollte sich vor allem durch Einfachheit auszeichnen. Eine Reihe von Zusatzstandards erhöhten aber zwischenzeitlich die Komplexität beträchtlich. Während der Kernstandard weitgehend stabil bleibt, stehen in anderen Bereichen größere Änderungen bevor.

Saxon: Anatomy of an XSLT Processor

This paper by Michael Kay, although published as long ago as 2001, remains a frequently cited description of how XSLT processing in a product like Saxon actually works.

What kind of a language is XSLT?

This paper by Michael Kay, published at the same time as the one above, gives an overview of the capabilities of the XSLT language.

Return to top of page

Articles written for Stylus Studio

Saxonica has a close working relationship with the Stylus Studio team: Stylus Studio was the first XML development environment to offer Saxon-SA as a standard feature. As part of this collaboration, we wrote a regular column for their web site. The following articles have been published:

Return to top of page

Demonstrations

In some of my tutorials and seminars I use a genealogy application to illustrate the features of XSLT 2.0. The files for this demonstration are available for download.

Return to top of page

On this page:

Publications

Blog

Books

Podcasts

Published papers and articles

Articles written for Stylus Studio

Demonstrations