Whitespace stripping

A number of factors combine to determine whether whitespace-only text nodes in the source document are visible to the user-written XSLT or XQuery code.

By default, if there is a DTD or schema, then ignorable whitespace is stripped from any source document loaded from a StreamSource or SAXSource. Ignorable whitespace is defined as the whitespace that appears separating the child elements in elements declared to have element-only content. This whitespace is removed regardless of any xml:space attributes in the source document.

It is possible to change this default behavior in several ways.

From the Query or Transform command line, options are available: -strip:all strips all whitespace text nodes, -strip:none strips no whitespace text nodes, and -strip:ignorable strips ignorable whitespace text nodes only (this is the default).
If the -p option is used on the command line, then query parameters are recognized in the URI passed to the document() or doc() function. The parameter strip-space=yes strips all whitespace text nodes, strip-space=no strips no whitespace text nodes, and strip-space=ignorable strips ignorable whitespace text nodes only. This overrides anything specified on the command line.
Options corresponding to the above can also be set on the TransformerFactory object or on the Configuration. These settings are global.

Whitespace stripping that is specified in any of the above ways does not occur only if the source document is parsed under Saxon's control: that is, if it is supplied as a JAXP StreamSource or SAXSource. It also applies where the input is supplied in the form of a tree (for example, a DOM). In this case Saxon wraps the supplied tree in a virtual tree that provides a view of the original tree with whitespace text nodes omitted.

This whitespace stripping is additional (and prior) to any stripping carried out as a result of the xsl:strip-space declaration in the stylesheet.

Saxon never modifies a supplied tree in situ: if a tree is supplied as input, and the stylesheet requests space stripping, then a virtual tree is created and whitespace is stripped on the fly as it is navigated. This is expensive (it can add 25% to processing time); it is therefore best to supply lexical XML as input to a transformation, so that Saxon can strip unwanted whitespace while the tree is being parsed and built.