In regular expressions, the rules on back-references as defined in errata E4 and E24 have now been implemented. Backreferences of more than two digits are now recognised; a back-reference is recognized as the longest sequence of digits after "\" that is either one digit or is longer than the number of preceding left parentheses; a backreference must not start with the digit zero; a backreference N is an error if it appears before the closing bracket corresponding to the Nth opening bracket.
The rules for regular expressions in the draft XML Schema 1.1 specification clarify the ways in which hyphens may be used
within a character class expression (that is, within square brackets). To implement these rules, Saxon now disallows an
unescaped hyphen at the start or end of a character range (for example [--a]
).
The option alphanumeric=codepoint
is now available in collation URIs to request alphanumeric collation (integers
embedded in the string are sorted as integers) with codepoint collation for the "alpha" parts of the string.
The collection()
function now allows directories of text files to be read, provided the text uses characters
that are legal in XML. This is achieved using the additional query parameter unparsed=yes
in the collection
URI. The resulting files are returned in the form of document nodes, each having a single text node as a child.
The platform default encoding is assumed.
Since the unparsed-text()
function is not available in XQuery, this also gives a way of reading
unparsed text files from XQuery. Simply use collection('file:/c:/my/dir/?select=filename.txt')
.