Arrays in XPath

XPath 3.1 introduces arrays as a new data type. (Unlike maps, arrays are not defined for use in XSLT 3.0 unless XPath 3.1 is supported. But Saxon's XSLT 3.0 implementation always supports XPath 3.1.)

The main reason arrays were introduced was to allow JSON data structures to be represented faithfully. However there are many other ways you can take advantage of arrays.

Arrays differ from sequences in a number of ways:

Arrays can be nested (an array can have other arrays as its members). In fact the members of an array can be arbitrary XDM values, including nodes, atomic values, sequences, functions, and maps. A member of an array can also be an empty sequence.
An array is a single item, so you can have a sequence of arrays.

Beware: because an array is a single item, $array[1] returns the entire array, not the first member of the array. Similarly, for $x in [1, 2, 3] return $x+1 throws a type error, because $x is bound to the entire array, not to each of its members in turn.
To access the Nth member of an array, use the syntax $array($N). This works because an array is a function: given an integer $N as the argument, this function returns the member in position $N.
If you know which member you want statically, you can also use the syntax $array?1 which is equivalent to $array(1).
As with sequences, the members of an array are addressed by integers in the range 1 to N, where N is the length of the array. However, unlike sequences, any attempt to access a member using an out-of-range index is a dynamic error (FOAY0001).
Operators and functions designed to work on sequences, such as count(), for expressions, filter expressions, head(), tail(), remove(), insert() and so on, do not work as you might expect on arrays. They don't return an error, because an array is a sequence of length one, but they don't return the result you might have expected. Instead, there is a library of functions available to perform analogous operations on arrays.

However, operators designed to work on sequences of atomic values will also produce useful results when applied to arrays of atomic values. This is because atomizing an array produces the corresponding sequence. So, for example, sum([1,2,3,4]) returns 10, as does sum([[1,2], [3,4]]).

Arrays, like all other XDM values, are immutable. When you append or replace or remove a member entry in an array, you get a new array; the original is unchanged. From Saxon 9.9, the implementation uses a "persistent immutable" data structure under the covers, to ensure that making a small change to an array (such as replacing a single member) does not require copying the entire array.

As with sequences and maps, arrays do not have an intrinsic type of their own, but rather have a type that can be inferred from what they contain. An array conforms to the type array(T) if all of its members are of type T. For example if the members are all strings, then the array conforms to the type array(xs:string).

There are several ways to create an array:

If the number of members is known, you can use the constructor syntax [ value, value, value ]. Here the values can be any "simple expression" (an expression not containing a top-level comma). If the values are all known statically, you might write: [1, 2, "a", "b"]. You can use this construct anywhere an XPath expression can be used, for example in the select attribute of an xsl:variable element. The members do not need to be single items: for example [(), 1, 2, 5 to 10] constructs an array with four members, the members being sequences of length 0, 1, 1, and 6 respectively. The construct [] returns an empty array.
If the number of members is unknown, but if you know that each member of the array will be a single item, you can use the construct array{ value } where value is any sequence. For example, array{(), 1, 2, 5 to 10} constructs an array with eight members, these being the single integers 1, 2, 5, 6, 7, 8, 9, and 10.

There are no XSLT 3.0 instructions for creating arrays, analogous to the instructions xsl:map and xsl:map-entry. Saxon however fills the gap with the instructions saxon:array and saxon:array-member: see Extension instructions.

The summary of the full list of functions that operate on arrays is as follows; for full details see the Functions Library. The prefix array represents the namespace URI http://www.w3.org/2005/xpath-functions/array.

array:append: adds one new member to the end of an array. For example, array:append([], 5) returns [5].
array:filter: returns those members of an array that satisfy some condition, expressed as a function. For example, array:filter([1 to 5], function($x){$x mod 2 = 1}) returns [1, 3, 5].
array:flatten: replaces an array by the sequence-concatenation of its members, recursively. For example, array:flatten(([1], [2 to 4], [3, [4, 5]])) returns (1, 2, 3, 4, 3, 4, 5).
array:fold-left: applies a function cumulatively to successive members of the array: each call is applied to two arguments, being the value so far, and the next member of the array. For example, array:fold-left(array{1 to 5}, 0, function($x, $y){$x + $y}) returns 15, while array:fold-left(array{1 to 3}, [], function($x, $y){[$x, $y]}) returns [[[[], 1], 2], 3].
array:fold-right: applies a function cumulatively to successive members of the array: each call is applied to two arguments, being the next member of the array, and the result of applying the function to the remainder of the array. For example, array:fold-right(array{1 to 5}, 0, function($x, $y){$x + $y}) returns 15, while array:fold-right(array{1 to 3}, [], function($x, $y){[$x, $y]}) returns [1, [2, [3, []]]].
array:for-each: applies a function to each member of an array, and returns an array containing the results. For example, array:for-each(array{1 to 5}, function($x){$x + 1}) returns [2, 3, 4, 5, 6].
array:for-each-pair: applies a function to pairs of corresponding items from two supplied arrays. For example, array:for-each-pair(['x', 'y', 'z'], [1, 2, 3], concat#2) returns ['x1', 'y2', 'z3'].
array:get: returns the member at a given position (starting from 1). For example, array:get([3, 4, 5], 2) returns 4.
array:head: returns the first member of the array. For example, array:head([1 to 5, 1 to 10] returns (1 to 5) (a sequence, not an array).
array:insert-before: inserts a new member at a given position. For example, array:insert-before([1, 2, 3, 4], 3, ()] returns [1, 2, (), 3, 4].
array:join: Joins a number of arrays end-to-end to create a new array. For example, array:join(([1], [2 to 4], [3, [4, 5]])) returns [1, (2, 3, 4), 3, [4, 5]].
array:put: replaces the member at a given position. For example, array:put([4, 5, 6], 2, 8) returns [4, 8, 6].
array:remove: removes the member at a given position. For example, array:remove([4, 5, 6], 2) returns [4, 6].
array:reverse: reverses the order of members. For example, array:reverse([[1,2], [3,4]]) returns [[3,4], [1,2]].
array:size: returns the number of members in the array. For example, array:size([[1,2], [3,4]]) returns 2.
array:sort: sorts the members of an array, either by their own atomic value, or by the result of applying a function to compute a sort key. A collation can also be specified. For example, array:sort([4, 8, 2]) returns [2, 4, 8].
array:subarray: returns members of the array starting at a specified position, either to the end of the array or up to a specified length. For example, array:subarray([1, 2, 3, 4], 2) returns [2, 3, 4], while array:subarray([1, 2, 3, 4], 2, 2) returns [2, 3].
array:tail: removes the first member from an array. For example, array:head([1 to 5, 1 to 10] returns [1 to 10].

Arrays: Technical Implementation Details

Internally Saxon (from release 9.9) uses two implementations of arrays.

The first implementation, SimpleArrayItem, is backed by a Java ArrayList<Sequence>. An array constructor (either [x,y,z] or array{a,b,c}) will always deliver a SimpleArrayItem. This is economical on storage and has good performance for accessing specific items by index position, or for scanning the entire array. It is less well suited to incremental modification.

The other implementation, PersistentArrayItem, is backed by a persistent (immutable) list implemented as a tree structure. This makes it possible to add, replace, or remove entries in constant time without copying the entire array.

The result of operations such as array:put(), array:insert-before(), and array:remove() is always a PersistentArrayItem, even if the input is a SimpleArrayItem. Converting one to the other takes time proportional to the size of the array.

The result of array:filter(), array:for-each(), array:for-each-pair(), and array:reverse() is always a SimpleArrayItem, even if the input is a PersistentArrayItem. In cases where an array goes through a construction phase and is then used heavily for retrieval-only access, it might be worth forcing it back to a SimpleArrayItem at the end of the construction phase: this can be achieved by means of a call on array:filter() with a predicate that selects all members.