Directories as collections
If the URI passed to the collection()
function (still assuming a default
CollectionFinder
) identifies a directory, then the contents of the
directory are returned. Such a URI may have a number of query parameters, written in the form
file:///a/b/c/d?keyword=value;keyword=value;...
. The recognized keywords and
their values are as follows:
keyword |
values |
effect |
recurse |
yes | no (default no) |
Determines whether subdirectories are searched recursively. |
strip-space |
yes | ignorable | no |
Determines whether whitespace text nodes are to be stripped. The default depends on the Configuration settings. |
validation |
strip | preserve | lax | strict |
Determines whether and how schema validation is applied to each document. The default depends on the Configuration settings. |
select |
file name pattern ("glob") |
Determines which files are selected (see below). |
match |
regular expression |
Determines which files are selected (see below). |
content-type |
media type (for example |
Determines how the resource is processed. For example if the media type is
If this parameter is absent, then the CollectionFinder attempts to discern the content type first by looking at the file extension, and then, if necessary, by examining the initial bytes of the content itself. The set of content types that are recognized, and their mapping to implementations of the
class ResourceFactory, is defined in the
Configuration, and can be changed using the
method Available from Saxon 10.1. |
metadata |
yes | no |
If set to yes, the item returned by the The value of the "fetch" entry is a function that can be called to retrieve the
content (it returns the same item that would have been returned with the default
setting of
Failures in parsing a resource can be trapped by using try/catch around the call on
the Other entries in the returned map represent properties of the file obtained from the
operating system: for example |
on-error |
fail | warning | ignore |
Determines the action to be taken if one of the files cannot be successfully parsed. |
parser |
Java class name |
Class name of the Java |
xinclude |
yes | no |
Determines whether XInclude processing should be applied to the selected documents. This overrides any setting in the Configuration (or any command line option). |
stable |
yes | no |
Determines whether the collection is to be stable. |
The pattern used in the select
parameter can use glob-like syntax, for example
*.xml
selects all files with extension "xml". More generally, the pattern is
converted to a regular expression by prepending "^
", appending "$
",
replacing ".
" by "\.
", "*
" by
".*
", and "?
" by
".?
", and it is then used to match the file names appearing in the directory
using the Java regular expression rules. So, for example, you can write
?select=*.(xml|xhtml)
to match files with either of these two file extensions.
Note however, that special characters used in the URL (that is, characters such as backslash
and curly braces that are not allowed in the query part of a URI) must be escaped using
the %HH convention. For example,
vertical bar needs to be written as %7C
. This escaping can be achieved using the
encode-for-uri()
function.
As an alternative to the select
parameter, the match
parameter
can be used. This accepts a standard XPath 3.1 regular expression as its value. For example,
.+\.xml
selects all files with extension "xml". Again, characters that are not allowed
in the query part of a URI, such as backslash, curly braces, and vertical bar, must be escaped
using the %HH convention, which can be achieved using the encode-for-uri() function.
A collection read in this way is not stable by default. (Stability can be expensive, and is
rarely required, so the default setting is recommended.) Making a collection stable has the
effect that the entire result of the collection()
function is retained in a cache
for the duration of the query or transformation, and any further calls on
collection()
with the same absolute URI return this saved collection retrieved
from this cache.