XML Schema 1.0 implementation
Saxon now implements enumeration facets on union and list types as the authors of the specification intended. Although the spec as written has problems (bug 5328 has been raised), the intent is that the enumeration facet as written should be interpreted as an instance of the type being restricted. Previously enumeration facets on union and list types were doing a string comparison on the lexical value.
The reporting of keyRef
validation errors has been improved. Multiple errors can now be reported in a single
schema validation run, and the line number given with the error message reflects the location of the unresolved
keyRef
value, rather than the end of the document as before.
A new configuration option is available to control whether the schema processor takes notice (and attempts to
dereference) xsi:schemaLocation
and xsi:noNamespaceSchemaLocation
attributes encountered
in an instance document that is being validated. This is available as the named property FeatureKeys.USE_XSI_SCHEMA_LOCATION
on the TransformerFactory
and Configuration
classes, via methods on the S9API and .NET
SchemaValidator
classes, and the XQJ class SaxonXQDataSource
, and via the -xsiloc
option on the command line interfaces Validate
, Transform
, and Query
.
New methods have been added to class com.saxonica.schema.SchemaCompiler
to allow setting of "deferred validation
mode". In this mode a sequence of calls on readSchema()
can be made, followed by a single call on compile()
.
The effect is to defer all generation of the finite state machines used for run-time validation until compile()
is
called. This avoids repeated (and wasted) recompilation of complex types every time new elements are added to a substitution
group, or every time a new complex type is derived by extension from an existing type. This facility was developed with
XBRL as the primary use case, and has the effect of reducing compilation time for this collection of schema documents from
400 seconds to 560 milliseconds.
When minOccurs
and numeric maxOccurs
constraints (other than 0, 1, or unbounded)
appear on an element or wildcard particle, Saxon now implements a finite state machine using simple counters to count the
number of occurrences, rather than "unfolding" the FSM as previously. This removes the limits on the values
of minOccurs
and maxOccurs
, as well as the cost in time and memory of handling large
finite values of minOccurs
and maxOccurs
. The unfolding technique is still used when
minOccurs
and maxOccurs
appear on other kinds of particle, specifically on
sequence or choice groups, or when "vulnerable" repeated element and wildcard particles appear within a model group that
can itself be repeated (a particle is vulnerable if all the other particles in the model group are optional).
A side-effect of this change is that the diagnostics are more specific when a validation failure occurs.
Another side-effect, hopefully temporary, is that some rather artificial type derivations are no longer allowed: specifically those where a wildcard with maxOccurs in the base type is specialized to a sequence of specific element particles in the derived type