XML Schema 1.0 implementation
Saxon now recognizes an xs:import
of the XML namespace, and no longer requires a schema location
to be provided: the relevant schema components are constructed automatically. (However, if a schema location is
supplied, then it is used.)
Durations can now be validated against value range facets (minInclusive
, minExclusive
,
maxInclusive
, maxExclusive
. The algorithm for comparing durations that mix year/month and day/time
components (for example 1 year compared with 365 days) is not precisely the same as the one in the XML Schema specification
in a few edge cases.
A pattern facet is now checked against the canonical lexical representation of the value, as defined in XML Schema part 2. Previously it was checked against the result of casting the value to a string according to the XPath rules. This makes a difference for the types xs:decimal, xs:float, and xs:double, where the two specifications differ.
In most cases Saxon now uses the correct schema-defined semantics when comparing atomic values, for example in evaluating facets and in testing identity constraints. Previously Saxon used the XPath semantics. This means, for example, that when handling identity constraints the xs:integer 1 and the xs:double 1 are no longer considered equal.
Erratum E2-25 to XML Schema Part 2 has been implemented. This erratum changes the validation
rules for the xs:language
data type.
The rules for escaping of hyphens in regular expressions have changed. The rules in the specification are
still unclear, but Saxon was disallowing some cases which clearly should be allowed, like the subtraction
[\c-[X]]
. The rules are now that within square brackets, an unescaped hyphen is taken as representing itself
(that is, it matches a hyphen in the input) if it appears as the first character, or is followed by ']', or if
it immediately follows a character range. (Thus [A-Z-0-9]
allows A-Z, 0-9, or hyphen). It is taken
as a subtraction operator only if followed by '['. The new rules affect XPath as well as XML Schema.
Saxon now enforces the constraint defined in the XML Schema specification that in a hierarchy of types, an element or attribute cannot be dropped at one level (by restriction) and then re-introduced at a deeper level (by extension) with an incompatible type.
XPath expressions used in identity constraints in a schema are now statically type-checked; this means that most errors in defining the path (for example, incorrectly spelt element names) will now result in a warning message. (This is a warning rather than an error because such path expressions are not disallowed by the XML Schema specification)
Running the new W3C XML Schema Test Suite (some 40,000 test cases) revealed a number of cases where Saxon was doing insufficient checks of schema or document validity. These cases have been fixed. The main ones are:
- Additional cross-checks have been implemented between different facets on the same simple type, for example checking that minExclusive is less than maxExclusive, and that fractionDigits does not exceed totalDigits
- Saxon now checks that an element cannot have two ID attributes where one or both is defined as an ID by matching an xs:anyAttribute wildcard
- Saxon now treats a "prohibited" attribute use as if it were not present at all in the schema component model
- There are now more checks on the validity of
xs:annotation
elements (and their children and attributes) - Whitespace is now consistently trimmed before checking ID/IDREF constraints
- Saxon was correctly disallowing
xsi:nil="true"
on a non-nillable element; it should also disallowxsl:nil="false"
- The
block
attribute on an element declaration was being checked but had no effect. -
xsi:nil
should be allowed but ignored when validating against a type rather than against an element declaration - A notation declaration must have a name, and either a system identifier or a public identifier
- Attributes of elements in a schema document must not be in the XML Schema namespace.
- Values of fixed attributes were not being passed through to a query or stylesheet
- There is a rule that the fields in an identity constraint must have a simple type; Saxon was checking only that the field had no child elements. In fact the test suite also lets through fields having a complex type with simple content, and after raising a bug report on the spec I have written the code on the basis that this is the intended meaning.
- The whiteSpace facet on a union type should be preserve rather than collapse.
- Saxon now disallows deriving a simple type as a direct restriction of xs:simpleType. This is another area where the spec is unclear, but this follows the practice of other schema validators.
- A union or list type can be validly derived from xs:anyType; this was previously disallowed
- Previous releases failed to detect one case of invalid complex type derivation, namely cases where the subtype differed from the supertype only in allowing an empty content model
- In previous releases, the algorithm for verifying that one content model was validly derived from another, in cases where one (but not both) models used xs:all, was incorrect. (This fix has required a substantial amount of new code).
- Documents using xsi:nil in conjunction with an xs:all content model were being reported as invalid.
- In testing whether one content model subsumes another, the algorithm needs to take account of the nillability and fixed value constraints of the element particles.
- The specification (almost certainly unintentionally) does not say that it is an error to derive a complex type with <complexContent> by extension from a complex type with simple content, provided that the content model of the extension is locally empty; the result is in fact a complex type with simple content.
- A bug was found in the Thompson and Tobin (2003) algorithm for determining type subsumption. The algorithm checks that every path from an initial state to a final state in the derived type corresponds to a path starting at an initial state in the base type, but it fails to check that this path ends in a final state in the base type.
- Within an
xs:all
group, specifyingmaxOccurs="0"
on an element particle had no effect. - Leading and trailing whitespace was not being trimmed from names in a schema document.
- When
processContents="strict"
is specified on an element wildcard, it is not necessary that there should be a global element declaration for the element found in the instance; it is acceptable as an alternative for the element to have anxsi:type
attribute. - The rules for testing whether two element wildcards overlapped did not cover all cases,
in particular the case where both wildcards specify
##other
, but from different target namespaces - Previous releases ignored fixed and default values, and also the
xsi:nil
attribute, on elements having the typexs:anyType
. - An xsi:schemaLocation or xsi:noNamespaceSchemaLocation attribute is now disallowed if it appears after the first element or attribute in the specified namespace (or non-namespace).
- Rules concerning use of the
whiteSpace
facet are more strictly applied. - The rules to test whether wildcards overlap have been refined.
- Code has been added to check that a non-self-referential redefined model group definition is a valid restriction of its alter ego.
- Code has been added to check that an attribute group, model group, or type is not redefined more than once in the
same
xs:redefine
element (there's no clear ban on this in the spec, but the effect would be incoherent). - Pattern and enumeration facets defined on a union type are now checked (previous releases allowed the facets to be defined but ignored them during validation)
- The length, minLength, and maxLength facets on a QName or NOTATION are now ignored.
- Leading and trailing whitespace is now allowed in a value of type xs:QName or xs:NOTATION (including the value of xsi:type)
- If xsi:type is used, and the element in question has a fixed or default value that comes into play, the fixed or default value is now checked for validity against the requested xsi:type
- Fixed and default values are now recognized in the case of an element that has an empty complex content model but allows mixed content
Internally, Saxon now maintains two copies of the content model of a complex type: the version that corresponds to the component model as defined in the XML Schema specification, and a simplified version in which group references are expanded and pointless particles are eliminated.