XML Schema 1.0 implementation
The command line interface com.saxonica.Validate
has been completely redesigned, allowing
multiple schema documents to be loaded and multiple instance documents to be validated.
This release of Saxon introduces preliminary support for assertions in a schema, based on the
current (31 August 2006) draft of XML Schema version 1.1. This allows a complex type to contain an assertion
about the content of the corresponding element expressed as an arbitrary XPath 2.0 expression. Please note that this
facility in the Working Draft is likely to change, and the Saxon implementation will change accordingly. For further
details see
The XML Schema specification imposes a rule that when one type R is derived from another type B by restriction, then every element particle ER in the content model of R must be compatible with the corresponding element particle EB in B. One aspect of this is that the identity constraints defined in the declaration of ER (that is, unique, key, and keyref) must be a superset of the constraints defined for EB. The specification doesn't say how to decide whether two constraints are equivalent for this purpose, and Saxon has previously ignored this requirement. At this release a check is introduced which partially implements the rule. Specifically, Saxon will count the number of constraints that are defined, and will report an error if EB has more constraints of any particular kind (unique, key, or keyref) than ER has. If EB has at least one constraint and ER has one or more, then Saxon will output a warning saying that it was unable to check whether the constraints were compatible with each other.
It is now possible when requesting validation of an instance to specify the required name of the top-level element
in the document being validated. This is possible through the option -top:clarkname
on the
com.saxonica.Validate
command, or via a new property on the AugmentedSource
object.
The property is also available on the DocumentBuilder
in the .NET API and in the new s9api Java API.
A validation error occurs if the document being validated has a top-level element with a different name.
I discovered that Saxon allows you to use the types xs:dayTimeDuration
and xs:yearMonthDuration
in a schema as built-in types. XML Schema 1.0 doesn't recognize these types (though I can't find a rule that says it is
absolutely non-conformant to accept them). I have changed the code to give an interoperability warning if they are
used. I have also disallowed the use of the type xs:anyAtomicType
, which has no defined validation
semantics.
The mechanisms for comparing values in the course of schema validation and processing have now been separated completely from the mechanisms used when implementing XPath operators. This means that the semantics of comparison and ordering should now follow the XML Schema specification precisely. Previously some operations were implemented according to the XPath semantics.
A duplicate xsi:schemaLocation
or xsi:noNamespaceSchemaLocation
attribute is now
ignored (previously it was rejected under the rule that such an attribute cannot appear after the first element
in the relevant namespace). Duplicates can arise naturally from XInclude processing, so they are now accepted
and ignored. The schema specification permits this but does not require it. To be considered duplicates, the
declarations must match in the namespace URI and in the absolutized schemaLocation URI.
Result tree validation
Saxon now does more extensive compile-time checking where an xsl:document
or xsl:result-document
instruction requests validation of the result tree. This means that validation errors that were previously detected
at stylesheet execution time are now sometimes detected at compile time. Previously these checks were only done when
validation was requested on an element-constructor instruction.
Expansion of attribute and element defaults
When the input or output of a query or transformation is validated, it is now possible to request that fixed and default
element and attribute values defined in the schema should not be expanded. This is done using the option -expand:off
on the command line, or equivalent options in the TransformerFactory
and Configuration
APIs.
The same option also applies to DTD-based attribute default expansion, provided that the XML parser reports sufficient information to the application.
Serializing a Schema Component Model
It is now possible to export the contents of the schema cache held in the Configuration
object to an XML file (with the conventional extension .scm
for Schema Component Model). The contents
can subsequently be reloaded. This is faster than reloading the original source schema documents,
because it allows most of the validation to be skipped, along with the sometimes expensive operation of constructing
and determinizing finite state machines. This facility is intended to be used in conjunction with XQuery
Java code generation: it allows the schemas that were imported by a compiled query to be saved on disk alongside
the compiled query itself, for rapid reloading at run time.
The serialized SCM file is also designed to be easy for applications to process. The representation of schema components is more uniform than in source .xsd documents (there are fewer defaults, and fewer alternative ways of expressing the same information). This makes it a suitable representation for applications that need to process or analyze schema information, as an alternative to using the Java API.
assert
and report
elements threatened to make this even more complex. So a simple XSLT transformation was written to take the finite state
machines in the SCM version of the schema-for-schemas and generate Java code from them. This means that Saxon's schema validation
logic is now derived directly from the published schema-for-schemas, while retaining the efficiency of hard-coded Java.Changes to the Schema Component Model API
Changes have been made to the API for the schema component model (package com.saxonica.schema
)
to align it more closely with the abstract model defined in the W3C specifications.
All named components now consistently expose
methods getName()
and getTargetNamespace()
to provide access to the local part of the name and the namespace URI respectively.
The wide variety of existing names for these accessors have been retained for the
time being as deprecated methods. The new names are chosen because they correspond
to the names used for these properties in the W3C schema component model.
The class FacetCollection
has disappeared; its functionality has been merged into UserSimpleType
.
The class Compositor
has been renamed ModelGroup
, and its subclasses
such as ChoiceCompositor
have been renamed accordingly. In the W3C schema model, the
compositor (all, choice, sequence) is one of the properties of the ModelGroup
. This is now
available using the method getCompositorName()
on the ModelGroup
object.
Particle
is now an abstract class rather than an interface, and the previous
abstract class AbstractParticle
no longer exists. There are three subclasses of
Particle
, namely ElementParticle
, ElementWildcard
,
and ModelGroupParticle
. This means there is now a destinction between the ModelGroupParticle
,
which represents a reference to a ModelGroup
, and the ModelGroup
itself.
The class ModelGroupDefinition
(which represents a named model group) no longer
implements Particle
; it is now a subclass of ModelGroup
.
The class ModelGroupParticle
replaces GroupReference
; it is no longer
necessarily a reference to a (named) ModelGroupDefinition
, but now can be a reference
to any (named or unnamed) ModelGroup
.
ElementWildcard
and AttributeWildcard
are no longer subclasses
of Wildcard
; instead Wildcard
is now a helper class to which these
two classes delegate. Instead, ElementWildcard
is now a subclass of Particle
.
The getTerm()
method of ElementWildcard
returns the Wildcard
object
(previously it returned the ElementWildcard
object itself).
The use of exceptions SchemaException
and ValidationException
has been made
more consistent. A SchemaException
indicates that the schema is invalid, and should occur only
while the schema is being loaded and validated. A ValidationException
indicates that an instance
document is invalid against the schema, and should occur only during instance validation. Errors relating to the
consistency of a stylesheet or query against a valid schema should result in an XPathException
being thrown.
An inconsistency in the schema found during instance validation is an internal error, and should result in an
IllegalStateException
, except for unresolved references to missing schema components (which is defined
in the schema spec not to constitute a schema invalidity), which results in an UnresolvedReferenceException
.
Because it can occur almost anywhere, UnresolvedReferenceException
is an unchecked exception.