Handling minOccurs and maxOccurs
Prior to release 9.1, Saxon used the validation algorithm described in Thompson and
Tobin 2003. This algorithm can be very inefficient when large bounded values of
minOccurs
and maxOccurs
are used in a content model; indeed,
it can be so inefficient that the finite state machine is too large to fit in memory, and
an OutOfMemory exception occurs.
Since Saxon 9.1, many common cases of minOccurs
and maxOccurs
are
handled using a finite state machine that makes use of counters at run-time. This
eliminates the need to have one state in the machine for each possible number of
occurrences of the repeating item. Instead, counters are maintained at run-time and
compared against the minOccurs
and maxOccurs
values.
This technique is used under the following circumstances:
-
Either
minOccurs
> 1, ormaxOccurs
> 1 (and is not unbounded), or both. -
The
minOccurs
/maxOccurs
values must be defined on an element (xs:element
) or wildcard (xs:any
) particle. -
If the repeating particle is vulnerable, then it must not be part of a model group that is itself repeatable. A particle is vulnerable if it is part of a choice group, or if it is part of a sequence group in which all the other particles are optional or emptiable, except in the case where
minOccurs
is equal tomaxOccurs
. The reason for this restriction is that in such situations there are two nested repetitions, and it is ambiguous whether a new instance of the repeating term should be treated as a repetition at the inner level or at the outer level.
In cases where counters cannot be used, Saxon will still attempt to compile a finite state
machine, but will use configuration-defined limits on minOccurs
and
maxOccurs
to approximate the values requested. If the values used in the
schema exceed these limits, Saxon will therefore approximate by generate a schema that does
not strictly enforce the specified minOccurs
and maxOccurs
. The
default limits are 100 and 250 respectively. Different limits can be set on the command
line or via the Java API on the Configuration
object. Note however that when
several nested repeating groups are defined it is still possible for out-of-memory
conditions to occur, even with quite modest values of minOccurs
and
maxOccurs
.