Handling minOccurs and maxOccurs
Prior to release 9.1, Saxon used the validation algorithms described in
Thompson and Tobin 2003.
This algorithm can be very inefficient when large bounded values of minOccurs and maxOccurs are used
in a content model; indeed, it can be so inefficient that the finite state machine is too large to fit in
memory, and an OutOfMemory exception occurs.
From Saxon 9.1, many common cases of minOccurs and maxOccurs are handled using a finite state machine
that makes use of counters at run-time. This eliminates the need to have one state in the machine for each
possible number of occurrences of the repeating item. Instead, counters are maintained at run-time and
compared against the minOccurs and maxOccurs values.
This technique is used under the following circumstances:
Either minOccurs > 1, or maxOccurs > 1 (and is not unbounded), or both
The minOccurs/maxOccurs values must be defined on an element (xs:element) or wildcard (xs:any) particle
If the repeating particle is vulnerable, then it must not be part of a model group that is itself
repeatable. A particle is vulnerable if it is part of a choice group, or if it is part of a sequence group in which all
the other particles are optional or emptiable, except in the case where minOccurs is equal to maxOccurs. The reason
for this restriction is that in such situations there are two nested repetitions, and it is ambiguous whether a new instance
of the repeating term should
be treated as a repetition at the inner level or at the outer level.
In cases where counters cannot be used, Saxon will still attempt to compile a finite state machine, but will use
configuration-defined limits on minOccurs and maxOccurs to approximate the values requested. If the values used in the schema
exceed these limits, Saxon will therefore
approximate by generate a schema that does not strictly enforce the specified minOccurs and maxOccurs.
The default limits are 100 and 250 respectively. Different limits can be set on the command line or via the
Java API on the
Configuration object. Note however that when several nested repeating groups are
defined it is still possible for out-of-memory conditions to occur, even with quite modest values of minOccurs