Handling minOccurs and maxOccurs
Prior to release 9.1, Saxon used the validation algorithm described in Thompson and Tobin 2003. This algorithm can be very inefficient when large bounded values of minOccurs and maxOccurs are used in a content model; indeed, it can be so inefficient that the finite state machine is too large to fit in memory, and an OutOfMemory exception occurs.
Since Saxon 9.1, many common cases of minOccurs and maxOccurs are handled using a finite state machine that makes use of counters at run-time. This eliminates the need to have one state in the machine for each possible number of occurrences of the repeating item. Instead, counters are maintained at run-time and compared against the minOccurs and maxOccurs values.
This technique is used under the following circumstances:
-
Either minOccurs > 1, or maxOccurs > 1 (and is not unbounded), or both
-
The minOccurs/maxOccurs values must be defined on an element (xs:element) or wildcard (xs:any) particle
-
If the repeating particle is vulnerable, then it must not be part of a model group that is itself repeatable. A particle is vulnerable if it is part of a choice group, or if it is part of a sequence group in which all the other particles are optional or emptiable, except in the case where minOccurs is equal to maxOccurs. The reason for this restriction is that in such situations there are two nested repetitions, and it is ambiguous whether a new instance of the repeating term should be treated as a repetition at the inner level or at the outer level.
In cases where counters cannot be used, Saxon will still attempt to compile a finite state machine, but will use
configuration-defined limits on minOccurs and maxOccurs to approximate the values requested. If the values used in the schema exceed these limits, Saxon will therefore
approximate by generate a schema that does not strictly enforce the specified minOccurs and maxOccurs.
The default limits are 100 and 250 respectively. Different limits can be set on the command line or via the
Java API on the Configuration
object. Note however that when several nested repeating groups are
defined it is still possible for out-of-memory conditions to occur, even with quite modest values of minOccurs
and maxOccurs.