Saxonica.com

Internal changes

A new method getProperties() has been added to the SequenceIterator interface. This allows the user of the iterator to determine properties of the iterator, such as whether the number of items in the iteration is known. Previously such properties could only be determined by testing the class of the iterator, which in some circumstances is not sufficiently fine-grained: the same class of iterator might sometimes support a property and sometimes not.

Following a suggestion from Wolfgang Hoschek, the Configuration now maintains a pool of parsers (XMLReader objects). Because initialization of a parser is typically expensive, this gives a significant improvement in the performance of parsing and tree-building when many documents are parsed under the control of the same Configuration, whether in the same thread or in different threads.

I have done some performance measurements to assess the impact of contention on the shared NamePool. This shows that synchronization on the allocate() method causes throughput of document construction to fall off when the concurrency gets above 10 or so. To eliminate this effect, the class ReceivingContentHandler now maintains a local cache of allocated name codes, so that when the same name appears repeatedly in an input document, the critical allocate() method is only invoked once. This removes the bottleneck. By removing the need to parse the lexical QName, this change also gives a (small) benefit when running a single-threaded transformation or query.

The document map maintained by the NamePool has been removed. This was being used only by the XOM interface, but was maintained for all documents (even temporary trees), with a potential cost in NamePool contention and in garbage collection time. XOM now maintains its own document map.

Next