Internal changes
A new method getProperties()
has been added to the SequenceIterator
interface. This allows the user of the iterator to determine properties of the iterator, such as whether the number
of items in the iteration is known. Previously such properties could only be determined by testing the class of
the iterator, which in some circumstances is not sufficiently fine-grained: the same class of iterator might sometimes
support a property and sometimes not.
Following a suggestion from Wolfgang Hoschek, the Configuration
now maintains a pool
of parsers (XMLReader
objects). Because initialization of a parser is typically expensive,
this gives a significant improvement in the performance of
parsing and tree-building when many documents are parsed under the control of the same Configuration
,
whether in the same thread or in different threads.
I have done some performance measurements to assess the impact of contention on the shared NamePool
. This
shows that synchronization on the allocate()
method causes throughput of document construction
to fall off when the concurrency gets above 10 or so. To eliminate this effect, the class
ReceivingContentHandler
now maintains a local cache of allocated name codes, so that when the same
name appears repeatedly in an input document, the critical allocate()
method is only invoked once.
This removes the bottleneck. By removing the need to parse the lexical QName, this change also gives a
(small) benefit when running a single-threaded transformation or query.
The document map maintained by the NamePool has been removed. This was being used only by the XOM interface, but was maintained for all documents (even temporary trees), with a potential cost in NamePool contention and in garbage collection time. XOM now maintains its own document map.