SAXONICA |
Saxon-SA 8.5 allows an XML document to be saved on disk in a format referred to as a PTree. This is a binary format designed for speed of loading. A document in PTree format takes about the same amount of disk space as the original source XML, but takes about half as long to load into memory. The saving is greater when the document contains type information, because this is retained in the PTree without the need to revalidate.
Two new commands are available, com.saxonica.ptree.PTreeWriter
and
com.saxonica.ptree.PTreeReader
to convert XML documents into PTrees and vice
versa.
A PTree can be supplied as the input to a transformation or query using the class PTreeSource
,
which implements the JAXP Source
interface.
A new command-line option is available on the commands com.saxonica.Transform
and com.saxonica.Query
. The option -p
causes a URIResolver to be used
that recognizes the file extension .ptree
as representing a Saxon PTree. This option
implicitly switches on the -u
option, meaning that the source file name is interpreted
as a URI. The PTreeURIResolver
, as well as recognising the .ptree
file extension, also
recognizes query parameters at the end of a URI. In particular it recognizes the parameters
validation=strict
, validation=lax
, validation=strip
which control how a source
document is schema-validated. For example, doc('source.xml?validation=lax')
loads a source
document with lax validation. This option allows different validation to be applied to different source
documents loaded by a single query or transformation.
The result of a query or transformation can be serialized as a PTree by specifying saxon:ptree
as the serialization method
. From the command line, use the parameter
!method={http://saxon.sf.net/}ptree
.
The PTree format has been designed so that one Saxon release should normally be able to read PTree files created by an earlier release. It may not always be possible, however, to read PTrees created using a later Saxon release. The PTree is not dependent on any particular NamePool, and can be freely moved between different machines just as source XML can. It is a binary format, so there is no dependency on any particular character encoding or machine architecture. PTree files are not designed to be read or written directly by user applications, nor are they designed to provide an interchange format between Saxon and other products: the internal format is therefore not published.
When a PTree contains type information, the schema that defines those types must also be loaded. This doesn't happen automatically. At present, there is no way of storing a compiled schema on disk, so this will generally involve rebuilding the schema from its source representation. It is the user's responsibility to ensure that the loaded schema is consistent with the schema that was used to validate the original XML document.
For more information see PTree Files.