public class DocumentBuilder
extends java.lang.Object
This class has no public constructor. Users should construct a DocumentBuilder
by calling the factory method Processor.newDocumentBuilder()
.
All documents used in a single Saxon query, transformation, or validation episode must
be built with the same Configuration
. However, there is no requirement that they
should use the same DocumentBuilder
.
Sharing of a DocumentBuilder
across multiple threads is not recommended. However,
in the current implementation sharing a DocumentBuilder
(once initialized) will only
cause problems if a SchemaValidator
is used.
Modifier | Constructor and Description |
---|---|
protected |
DocumentBuilder(Configuration config)
Create a DocumentBuilder.
|
Modifier and Type | Method and Description |
---|---|
XdmNode |
build(java.io.File file)
Build a document from a supplied XML file
|
XdmNode |
build(javax.xml.transform.Source source)
Load an XML document, to create a tree representation of the document in memory.
|
java.net.URI |
getBaseURI()
Get the base URI of documents loaded using this DocumentBuilder when no other URI is available.
|
XQueryExecutable |
getDocumentProjectionQuery()
Get the compiled query to be used for implementing document projection.
|
SchemaValidator |
getSchemaValidator()
Get the SchemaValidator used to validate documents loaded using this
DocumentBuilder . |
TreeModel |
getTreeModel()
Get the tree model to be used for documents constructed using this DocumentBuilder.
|
WhitespaceStrippingPolicy |
getWhitespaceStrippingPolicy()
Get the white whitespace stripping policy applied when loading a document
using this
DocumentBuilder . |
boolean |
isDTDValidation()
Ask whether DTD validation is to be applied to documents loaded using this
DocumentBuilder |
boolean |
isLineNumbering()
Ask whether line numbering is enabled for documents loaded using this
DocumentBuilder . |
BuildingContentHandler |
newBuildingContentHandler()
Get an
ContentHandler that may be used to build the document programmatically. |
BuildingStreamWriterImpl |
newBuildingStreamWriter()
Get an
XMLStreamWriter that may be used to build the document programmatically. |
void |
setBaseURI(java.net.URI uri)
Set the base URI of a document loaded using this
DocumentBuilder . |
void |
setDocumentProjectionQuery(XQueryExecutable query)
Set a compiled query to be used for implementing document projection.
|
void |
setDTDValidation(boolean option)
Set whether DTD validation should be applied to documents loaded using this
DocumentBuilder . |
void |
setLineNumbering(boolean option)
Say whether line numbering is to be enabled for documents constructed using this DocumentBuilder.
|
void |
setSchemaValidator(SchemaValidator validator)
Set the schemaValidator to be used.
|
void |
setTreeModel(TreeModel model)
Set the tree model to be used for documents constructed using this DocumentBuilder.
|
void |
setWhitespaceStrippingPolicy(WhitespaceStrippingPolicy policy)
Set the whitespace stripping policy applied when loading a document
using this
DocumentBuilder . |
XdmNode |
wrap(java.lang.Object node)
Create a node by wrapping a recognized external node from a supported object model.
|
protected DocumentBuilder(Configuration config)
Processor.newDocumentBuilder()
.config
- the Saxon configurationpublic void setTreeModel(TreeModel model)
model
- typically one of the constants TreeModel.TINY_TREE
,
TreeModel.TINY_TREE_CONDENSED
, or TreeModel.LINKED_TREE
. It can also be
an external object model such as XOMObjectModel
public TreeModel getTreeModel()
TreeModel.TINY_TREE
,
TreeModel.TINY_TREE_CONDENSED
, or TreeModel.LINKED_TREE
. However, in principle
a user-defined tree model can be used.public void setLineNumbering(boolean option)
By default, line numbers are not maintained.
Errors relating to document parsing and validation will generally contain line numbers whether or not this option is set, because such errors are detected during document construction.
Line numbering is not available for all kinds of source: for example, it is not available when loading from an existing DOM Document.
The resulting line numbers are accessible to applications using the
XPath extension function saxon:line-number() applied to a node, or using the
Java method NodeInfo.getLineNumber()
Line numbers are maintained only for element nodes; the line number returned for any other node will be that of the most recent element. For an element node, the line number is generally that of the closing angle bracket at the end of the start tag (this is what a SAX parser notifies)
option
- true if line numbers are to be maintained, false otherwise.public boolean isLineNumbering()
DocumentBuilder
.
By default, line numbering is disabled.
Line numbering is not available for all kinds of source: in particular, it is not available when loading from an existing DOM Document.
The resulting line numbers are accessible to applications using the
extension function saxon:line-number() applied to a node, or using the
Java method NodeInfo.getLineNumber()
Line numbers are maintained only for element nodes; the line number returned for any other node will be that of the most recent element. For an element node, the line number is generally that of the closing angle bracket at the end of the start tag (this is what a SAX parser notifies)
public void setSchemaValidator(SchemaValidator validator)
This option requires the schema-aware version of the Saxon product (Saxon-EE).
Since a SchemaValidator
is serially reusable but not thread-safe, using this
method is not appropriate when the DocumentBuilder
is shared between threads.
validator
- the SchemaValidator to be usedpublic SchemaValidator getSchemaValidator()
DocumentBuilder
.public void setDTDValidation(boolean option)
DocumentBuilder
.
By default, no DTD validation takes place.
option
- true if DTD validation is to be applied to the documentpublic boolean isDTDValidation()
DocumentBuilder
public void setWhitespaceStrippingPolicy(WhitespaceStrippingPolicy policy)
DocumentBuilder
.
(New rule in 9.8:) If DTD or schema validation is applied, the only permitted setting
is WhitespaceStrippingPolicy.IGNORABLE
. Any other value results
in an exception from the build(File)
method
policy
- the policy for stripping whitespace-only text nodes from
source documentspublic WhitespaceStrippingPolicy getWhitespaceStrippingPolicy()
DocumentBuilder
.public void setBaseURI(java.net.URI uri)
DocumentBuilder
.
This is used for resolving any relative URIs appearing within the document, for example in references to DTDs and external entities.
This information is required when the document is loaded from a source that does not provide an intrinsic URI, notably when loading from a Stream or a DOMSource. The value is ignored when loading from a source that does have an intrinsic base URI.
uri
- the base URI of documents loaded using this DocumentBuilder
. This
must be an absolute URI.java.lang.IllegalArgumentException
- if the baseURI supplied is not an absolute URIpublic java.net.URI getBaseURI()
public void setDocumentProjectionQuery(XQueryExecutable query)
The query should be written to use the projected document as its initial context item.
For example, if the query is //ITEM[COLOR='blue')
, then only ITEM
elements and their COLOR
children will be retained in the projected document.
This facility is only available in Saxon-EE; if the facility is not available, calling this method has no effect.
query
- the compiled query used to control document projectionpublic XQueryExecutable getDocumentProjectionQuery()
setDocumentProjectionQuery(net.sf.saxon.s9api.XQueryExecutable)
if this
has been called, or null otherwisepublic XdmNode build(javax.xml.transform.Source source) throws SaxonApiException
source
- A JAXP Source object identifying the source of the document. This can always be
a StreamSource
or a SAXSource
.
Some kinds of Source are consumed by this method, and should only be used once.
If a SAXSource is supplied, the XMLReader held within the SAXSource may be modified (by setting features and properties) to reflect the options selected on this DocumentBuilder.
If the source is an instance of NodeInfo
then the subtree rooted at this node
will be copied (applying schema validation if requested) to create a new tree.
Saxon also accepts an instance of StAXSource
or
PullSource
, which can be used to supply a document that is to be parsed
using a StAX parser.
(9.8.0.5) This method now (once again) accepts an instance of AugmentedSource
.
If an AugmentedSource
is supplied, the properties of the AugmentedSource
take
precedence over any properties set on this DocumentBuilder
, which in turn take precedence
over properties set at the Processor
or Configuration
level. The concept of
"taking precedence" is explained more fully at ParseOptions.merge(ParseOptions)
XdmNode
. This will be
the document node at the root of the tree of the resulting in-memory document.java.lang.NullPointerException
- if the source argument is nulljava.lang.IllegalArgumentException
- if the kind of source is not recognizedSaxonApiException
- if any other failure occurs building the document, for example
a parsing errorpublic XdmNode build(java.io.File file) throws SaxonApiException
file
- the supplied fileSaxonApiException
- if any failure occurs retrieving or parsing the documentpublic BuildingContentHandler newBuildingContentHandler() throws SaxonApiException
ContentHandler
that may be used to build the document programmatically.BuildingContentHandler
, which implements the ContentHandler
interface. If schema validation has been requested for this DocumentBuilder
, then the document constructed
using the ContentHandler
will be validated as it is written.
Note that the returned ContentHandler
expects namespace scopes to be indicated
explicitly by calls to ContentHandler.startPrefixMapping(java.lang.String, java.lang.String)
and
ContentHandler.endPrefixMapping(java.lang.String)
.
If the stream of events supplied to the ContentHandler
does not constitute
a well formed (and namespace-well-formed) document, the effect is undefined; Saxon may fail
to detect the error, and construct an unusable tree.
SaxonApiException
- if any failure occurspublic BuildingStreamWriterImpl newBuildingStreamWriter() throws SaxonApiException
XMLStreamWriter
that may be used to build the document programmatically.BuildingStreamWriter
, which implements the XMLStreamWriter
interface. If schema validation has been requested for this DocumentBuilder
, then the document constructed
using the XMLStreamWriter
will be validated as it is written.
If the stream of events supplied to the XMLStreamWriter
does not constitute
a well formed (and namespace-well-formed) document, the effect is undefined; Saxon may fail
to detect the error, and construct an unusable tree.
SaxonApiException
- if any failure occurspublic XdmNode wrap(java.lang.Object node) throws java.lang.IllegalArgumentException
If the supplied object implements the NodeInfo
interface then it
will be wrapped as an XdmNode
without copying and without change. The NodeInfo
must have been created using a Configuration
compatible
with the one used by this Processor
(specifically, one that uses the same
NamePool
)
To wrap nodes from other object models, such as DOM, the support module for the external object model must be on the class path and registered with the Saxon configuration. The support modules for DOM, JDOM, DOM4J and XOM are registered automatically if they can be found on the classpath.
It is best to avoid calling this method repeatedly to wrap different nodes in the same document. Each such wrapper conceptually creates a new XDM tree instance with its own identity. Although the memory is shared, operations that rely on node identity might not have the expected result. It is best to create a single wrapper for the document node, and then to navigate to the other nodes in the tree using S9API interfaces.
node
- the node in the external tree representation. Either an instance of
NodeInfo
, or an instances of a node in an external object model.
Nodes in other object models (such as DOM, JDOM, etc) are recognized only if
the support module for the external object model is known to the Configuration.java.lang.IllegalArgumentException
- if the type of object supplied is not recognized. This may be because
node was created using a different Saxon Processor, or because the required code for the external
object model is not on the class pathCopyright (c) 2004-2018 Saxonica Limited. All rights reserved.