Class DocumentBuilder
- java.lang.Object
-
- net.sf.saxon.s9api.DocumentBuilder
-
public class DocumentBuilder extends java.lang.Object
A document builder holds properties controlling how a Saxon document tree should be built, and provides methods to invoke the tree construction.This class has no public constructor. To construct a
DocumentBuilder
, use the factory methodProcessor.newDocumentBuilder()
.All documents used in a single Saxon query, transformation, or validation episode must be built with the same
Configuration
. However, there is no requirement that they should use the sameDocumentBuilder
.Sharing of a
DocumentBuilder
across multiple threads is not recommended. However, in the current implementation sharing aDocumentBuilder
(once initialized) will only cause problems if aSchemaValidator
is used.- Since:
- 9.0
-
-
Constructor Summary
Constructors Modifier Constructor Description protected
DocumentBuilder(Configuration config)
Create a DocumentBuilder.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description XdmNode
build(java.io.File file)
Build a document from a supplied XML fileXdmNode
build(javax.xml.transform.Source source)
Load an XML document, to create a tree representation of the document in memory.java.net.URI
getBaseURI()
Get the base URI of documents loaded using this DocumentBuilder when no other URI is available.XQueryExecutable
getDocumentProjectionQuery()
Get the compiled query to be used for implementing document projection.SchemaValidator
getSchemaValidator()
Get the SchemaValidator used to validate documents loaded using thisDocumentBuilder
.TreeModel
getTreeModel()
Get the tree model to be used for documents constructed using this DocumentBuilder.WhitespaceStrippingPolicy
getWhitespaceStrippingPolicy()
Get the white whitespace stripping policy applied when loading a document using thisDocumentBuilder
.boolean
isDTDValidation()
Ask whether DTD validation is to be applied to documents loaded using thisDocumentBuilder
boolean
isLineNumbering()
Ask whether line and column numbering is enabled for documents loaded using thisDocumentBuilder
.BuildingContentHandler
newBuildingContentHandler()
Get anContentHandler
that may be used to build the document programmatically.BuildingStreamWriterImpl
newBuildingStreamWriter()
Get anXMLStreamWriter
that may be used to build the document programmatically.void
parse(java.io.File file, Destination destination)
Parse a source document from a File, sending it to a suppliedDestination
void
parse(javax.xml.transform.Source source, Destination destination)
Parse a source document, sending it to a suppliedDestination
void
setBaseURI(java.net.URI uri)
Set the base URI of a document loaded using thisDocumentBuilder
.void
setDocumentProjectionQuery(XQueryExecutable query)
Set a compiled query to be used for implementing document projection.void
setDTDValidation(boolean option)
Set whether DTD validation should be applied to documents loaded using thisDocumentBuilder
.void
setLineNumbering(boolean option)
Say whether line and column numbering and is to be enabled for documents constructed using this DocumentBuilder.void
setSchemaValidator(SchemaValidator validator)
Set options for schema validation.void
setTreeModel(TreeModel model)
Set the tree model to be used for documents constructed using this DocumentBuilder.void
setWhitespaceStrippingPolicy(WhitespaceStrippingPolicy policy)
Set the whitespace stripping policy applied when loading a document using thisDocumentBuilder
.XdmNode
wrap(java.lang.Object node)
Create a node by wrapping a recognized external node from a supported object model.
-
-
-
Constructor Detail
-
DocumentBuilder
protected DocumentBuilder(Configuration config)
Create a DocumentBuilder. This is a protected constructor. Users should construct a DocumentBuilder by calling the factory methodProcessor.newDocumentBuilder()
.- Parameters:
config
- the Saxon configuration
-
-
Method Detail
-
setTreeModel
public void setTreeModel(TreeModel model)
Set the tree model to be used for documents constructed using this DocumentBuilder. By default, the TinyTree is used (irrespective of the TreeModel set in the underlying Configuration).- Parameters:
model
- typically one of the constantsTreeModel.TINY_TREE
,TreeModel.TINY_TREE_CONDENSED
, orTreeModel.LINKED_TREE
. It can also be an external object model such asXOMObjectModel
- Since:
- 9.2
-
getTreeModel
public TreeModel getTreeModel()
Get the tree model to be used for documents constructed using this DocumentBuilder. By default, the TinyTree is used (irrespective of the TreeModel set in the underlying Configuration).- Returns:
- the tree model in use: typically one of the constants
TreeModel.TINY_TREE
,TreeModel.TINY_TREE_CONDENSED
, orTreeModel.LINKED_TREE
. However, in principle a user-defined tree model can be used. - Since:
- 9.2
-
setLineNumbering
public void setLineNumbering(boolean option)
Say whether line and column numbering and is to be enabled for documents constructed using this DocumentBuilder. This has the effect that the line and column number in the original source document is maintained in the constructed tree, for each element node (and only for elements). The line and column number in question are generally the position at which the closing ">" of the element start tag appears.By default, line and column numbers are not maintained.
Errors relating to document parsing and validation will generally contain line numbers whether or not this option is set, because such errors are detected during document construction.
Line numbering is not available for all kinds of source: for example, it is not available when loading from an existing DOM Document.
The resulting line and column numbers are accessible to applications using the XPath extension functions saxon:line-number() and saxon:column-number() applied to a node, or using the Java methods
NodeInfo.getLineNumber()
andNodeInfo.getColumnNumber()
Line and column numbers are maintained only for element nodes; the line number returned for any other node will be that of the most recent element. For an element node, the line and column number are generally that of the closing angle bracket at the end of the start tag (this is what a SAX parser notifies)
- Parameters:
option
- true if line numbers are to be maintained, false otherwise.
-
isLineNumbering
public boolean isLineNumbering()
Ask whether line and column numbering is enabled for documents loaded using thisDocumentBuilder
.By default, line and column numbering is disabled.
Line numbering is not available for all kinds of source: in particular, it is not available when loading from an existing DOM Document.
The resulting line and column numbers are accessible to applications using the extension functions saxon:line-number() and saxon:column-number applied to a node, or using the Java methods
NodeInfo.getLineNumber()
andNodeInfo.getColumnNumber()
Line and column numbers are maintained only for element nodes; the line number returned for any other node will be that of the most recent element. For an element node, the line number is generally that of the closing angle bracket at the end of the start tag (this is what a SAX parser notifies)
- Returns:
- true if line numbering is enabled
-
setSchemaValidator
public void setSchemaValidator(SchemaValidator validator)
Set options for schema validation. This determines whether schema validation is applied to an input document and whether type annotations in a supplied document are retained. If no schemaValidator is supplied, then schema validation does not take place.This option requires the schema-aware version of the Saxon product (Saxon-EE).
The supplied
SchemaValidator
is not actually used directly when a document is built usingparse(File, Destination)
orparse(Source, Destination)
(theSchemaValidator.validate(Source)
method is never called). Rather, some of the properties of theSchemaValidator
are used to control how schema validation is performed by theDocumentBuilder
. The particular properties that take effect include:- The validation mode (strict or lax)
- The required top-level element declaration (see
SchemaValidator.setDocumentElementName(QName)
- The required type of the top-level element (see
SchemaValidator.setDocumentElementTypeName(QName)
- The option
SchemaValidator.isUseXsiSchemaLocation()
- The option
SchemaValidator.isExpandAttributeDefaults()
- Validation parameters set using
SchemaValidator.setParameter(net.sf.saxon.s9api.QName, net.sf.saxon.s9api.XdmValue)
- The
InvalidityHandler
Properties that do NOT have any effect include:
- The option
SchemaValidator.isCollectStatistics()
- Parameters:
validator
- the SchemaValidator to be used
-
getSchemaValidator
public SchemaValidator getSchemaValidator()
Get the SchemaValidator used to validate documents loaded using thisDocumentBuilder
.- Returns:
- the SchemaValidator if one has been set; otherwise null.
-
setDTDValidation
public void setDTDValidation(boolean option)
Set whether DTD validation should be applied to documents loaded using thisDocumentBuilder
.By default, no DTD validation takes place.
- Parameters:
option
- true if DTD validation is to be applied to the document
-
isDTDValidation
public boolean isDTDValidation()
Ask whether DTD validation is to be applied to documents loaded using thisDocumentBuilder
- Returns:
- true if DTD validation is to be applied
-
setWhitespaceStrippingPolicy
public void setWhitespaceStrippingPolicy(WhitespaceStrippingPolicy policy)
Set the whitespace stripping policy applied when loading a document using thisDocumentBuilder
.If DTD or schema validation is applied, the only permitted setting is
WhitespaceStrippingPolicy.IGNORABLE
. Any other value results in an exception from thebuild(File)
method- Parameters:
policy
- the policy for stripping whitespace-only text nodes from source documents
-
getWhitespaceStrippingPolicy
public WhitespaceStrippingPolicy getWhitespaceStrippingPolicy()
Get the white whitespace stripping policy applied when loading a document using thisDocumentBuilder
.- Returns:
- the policy for stripping whitespace-only text nodes
-
setBaseURI
public void setBaseURI(java.net.URI uri)
Set the base URI of a document loaded using thisDocumentBuilder
.This is used for resolving any relative URIs appearing within the document, for example in references to DTDs and external entities.
This information is required when the document is loaded from a source that does not provide an intrinsic URI, notably when loading from a Stream or a DOMSource. The value is ignored when loading from a source that does have an intrinsic base URI.
- Parameters:
uri
- the base URI of documents loaded using thisDocumentBuilder
. This must be an absolute URI.- Throws:
java.lang.IllegalArgumentException
- if the baseURI supplied is not an absolute URI
-
getBaseURI
public java.net.URI getBaseURI()
Get the base URI of documents loaded using this DocumentBuilder when no other URI is available.- Returns:
- the base URI to be used, or null if no value has been set.
-
setDocumentProjectionQuery
public void setDocumentProjectionQuery(XQueryExecutable query)
Set a compiled query to be used for implementing document projection. The effect of using this option is that the tree constructed by the DocumentBuilder contains only those parts of the source document that are needed to answer this query. Running this query against the projected document should give the same results as against the raw document, but the projected document typically occupies significantly less memory. It is permissible to run other queries against the projected document, but unless they are carefully chosen, they will give the wrong answer, because the document being used is different from the original.The query should be written to use the projected document as its initial context item. For example, if the query is
//ITEM[COLOR='blue')
, then onlyITEM
elements and theirCOLOR
children will be retained in the projected document.This facility is only available in Saxon-EE; if the facility is not available, calling this method has no effect.
- Parameters:
query
- the compiled query used to control document projection- Since:
- 9.3
-
getDocumentProjectionQuery
public XQueryExecutable getDocumentProjectionQuery()
Get the compiled query to be used for implementing document projection.- Returns:
- the query set using
setDocumentProjectionQuery(net.sf.saxon.s9api.XQueryExecutable)
if this has been called, or null otherwise - Since:
- 9.3. In 9.4 the unused and undocumented first argument is removed.
-
build
public XdmNode build(javax.xml.transform.Source source) throws SaxonApiException
Load an XML document, to create a tree representation of the document in memory.- Parameters:
source
- A JAXP Source object identifying the source of the document. This can always be aStreamSource
or aSAXSource
. Some kinds of Source are consumed by this method, and should only be used once.If a SAXSource is supplied, the XMLReader held within the SAXSource may be modified (by setting features and properties) to reflect the options selected on this DocumentBuilder.
If the source is an instance of
NodeInfo
then the subtree rooted at this node will be copied (applying schema validation if requested) to create a new tree.Saxon also accepts an instance of
StAXSource
orPullSource
, which can be used to supply a document that is to be parsed using a StAX parser.(9.8.0.5) This method now (once again) accepts an instance of
AugmentedSource
. If anAugmentedSource
is supplied, the properties of theAugmentedSource
take precedence over any properties set on thisDocumentBuilder
, which in turn take precedence over properties set at theProcessor
orConfiguration
level. The concept of "taking precedence" is explained more fully atParseOptions.merge(ParseOptions)
- Returns:
- An
XdmNode
. This will be the document node at the root of the tree of the resulting in-memory document. - Throws:
java.lang.NullPointerException
- if the source argument is nulljava.lang.IllegalArgumentException
- if the kind of source is not recognizedSaxonApiException
- if any other failure occurs building the document, for example a parsing error
-
build
public XdmNode build(java.io.File file) throws SaxonApiException
Build a document from a supplied XML file- Parameters:
file
- the supplied file- Returns:
- the XdmNode representing the root of the document tree
- Throws:
SaxonApiException
- if any failure occurs retrieving or parsing the document
-
newBuildingContentHandler
public BuildingContentHandler newBuildingContentHandler() throws SaxonApiException
Get anContentHandler
that may be used to build the document programmatically.- Returns:
- a newly constructed
BuildingContentHandler
, which implements theContentHandler
interface. If schema validation has been requested for thisDocumentBuilder
, then the document constructed using theContentHandler
will be validated as it is written.Note that the returned
ContentHandler
expects namespace scopes to be indicated explicitly by calls toContentHandler.startPrefixMapping(java.lang.String, java.lang.String)
andContentHandler.endPrefixMapping(java.lang.String)
.If the stream of events supplied to the
ContentHandler
does not constitute a well formed (and namespace-well-formed) document, the effect is undefined; Saxon may fail to detect the error, and construct an unusable tree. - Throws:
SaxonApiException
- if any failure occurs- Since:
- 9.3
-
newBuildingStreamWriter
public BuildingStreamWriterImpl newBuildingStreamWriter() throws SaxonApiException
Get anXMLStreamWriter
that may be used to build the document programmatically.- Returns:
- a newly constructed
BuildingStreamWriter
, which implements theXMLStreamWriter
interface. If schema validation has been requested for thisDocumentBuilder
, then the document constructed using theXMLStreamWriter
will be validated as it is written.If the stream of events supplied to the
XMLStreamWriter
does not constitute a well formed (and namespace-well-formed) document, the effect is undefined; Saxon may fail to detect the error, and construct an unusable tree. - Throws:
SaxonApiException
- if any failure occurs- Since:
- 9.3
-
wrap
public XdmNode wrap(java.lang.Object node) throws java.lang.IllegalArgumentException
Create a node by wrapping a recognized external node from a supported object model.If the supplied object implements the
NodeInfo
interface then it will be wrapped as anXdmNode
without copying and without change. TheNodeInfo
must have been created using aConfiguration
compatible with the one used by thisProcessor
(specifically, one that uses the sameNamePool
)To wrap nodes from other object models, such as DOM, the support module for the external object model must be on the class path and registered with the Saxon configuration. The support modules for DOM, JDOM, DOM4J and XOM are registered automatically if they can be found on the classpath.
It is best to avoid calling this method repeatedly to wrap different nodes in the same document. Each such wrapper conceptually creates a new XDM tree instance with its own identity. Although the memory is shared, operations that rely on node identity might not have the expected result. It is best to create a single wrapper for the document node, and then to navigate to the other nodes in the tree using S9API interfaces.
- Parameters:
node
- the node in the external tree representation. Either an instance ofNodeInfo
, or an instances of a node in an external object model. Nodes in other object models (such as DOM, JDOM, etc) are recognized only if the support module for the external object model is known to the Configuration.- Returns:
- the supplied node wrapped as an XdmNode
- Throws:
java.lang.IllegalArgumentException
- if the type of object supplied is not recognized. This may be because node was created using a different Saxon Processor, or because the required code for the external object model is not on the class path
-
parse
public void parse(javax.xml.transform.Source source, Destination destination) throws SaxonApiException
Parse a source document, sending it to a suppliedDestination
The process is streamed; no tree is constructed in memory.
- Parameters:
source
- The source document to be parseddestination
- The destination to which the document is to be sent- Throws:
SaxonApiException
- if parsing fails, or if the destination reports an error
-
parse
public void parse(java.io.File file, Destination destination) throws SaxonApiException
Parse a source document from a File, sending it to a suppliedDestination
The process is streamed; no tree is constructed in memory.
- Parameters:
file
- The file containing the XML source document to be parseddestination
- The destination to which the document is to be sent- Throws:
SaxonApiException
- if parsing fails, or if the destination reports an error
-
-