saxon:parse-html
Parses HTML supplied as a string.
parse-html($html as xs:string) ➔ document-node()
Arguments | |||
| $html | xs:string | The HTML content as a string |
Result | document-node() |
Namespace
http://saxon.sf.net/
Notes on the Saxon implementation
Available since Saxon 9.2.
Details
This function takes a single argument, a string containing the source text of an HTML document. It returns the document node (root node) that results from parsing this text using the TagSoup parser.
On the Java platform, the TagSoup jar file must be on the classpath. It may be downloaded from https://mvnrepository.com/artifact/org.ccil.cowan.tagsoup/tagsoup/1.2.
On the .NET platform, the code of TagSoup 1.2 is available automatically: it has been
compiled into the saxon-pe-10.#.dll
and saxon-ee-10.#.dll
assemblies.
This function is useful where an HTML document is embedded inside another using CDATA. It can also be used in conjunction with the unparsed-text() function to read HTML from filestore. Note that the base URI of the document is not retained in this case.