Package org.jsoup.parser
Class Parser
- java.lang.Object
-
- org.jsoup.parser.Parser
-
public class Parser extends java.lang.Object
-
-
Field Summary
Fields Modifier and Type Field Description private ParseErrorList
errors
private ParseSettings
settings
private TreeBuilder
treeBuilder
-
Constructor Summary
Constructors Constructor Description Parser(TreeBuilder treeBuilder)
Create a new Parser, using the specified TreeBuilder
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description ParseErrorList
getErrors()
Retrieve the parse errors, if any, from the last parse.TreeBuilder
getTreeBuilder()
Get the TreeBuilder currently in use.static Parser
htmlParser()
Create a new HTML parser.boolean
isTrackErrors()
Check if parse error tracking is enabled.static Document
parse(java.lang.String html, java.lang.String baseUri)
Parse HTML into a Document.static Document
parseBodyFragment(java.lang.String bodyHtml, java.lang.String baseUri)
Parse a fragment of HTML into thebody
of a Document.static Document
parseBodyFragmentRelaxed(java.lang.String bodyHtml, java.lang.String baseUri)
static java.util.List<Node>
parseFragment(java.lang.String fragmentHtml, Element context, java.lang.String baseUri)
Parse a fragment of HTML into a list of nodes.static java.util.List<Node>
parseFragment(java.lang.String fragmentHtml, Element context, java.lang.String baseUri, ParseErrorList errorList)
Parse a fragment of HTML into a list of nodes.java.util.List<Node>
parseFragmentInput(java.lang.String fragment, Element context, java.lang.String baseUri)
Document
parseInput(java.io.Reader inputHtml, java.lang.String baseUri)
Document
parseInput(java.lang.String html, java.lang.String baseUri)
static java.util.List<Node>
parseXmlFragment(java.lang.String fragmentXml, java.lang.String baseUri)
Parse a fragment of XML into a list of nodes.ParseSettings
settings()
Parser
settings(ParseSettings settings)
Parser
setTrackErrors(int maxErrors)
Enable or disable parse error tracking for the next parse.Parser
setTreeBuilder(TreeBuilder treeBuilder)
Update the TreeBuilder used when parsing content.static java.lang.String
unescapeEntities(java.lang.String string, boolean inAttribute)
Utility method to unescape HTML entities from a stringstatic Parser
xmlParser()
Create a new XML parser.
-
-
-
Field Detail
-
treeBuilder
private TreeBuilder treeBuilder
-
errors
private ParseErrorList errors
-
settings
private ParseSettings settings
-
-
Constructor Detail
-
Parser
public Parser(TreeBuilder treeBuilder)
Create a new Parser, using the specified TreeBuilder- Parameters:
treeBuilder
- TreeBuilder to use to parse input into Documents.
-
-
Method Detail
-
parseInput
public Document parseInput(java.lang.String html, java.lang.String baseUri)
-
parseInput
public Document parseInput(java.io.Reader inputHtml, java.lang.String baseUri)
-
parseFragmentInput
public java.util.List<Node> parseFragmentInput(java.lang.String fragment, Element context, java.lang.String baseUri)
-
getTreeBuilder
public TreeBuilder getTreeBuilder()
Get the TreeBuilder currently in use.- Returns:
- current TreeBuilder.
-
setTreeBuilder
public Parser setTreeBuilder(TreeBuilder treeBuilder)
Update the TreeBuilder used when parsing content.- Parameters:
treeBuilder
- current TreeBuilder- Returns:
- this, for chaining
-
isTrackErrors
public boolean isTrackErrors()
Check if parse error tracking is enabled.- Returns:
- current track error state.
-
setTrackErrors
public Parser setTrackErrors(int maxErrors)
Enable or disable parse error tracking for the next parse.- Parameters:
maxErrors
- the maximum number of errors to track. Set to 0 to disable.- Returns:
- this, for chaining
-
getErrors
public ParseErrorList getErrors()
Retrieve the parse errors, if any, from the last parse.- Returns:
- list of parse errors, up to the size of the maximum errors tracked.
-
settings
public Parser settings(ParseSettings settings)
-
settings
public ParseSettings settings()
-
parse
public static Document parse(java.lang.String html, java.lang.String baseUri)
Parse HTML into a Document.- Parameters:
html
- HTML to parsebaseUri
- base URI of document (i.e. original fetch location), for resolving relative URLs.- Returns:
- parsed Document
-
parseFragment
public static java.util.List<Node> parseFragment(java.lang.String fragmentHtml, Element context, java.lang.String baseUri)
Parse a fragment of HTML into a list of nodes. The context element, if supplied, supplies parsing context.- Parameters:
fragmentHtml
- the fragment of HTML to parsecontext
- (optional) the element that this HTML fragment is being parsed for (i.e. for inner HTML). This provides stack context (for implicit element creation).baseUri
- base URI of document (i.e. original fetch location), for resolving relative URLs.- Returns:
- list of nodes parsed from the input HTML. Note that the context element, if supplied, is not modified.
-
parseFragment
public static java.util.List<Node> parseFragment(java.lang.String fragmentHtml, Element context, java.lang.String baseUri, ParseErrorList errorList)
Parse a fragment of HTML into a list of nodes. The context element, if supplied, supplies parsing context.- Parameters:
fragmentHtml
- the fragment of HTML to parsecontext
- (optional) the element that this HTML fragment is being parsed for (i.e. for inner HTML). This provides stack context (for implicit element creation).baseUri
- base URI of document (i.e. original fetch location), for resolving relative URLs.errorList
- list to add errors to- Returns:
- list of nodes parsed from the input HTML. Note that the context element, if supplied, is not modified.
-
parseXmlFragment
public static java.util.List<Node> parseXmlFragment(java.lang.String fragmentXml, java.lang.String baseUri)
Parse a fragment of XML into a list of nodes.- Parameters:
fragmentXml
- the fragment of XML to parsebaseUri
- base URI of document (i.e. original fetch location), for resolving relative URLs.- Returns:
- list of nodes parsed from the input XML.
-
parseBodyFragment
public static Document parseBodyFragment(java.lang.String bodyHtml, java.lang.String baseUri)
Parse a fragment of HTML into thebody
of a Document.- Parameters:
bodyHtml
- fragment of HTMLbaseUri
- base URI of document (i.e. original fetch location), for resolving relative URLs.- Returns:
- Document, with empty head, and HTML parsed into body
-
unescapeEntities
public static java.lang.String unescapeEntities(java.lang.String string, boolean inAttribute)
Utility method to unescape HTML entities from a string- Parameters:
string
- HTML escaped stringinAttribute
- if the string is to be escaped in strict mode (as attributes are)- Returns:
- an unescaped string
-
parseBodyFragmentRelaxed
public static Document parseBodyFragmentRelaxed(java.lang.String bodyHtml, java.lang.String baseUri)
Deprecated.- Parameters:
bodyHtml
- HTML to parsebaseUri
- baseUri base URI of document (i.e. original fetch location), for resolving relative URLs.- Returns:
- parsed Document
-
htmlParser
public static Parser htmlParser()
Create a new HTML parser. This parser treats input as HTML5, and enforces the creation of a normalised document, based on a knowledge of the semantics of the incoming tags.- Returns:
- a new HTML parser.
-
xmlParser
public static Parser xmlParser()
Create a new XML parser. This parser assumes no knowledge of the incoming tags and does not treat it as HTML, rather creates a simple tree directly from the input.- Returns:
- a new simple XML parser.
-
-