Package org.jsoup.examples
Class HtmlToPlainText
- java.lang.Object
-
- org.jsoup.examples.HtmlToPlainText
-
public class HtmlToPlainText extends java.lang.Object
HTML to plain-text. This example program demonstrates the use of jsoup to convert HTML input to lightly-formatted plain-text. That is divergent from the general goal of jsoup's .text() methods, which is to get clean data from a scrape.Note that this is a fairly simplistic formatter -- for real world use you'll want to embrace and extend.
To invoke from the command line, assuming you've downloaded the jsoup jar to your current directory:
where url is the URL to fetch, and selector is an optional CSS selector.java -cp jsoup.jar org.jsoup.examples.HtmlToPlainText url [selector]
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private class
HtmlToPlainText.FormattingVisitor
-
Constructor Summary
Constructors Constructor Description HtmlToPlainText()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.String
getPlainText(Element element)
Format an Element to plain-textstatic void
main(java.lang.String... args)
-
-
-
Field Detail
-
userAgent
private static final java.lang.String userAgent
- See Also:
- Constant Field Values
-
timeout
private static final int timeout
- See Also:
- Constant Field Values
-
-
Method Detail
-
main
public static void main(java.lang.String... args) throws java.io.IOException
- Throws:
java.io.IOException
-
getPlainText
public java.lang.String getPlainText(Element element)
Format an Element to plain-text- Parameters:
element
- the root element to format- Returns:
- formatted text
-
-