Class HtmlToPlainText


  • public class HtmlToPlainText
    extends java.lang.Object
    HTML to plain-text. This example program demonstrates the use of jsoup to convert HTML input to lightly-formatted plain-text. That is divergent from the general goal of jsoup's .text() methods, which is to get clean data from a scrape.

    Note that this is a fairly simplistic formatter -- for real world use you'll want to embrace and extend.

    To invoke from the command line, assuming you've downloaded the jsoup jar to your current directory:

    java -cp jsoup.jar org.jsoup.examples.HtmlToPlainText url [selector]

    where url is the URL to fetch, and selector is an optional CSS selector.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private static int timeout  
      private static java.lang.String userAgent  
    • Constructor Summary

      Constructors 
      Constructor Description
      HtmlToPlainText()  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      java.lang.String getPlainText​(Element element)
      Format an Element to plain-text
      static void main​(java.lang.String... args)  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • HtmlToPlainText

        public HtmlToPlainText()
    • Method Detail

      • main

        public static void main​(java.lang.String... args)
                         throws java.io.IOException
        Throws:
        java.io.IOException
      • getPlainText

        public java.lang.String getPlainText​(Element element)
        Format an Element to plain-text
        Parameters:
        element - the root element to format
        Returns:
        formatted text