HTML Parser Home Page

org.htmlparser.parserapplications
Class StringExtractor

java.lang.Object
  extended by org.htmlparser.parserapplications.StringExtractor

public class StringExtractor
extends Object

Extract plaintext strings from a web page. Illustrative program to gather the textual contents of a web page. Uses a StringBean to accumulate the user visible text (what a browser would display) into a single string.


Constructor Summary
StringExtractor(String resource)
          Construct a StringExtractor to read from the given resource.
 
Method Summary
 String extractStrings(boolean links)
          Extract the text from a page.
static void main(String[] args)
          Mainline.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

StringExtractor

public StringExtractor(String resource)
Construct a StringExtractor to read from the given resource.

Parameters:
resource - Either a URL or a file name.
Method Detail

extractStrings

public String extractStrings(boolean links)
                      throws ParserException
Extract the text from a page.

Parameters:
links - if true include hyperlinks in output.
Returns:
The textual contents of the page.
Throws:
ParserException - If a parse error occurs.

main

public static void main(String[] args)
Mainline.

Parameters:
args - The command line arguments.

© 2005 Derrick Oswald
Jun 10, 2006

HTML Parser is an open source library released under LGPL. SourceForge.net