How to Build the HTML Parser libraries

JDK

Set up java. I won't include instructions here, just a link to the Sun j2se site. I use version 1.5, and you need a JDK (java development kit), not a JRE (java runtime environment).

Test your installation by typing command:

javac

This should display help on the java compiler options.

Ant

Set up ant, the Java-based build tool from the Apache Jakarta project. It is kind of like Make, but without Make's wrinkles. The build.xml file the HTML Parser uses relies on command tags available in Ant version 1.4.1 or higher. The version currently used on the build machine is 1.6.2. The current version of Ant is available here.

Basically you unzip the file into a directory and add an ANT_HOME environment variable that points at it. Test your installation by typing the command:

ant -help

This should display help on ant options.

Third Party Libraries

Any needed third-party libraries are included in the lib directory.

The unit test code relies on lib/junit.jar from the JUnit project. The version used on the build machine is 3.8.1 which you can get here.

Sources

The distribution zip file contains a src.jar file. If you've unpacked the distribution this file should be in the top level directory you chose.

Unjar this file with the command:

jar -xf src.jar

There should now be a build.xml in the top level directory.

Building

The default ant target 'htmlparser' builds everything:

ant

If you just want to build some of the parts see the help list:

ant -projecthelp
 Package        glom the release and source files into the distribution zip file
 Release        prepare the release files
 changelog      create the change log from CVS logs
 checkstyle     check source code adheres to coding standards
 clean          cleanup
 compile        compile all java files
 compilelexer   compile lexer java files
 compileparser  compile parser java files
 htmlparser     same as Package plus cleanup
 init           initialize version properties
 jar            create htmlparser.jar and htmllexer.jar
 jarlexer       create htmllexer.jar
 jarparser      create htmlparser.jar
 javadoc        create JavaDoc (API) documentation
 sources        create the source zip
 test           run the JUnit tests
 thumbelina     create thumbelina.jar
 versionSource  update the version in all java files
  

Developing

For development purposes you might want to get an Integrated Development Environment (IDE) such as NetBeans or Eclipse. Mount the org directory where the HTML Parser was installed along with the junit.jar file from the lib directory, and the tools.jar file from the java JDK lib directory [where java is installed]/lib/tools.jar. "Build All" should work.

CVS

The most recent files are only available via CVS:
  server: cvs.htmlparser.sourceforge.net
  repository: /cvsroot/htmlparser
  
For read-only access use 'pserver' and anonymous access with no password. For commit access you'll need to set up ssh (see an introduction to SSH on sourceforge and a guide on setting up ssh keys).

Short instructions from Karle Kaila:

I have installed SSH software from www.f-secure.com

I think it was something like F-Secure SSH 5.2 for Win95/98/ME/NT4.0/2000/XP Client

It is a nice grapfical SSH client both for terminal use and filetransfer
and it also contains commandline ssh2 software that CVS needs.

To access CVS I first set it up with these commands

set CVS_RSH=ssh2
set CVSROOT=username@cvs.htmlparser.sourceforge.net:/cvsroot/htmlparser

username = your sourceforge username

In an empty directory I then can give CVS commands such as

cvs chekcout htmlparser

It asks for your password to sourceforge

This retrieves the latest  fileversions.
Check the CVS commands in some handbook you can find on the internet.
The manual I found is called Version Management with CVS by Per Cederqvist et al.
perhaps from http://www.cvshome.org

Derrick says:
I need
CVSROOT=:ext:username@cvs.htmlparser.sourceforge.net:/cvsroot/htmlparser
CVS_RSH=ssh

Other

Some of the build.xml targets (like changelog) rely on Perl to execute, and need a sourceforge login via ssh (secure shell). This is unlikely to be needed by the casual user.