View Single Post
  #2 (permalink)  
Old 05-02-2007, 04:35 AM
derrickD derrickD is offline
Member
 
Join Date: Apr 2007
Location: USA
Posts: 50
derrickD is on a distinguished road
For any given page you should use an HTML parser to parse and process the document in any way you see fit. This allows you to retreive all links etc. Apache also has some really nice libraries in the HTTPComponents sub project.
HTML Parser - HTML Parser
HttpComponents - HttpComponents Overview

Also, if you choose not to elect Java for the task, I would suggest Python.
Reply With Quote