Results 1 to 4 of 4
  1. #1
    orchid's Avatar
    orchid is offline Member
    Join Date
    Apr 2007
    Location
    Midwest
    Posts
    60
    Rep Power
    0

    Default html web page parsing/scraping

    Hi, I am trying to automate some routine web browsing functionality. I need to log in/enter information/etc...however, the part that gets tricky (as far as finding a solution) is at some point after submitting information from a page, the links returned are undetermined...in other words, the results are not always the same (as far as number of naming)...and I need a way of accessing the links returned...determining their text, and being able to continue to specific links from there...certain screen scrapers out there come very close to doing what i want with the exception of the last part. any java api out there to handle this type of stuff?? I've tried httpunit & something very similar (forget the name), but they didn't work...i think issues with java script, etc...looking for perhaps a language or java api specifically geared around this type of stuff...if anyone has any insight, i would greatly appreciate it!! thanks...

  2. #2
    derrickD is offline Member
    Join Date
    Apr 2007
    Location
    USA
    Posts
    50
    Rep Power
    0

    Default

    For any given page you should use an HTML parser to parse and process the document in any way you see fit. This allows you to retreive all links etc. Apache also has some really nice libraries in the HTTPComponents sub project.
    HTML Parser - HTML Parser
    HttpComponents - HttpComponents Overview

    Also, if you choose not to elect Java for the task, I would suggest Python.

  3. #3
    francojava1 is offline Member
    Join Date
    Sep 2010
    Posts
    26
    Rep Power
    0

    Default html web page parsing/scraping

    Hello dear orchid, Iam francojava1, who suggests you visit at this sample HTML Scraper Python recipes HTML Scraper Python recipes ActiveState Code. Please tell me what is the part of codes that want to codified there .Also, I could build a java API with respect to this parser Ok.

    Thanks.

  4. #4
    Fubarable's Avatar
    Fubarable is offline Moderator
    Join Date
    Jun 2008
    Posts
    19,316
    Blog Entries
    1
    Rep Power
    26

Similar Threads

  1. jeditorpane help parsing html
    By asifsolkar in forum Advanced Java
    Replies: 4
    Last Post: 12-14-2007, 05:23 AM
  2. How to view applet from html page.
    By jwzumwalt in forum Java Applets
    Replies: 2
    Last Post: 11-24-2007, 04:21 AM
  3. HTML page
    By bbq in forum New To Java
    Replies: 1
    Last Post: 07-05-2007, 03:46 AM
  4. Create a Applet in the page HTML
    By Daniel in forum Java Applets
    Replies: 2
    Last Post: 07-04-2007, 07:52 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •