Java Forums

Main Menu
Home
Today's Posts
FAQ
Search
Contact Us

Java Network
Java Tips
Java Tips Blog

Sponsored Links





Welcome to the Java Forums.

You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community, you will:

  • have access to post topics
  • communicate privately with other members (PM)
  • not see advertisements between posts
  • have the possibility to earn one of our surprises if you are an active member
  • access many other special features that will be introduced later.

Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact us.

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 05-02-2007, 04:32 AM
orchid's Avatar
Member
 
Join Date: Apr 2007
Location: Midwest
Posts: 60
orchid is on a distinguished road
html web page parsing/scraping
Hi, I am trying to automate some routine web browsing functionality. I need to log in/enter information/etc...however, the part that gets tricky (as far as finding a solution) is at some point after submitting information from a page, the links returned are undetermined...in other words, the results are not always the same (as far as number of naming)...and I need a way of accessing the links returned...determining their text, and being able to continue to specific links from there...certain screen scrapers out there come very close to doing what i want with the exception of the last part. any java api out there to handle this type of stuff?? I've tried httpunit & something very similar (forget the name), but they didn't work...i think issues with java script, etc...looking for perhaps a language or java api specifically geared around this type of stuff...if anyone has any insight, i would greatly appreciate it!! thanks...
Bookmark Post in Technorati
Reply With Quote
Sponsored Links
  #2 (permalink)  
Old 05-02-2007, 04:35 AM
Member
 
Join Date: Apr 2007
Location: USA
Posts: 50
derrickD is on a distinguished road
For any given page you should use an HTML parser to parse and process the document in any way you see fit. This allows you to retreive all links etc. Apache also has some really nice libraries in the HTTPComponents sub project.
HTML Parser - HTML Parser
HttpComponents - HttpComponents Overview

Also, if you choose not to elect Java for the task, I would suggest Python.
Bookmark Post in Technorati
Reply With Quote
Sponsored Links
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
jeditorpane help parsing html asifsolkar Advanced Java 4 12-14-2007 06:23 AM
How to view applet from html page. jwzumwalt Java Applets 2 11-24-2007 05:21 AM
Fetching HTML content of a Web Page JavaForums Java Blogs 0 11-05-2007 09:00 PM
HTML page bbq New To Java 1 07-05-2007 04:46 AM
Create a Applet in the page HTML Daniel Java Applets 2 07-04-2007 08:52 AM


All times are GMT +3. The time now is 11:45 PM.


VBulletin, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2006 - 2007, www.java-forums.org