Java Forums

Main Menu
Home
Today's Posts
FAQ
Search
Contact Us

Java Network
Java Tips
Java Tips Blog

Sponsored Links





Welcome to the Java Forums.

You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community, you will:

  • have access to post topics
  • communicate privately with other members (PM)
  • not see advertisements between posts
  • have the possibility to earn one of our surprises if you are an active member
  • access many other special features that will be introduced later.

Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact us.

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 01-08-2008, 01:02 AM
Member
 
Join Date: Jan 2008
Posts: 1
Rodija is on a distinguished road
Java and Web Crawlers...
Hi everyone,

I was looking into how to build a web crawler with java and came across this old tutorial: Writing a Web Crawler in the Java Programming Language

After a few changes I got it to compile and run... but it doesn't work. Can anyone help?

Basically my intention is to use this as a foundation for a little project I have going. Eventually I want to modify it so that it will just crawl one website defined in the code and look for a set of qualties which i can define rather than for URLs. Any suggestions for going about this?

Many thanks.
Bookmark Post in Technorati
Reply With Quote
Sponsored Links
  #2 (permalink)  
Old 01-08-2008, 04:54 AM
roots's Avatar
Moderator
 
Join Date: Jan 2008
Location: Dallas
Posts: 263
roots is on a distinguished road
I went through that code. Elaborate a bit more on how you want your web crawler to be.
Things to consider will be.
1. Java networking concept
2. HTTP Protocol and HTML
3. Text Parsing (Remember how he extracted <a href ..

Commons HTTP/HTTP Component, Regex, Simple IO are welcome as well.

Program you used is not ROBUST enough. Better you start from scratch.
__________________
dont worry newbie, we got you covered.
Bookmark Post in Technorati
Reply With Quote
Sponsored Links
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +3. The time now is 04:21 PM.


VBulletin, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2006 - 2007, www.java-forums.org