Results 1 to 2 of 2
  1. #1
    Rodija is offline Member
    Join Date
    Jan 2008
    Rep Power

    Default Java and Web Crawlers...

    Hi everyone,

    I was looking into how to build a web crawler with java and came across this old tutorial: Writing a Web Crawler in the Java Programming Language

    After a few changes I got it to compile and run... but it doesn't work. Can anyone help?

    Basically my intention is to use this as a foundation for a little project I have going. Eventually I want to modify it so that it will just crawl one website defined in the code and look for a set of qualties which i can define rather than for URLs. Any suggestions for going about this?

    Many thanks.

  2. #2
    roots's Avatar
    roots is offline Moderator
    Join Date
    Jan 2008
    Rep Power


    I went through that code. Elaborate a bit more on how you want your web crawler to be.
    Things to consider will be.
    1. Java networking concept
    2. HTTP Protocol and HTML
    3. Text Parsing (Remember how he extracted <a href ..

    Commons HTTP/HTTP Component, Regex, Simple IO are welcome as well.

    Program you used is not ROBUST enough. Better you start from scratch.
    dont worry newbie, we got you covered.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts