Java and Web Crawlers...
I was looking into how to build a web crawler with java and came across this old tutorial: Writing a Web Crawler in the Java Programming Language
After a few changes I got it to compile and run... but it doesn't work. Can anyone help?
Basically my intention is to use this as a foundation for a little project I have going. Eventually I want to modify it so that it will just crawl one website defined in the code and look for a set of qualties which i can define rather than for URLs. Any suggestions for going about this?
I went through that code. Elaborate a bit more on how you want your web crawler to be.
Things to consider will be.
1. Java networking concept
2. HTTP Protocol and HTML
3. Text Parsing (Remember how he extracted <a href ..
Commons HTTP/HTTP Component, Regex, Simple IO are welcome as well.
Program you used is not ROBUST enough. Better you start from scratch.