Results 1 to 3 of 3
  1. #1
    locke19 is offline Member
    Join Date
    Feb 2010
    Posts
    2
    Rep Power
    0

    Default webcrawling and harvesting

    I do not really understand much of html and I need to write a program that goes through a web page and finds all of the links to other http sites on it. How would I approach such a problem?

  2. #2
    gcalvin is offline Senior Member
    Join Date
    Mar 2010
    Posts
    953
    Rep Power
    5

    Default

    You need to learn about HTML. :) Hint: the particular HTML tag that you are interested in looks like:

    Java Code:
       <a href="http://www.somesite.tld/path/file.html">here is the link text</a>
    -Gary-

  3. #3
    Eranga's Avatar
    Eranga is offline Moderator
    Join Date
    Jul 2007
    Location
    Colombo, Sri Lanka
    Posts
    11,372
    Blog Entries
    1
    Rep Power
    19

    Default

    To find the link text, you can use a regular expressions with your own string parser. My suggestion is if you really doesn't know about file reading and stuff, white a small application first of all.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •