Results 1 to 7 of 7
  1. #1
    pietertje is offline Member
    Join Date
    Jul 2010
    Posts
    10
    Rep Power
    0

    Default [Java] getting links from website source

    So I'm writing this program in java which will connect to a website, get it's source and search it for links (<a href="link" target="_blank">)

    Now in php I would just do
    PHP Code:
    <?php
    preg_match_all("/<a href=\"(.*?)\" target=\"_blank\">/", $source, $matches);
    ?>
    Which would put all the links neatly in $matches[1] in array form. I have however no idea how to do this in java, I know I have to use java.util.regex.Matcher and java.util.regex.Pattern, but how to go from there has left me stumped.

    Could anyone help me out here?

    Thanks.

  2. #2
    masijade is offline Senior Member
    Join Date
    Jun 2008
    Posts
    2,571
    Rep Power
    9

    Default

    Google Java HTML Parser, then download one, then parse the source, the pull the links from the parsed structure.

    If you fell you must use regex, then simply use regex, but, of course, be warned that regex will not catch all of them, and the one you've posted above will, actually, only catch a select few (even in php). Use the parser and catch all of them (except, maybe, some of those that are "constructed" in JavaScript "dynamically" (God I hate that word)).

  3. #3
    Eranga's Avatar
    Eranga is offline Moderator
    Join Date
    Jul 2007
    Location
    Colombo, Sri Lanka
    Posts
    11,372
    Blog Entries
    1
    Rep Power
    20

    Default

    If your files are not considerably larger, then I use regex with loading the complete content at once, and process line by line.

  4. #4
    pietertje is offline Member
    Join Date
    Jul 2010
    Posts
    10
    Rep Power
    0

    Default

    Quote Originally Posted by masijade View Post
    Google Java HTML Parser, then download one, then parse the source, the pull the links from the parsed structure.

    If you fell you must use regex, then simply use regex, but, of course, be warned that regex will not catch all of them, and the one you've posted above will, actually, only catch a select few (even in php). Use the parser and catch all of them (except, maybe, some of those that are "constructed" in JavaScript "dynamically" (God I hate that word)).
    Well yes I know, that's the point I only need the links that have specific characteristics.

    I'm making a 4chan image downloader you see, so I need all the image links from a specific thread. I hope that makes my problem more clear.

  5. #5
    masijade is offline Senior Member
    Join Date
    Jun 2008
    Posts
    2,571
    Rep Power
    9

    Default

    "links from a specific thread" make no sense, as pertains to HTML.

    In any case, see the API docs Pattern and Matcher classes (and at least one of them will contain a link to Sun's tutorials for them, somewhere in it).

    It is still IMHO, the wrong way to go, but, hey, to each his own.

  6. #6
    JosAH's Avatar
    JosAH is offline Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,658
    Blog Entries
    7
    Rep Power
    21

    Default

    Quote Originally Posted by masijade View Post
    (except, maybe, some of those that are "constructed" in JavaScript "dynamically" (God I hate that word)).
    I just wanted to say that it's dynamically good weather overhere and my liquor store owner has dynamically delivered a crate of Grosch beer; in a moment I'm going to dynamically open one because I'm dynamically thirsty.

    kindest regards,

    Jos (<--- very dynamic ;-)

  7. #7
    masijade is offline Senior Member
    Join Date
    Jun 2008
    Posts
    2,571
    Rep Power
    9

Similar Threads

  1. python source to java source converter?
    By conor147 in forum New To Java
    Replies: 0
    Last Post: 01-15-2010, 05:14 AM
  2. Issues with a particular website (Java + forms)
    By aromes in forum Advanced Java
    Replies: 1
    Last Post: 04-13-2009, 03:16 AM
  3. Replies: 1
    Last Post: 11-28-2008, 06:27 PM
  4. How to download website (Get all link in website)
    By finalmem in forum Advanced Java
    Replies: 0
    Last Post: 11-12-2008, 08:43 AM
  5. Kode Java Website
    By wsaryada in forum Reviews / Advertising
    Replies: 4
    Last Post: 01-18-2008, 09:16 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •