Results 1 to 7 of 7
- 07-01-2010, 12:34 PM #1
Member
- Join Date
- Jul 2010
- Posts
- 10
- Rep Power
- 0
[Java] getting links from website source
So I'm writing this program in java which will connect to a website, get it's source and search it for links (<a href="link" target="_blank">)
Now in php I would just do
Which would put all the links neatly in $matches[1] in array form. I have however no idea how to do this in java, I know I have to use java.util.regex.Matcher and java.util.regex.Pattern, but how to go from there has left me stumped.PHP Code:<?php preg_match_all("/<a href=\"(.*?)\" target=\"_blank\">/", $source, $matches); ?>
Could anyone help me out here?
Thanks.
- 07-01-2010, 01:17 PM #2
Senior Member
- Join Date
- Jun 2008
- Posts
- 2,366
- Rep Power
- 7
Google Java HTML Parser, then download one, then parse the source, the pull the links from the parsed structure.
If you fell you must use regex, then simply use regex, but, of course, be warned that regex will not catch all of them, and the one you've posted above will, actually, only catch a select few (even in php). Use the parser and catch all of them (except, maybe, some of those that are "constructed" in JavaScript "dynamically" (God I hate that word)).
- 07-01-2010, 01:39 PM #3
- Join Date
- Jul 2007
- Location
- Colombo, Sri Lanka
- Posts
- 11,374
- Blog Entries
- 1
- Rep Power
- 18
If your files are not considerably larger, then I use regex with loading the complete content at once, and process line by line.
- 07-01-2010, 02:10 PM #4
Member
- Join Date
- Jul 2010
- Posts
- 10
- Rep Power
- 0
- 07-01-2010, 02:13 PM #5
Senior Member
- Join Date
- Jun 2008
- Posts
- 2,366
- Rep Power
- 7
"links from a specific thread" make no sense, as pertains to HTML.
In any case, see the API docs Pattern and Matcher classes (and at least one of them will contain a link to Sun's tutorials for them, somewhere in it).
It is still IMHO, the wrong way to go, but, hey, to each his own.
- 07-01-2010, 03:27 PM #6
- Join Date
- Sep 2008
- Location
- Voorschoten, the Netherlands
- Posts
- 11,406
- Blog Entries
- 7
- Rep Power
- 17
- 07-01-2010, 09:36 PM #7
Senior Member
- Join Date
- Jun 2008
- Posts
- 2,366
- Rep Power
- 7
Similar Threads
-
python source to java source converter?
By conor147 in forum New To JavaReplies: 0Last Post: 01-15-2010, 05:14 AM -
Issues with a particular website (Java + forms)
By aromes in forum Advanced JavaReplies: 1Last Post: 04-13-2009, 03:16 AM -
MavenJava - browse source code of all open source projects online
By jirkacelak in forum Java SoftwareReplies: 1Last Post: 11-28-2008, 06:27 PM -
How to download website (Get all link in website)
By finalmem in forum Advanced JavaReplies: 0Last Post: 11-12-2008, 08:43 AM -
Kode Java Website
By wsaryada in forum Reviews / AdvertisingReplies: 4Last Post: 01-18-2008, 09:16 PM


LinkBack URL
About LinkBacks
Reply With Quote

Bookmarks