Results 1 to 8 of 8
  1. #1
    africanhacker is offline Senior Member
    Join Date
    Feb 2011
    Posts
    107
    Rep Power
    0

    Default String operations or regex please

    Java Code:
    <h1 class="headline1"><a href="article/2011-03-31-obama-saga-sucks-in-senior-democrats">
    Obama in serious trouble
    </a></h1>
    I am trying to parse a page which has headlines of this sort on it. Now my objective is to isolate all instance of links that are found in the h1 tag of class headline

    Java Code:
    article/2011-03-31-obama-saga-sucks-in-senior-democrats
    and the concatenate it with http://www.website.com/ so I end up with:

    Java Code:
     http://www.website.com/article/2011-03-31-obama-saga-sucks-in-senior-democrats
    I will then use this url to get the content of article in question.

    There are perhaps 15 links with the structure described above. How get these links an ignore all the other HTML on the page. Help please :(

  2. #2
    eRaaaa is offline Senior Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    787
    Rep Power
    6

    Default

    You could use the Scanner class.
    The following is not tested (and works only if the links are exactly like you have said)
    Try something like
    Java Code:
    		Scanner sc = new Scanner(new URL("HERE YOUR URL").openStream());
    		while (sc.findWithinHorizon("<h1 class=\"headline1\"><a href=\"(.+?)\">", 0) != null) {
    			System.out.println("http://www.website.com/" + sc.match().group(1)); //do anything with the string :)
    		}

  3. #3
    africanhacker is offline Senior Member
    Join Date
    Feb 2011
    Posts
    107
    Rep Power
    0

    Default

    Let me test this, thanks for spending your time on this

  4. #4
    ozzyman's Avatar
    ozzyman is offline Senior Member
    Join Date
    Mar 2011
    Location
    London, UK
    Posts
    797
    Blog Entries
    2
    Rep Power
    4

    Default

    eRaaaa, i'm not so good with regex, but is it possible to replace this part:
    <h1 class=\"headline1\">
    with this:
    <h1*>
    to eliminate the need for a particular CSS class?

  5. #5
    eRaaaa is offline Senior Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    787
    Rep Power
    6

    Default

    You mean <h1 .*?> or?
    Yes its possible, but I thought he wants only extract the links of class headline1

    "Now my objective is to isolate all instance of links that are found in the h1 tag of class headline "
    Maybe he means <h1 class=\"headline.*?\"> too ?! :)

  6. #6
    ozzyman's Avatar
    ozzyman is offline Senior Member
    Join Date
    Mar 2011
    Location
    London, UK
    Posts
    797
    Blog Entries
    2
    Rep Power
    4

    Default

    no i think he only wanted the class headline like you originally posted, but i was just curious myself, thanks.

    what i meant was, so that the program would recognize all forms of H1

    <h1>
    <h1 class="head1">
    <h1 class="head2">
    <h1 color=red class="otherHead">

    i thought that * indicates any character or no character,
    but i guess thats not correct

  7. #7
    africanhacker is offline Senior Member
    Join Date
    Feb 2011
    Posts
    107
    Rep Power
    0

    Default

    Java Code:
    <h1 class="headline1">
    I want to do this operation on links within all such tags.

  8. #8
    eRaaaa is offline Senior Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    787
    Rep Power
    6

    Default

    Can you give us the URL or is it secret? =)
    And what`s your problem now? My example isn`t working? :p

Similar Threads

  1. parse simple string with regex?
    By zardos in forum New To Java
    Replies: 1
    Last Post: 03-01-2011, 01:14 PM
  2. Using regex to replace characters in a string
    By DC200 in forum New To Java
    Replies: 7
    Last Post: 10-13-2010, 03:35 PM
  3. breaking up a string, a regex problem!!
    By A.n.H in forum Advanced Java
    Replies: 7
    Last Post: 05-18-2010, 03:39 AM
  4. breaking up a string, a regex problem!!
    By A.n.H in forum Advanced Java
    Replies: 0
    Last Post: 05-17-2010, 04:03 PM
  5. String operations..
    By sireesha in forum New To Java
    Replies: 4
    Last Post: 12-14-2007, 03:04 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •