Results 1 to 3 of 3
  1. #1
    mobilityguy is offline Member
    Join Date
    Mar 2009
    Posts
    2
    Rep Power
    0

    Default Can't make regex ignore line terminator - fixed

    I've gotten this problem fixed in-house. Thanks for looking.

    Hi all,
    I'm not new to Java, but I am new to this community and I didn't see a more appropriate forum to post this question. I'm trying to break apart a block of HTML code by extracting the text between "</table>" and "<br /><br />". In the middle of this block are several instances of "<br />\x0d\x0a<br />" , the two break tags separated by a carriage return/line feed pair. I'm using the regex

    "</table>(.*?)<br /><br"

    which always matches the first instance of "<br />\x0d\x0a<br />" . I'm running Eclipse 3.4, and I've turned on the Pattern.DOTALL and Pattern.CASE_INSENSITIVE flags. I tried turning on PATTERN.MULTILINE, which, as expected, didn't help. I tried explicitly testing for the line terminator with

    "</table>(.*?)<br />[^\\x0d\\x0a]*?<br"

    The added regex terms have no effect - I get exactly the same match as I do without them. If I remove the *? quantifier, the regex fails. If I quote the "\\x0d\\x0a", the regex fails.

    I've hexdumped the HTML code and I can see that the CR and LF are there, but the regex seems blind to them. Any advice would be really appreciated. Thanks,
    Rich Stillman
    Last edited by mobilityguy; 03-12-2009 at 04:34 PM. Reason: Problem fixed locally

  2. #2
    masijade is offline Senior Member
    Join Date
    Jun 2008
    Posts
    2,571
    Rep Power
    9

    Default

    Post a sample of the text and the expected result.

  3. #3
    mobilityguy is offline Member
    Join Date
    Mar 2009
    Posts
    2
    Rep Power
    0

    Default

    The text I'm searching is:

    <td style="vertical-align: top;"><span class="title">Miss Saigon</span><br /><br /><table cellspacing="0" cellpadding="0" style="width: 98%"><tr><td><b>May 01 - May 16</b></td><td align="right"><b>Salt Lake City</b></td></tr></table>In a war-torn land, two lovers find each other, lose each other. And find each other again one last time.<br />
    <br />
    By the creators of Les Misérables, this epic musical of star-crossed lovers set in the closing days of the Vietnam War has moved and thrilled audiences around the world. <br />
    <br />

    Contains occasional strong language and mature themes.<br /><br />Phone: 801-581-6961<br /><br />Event Hours: M-Th 7:30pm, F 8:00pm, Sa 2:00 & 8:00pm<br /><br />Admission: $30-49<br /><br />Location: Pioneer Theatre, University of Utah<br /><br />Address: 300 S. 1400 E., Salt Lake City<br /><br />Web Site: <


    I'm trying to match the "<br /><br />" after "mature themes.". Instead, I match the "<br />\x0d\x0a<br />" after "one last time."

    Thanks for looking.

Similar Threads

  1. *TEST* --ignore this--
    By angryboy in forum Reviews / Advertising
    Replies: 5
    Last Post: 05-01-2009, 09:15 AM
  2. How to make Scanner read the same line
    By mcollins in forum New To Java
    Replies: 2
    Last Post: 03-03-2009, 07:41 AM
  3. Ignore Symbol
    By Xystus777 in forum New To Java
    Replies: 2
    Last Post: 01-21-2009, 01:24 AM
  4. Make a text in Jlabel down to next Line
    By hungleon88 in forum AWT / Swing
    Replies: 2
    Last Post: 12-02-2008, 12:10 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •