Results 1 to 3 of 3
- 03-11-2009, 03:06 PM #1
Member
- Join Date
- Mar 2009
- Posts
- 2
- Rep Power
- 0
Can't make regex ignore line terminator - fixed
I've gotten this problem fixed in-house. Thanks for looking.
Hi all,
I'm not new to Java, but I am new to this community and I didn't see a more appropriate forum to post this question. I'm trying to break apart a block of HTML code by extracting the text between "</table>" and "<br /><br />". In the middle of this block are several instances of "<br />\x0d\x0a<br />" , the two break tags separated by a carriage return/line feed pair. I'm using the regex
"</table>(.*?)<br /><br"
which always matches the first instance of "<br />\x0d\x0a<br />" . I'm running Eclipse 3.4, and I've turned on the Pattern.DOTALL and Pattern.CASE_INSENSITIVE flags. I tried turning on PATTERN.MULTILINE, which, as expected, didn't help. I tried explicitly testing for the line terminator with
"</table>(.*?)<br />[^\\x0d\\x0a]*?<br"
The added regex terms have no effect - I get exactly the same match as I do without them. If I remove the *? quantifier, the regex fails. If I quote the "\\x0d\\x0a", the regex fails.
I've hexdumped the HTML code and I can see that the CR and LF are there, but the regex seems blind to them. Any advice would be really appreciated. Thanks,
Rich StillmanLast edited by mobilityguy; 03-12-2009 at 03:34 PM. Reason: Problem fixed locally
- 03-11-2009, 04:16 PM #2
Senior Member
- Join Date
- Jun 2008
- Posts
- 2,366
- Rep Power
- 7
Post a sample of the text and the expected result.
- 03-11-2009, 04:27 PM #3
Member
- Join Date
- Mar 2009
- Posts
- 2
- Rep Power
- 0
The text I'm searching is:
<td style="vertical-align: top;"><span class="title">Miss Saigon</span><br /><br /><table cellspacing="0" cellpadding="0" style="width: 98%"><tr><td><b>May 01 - May 16</b></td><td align="right"><b>Salt Lake City</b></td></tr></table>In a war-torn land, two lovers find each other, lose each other. And find each other again one last time.<br />
<br />
By the creators of Les Misérables, this epic musical of star-crossed lovers set in the closing days of the Vietnam War has moved and thrilled audiences around the world. <br />
<br />
Contains occasional strong language and mature themes.<br /><br />Phone: 801-581-6961<br /><br />Event Hours: M-Th 7:30pm, F 8:00pm, Sa 2:00 & 8:00pm<br /><br />Admission: $30-49<br /><br />Location: Pioneer Theatre, University of Utah<br /><br />Address: 300 S. 1400 E., Salt Lake City<br /><br />Web Site: <
I'm trying to match the "<br /><br />" after "mature themes.". Instead, I match the "<br />\x0d\x0a<br />" after "one last time."
Thanks for looking.
Similar Threads
-
*TEST* --ignore this--
By angryboy in forum Reviews / AdvertisingReplies: 5Last Post: 05-01-2009, 08:15 AM -
How to make Scanner read the same line
By mcollins in forum New To JavaReplies: 2Last Post: 03-03-2009, 06:41 AM -
Ignore Symbol
By Xystus777 in forum New To JavaReplies: 2Last Post: 01-21-2009, 12:24 AM -
Make a text in Jlabel down to next Line
By hungleon88 in forum AWT / SwingReplies: 2Last Post: 12-01-2008, 11:10 PM


LinkBack URL
About LinkBacks
Reply With Quote
Bookmarks