Results 1 to 6 of 6
- 01-19-2012, 04:13 AM #1
Member
- Join Date
- Jan 2012
- Posts
- 3
- Rep Power
- 0
Regular expressions - extracting paragraphs
Hi there. First post here. I was hoping some one could help me with a tiny problem.
I'm trying to extract a paragraph from a string of text. After hours of learning about regex and trying different combinations, I finally got close to what I wanted. Just a small problem.
For the sake of simplicity, I'm defining a paragraph as a line or lines of text followed by a blank line or EOF. The problem I'm having with my expression is that it returns \r\n which would be ok if I were concerned with the letters, but I'm really only concerned with the index's of the first and last letter/punctuation of the paragraph.
I know I could subtract 2 from the ending index but that seems rather hackish.
This is what I have so far - if I'm going about this all wrong then please let me know.
((\r\n)?+.+)+\r\n\r\n
match = "foobar\\r\\n\\r\\n";
I only want "foobar" not "foobar\r\n\r\n" but I get the latter. How can I resolve this? I tried back referencing it but I apparantly don't know what I'm doing because every time I tried the program hung (infinite loop?)
P.S. I know there a different line terminators on different platforms, but let's keep it simple and only use \r\n for now ;)
- 01-19-2012, 08:18 AM #2
Re: Regular expressions - extracting paragraphs
You've left out one vital piece of information. Where does the "string of text" come from? Reading a file? getText() of a JTextComponent? Database field?... from a string of text.
That String doesn't contain any carriage returns nor linefeeds.match = "foobar\\r\\n\\r\\n";
A more clear description will get you more targeted advice, but in the meantime you might want to read up on non-capturing groups. And to get better help sooner, post a SSCCE (Short, Self Contained, Compilable and Executable) example that demonstrates the problem, a class with a main(...) method that members here can copy and run to see where it's going wrong.
dbWhy do they call it rush hour when nothing moves? - Robin Williams
- 01-19-2012, 09:13 AM #3
Senior Member
- Join Date
- Oct 2010
- Location
- Germany
- Posts
- 780
- Rep Power
- 4
Re: Regular expressions - extracting paragraphs
Do you mean
match = "foobar\r\n\r\n";
??
In your example, why do you not replace/remove these carriage return and new lines? match = match.replaceAll("\\s", ""); ?? (\s = [ \t\n\x0B\f\r] or write only [\n\r])
If you really mean match = "foobar\\r\\n\\r\\n"; -> match = match.replaceAll(Pattern.quote("\\r\\n"), "");
- 01-19-2012, 10:26 AM #4
Member
- Join Date
- Jan 2012
- Posts
- 3
- Rep Power
- 0
Re: Regular expressions - extracting paragraphs
Thank you!! (non-capturing groups)
Solved: ((\r\n)?+.+)+(?=\r\n\r\n) //although I have a feeling there's a more elegant way
But you bring up another question - When you asked about where the string came from, what difference does it make? I have been testing this reading it in from a file stream one character at a time and appending them to a string. But I eventually plan to integrate my regex's with RTF documents and I have been kind of worried that once I get to that stage if everything will work the way I planned.
Btw: I used to program c++ a long time ago but haven't programmed anything in about 6 years. I've only been learning java for about 4 days. I'm sort of learning on the fly as I work on my project. It's not just regex that I'm unfamiliar with.
- 01-19-2012, 10:33 AM #5
Member
- Join Date
- Jan 2012
- Posts
- 3
- Rep Power
- 0
Re: Regular expressions - extracting paragraphs
Last edited by jeramy; 01-19-2012 at 10:36 AM.
- 01-20-2012, 06:52 AM #6
Re: Regular expressions - extracting paragraphs
It makes a lot of difference. The backslash is an escape character in String literals. Not in Strings read in from a file.When you asked about where the string came from, what difference does it make? I have been testing this reading it in from a file stream one character at a time and appending them to a string.
A line in a fileis equivalent to the String literalJava Code:FirstLine\r\nSecond Line
and contains no carriage return nor linefeed. The String literalJava Code:"FirstLine\\r\\nSecond Line
would be one possibility* when read from a file content ofJava Code:FirstLine\r\nSecond Line
*(depending on how the file was created/saved)Java Code:FirstLine Second Line
dbWhy do they call it rush hour when nothing moves? - Robin Williams
Similar Threads
-
Regular expressions
By freelancer in forum New To JavaReplies: 1Last Post: 11-25-2011, 12:41 PM -
Regular expressions (REGEX)
By kovitch in forum Advanced JavaReplies: 1Last Post: 09-23-2011, 12:57 PM -
Regular Expressions Help
By Death Sickle in forum New To JavaReplies: 4Last Post: 04-04-2011, 04:21 AM -
regular expressions
By sozeee in forum New To JavaReplies: 3Last Post: 12-06-2010, 09:58 PM -
Regular Expressions in java
By blue404 in forum Advanced JavaReplies: 2Last Post: 09-26-2008, 03:43 AM


LinkBack URL
About LinkBacks
Reply With Quote

Bookmarks