Results 1 to 6 of 6
  1. #1
    jeramy is offline Member
    Join Date
    Jan 2012
    Posts
    3
    Rep Power
    0

    Default Regular expressions - extracting paragraphs

    Hi there. First post here. I was hoping some one could help me with a tiny problem.

    I'm trying to extract a paragraph from a string of text. After hours of learning about regex and trying different combinations, I finally got close to what I wanted. Just a small problem.

    For the sake of simplicity, I'm defining a paragraph as a line or lines of text followed by a blank line or EOF. The problem I'm having with my expression is that it returns \r\n which would be ok if I were concerned with the letters, but I'm really only concerned with the index's of the first and last letter/punctuation of the paragraph.
    I know I could subtract 2 from the ending index but that seems rather hackish.

    This is what I have so far - if I'm going about this all wrong then please let me know.
    ((\r\n)?+.+)+\r\n\r\n

    match = "foobar\\r\\n\\r\\n";

    I only want "foobar" not "foobar\r\n\r\n" but I get the latter. How can I resolve this? I tried back referencing it but I apparantly don't know what I'm doing because every time I tried the program hung (infinite loop?)


    P.S. I know there a different line terminators on different platforms, but let's keep it simple and only use \r\n for now ;)

  2. #2
    DarrylBurke's Avatar
    DarrylBurke is offline Forum Police
    Join Date
    Sep 2008
    Location
    Madgaon, Goa, India
    Posts
    11,457
    Rep Power
    20

    Default Re: Regular expressions - extracting paragraphs

    ... from a string of text.
    You've left out one vital piece of information. Where does the "string of text" come from? Reading a file? getText() of a JTextComponent? Database field?

    match = "foobar\\r\\n\\r\\n";
    That String doesn't contain any carriage returns nor linefeeds.

    A more clear description will get you more targeted advice, but in the meantime you might want to read up on non-capturing groups. And to get better help sooner, post a SSCCE (Short, Self Contained, Compilable and Executable) example that demonstrates the problem, a class with a main(...) method that members here can copy and run to see where it's going wrong.

    db
    If you're forever cleaning cobwebs, it's time to get rid of the spiders.

  3. #3
    eRaaaa is offline Senior Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    787
    Rep Power
    6

    Default Re: Regular expressions - extracting paragraphs

    Do you mean
    match = "foobar\r\n\r\n";
    ??
    In your example, why do you not replace/remove these carriage return and new lines? match = match.replaceAll("\\s", ""); ?? (\s = [ \t\n\x0B\f\r] or write only [\n\r])
    If you really mean match = "foobar\\r\\n\\r\\n"; -> match = match.replaceAll(Pattern.quote("\\r\\n"), "");

  4. #4
    jeramy is offline Member
    Join Date
    Jan 2012
    Posts
    3
    Rep Power
    0

    Default Re: Regular expressions - extracting paragraphs

    Quote Originally Posted by DarrylBurke View Post
    You've left out one vital piece of information. Where does the "string of text" come from? Reading a file? getText() of a JTextComponent? Database field?


    That String doesn't contain any carriage returns nor linefeeds.

    A more clear description will get you more targeted advice, but in the meantime you might want to read up on non-capturing groups. And to get better help sooner, post a SSCCE (Short, Self Contained, Compilable and Executable) example that demonstrates the problem, a class with a main(...) method that members here can copy and run to see where it's going wrong.

    db
    Thank you!! (non-capturing groups)

    Solved: ((\r\n)?+.+)+(?=\r\n\r\n) //although I have a feeling there's a more elegant way

    But you bring up another question - When you asked about where the string came from, what difference does it make? I have been testing this reading it in from a file stream one character at a time and appending them to a string. But I eventually plan to integrate my regex's with RTF documents and I have been kind of worried that once I get to that stage if everything will work the way I planned.

    Btw: I used to program c++ a long time ago but haven't programmed anything in about 6 years. I've only been learning java for about 4 days. I'm sort of learning on the fly as I work on my project. It's not just regex that I'm unfamiliar with.

  5. #5
    jeramy is offline Member
    Join Date
    Jan 2012
    Posts
    3
    Rep Power
    0

    Default Re: Regular expressions - extracting paragraphs

    Quote Originally Posted by eRaaaa View Post
    In your example, why do you not replace/remove these carriage return and new lines? match = match.replaceAll("\\s", ""); ?? (\s = [ \t\n\x0B\f\r] or write only [\n\r])
    Because I wanted to simplify it. I tried a million different things. I'm not familiar with regex but I knew for sure \r\n\r\n would match.
    Also I didn't want to replace because I want the index's, not the characters. I was afraid replacing would mess up the indexs.
    Last edited by jeramy; 01-19-2012 at 11:36 AM.

  6. #6
    DarrylBurke's Avatar
    DarrylBurke is offline Forum Police
    Join Date
    Sep 2008
    Location
    Madgaon, Goa, India
    Posts
    11,457
    Rep Power
    20

    Default Re: Regular expressions - extracting paragraphs

    When you asked about where the string came from, what difference does it make? I have been testing this reading it in from a file stream one character at a time and appending them to a string.
    It makes a lot of difference. The backslash is an escape character in String literals. Not in Strings read in from a file.

    A line in a file
    Java Code:
    FirstLine\r\nSecond Line
    is equivalent to the String literal
    Java Code:
    "FirstLine\\r\\nSecond Line
    and contains no carriage return nor linefeed. The String literal
    Java Code:
    FirstLine\r\nSecond Line
    would be one possibility* when read from a file content of
    Java Code:
    FirstLine
    Second Line
    *(depending on how the file was created/saved)

    db
    If you're forever cleaning cobwebs, it's time to get rid of the spiders.

Similar Threads

  1. Regular expressions
    By freelancer in forum New To Java
    Replies: 1
    Last Post: 11-25-2011, 01:41 PM
  2. Regular expressions (REGEX)
    By kovitch in forum Advanced Java
    Replies: 1
    Last Post: 09-23-2011, 01:57 PM
  3. Regular Expressions Help
    By Death Sickle in forum New To Java
    Replies: 4
    Last Post: 04-04-2011, 05:21 AM
  4. regular expressions
    By sozeee in forum New To Java
    Replies: 3
    Last Post: 12-06-2010, 10:58 PM
  5. Regular Expressions in java
    By blue404 in forum Advanced Java
    Replies: 2
    Last Post: 09-26-2008, 04:43 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •