Results 1 to 13 of 13
  1. #1
    RR_QQ is offline Member
    Join Date
    Sep 2008
    Posts
    25
    Rep Power
    0

    Default Regular expression tips or resources

    Hello! I'm having some issues implementing the appropriate regex patter to eliminate unwanted characters from a string.

    Here is a sample string:
    Java Code:
    String str = "test-hello. me  please3, _dog[ -()";
    What I need to do is create a regex that will take out all non word characters except for spaces AND strings of type "test-hello" - in other words any string that has word or digit characters followed by a "-" (hyphen) followed by more word or digit characters - this is what is giving me trouble.

    What I have right now which is perfect except for the "test-hello" type strings is:
    Java Code:
    Pattern replace = Pattern.compile("([^\\w ])");
    Matcher matcher = replace.matcher(str);
    str = matcher.replaceAll("");
    The above code removes all non word or digit characters except for spaces but it will obviously still remove the "-" in "test-hello".

    I'm not looking for the whole answer of course but any tips or resources I can be pointed to would be greatly appreciated. I'm not familiar with the regex syntax in Java as I am in say PHP.

    Thank you!

  2. #2
    masijade is offline Senior Member
    Join Date
    Jun 2008
    Posts
    2,571
    Rep Power
    8

    Default

    There is not that much difference. See the API docs for Pattern and remember that any where you would use a single "\" in other languages you use a double "\\" in Java. That's pretty much the extent of it. The other main difference is that look ahead and behind patterns can not be continuous. I.E. They have to have a defined limit of not more than 256 characters (I believe that's the limit anyway, haven't done too many of those recently).

  3. #3
    RR_QQ is offline Member
    Join Date
    Sep 2008
    Posts
    25
    Rep Power
    0

    Default

    Hmm..k thank you. I read some of the docs. I need to find real examples of advanced patters to understand it.

  4. #4
    Fubarable's Avatar
    Fubarable is offline Moderator
    Join Date
    Jun 2008
    Posts
    19,316
    Blog Entries
    1
    Rep Power
    25

  5. #5
    masijade is offline Senior Member
    Join Date
    Jun 2008
    Posts
    2,571
    Rep Power
    8

    Default

    You claimed (indirectly) to "know" regex in PHP. There is no large difference, so where's the problem? Design the regex in PHP and and change all "\" to "\\" and it will probably work. Google for a few regex tutorials, there are a large number of them out there. Regex is, for the most part, regex. There are some differences, but, usually no real "show-stoppers".
    Last edited by masijade; 09-19-2009 at 07:22 AM.

  6. #6
    CodesAway's Avatar
    CodesAway is offline Senior Member
    Join Date
    Sep 2009
    Location
    Texas
    Posts
    238
    Rep Power
    5

    Default

    So, it sounds like you want to remove all non-word characters (so not '_', since \w matches it, right?). However, you only want to remove hyphens, if they are not part of a word.

    So, you have two parts, the non-word (other than spaces) and the hyphen, so take them in two parts. For the first part, ignore '-', and in the second part, specify what you want to do with the hyphen. The regex below only removes a '-' if a non-word character follows, ex. "a-]". You can change this behavior as needed.

    Java Code:
    String str = "test-hello. me  please3, _dog[ -()";
    Pattern replace = Pattern.compile("([^\\w -]|-\\B)");
    Matcher matcher = replace.matcher(str);
    str = matcher.replaceAll("");
    
    // test-hello me please3 _dog
    System.out.println(str);

    Also, it would help immensely if you could tell up what you expect the result to be, so that a good regex can be developed.

    If you have any further questions, don't hesitate to ask.
    CodesAway - codesaway.info
    writing tools that make writing code a little easier

  7. #7
    RR_QQ is offline Member
    Join Date
    Sep 2008
    Posts
    25
    Rep Power
    0

    Default

    CodesAway thank you so much! The syntax is a little different from PHP. I was able to expand your regex a bit to make it just as I wanted it:

    Java Code:
    Pattern replace = Pattern.compile("([^\\w -]|-\\B|\\B-|_)");
    Thank you!

  8. #8
    masijade is offline Senior Member
    Join Date
    Jun 2008
    Posts
    2,571
    Rep Power
    8

    Default

    Ah, so someone took pity on you and did it for you (which is really what you were after). Any differences in the syntax are clearly spelled out in the API docs for Pattern, and if you "knew" regex in PHP, as you claimed, and were able to do this particular expression in PHP, then reading that doc (especially with the comments on the differences given earlier) you would have had no problem doing this. You weren't interested in that, though. You simply wanted someone to do it for you.

  9. #9
    CodesAway's Avatar
    CodesAway is offline Senior Member
    Join Date
    Sep 2009
    Location
    Texas
    Posts
    238
    Rep Power
    5

    Default

    Come on, seriously, if they wanted someone to do it for them, they wouldn't have provided their own example regular expression.

    The problem that RR_QQ was having is that they didn't think to handle the '-' as a separate case. If you check my submission versus their original one, I only changed one small detail, I ignored the '-' in the first branch, and handled it in the second.

    From there, they modified the regular expression I posted to fit their needs.

    Additionally, they must have some level of regex understanding, otherwise

    1) They couldn't have provided such a good example for me to use
    2) They know enough of Java regexes to write code that I could easily paste and test
    3) They easily modified my response to meet their needs

    So, what part of this means they didn't do their own research??
    CodesAway - codesaway.info
    writing tools that make writing code a little easier

  10. #10
    masijade is offline Senior Member
    Join Date
    Jun 2008
    Posts
    2,571
    Rep Power
    8

    Default

    And I would have had no problem with that, except that if you read his first post he strongly implies that in PHP he could do it, buit was completely unwilling to even try in Java? Come on, really, yourself. He had not a clue beyond the simplest types of regex and wanted someone to help him with regex itself, not with the difference between PHP and Java regex.

    He wanted to claim that he "new" regex, just not in Java "syntax" and I called him on it.

    Had he not made that "PHP" claim I would have gladly helped him with the regex itself, but he was, seemingly, too proud to admit that it was regex itself he didn't know. It was an attempt to at least get the OP to admit what the real problem was and not hide behind a "syntax" excuse.

  11. #11
    CodesAway's Avatar
    CodesAway is offline Senior Member
    Join Date
    Sep 2009
    Location
    Texas
    Posts
    238
    Rep Power
    5

    Default

    So, you thought he was being too proud, so you didn't help him... The world works in very mysterious ways. However, I do see your point.

    But, what if it wasn't pride, what if he was forthcoming? Wouldn't that leave him confused? I mean, suppose that he does have experience in regexes, but just had trouble with Java syntax. He would interpret your action as being hostile, when he only was asking for help. I can tell that you're not the kind of person who would want that.

    How would I know this, you may ask? Your read. By how you responded to my post, you showed your willingness to help, however, you didn't want to help someone who wasn't willing to help themselves.


    I'm naive, so I apologize in advance, but why not overlook his pride? Even if you thought his pride caused him not to ask the right question, why not help him anyway? I mean, I know I've had too much pride at times, but after being helped it "grounded me" and made me realize how foolish I was.

    That's why I posted. Because even if it was pride that prevented him from asking the right question, the fact that he got help, might have grounded him.
    CodesAway - codesaway.info
    writing tools that make writing code a little easier

  12. #12
    masijade is offline Senior Member
    Join Date
    Jun 2008
    Posts
    2,571
    Rep Power
    8

    Default

    I was doing the same thing, just the other way around. He got pointers in the differences (since that is what he claimed not to know) and was told where to look to get more info. If the syntax was what was giving him problems that should have been enough. I was trying to force him to admit that it was the regex, and not the syntax that was giving him the problem. In all likely hood he was probably telling himself thast everything would be oky if he were using "X", and so was ignoring the "real" problem and I was trying (although in an @sshole-like manner) to get him to admit, to himself what the problem was. If he is not able to admit to himself what the real problem is, he'll never be able to fix it himself. And that will apply to all future problems as well.

  13. #13
    RR_QQ is offline Member
    Join Date
    Sep 2008
    Posts
    25
    Rep Power
    0

    Default

    My last post before Codesaway replied says this:

    "Hmm..k thank you. I read some of the docs. I need to find real examples of advanced patters to understand it."

    Pretty clear, I NEED TO FIND REAL EXAMPLES OF ADVANCED PATTER[N]S TO UNDERSTAND IT.

    Let me saythat again: "I NEED TO FIND". Not you, not Codesaway, not anyone. JUST ME.

    Codesaway deciced to reply out of his own volition (and thanks again for that) a FULL WEEK (7 DAYS) AFTER MY LAST POST. I don't believe I was desperately begging for help in those 7 days. I had already applied my own solution (which was couple of pattern statements in parts) and had forgotten about this until I got an email from the forum saying there was a reply. That's when I thanked Codesaway. It's pretty obvious I was trying to find the solution myself and to learn the correct syntax. No need to be so rude, arrogant, and gauche. Besides the post was about correct java regex syntax. Not about how incompetent and pitiful I am, I don't think. Let's keep to the topic and try not to scare people away from posting on this forum again.

Similar Threads

  1. Regular Expression Problem
    By daflores in forum Advanced Java
    Replies: 8
    Last Post: 02-10-2009, 06:45 PM
  2. Quantifiers in Regular Expression
    By cdpm in forum java.util
    Replies: 0
    Last Post: 12-24-2008, 01:03 PM
  3. regular expression for unicode
    By tharhan in forum Advanced Java
    Replies: 0
    Last Post: 04-01-2008, 10:53 PM
  4. Regular expression with Unions
    By Java Tip in forum Java Tip
    Replies: 0
    Last Post: 01-09-2008, 12:03 PM
  5. Regular expression with Intersections
    By Java Tip in forum Java Tip
    Replies: 0
    Last Post: 01-09-2008, 12:03 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •