Results 1 to 12 of 12
  1. #1
    Dan0100 is offline Member
    Join Date
    Aug 2010
    Posts
    18
    Rep Power
    0

    Default RegExp and UTF-8 Characters

    Hi everyone,
    I am relatively new to Java and RegExp.
    At the moment I am building a socket server where I use UTF-8 control characters (\u0000 - \u0007) for special messages.

    What I am looking for is a RegExp pattern to convert all of these to a $ symbol. My main problem is actually specifying the characters in the RegExp.

    This works:
    Java Code:
    myNewString = myString.replaceAll("\u0000", "$")
    But this doesn't work and it is what I need:
    Java Code:
    myNewString = myString.replaceAll("[\u0000-\u0007]", "$")
    Thanks for your help and reading,
    Dan

  2. #2
    Webuser is offline Senior Member
    Join Date
    Dec 2008
    Posts
    526
    Rep Power
    0

    Lightbulb

    Hello :)

    You can use code like a

    Java Code:
    char d;
    void gogo(int c)
      {
       int a=(int)'\u0000';
       int b=(int)'\u0007';
       
       if(c>=a && c<=b){d='$';}
       
    
      }
    Last edited by Webuser; 08-14-2010 at 02:34 AM.
    If my answer helped you. Please click my "REP" button and add a comment
    Have a Good Java Coding :)

  3. #3
    Norm's Avatar
    Norm is offline Moderator
    Join Date
    Jun 2008
    Location
    SW Missouri
    Posts
    17,321
    Rep Power
    25

    Default

    I get the following error with these lines of code:
    String myString = "First this \u0000 for testing";
    String myNewString = myString.replaceAll("\u0000", "$"); // line 31


    Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 1
    at java.lang.String.charAt(String.java:687)
    at java.util.regex.Matcher.appendReplacement(Matcher. java:711)
    at java.util.regex.Matcher.replaceAll(Matcher.java:81 3)
    at java.lang.String.replaceAll(String.java:2190)
    at TestStatement.main(TestStatement.java:31)

  4. #4
    Webuser is offline Senior Member
    Join Date
    Dec 2008
    Posts
    526
    Rep Power
    0

    Exclamation

    Or this one...

    Java Code:
    char gogo(int c)
      {
       int a=(int)'\u0000';
       int b=(int)'\u0007';
       char d=(char)c;
       if(c>=a || c<=b){d='$';}
    
       System.out.println(d);
       return d;
      }
    Last edited by Webuser; 08-14-2010 at 03:13 AM.
    If my answer helped you. Please click my "REP" button and add a comment
    Have a Good Java Coding :)

  5. #5
    DarrylBurke's Avatar
    DarrylBurke is offline Member
    Join Date
    Sep 2008
    Location
    Madgaon, Goa, India
    Posts
    11,197
    Rep Power
    19

    Default

    0.
    This works:
    Java Code:
    myNewString = myString.replaceAll("\u0000", "$")
    I don't think so. That code should produce the error Norm reported. Probably the code you posted isn't the code you ran. Don't do that -- it greatly reduces the chances of getting targeted help on a forum.

    1. The dollar sign is a metacharacter in the replacement String and needs to be quoted with a double backslash.

    2. There's no problem with the character class or the unicode characters.

    Java Code:
    myNewString = myString.replaceAll("[\u0000-\u0007]", "\\$")
    db

  6. #6
    JosAH's Avatar
    JosAH is offline Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,447
    Blog Entries
    7
    Rep Power
    20

    Default

    Quote Originally Posted by Webuser View Post
    Hello :)

    You can use code like a

    Java Code:
    char d;
    void gogo(int c)
      {
       int a=(int)'\u0000';
       int b=(int)'\u0007';
       
       if(c>=a && c<=b){d='$';}
       
    
      }
    Don't you people ever read the API documentation? The docs for the String.replaceAll( ... ) method clearly state:

    Quote Originally Posted by API
    Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string; see Matcher.replaceAll. Use Matcher.quoteReplacement(java.lang.String) to suppress the special meaning of these characters, if desired.
    So the dollar sign should be escaped, as in "\\$"

    kind regards,

    Jos

  7. #7
    Fubarable's Avatar
    Fubarable is offline Moderator
    Join Date
    Jun 2008
    Posts
    19,316
    Blog Entries
    1
    Rep Power
    26

    Default

    Don't the unicode escape sequences need to be escaped or double backslashes as well? i.e.,
    Java Code:
    public class UnicodeRegEx {
       public static void main(String[] args) {
          String test = "Hello World";
          
          System.out.println(test);
          
          String regex = "[\\u0061-\\u0079]";
          
          test = test.replaceAll(regex, "\\$");
          System.out.println(test);
       }
    }

  8. #8
    JosAH's Avatar
    JosAH is offline Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,447
    Blog Entries
    7
    Rep Power
    20

    Default

    Quote Originally Posted by Fubarable View Post
    Don't the unicode escape sequences need to be escaped or double backslashes as well? i.e.,
    Java Code:
    public class UnicodeRegEx {
       public static void main(String[] args) {
          String test = "Hello World";
          
          System.out.println(test);
          
          String regex = "[\\u0061-\\u0079]";
          
          test = test.replaceAll(regex, "\\$");
          System.out.println(test);
       }
    }
    Nope, those \uxxxx escape sequences are handled by javac, the Java compiler; the regexp compiler doesn't care about special characters in funny intervals, it just cares about its meta characters, so the regexp is:

    Java Code:
    String regex = "[\u0061-\u0079]";
    kind regards,

    Jos

  9. #9
    Fubarable's Avatar
    Fubarable is offline Moderator
    Join Date
    Jun 2008
    Posts
    19,316
    Blog Entries
    1
    Rep Power
    26

  10. #10
    Dan0100 is offline Member
    Join Date
    Aug 2010
    Posts
    18
    Rep Power
    0

    Default

    Thanks for all the replies everyone!
    Oh no I was so unlucky to choose the $ dollar symbol lol.

    Ok so all I needed was to add the \\$ instead of $ and it works perfectly now, thank you very much guys.

    Working code:
    Java Code:
    msg.replaceAll("[\u0000-\u0007]", "\\$");
    Norm + Darryl.Burke, I understand now that the code I ran SHOULDN'T have worked but for some reason it honestly DID work on my compiler.

    JosAH, I did read the documentation A LOT and I can't find anything about $ symbols here: String (Java 2 Platform SE v1.4.2))

    Thanks again everyone!

  11. #11
    JosAH's Avatar
    JosAH is offline Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,447
    Blog Entries
    7
    Rep Power
    20

    Default

    Quote Originally Posted by Dan0100 View Post
    JosAH, I did read the documentation A LOT and I can't find anything about $ symbols here: String (Java 2 Platform SE v1.4.2))
    I even copied/pasted the relevant parts of the docs for you (see reply #6). b.t.w. Java 1.4.2. is dead, you should update your jvm and the documentation.

    kind regards,

    Jos

  12. #12
    Dan0100 is offline Member
    Join Date
    Aug 2010
    Posts
    18
    Rep Power
    0

    Default

    Oh Ok. Java SE 6, sorry, but but google says... lol!
    Thanks.

Similar Threads

  1. Swapping Characters
    By besweeet in forum New To Java
    Replies: 8
    Last Post: 02-18-2010, 04:37 PM
  2. Need help with escape characters
    By jayjones149 in forum New To Java
    Replies: 1
    Last Post: 02-15-2010, 08:10 AM
  3. RegExp to remove tag from html file with exceptions
    By Daedalus in forum Advanced Java
    Replies: 3
    Last Post: 09-27-2008, 04:43 AM
  4. [SOLVED] help with RegExp
    By JT4NK3D in forum New To Java
    Replies: 5
    Last Post: 05-23-2008, 04:05 AM
  5. Getting all characters in a String
    By Alayna in forum New To Java
    Replies: 2
    Last Post: 05-20-2007, 11:49 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •