Results 1 to 16 of 16
  1. #1
    LeonLanford is offline Member
    Join Date
    Oct 2009
    Posts
    29
    Rep Power
    0

    Default JTextPane Unicode/UTF-8 problem?

    Hi, I'm got confused by JTextPane behavior, I already followed instructions from these links:

    Swing - unable to get chinese characters in JTextPane
    converting unicode to UTF-8

    But I still have the problem..

    My JTextPane can show japanese characters perfectly if I don't encode my string to base64, but if I encode it to base64 then decode it, the string will be displayed as garbled.

    Java Code:
    らã��☆ã�™ã�Ÿ ドラマCD :):d d das sd addaaa
    Here's an example of the base64 encoded string
    Java Code:
    44KJ44GN4piG44GZ44GfIOODieODqeODnkNEIDopOkQgZCBkYXMgc2QgYWRkYWFh
    It shows the result fine at base64 online decoder
    Base 64 Decoder

  2. #2
    JosAH's Avatar
    JosAH is online now Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,024
    Blog Entries
    7
    Rep Power
    20

    Default

    Quote Originally Posted by LeonLanford View Post
    Hi, I'm got confused by JTextPane behavior, I already followed instructions from these links:

    Swing - unable to get chinese characters in JTextPane
    converting unicode to UTF-8

    But I still have the problem..

    My JTextPane can show japanese characters perfectly if I don't encode my string to base64, but if I encode it to base64 then decode it, the string will be displayed as garbled.

    Java Code:
    らã��☆ã�™ã�Ÿ ドラマCD :):d d das sd addaaa
    Here's an example of the base64 encoded string
    Java Code:
    44KJ44GN4piG44GZ44GfIOODieODqeODnkNEIDopOkQgZCBkYXMgc2QgYWRkYWFh
    It shows the result fine at base64 online decoder
    Base 64 Decoder
    Well, then your base64 en/decoder is different from the other base64 en/decoder. You have to debug ...

    kind regards,

    Jos

  3. #3
    LeonLanford is offline Member
    Join Date
    Oct 2009
    Posts
    29
    Rep Power
    0

    Default

    Quote Originally Posted by JosAH View Post
    Well, then your base64 en/decoder is different from the other base64 en/decoder. You have to debug ...

    kind regards,

    Jos
    Different? I got the encoder from here..
    Base64: Public Domain Base64 Encoder/Decoder

    If it's different why it can show fine at the online decoder?

  4. #4
    Norm's Avatar
    Norm is offline Moderator
    Join Date
    Jun 2008
    Location
    Eastern Florida
    Posts
    16,620
    Rep Power
    23

    Default

    Can you make a short program that compiles and executes to demonstrate your problem?
    There are en/decoders for Strings that may be needed for your data for example.

    Have you tested the base64 en/decoder by feeding it a String with char values from 0 to say 32K?
    Encode the string and decode it and see if the output matches the input.
    Last edited by Norm; 08-14-2010 at 04:15 PM.

  5. #5
    LeonLanford is offline Member
    Join Date
    Oct 2009
    Posts
    29
    Rep Power
    0

    Default

    Quote Originally Posted by Norm View Post
    Can you make a short program that compiles and executes to demonstrate your problem?
    There are en/decoders for Strings that may be needed for your data for example.

    Have you tested the base64 en/decoder by feeding it a String with char values from 0 to say 32K?
    Encode the string and decode it and see if the output matches the input.
    I tested making SSCCE just now, I'm confused why it's working at SSCE and not in the long code. After few seconds thinking I figured out what I did, I change the default character encoding of eclipse(cp1252) to UTF-8. I don't know why changing it makes the problem solved, maybe someone can explain this to me?

    Thanks all :D
    Attached Thumbnails Attached Thumbnails JTextPane Unicode/UTF-8 problem?-sda.jpg  
    Last edited by LeonLanford; 08-15-2010 at 11:17 AM.

  6. #6
    JosAH's Avatar
    JosAH is online now Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,024
    Blog Entries
    7
    Rep Power
    20

    Default

    Quote Originally Posted by LeonLanford View Post
    I tested making SSCCE just now, I'm confused why it's working at SSCE and not in the long code. After few seconds thinking I figured out what I did, I change the default character encoding of eclipse(cp1252) to UTF-8. I don't know why changing it makes the problem solved, maybe someone can explain this to me?

    Thanks all :D
    When the world was simple and run by VAXen, a pointer was an int and an int was a word and all characters fitted in 8 bit bytes. English characters that is; they even fitted in 7 bits so there was room for 128 'foreign' characters. The 'code page' was born and many code pages saw the light because there are many more characters than just the English characters. We used encoding based on those code pages but still there were characters we could not all handle very well. Unicode saw the light and a very Anglo-American encoding came with it: utf-8; the English characters were encoded in one byte and those foreign characters used more than one byte per character.

    Now we had even more encoding schemes and operating systems used one of them as their 'default' encoding. That's the encoding inherited by Java but you can still select another encoding mechanism; that's what you did in Eclipse. Your files were encoded in utf-8 so you should decode them as such. I normally prefer to explicitly set the encoding scheme in my code (read the API documentation for e.g. the InputStreamReader) because Eclipse won't always be there to run my code and I don't want to use the operating system's default encoding.

    kind regards,

    Jos

  7. #7
    LeonLanford is offline Member
    Join Date
    Oct 2009
    Posts
    29
    Rep Power
    0

    Default

    Quote Originally Posted by JosAH View Post
    When the world was simple and run by VAXen, a pointer was an int and an int was a word and all characters fitted in 8 bit bytes. English characters that is; they even fitted in 7 bits so there was room for 128 'foreign' characters. The 'code page' was born and many code pages saw the light because there are many more characters than just the English characters. We used encoding based on those code pages but still there were characters we could not all handle very well. Unicode saw the light and a very Anglo-American encoding came with it: utf-8; the English characters were encoded in one byte and those foreign characters used more than one byte per character.

    Now we had even more encoding schemes and operating systems used one of them as their 'default' encoding. That's the encoding inherited by Java but you can still select another encoding mechanism; that's what you did in Eclipse. Your files were encoded in utf-8 so you should decode them as such. I normally prefer to explicitly set the encoding scheme in my code (read the API documentation for e.g. the InputStreamReader) because Eclipse won't always be there to run my code and I don't want to use the operating system's default encoding.

    kind regards,

    Jos
    Yes you're right.. if I open the jar outside eclipse, the unicode characters become error again, I'll search how to encode and decode them from unicode then :(

    Thanks for the explanation

  8. #8
    JosAH's Avatar
    JosAH is online now Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,024
    Blog Entries
    7
    Rep Power
    20

    Default

    Quote Originally Posted by LeonLanford View Post
    Yes you're right.. if I open the jar outside eclipse, the unicode characters become error again, I'll search how to encode and decode them from unicode then :(

    Thanks for the explanation
    Reread my previous reply: I hinted at the answer; use InputStreamReaders and OutputStreamWriters; you can specify the wanted encoding and decoding for those classes; the FileReaders etc. are just convenience classes.

    kind regards,

    Jos

  9. #9
    LeonLanford is offline Member
    Join Date
    Oct 2009
    Posts
    29
    Rep Power
    0

    Default

    Quote Originally Posted by JosAH View Post
    Reread my previous reply: I hinted at the answer; use InputStreamReaders and OutputStreamWriters; you can specify the wanted encoding and decoding for those classes; the FileReaders etc. are just convenience classes.

    kind regards,

    Jos
    Yes, I mean I was gonna search how to use the InputStream.. I never user it before.

    I don't know if it's alerady correct or not, what I do is get the string bytes, decode it, put into input stream, put into input stream reader, put into buffered reader, read the line one by one and return the string.

    Quite lot of work compared to just changing eclipse's setting. The decoded content also differs a bit from the original content, I think it's missing some space(maybe because of reading per line?), but it doesn't affects the decoded content much.

    Thanks it's already working now :D
    Last edited by LeonLanford; 08-15-2010 at 08:04 PM.

  10. #10
    Norm's Avatar
    Norm is offline Moderator
    Join Date
    Jun 2008
    Location
    Eastern Florida
    Posts
    16,620
    Rep Power
    23

    Default

    get the string bytes
    The String.getBytes() method will decode the bytes according to the default setting.
    Better to use a charset:
    byte[] someCharsB = someCharsS.getBytes(charset);

    The decoded content also differs a bit from the original content,
    Can you post a short simple program that compiles and executes to demo that. It should be symmetrical. What goes in comes out.

  11. #11
    LeonLanford is offline Member
    Join Date
    Oct 2009
    Posts
    29
    Rep Power
    0

    Default

    Quote Originally Posted by Norm View Post
    The String.getBytes() method will decode the bytes according to the default setting.
    Better to use a charset:
    byte[] someCharsB = someCharsS.getBytes(charset);


    Can you post a short simple program that compiles and executes to demo that. It should be symmetrical. What goes in comes out.
    I attached the code if you want to see, you need to get the base64 from here
    Base64: Public Domain Base64 Encoder/Decoder

    tespaste.txt is the text I used for testing, just paste the text at the top box and press send
    Attached Files Attached Files

  12. #12
    Norm's Avatar
    Norm is offline Moderator
    Join Date
    Jun 2008
    Location
    Eastern Florida
    Posts
    16,620
    Rep Power
    23

    Default

    There don't appear to be any comments in the code. Where did "The decoded content also differs a bit from the original content"
    Was the original content: tespaste.txt?
    Does your code read that file, encode it, decode it and compare the results with the original and find that it "differs a bit"?

    If I execute your program what should I look for?

    The first thing I see is that the newline characters are removed. Showing the String lengths gives:
    System.out.println("orgnlMsg.len=" + originalMessage.length()
    + ", dcdMsg.len=" + decodedMessage.length());
    //orgnlMsg.len=820, dcdMsg.len=784
    Last edited by Norm; 08-15-2010 at 09:07 PM.

  13. #13
    LeonLanford is offline Member
    Join Date
    Oct 2009
    Posts
    29
    Rep Power
    0

    Default

    Quote Originally Posted by Norm View Post
    There don't appear to be any comments in the code. Where did "The decoded content also differs a bit from the original content"
    Was the original content: tespaste.txt?
    Does your code read that file, encode it, decode it and compare the results with the original and find that it "differs a bit"?

    If I execute your program what should I look for?
    The original message is the one in the txt file, you need to manually paste it into the first box and press send to execute(action listener).

    The second box is the encoded content, the third is decoded content.

    If you can't see the difference visually, put this code to check the length
    Java Code:
    System.out.println(originalMessage.length()+"|"+encodedMessage.length()+"|"+decodedMessage.length());

  14. #14
    cselic is offline Senior Member
    Join Date
    Apr 2010
    Location
    Belgrade, Serbia
    Posts
    278
    Rep Power
    5

    Default

    I attached the code if you want to see, you need to get the base64 from here
    Base64: Public Domain Base64 Encoder/Decoder
    I try to compile your program but I have got an error that Base64 could not be resolved. What should I do?

  15. #15
    LeonLanford is offline Member
    Join Date
    Oct 2009
    Posts
    29
    Rep Power
    0

    Default

    Quote Originally Posted by cselic View Post
    I try to compile your program but I have got an error that Base64 could not be resolved. What should I do?
    norm can run it.. I don't know why you can't

    -----

    Anyway I solved it, I append new line everytime I read the line.
    The content visually looks ok but the length is still different from the original content, not matters much though :D

  16. #16
    Norm's Avatar
    Norm is offline Moderator
    Join Date
    Jun 2008
    Location
    Eastern Florida
    Posts
    16,620
    Rep Power
    23

    Default

    @cselic
    I have got an error that Base64 could not be resolved
    You need to download and compile the Base64 program at the link given.

Similar Threads

  1. problem in JTextPane
    By jperson in forum New To Java
    Replies: 4
    Last Post: 07-07-2010, 04:57 PM
  2. Urgent Unicode/Ascii Problem
    By HackerOfDoom in forum New To Java
    Replies: 5
    Last Post: 03-23-2010, 04:26 PM
  3. Unicode string serach problem
    By saurabh01 in forum Advanced Java
    Replies: 2
    Last Post: 07-02-2009, 10:22 AM
  4. Fedora Itext Unicode Problem
    By gautamn in forum Java 2D
    Replies: 0
    Last Post: 04-13-2009, 08:12 AM
  5. Unicode problem
    By rovshanb in forum JDBC
    Replies: 0
    Last Post: 02-14-2008, 06:41 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •