Results 1 to 4 of 4
  1. #1
    Dyukon is offline Member
    Join Date
    Nov 2011
    Posts
    2
    Rep Power
    0

    Default Problem with encoding Russian text between UTF-8 and Unicode

    Hello!

    Not so long ago I tried to encode/decode the Russian text from Unicode to UTF-8 and back. And I discovered that Java doesn't like Russian letter 'И' ('\u0418'). Here is my code and result of its work:

    class Basic
    {
    public static void main(String[] args) throws Exception
    {
    String s = "";
    for(char ch=0x0410; ch<=0x044F; ch++)
    s += ch;
    System.out.println(s);
    s = new String(s.getBytes("UTF-8"));
    System.out.println(s);
    s = new String(s.getBytes(), "UTF-8");
    System.out.println(s);
    }
    }

    What I see in my console:

    АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъ ыьэюя
    РђРРРРРРРР?РРљРРњРќРћРџР*РЎРўРЈРРҐРРРЁ РРЄРРР*РРЇРРРІРіРґРРРРёР№РєРРјРЅРѕРїСЂСЃ ССѓСССССССЉССЊСЌСЋСЏ
    АБВГДЕЖЗ??ЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъ ыьэюя

    So, the only symbol which is distorted by this encoding/decoding process is 'И' ('\u0418'). But it bothers me very much. What could I do to avoid this problem?

  2. #2
    JosAH's Avatar
    JosAH is offline Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,732
    Blog Entries
    7
    Rep Power
    21

    Default Re: Problem with encoding Russian text between UTF-8 and Unicode

    The default encoding on your system is not UTF-8 (the second line of output tells you so); you should set both the encoding as well as the decoding to UTF-8; as in:

    Java Code:
    s= new String(s.getBytes("UTF-8"), "UTF-8");
    kind regards,

    Jos
    cenosillicaphobia: the fear for an empty beer glass

  3. #3
    Dyukon is offline Member
    Join Date
    Nov 2011
    Posts
    2
    Rep Power
    0

    Default Re: Problem with encoding Russian text between UTF-8 and Unicode

    Thank you very much. I see that I should specify charset of encoding any time not relying on the system defaults.

  4. #4
    JosAH's Avatar
    JosAH is offline Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,732
    Blog Entries
    7
    Rep Power
    21

    Default Re: Problem with encoding Russian text between UTF-8 and Unicode

    Quote Originally Posted by Dyukon View Post
    Thank you very much. I see that I should specify charset of encoding any time not relying on the system defaults.
    Yep, as long as you realize that both the encoding as well as the decoding parts are equally important, that entire unicode hoopla is easy.

    kind regards,

    Jos
    cenosillicaphobia: the fear for an empty beer glass

Similar Threads

  1. Replies: 0
    Last Post: 10-12-2010, 10:09 AM
  2. Replies: 23
    Last Post: 08-12-2010, 10:59 AM
  3. Recognizing text encoding
    By parag3002 in forum New To Java
    Replies: 0
    Last Post: 06-24-2010, 12:39 PM
  4. Missing text encoding
    By talgreen in forum Eclipse
    Replies: 0
    Last Post: 03-30-2008, 09:14 PM
  5. querying russian data from db problem
    By mr_empty in forum JDBC
    Replies: 0
    Last Post: 03-04-2008, 08:56 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •