Results 1 to 19 of 19
  1. #1
    vaskarbasak is offline Member
    Join Date
    May 2008
    Posts
    13
    Rep Power
    0

    Question dentify the language type from a given String.

    Hi all,

    Do you have some source code sample or any idea how can i identify the language type from a given String.

    e.g-

    “林悦旻” -Chinese language
    “ABC”- English language
    etc.

    Thanks!
    vaskar

  2. #2
    kurenai is offline Member
    Join Date
    Jun 2008
    Posts
    9
    Rep Power
    0

    Default

    where are you getting the string from?

  3. #3
    kurenai is offline Member
    Join Date
    Jun 2008
    Posts
    9
    Rep Power
    0

    Default

    maybe this might help you...jchardet.sourceforge.net

    i don't know how it could possibly be accurate though...

  4. #4
    Niveditha's Avatar
    Niveditha is offline Senior Member
    Join Date
    May 2008
    Posts
    307
    Rep Power
    7

    Thumbs up

    Hi,
    The code in this link is for japanese language and it may be converted to chinese also itseems as i dont know about Chinese i cant help u out with the code
    Java - Chinese Language Processing and Chinese Computing
    To finish sooner, take your own time....
    Nivedithaaaa

  5. #5
    Norm's Avatar
    Norm is offline Moderator
    Join Date
    Jun 2008
    Location
    SW Missouri
    Posts
    17,305
    Rep Power
    25

    Default

    Since the a String is made up of Unicode characters, convert one of the characters in the string to an int value and use that to see where in the range of values for two byte values of characters that make up the full range of Unicode characters that it fits. For example ASCII/english chars could range from 0 - 256. To guess the language/alphabet a char was from you need a table that maps the ranges of Unicode characters for each language.
    Something like: English 0-255, Japanese 1200-1400 etc for the full range of Unicode values 0-64K

  6. #6
    vaskarbasak is offline Member
    Join Date
    May 2008
    Posts
    13
    Rep Power
    0

    Default

    Can u give me some sample code? From where do i get the unicode range table.Can u give me some url.

    Thanks!
    vaskar

  7. #7
    Eranga's Avatar
    Eranga is offline Moderator
    Join Date
    Jul 2007
    Location
    Colombo, Sri Lanka
    Posts
    11,372
    Blog Entries
    1
    Rep Power
    20

  8. #8
    vaskarbasak is offline Member
    Join Date
    May 2008
    Posts
    13
    Rep Power
    0

    Default

    From where do i get the information

    English 0-255, Japanese 1200-1400....etc?

    pls help me.

  9. #9
    Eranga's Avatar
    Eranga is offline Moderator
    Join Date
    Jul 2007
    Location
    Colombo, Sri Lanka
    Posts
    11,372
    Blog Entries
    1
    Rep Power
    20

  10. #10
    Nicholas Jordan's Avatar
    Nicholas Jordan is offline Senior Member
    Join Date
    Jun 2008
    Location
    Southwest
    Posts
    1,018
    Rep Power
    8

    Default use Norm's suggestion.

    vaskarbasak, this is a remarkably involved subject. I found some work on the subject that is literally millions of lines of text, there are issues that are not apparent to native ISO Latin - 1 speakers.

    Just digging through the information available would require writing specailized Java programs. It would be better if you tell us what you are trying to achieve. Java has remarkable ability to handle a String as a String without the coder trying to disentangle the Unicode Constortium.

    Have you ever read an RFC?
    Introduction to Programming Using Java.
    Cybercartography: A new theoretical construct proposed by D.R. Fraser Taylor

  11. #11
    sravan_tel is offline Member
    Join Date
    Jul 2008
    Posts
    4
    Rep Power
    0

    Question Chinese character issue

    Hey Vaskarbasak,
    now i need exactly this same thing. Pls help me as i hope it 'd be solved for u by today.
    Anybody has any idea please tell me.

  12. #12
    Eranga's Avatar
    Eranga is offline Moderator
    Join Date
    Jul 2007
    Location
    Colombo, Sri Lanka
    Posts
    11,372
    Blog Entries
    1
    Rep Power
    20

  13. #13
    sravan_tel is offline Member
    Join Date
    Jul 2008
    Posts
    4
    Rep Power
    0

    Red face

    The ans frn Nivedita found useful for my problem, for Japanese. I have this same requirement for chinese and korean as well.
    And the character set range specified above,, i'm not dare enough to decide the language of the String, by just using range of values.

  14. #14
    Eranga's Avatar
    Eranga is offline Moderator
    Join Date
    Jul 2007
    Location
    Colombo, Sri Lanka
    Posts
    11,372
    Blog Entries
    1
    Rep Power
    20

    Default

    There in the Niveditha's link talking about UNICODE. So the thing it you have to fine the correct range of UNICODE values for correct language.

  15. #15
    sravan_tel is offline Member
    Join Date
    Jul 2008
    Posts
    4
    Rep Power
    0

    Default

    Hey Eranga,
    Thanku so much.., i got it and working fine.....:)

  16. #16
    Eranga's Avatar
    Eranga is offline Moderator
    Join Date
    Jul 2007
    Location
    Colombo, Sri Lanka
    Posts
    11,372
    Blog Entries
    1
    Rep Power
    20

    Default

    Nice to here that. If you can I think it's better to briefly explain here how did you so it. Because in later another member can follow the way you take to solve a problem.

  17. #17
    Niveditha's Avatar
    Niveditha is offline Senior Member
    Join Date
    May 2008
    Posts
    307
    Rep Power
    7

    Default

    Ya Eranga's suggestion was correct, atleast any one else wont have to spend one more month again to find the same solution... :)
    To finish sooner, take your own time....
    Nivedithaaaa

  18. #18
    sravan_tel is offline Member
    Join Date
    Jul 2008
    Posts
    4
    Rep Power
    0

    Default

    Heres the brief explanation of my problem and solution,
    I want to recognize chinese(both traditional and simplified) , japanese and Korean. In our code we can't recognize these characters directly as we do for English characters/strings. This is done with the help of unicode character set. Each language has different range of values to represent their characters. For example,
    for Korean language the range of values are '\uAC00' to '\uD7A3'. Which means, every korean letter has some value within this range. In this way we will come to a conclusion that this letter belong to Korean language.
    Please note that above range of values belongs to Hangul Syllablus, which is a type of languages in Korean, as there are different type of Koran langs i seen(but we actually won't see much difference.).
    Please make sure your java file is set to unicode(UTF-8) format.
    More questions? mail me.

  19. #19
    Eranga's Avatar
    Eranga is offline Moderator
    Join Date
    Jul 2007
    Location
    Colombo, Sri Lanka
    Posts
    11,372
    Blog Entries
    1
    Rep Power
    20

Similar Threads

  1. [SOLVED] Cast string type to int type
    By GilaMonster in forum New To Java
    Replies: 9
    Last Post: 09-17-2008, 10:43 AM
  2. [SOLVED] curiosity about String type variable
    By monir6464 in forum Advanced Java
    Replies: 1
    Last Post: 04-08-2008, 11:13 AM
  3. How to cast an Object into a specific type (Integer/String) at runtime
    By mailtogagan@gmail.com in forum Advanced Java
    Replies: 2
    Last Post: 12-03-2007, 01:04 PM
  4. Replies: 0
    Last Post: 11-20-2007, 04:59 PM
  5. V language 0.004
    By JavaBean in forum Java Software
    Replies: 0
    Last Post: 07-19-2007, 03:18 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •