|
|
Welcome to the Java Forums.
You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community, you will:
- have access to post topics
- communicate privately with other members (PM)
- not see advertisements between posts
- have the possibility to earn one of our surprises if you are an active member
- access many other special features that will be introduced later.
Registration is fast, simple and absolutely free so please, join our community today!
If you have any problems with the registration process or your account login, please contact us.
|
|

06-25-2008, 01:08 PM
|
|
Member
|
|
Join Date: May 2008
Posts: 12
|
|
|
dentify the language type from a given String.
Hi all,
Do you have some source code sample or any idea how can i identify the language type from a given String.
e.g-
“林悦旻” -Chinese language
“ABC”- English language
etc.
Thanks!
vaskar
|
|

06-25-2008, 05:46 PM
|
|
Member
|
|
Join Date: Jun 2008
Posts: 9
|
|
|
where are you getting the string from?
|
|

06-25-2008, 06:17 PM
|
|
Member
|
|
Join Date: Jun 2008
Posts: 9
|
|
|
maybe this might help you...jchardet.sourceforge.net
i don't know how it could possibly be accurate though...
|
|

06-25-2008, 07:33 PM
|
 |
Senior Member
|
|
Join Date: May 2008
Posts: 282
|
|
Hi,
The code in this link is for japanese language and it may be converted to chinese also itseems as i dont know about Chinese i cant help u out with the code
Java - Chinese Language Processing and Chinese Computing
__________________
To finish sooner, take your own time....
Nivedithaaaa
|
|

06-27-2008, 05:33 PM
|
 |
Senior Member
|
|
Join Date: Jun 2008
Location: SW MO, USA
Posts: 975
|
|
|
Since the a String is made up of Unicode characters, convert one of the characters in the string to an int value and use that to see where in the range of values for two byte values of characters that make up the full range of Unicode characters that it fits. For example ASCII/english chars could range from 0 - 256. To guess the language/alphabet a char was from you need a table that maps the ranges of Unicode characters for each language.
Something like: English 0-255, Japanese 1200-1400 etc for the full range of Unicode values 0-64K
|
|

06-28-2008, 08:51 AM
|
|
Member
|
|
Join Date: May 2008
Posts: 12
|
|
|
Can u give me some sample code? From where do i get the unicode range table.Can u give me some url.
Thanks!
vaskar
|
|

06-28-2008, 08:56 AM
|
 |
Moderator
|
|
Join Date: Jul 2007
Location: Colombo, Sri Lanka
Posts: 3,065
|
|
__________________
Use an appropriate Subject. "Help, urgent!" isn't one. To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
Has someone helped you? Then you can To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts. their helpful post.
Want to make your IDE the best? To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts. To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts. (Close on September 4, 2008)
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
|
|

06-28-2008, 11:52 AM
|
|
Member
|
|
Join Date: May 2008
Posts: 12
|
|
|
From where do i get the information
English 0-255, Japanese 1200-1400....etc?
pls help me.
|
|

06-28-2008, 12:07 PM
|
 |
Moderator
|
|
Join Date: Jul 2007
Location: Colombo, Sri Lanka
Posts: 3,065
|
|
|
You have to use UNICODE tables.
__________________
Use an appropriate Subject. "Help, urgent!" isn't one. To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
Has someone helped you? Then you can To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts. their helpful post.
Want to make your IDE the best? To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts. To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts. (Close on September 4, 2008)
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
|
|

06-28-2008, 06:18 PM
|
 |
Senior Member
|
|
Join Date: Jun 2008
Location: Southwest
Posts: 422
|
|
|
use Norm's suggestion.
vaskarbasak, this is a remarkably involved subject. I found some work on the subject that is literally millions of lines of text, there are issues that are not apparent to native ISO Latin - 1 speakers.
Just digging through the information available would require writing specailized Java programs. It would be better if you tell us what you are trying to achieve. Java has remarkable ability to handle a String as a String without the coder trying to disentangle the Unicode Constortium.
Have you ever read an RFC?
__________________
Please provide your feedback on our To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts. .
Cybercartography: A new theoretical construct proposed by D.R. Fraser Taylor
|
|

07-24-2008, 06:58 AM
|
|
Member
|
|
Join Date: Jul 2008
Posts: 4
|
|
|
Chinese character issue
Hey Vaskarbasak,
now i need exactly this same thing. Pls help me as i hope it 'd be solved for u by today.
Anybody has any idea please tell me.
|
|

07-24-2008, 07:00 AM
|
 |
Moderator
|
|
Join Date: Jul 2007
Location: Colombo, Sri Lanka
Posts: 3,065
|
|
Did you go through all the replies in this thread? There are lots of hints for you. 
__________________
Use an appropriate Subject. "Help, urgent!" isn't one. To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
Has someone helped you? Then you can To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts. their helpful post.
Want to make your IDE the best? To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts. To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts. (Close on September 4, 2008)
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
|
|

07-24-2008, 07:11 AM
|
|
Member
|
|
Join Date: Jul 2008
Posts: 4
|
|
|
The ans frn Nivedita found useful for my problem, for Japanese. I have this same requirement for chinese and korean as well.
And the character set range specified above,, i'm not dare enough to decide the language of the String, by just using range of values.
|
|

07-24-2008, 07:16 AM
|
 |
Moderator
|
|
Join Date: Jul 2007
Location: Colombo, Sri Lanka
Posts: 3,065
|
|
|
There in the Niveditha's link talking about UNICODE. So the thing it you have to fine the correct range of UNICODE values for correct language.
__________________
Use an appropriate Subject. "Help, urgent!" isn't one. To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
Has someone helped you? Then you can To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts. their helpful post.
Want to make your IDE the best? To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts. To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts. (Close on September 4, 2008)
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
|
|

07-29-2008, 11:49 AM
|
|
Member
|
|
Join Date: Jul 2008
Posts: 4
|
|
Hey Eranga,
Thanku so much.., i got it and working fine..... 
|
|

07-29-2008, 12:13 PM
|
 |
Moderator
|
|
Join Date: Jul 2007
Location: Colombo, Sri Lanka
Posts: 3,065
|
|
|
Nice to here that. If you can I think it's better to briefly explain here how did you so it. Because in later another member can follow the way you take to solve a problem.
__________________
Use an appropriate Subject. "Help, urgent!" isn't one. To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
Has someone helped you? Then you can To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts. their helpful post.
Want to make your IDE the best? To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts. To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts. (Close on September 4, 2008)
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
|
|

07-29-2008, 12:38 PM
|
 |
Senior Member
|
|
Join Date: May 2008
Posts: 282
|
|
Ya Eranga's suggestion was correct, atleast any one else wont have to spend one more month again to find the same solution... 
__________________
To finish sooner, take your own time....
Nivedithaaaa
|
|

07-31-2008, 11:24 AM
|
|
Member
|
|
Join Date: Jul 2008
Posts: 4
|
|
|
Heres the brief explanation of my problem and solution,
I want to recognize chinese(both traditional and simplified) , japanese and Korean. In our code we can't recognize these characters directly as we do for English characters/strings. This is done with the help of unicode character set. Each language has different range of values to represent their characters. For example,
for Korean language the range of values are '\uAC00' to '\uD7A3'. Which means, every korean letter has some value within this range. In this way we will come to a conclusion that this letter belong to Korean language.
Please note that above range of values belongs to Hangul Syllablus, which is a type of languages in Korean, as there are different type of Koran langs i seen(but we actually won't see much difference.).
Please make sure your java file is set to unicode(UTF-8) format.
More questions? mail me.
|
|

07-31-2008, 11:29 AM
|
 |
Moderator
|
|
Join Date: Jul 2007
Location: Colombo, Sri Lanka
Posts: 3,065
|
|
That's fine. So that anyone refer this thread can have a brief idea what he/she have to do.
__________________
Use an appropriate Subject. "Help, urgent!" isn't one. To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
Has someone helped you? Then you can To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts. their helpful post.
Want to make your IDE the best? To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts. To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts. (Close on September 4, 2008)
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
|
|
| Thread Tools |
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|