Results 1 to 2 of 2
- 02-19-2009, 06:25 AM #1
Member
- Join Date
- Feb 2009
- Posts
- 3
- Rep Power
- 0
Error while parsing html page in java on linux
I am parsing HTML page using some html parsing utility. i am using cobra.jar and js.jar for that.
There are some unreadable special charactor like ' � ' but when I compiled my program in windows it compile properly and run fine.
But when i compiled it in linux it gives me followig Warning:
unmappable character for encoding UTF8
String stateZipArray[] = stateZip.trim().split(" � ");
and then while accessing elements from stateZipArray array it gives ArrayIndexBounds exception.
In InputStreamReader class i am using 'ISO-8859-1' as a charsetname.
Can any one please tell me what is problem and how can i resolve it?
Thanks in advance.
- 02-20-2009, 02:20 AM #2
Senior Member
- Join Date
- Jan 2009
- Posts
- 671
- Rep Power
- 5
My guess is that your linux machine has an English language locale as it's default.
There are two ways to set the locale manually, if that's the problem. See this article on the subject:
Setting the Default Locale (Java Developers Almanac Example)
Similar Threads
-
html web page parsing/scraping
By orchid in forum Advanced JavaReplies: 3Last Post: 10-21-2010, 01:34 PM -
Parsing HTML
By jaadu25 in forum Advanced JavaReplies: 6Last Post: 07-20-2008, 06:51 PM -
Include Java file in HTML Page
By kathyc in forum New To JavaReplies: 2Last Post: 03-07-2008, 03:51 AM -
J2EE, PHP, Linux, MySql, CSS, HTML Development Professional
By softdev in forum Reviews / AdvertisingReplies: 1Last Post: 01-17-2008, 11:52 AM -
jeditorpane help parsing html
By asifsolkar in forum Advanced JavaReplies: 4Last Post: 12-14-2007, 05:23 AM


LinkBack URL
About LinkBacks
Reply With Quote
Bookmarks