Results 1 to 2 of 2
  1. #1
    rdhaware is offline Member
    Join Date
    Feb 2009
    Posts
    3
    Rep Power
    0

    Default Error while parsing html page in java on linux

    I am parsing HTML page using some html parsing utility. i am using cobra.jar and js.jar for that.

    There are some unreadable special charactor like ' � ' but when I compiled my program in windows it compile properly and run fine.

    But when i compiled it in linux it gives me followig Warning:
    unmappable character for encoding UTF8
    String stateZipArray[] = stateZip.trim().split(" � ");

    and then while accessing elements from stateZipArray array it gives ArrayIndexBounds exception.

    In InputStreamReader class i am using 'ISO-8859-1' as a charsetname.

    Can any one please tell me what is problem and how can i resolve it?

    Thanks in advance.

  2. #2
    toadaly is offline Senior Member
    Join Date
    Jan 2009
    Posts
    671
    Rep Power
    6

    Default

    My guess is that your linux machine has an English language locale as it's default.

    There are two ways to set the locale manually, if that's the problem. See this article on the subject:

    Setting the Default Locale (Java Developers Almanac Example)

Similar Threads

  1. html web page parsing/scraping
    By orchid in forum Advanced Java
    Replies: 3
    Last Post: 10-21-2010, 01:34 PM
  2. Parsing HTML
    By jaadu25 in forum Advanced Java
    Replies: 6
    Last Post: 07-20-2008, 06:51 PM
  3. Include Java file in HTML Page
    By kathyc in forum New To Java
    Replies: 2
    Last Post: 03-07-2008, 03:51 AM
  4. J2EE, PHP, Linux, MySql, CSS, HTML Development Professional
    By softdev in forum Reviews / Advertising
    Replies: 1
    Last Post: 01-17-2008, 11:52 AM
  5. jeditorpane help parsing html
    By asifsolkar in forum Advanced Java
    Replies: 4
    Last Post: 12-14-2007, 05:23 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •