Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 11-06-2009, 12:04 PM
Member
 
Join Date: Nov 2009
Posts: 2
Rep Power: 0
Kaizah is on a distinguished road
Default XML with special characters
Hello everyone,

I am trying to do the following:
- I have an XML document located at some place on the web
- I want to get the XMLs content (source) as it is on its location
- The XML file is utf8-encoded

I can do the above, except for that there is 1 odd thing I cannot seem to fix. I can get the XML's source and all that, but whenever it contains special characters such as ö or é, it gets malformed into something else consisting of two characters. I know this has to do with the fact that the XML file is UTF8 encoded and that I am probably reading it using ISO-encoding. However, I have been trying to get to reading it as UTF8, but I cannot succeed.
Anyone know how to do this?

My current code is:
Code:
public String retrieveSource(String link) {
        
        String htmlCode = "";
        Scanner reader;
        StringBuilder builder;
        try {
        
            URL url = new URL(link);
            reader = new Scanner(url.openStream( ) );
            builder = new StringBuilder( );
            
            while (reader.hasNext( ))
            
            builder.append(reader.nextLine( ) + "\n");
            
            htmlCode = builder.toString( );
        
        } catch (Exception e) {
        
        }
        
        return htmlCode;
    
    }
Thanks.
Bookmark Post in Technorati
Reply With Quote
  #2 (permalink)  
Old 11-06-2009, 03:26 PM
Member
 
Join Date: Nov 2009
Posts: 2
Rep Power: 0
Kaizah is on a distinguished road
Default
Never mind, simply modifying the line
Code:
reader = new Scanner(url.openStream( ) );
To
Code:
reader = new Scanner(url.openStream( ), "UTF-8" );
did the trick.
Bookmark Post in Technorati
Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to index the special characters in Lucene talktoudaykumar Lucene 2 04-23-2009 08:51 AM
[SOLVED] special characters (ASCII) åäö AlejandroPe New To Java 8 04-06-2009 11:42 AM
Searching for Microsoft special characters Tim McDaniel Eclipse 2 02-24-2009 04:11 PM
special characters ravian New To Java 2 11-16-2007 02:28 PM
Special characters in text fields Felissa Web Frameworks 0 06-27-2007 05:47 PM


All times are GMT +2. The time now is 04:51 AM.



VBulletin, Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2009, Crawlability, Inc.
Copyright ©2006 - 2007, www.java-forums.org