Results 1 to 3 of 3
Thread: [SOLVED] Codepage conversion
- 02-08-2009, 10:49 PM #1
[SOLVED] Codepage conversion
I'm working on this little program that retrieves a html-page using the URL and BufferedReader.
But I'm having some problems with special characters like 'ê' with some pages.
My own guess is that the reason is that it is a ISO-8859-1 page loaded into a UTF-8 stream. But is it possible to perform a conversion of the retrieved string ?
Live long and prosper...
Last edited by flywheel; 02-09-2009 at 05:56 PM. Reason: Sufficient input to fix problem achieved
- 02-09-2009, 12:14 AM #2
Look at CharsetDecoder and ByteBuffer. The Java network I/O classes make use of these to handle conversions from code page data to Unicode. Of course, you are expected to know in which code page the bytes are encoded. They also offer replacement options for unknown characters.
This means you will have to read the data at the byte level, not as characters. ByteBuffer is a thin wrapper class, so once you have the bytes in an array, creating a ByteBuffer around the array is simple and low overhead.
This is obviously a different approach. I hope this helps...
- 02-09-2009, 05:54 PM #3
- By nitin2k2k in forum Advanced JavaReplies: 17Last Post: 09-20-2011, 08:41 AM
- By praveen.kb in forum Advanced JavaReplies: 2Last Post: 01-16-2009, 12:27 PM
- By kushagra in forum Advanced JavaReplies: 3Last Post: 10-16-2008, 08:23 AM
- By bozovilla in forum Advanced JavaReplies: 1Last Post: 07-31-2008, 05:54 AM
- By tarandeep.singh in forum XMLReplies: 1Last Post: 06-14-2008, 02:17 AM