Results 1 to 3 of 3
Like Tree1Likes
  • 1 Post By StormyWaters

Thread: Charset problems (presumably?)

  1. #1
    Toll's Avatar
    Toll is offline Senior Member
    Join Date
    May 2011
    Location
    Sweden
    Posts
    392
    Rep Power
    4

    Default Charset problems (presumably?)

    After finding out a couple of HTML files I got didn't follow the XHTML standard I needed, I decided to write a small program to properly "fix" them (ugly fixes, but fixes nonetheless). The fixes work (at the basic level at least; there are still a couple of cases I need to cover), but I'm having a problem with characters getting corrupted. Never really had to worry about that before, so I'm kinda stumped at the moment. The worst thing is that not all unicode-characters are corrupted, but all the characters corrupted are unicode. Here's a quick SCCE:

    Java Code:
    import java.io.BufferedReader;
    import java.io.FileReader;
    import java.io.BufferedWriter;
    import java.io.FileWriter;
    
    public class SCCE
    {
      BufferedWriter out;
      public static void main(String[] args)
      {
        new SCCE();
      }
      public SCCE()
      {
        try
        {
          System.out.println("Reading...");
          BufferedReader in=new BufferedReader(new FileReader("Chapter1.html"));
          out=new BufferedWriter(new FileWriter("Chapter1Fixed.html"));
          String fullyread="";
          String s=in.readLine();
          while (s!=null)
          {
            fullyread+=s;
            s=in.readLine();
          }
          System.out.println("Read: "+fullyread.length());
          System.out.println("Done reading");
          out.write(fullyread);
          out.flush();
          out.close();
        }
        catch (Exception e)
        {
          e.printStackTrace();
        }
      }
    }
    Also attached the test-file in question in a .zip-archive. It's a rather minimalistic file (and nowhere near as large as the one I'm playing with), but with the code above and that file, it replicates the problem for me at least.

    Any hints appreciated, since I'm tearing my hair at this one right now.
    Attached Files Attached Files

  2. #2
    StormyWaters is offline Senior Member
    Join Date
    Feb 2009
    Posts
    306
    Rep Power
    6

    Default Re: Charset problems (presumably?)

    The FileReader and FileWriter classes assume the default character set and will use that. If you need to use a different character set, you will need to use other classes to create the BufferedReader and BufferedWriter instances which allow you to specify the character set you want to use.

    I would try switching from going File => FileReader => BufferedReader to File => FileInputStream => InputStreamReader => BufferedReader, and for writing instead of going File => FileWriter => BufferedWriter to File => FileOutputStream => OutputStreamWriter => BufferedWriter

    Hope that helps,
    Fubarable likes this.

  3. #3
    Toll's Avatar
    Toll is offline Senior Member
    Join Date
    May 2011
    Location
    Sweden
    Posts
    392
    Rep Power
    4

    Default Re: Charset problems (presumably?)

    That definitely helped, yes! The output is no longer corrupted, which is definitely a load off (I was afraid I'd have to convert it all to ASCII, which was a horrifying thought). The only problem is that the filesize doubled, but at least that's better than corrupt files. Thanks a bunch!

Similar Threads

  1. Problem with combining charset and enctype
    By shihad_s in forum New To Java
    Replies: 0
    Last Post: 02-22-2011, 01:14 PM
  2. InputStream/Jar Problems/File IO Problems
    By rdjava in forum Advanced Java
    Replies: 31
    Last Post: 01-17-2011, 12:12 PM
  3. Gui problems
    By bulldog in forum Advanced Java
    Replies: 1
    Last Post: 12-11-2009, 01:35 PM
  4. Setting Charset problem
    By geekdad in forum Advanced Java
    Replies: 2
    Last Post: 11-17-2009, 02:27 AM
  5. Having trouble using multi-byte charset?
    By devu1982 in forum Advanced Java
    Replies: 3
    Last Post: 03-03-2009, 05:00 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •