Results 1 to 2 of 2
  1. #1
    aneuryzma is offline Member
    Join Date
    Aug 2008
    Posts
    46
    Rep Power
    0

    Default File format exception... macRoman and UTF-8

    When I run my Lucene app and a parse a xml file I get the following error due to some fonts such as "" written in the text file.

    If I save the text file as UTF-8 with my text editor I don't have this issue, but when I create it with a java app, it is saved as MacRoman.

    How can I specify a different format with Java instead ?

    thanks

    Java Code:
    Exception in thread "main" com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.
    	at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8Reader.java:684)
    	at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:554)
    	at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1742)
    	at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(XMLEntityScanner.java:1416)
    	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2792)
    	at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
    	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
    	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)
    	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
    	at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
    	at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
    	at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
    	at org.apache.commons.digester.Digester.parse(Digester.java:1871)
    	at CollectionIndexer.main(CollectionIndexer.java:111)

  2. #2
    JosAH's Avatar
    JosAH is offline Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,453
    Blog Entries
    7
    Rep Power
    20

    Default

    Use an OutputStreamWriter and specify the encoding (UTF-8).

    kind regards,

    Jos
    cenosillicaphobia: the fear for an empty beer glass

Similar Threads

  1. Replies: 7
    Last Post: 12-30-2010, 01:42 PM
  2. Date format exception
    By chaudhas in forum New To Java
    Replies: 7
    Last Post: 06-25-2010, 09:31 AM
  3. convert .txt file in .csv format
    By rajuchacha007 in forum New To Java
    Replies: 19
    Last Post: 03-18-2010, 09:10 AM
  4. how to cerate own file format?
    By Mekonom in forum New To Java
    Replies: 5
    Last Post: 03-01-2010, 10:02 AM
  5. Format of class file
    By makpandian in forum New To Java
    Replies: 15
    Last Post: 05-07-2009, 02:40 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •