Results 1 to 7 of 7
  1. #1
    utikawa is offline Member
    Join Date
    Feb 2013
    Posts
    4
    Rep Power
    0

    Default SAXParser - Use enconding from XML

    Hi all!

    I am starting to learn Java for Android and I am working on a RSS Reader. I am a programmer and I work with C/C++ but know almost nothing about Java (despite its proximity to C++).
    I think here is the right place to post my question, sorry if I am wrong! I know that there is a specific forum for Android but I think my question is related to java itself. BTW, please someone move my question if necessary...

    My question is about XML encoding. The parser just ignores it.
    I have already tried many solutions, none of them worked for me. Maybe it is a simple or noob question but I have no more ideas about how to solve my issue...
    I read that SAXParser will only use the encoding from XML if I pass a Byte Stream to it but I think I am already sending a Byte Stream.
    If I force the encoding by setEncoding, all works fine.

    Please see the code below:
    Java Code:
    	public List<Article> getLatestArticles(String feedUrl) {
    		URL url = null;
    		try {
    
    			SAXParserFactory spf = SAXParserFactory.newInstance();
    			SAXParser sp = spf.newSAXParser();
    			XMLReader xr = sp.getXMLReader();
    
    			url = new URL(feedUrl);
    			
    			URLConnection conn = url.openConnection();
    			InputStream stream = conn.getInputStream();
    			String ContentEncoding = conn.getContentEncoding();
    			if("gzip".equals(conn.getContentEncoding())) {
    				stream = new GZIPInputStream(stream);
    			}
    
    			InputSource source = new InputSource(stream);
    
    			InputStream ByteStream = source.getByteStream();
    			Reader CharStream = source.getCharacterStream();
    
    			String encoding = source.getEncoding();
    ///			source.setEncoding("ISO-8859-1");
    
    			xr.setContentHandler(this);
    			xr.parse(source);
    
    
    		} catch (IOException e) {
    			Log.e("RSS Handler IO", e.getMessage() + " >> " + e.toString());
    		} catch (SAXException e) {
    			Log.e("RSS Handler SAX", e.toString());
    		} catch (ParserConfigurationException e) {
    			Log.e("RSS Handler Parser Config", e.toString());
    		}
    		
    		return articleList;
    	}
    Results:

    ContentEncoding = null
    ByteStream is valid
    CharacterStream = null
    encoding = null

    As you can see, it seems that I am using a Byte Stream but enconding is null.

    Here is two XML that I am trying to parse:

    ( ISO-8859-1 ) - Megasena
    ( UTF-8 ) - Hardware.com.br

    PS: both xml files are returning encoding = null

    Thanks in advance!

    Best Regards,
    Marcelo Utikawa da Fonseca

  2. #2
    Tolls is online now Moderator
    Join Date
    Apr 2009
    Posts
    12,049
    Rep Power
    20

    Default Re: SAXParser - Use enconding from XML

    Java Code:
    InputStream stream = conn.getInputStream(); 
    InputSource source = new InputSource(stream);
    That should be all you need.
    The parser is supposed to pick up the character encoding from the stream.
    From the docs:
    "
    The SAX parser will use the InputSource object to determine how to read XML input. ... If there is no character stream, but there is a byte stream, the parser will use that byte stream, using the encoding specified in the InputSource or else (if no encoding is specified) autodetecting the character encoding using an algorithm such as the one in the XML specification.
    "
    Now, if you are getting an exception caused when trying without declaring any encoding then post it here, because it could be something else entirely.
    Please do not ask for code as refusal often offends.

    ** This space for rent **

  3. #3
    utikawa is offline Member
    Join Date
    Feb 2013
    Posts
    4
    Rep Power
    0

    Default Re: SAXParser - Use enconding from XML

    Hi Tolls!

    Thanks for the very quick reply! :-)

    I have the same problem when trying your solution. I get a SAXException during parse: "org.apache.harmony.xml.ExpatParser$ParserExceptio n: At line 9, column 23: not well-formed (invalid token)".
    That is the exact position of first encoded character in the file (ISO-8859-1).
    The original code (without all my debugging code) produces the same exception and it is shown below (just to be easier to understand the code):
    Java Code:
    	public List<Article> getLatestArticles(String feedUrl) {
    		URL url = null;
    		try {
    
    			SAXParserFactory spf = SAXParserFactory.newInstance();
    			SAXParser sp = spf.newSAXParser();
    			XMLReader xr = sp.getXMLReader();
    
    			url = new URL(feedUrl);
    			xr.setContentHandler(this);
    			xr.parse(new InputSource(url.openStream()));
    
    
    		} catch (IOException e) {
    			Log.e("RSS Handler IO", e.getMessage() + " >> " + e.toString());
    		} catch (SAXException e) {
    			Log.e("RSS Handler SAX", e.toString());
    		} catch (ParserConfigurationException e) {
    			Log.e("RSS Handler Parser Config", e.toString());
    		}
    		
    		return articleList;
    	}
    Thanks a lot!

    Marcelo Fonseca

  4. #4
    Tolls is online now Moderator
    Join Date
    Apr 2009
    Posts
    12,049
    Rep Power
    20

    Default Re: SAXParser - Use enconding from XML

    Declare the InputSource outside of the try/catch block (obviously don't assign anything to it).
    In the relevant catch try:
    Java Code:
    Log.e("Encoding is: " + inputSource.getEncoding());
    Just to see what it thinks it's working with.
    Please do not ask for code as refusal often offends.

    ** This space for rent **

  5. #5
    utikawa is offline Member
    Join Date
    Feb 2013
    Posts
    4
    Rep Power
    0

    Default Re: SAXParser - Use enconding from XML

    The result is:

    Encoding is: null

    I can synchronize my code to github later if you want to see the full code. There are some changes from the current code. https://github.com/utikawa/AndroidRssReader.git

    Best regards,
    Marcelo Utikawa da Fonseca

  6. #6
    Tolls is online now Moderator
    Join Date
    Apr 2009
    Posts
    12,049
    Rep Power
    20

    Default Re: SAXParser - Use enconding from XML

    All I can say then is it looks like InputSource is having problems identifying an encoding in the incoming stream.
    You could try looking into the specific code for InputSource and see, or step through it as it acts on the stream possibly.

    Me looking at your code (even if I had the time) would not really help.
    Please do not ask for code as refusal often offends.

    ** This space for rent **

  7. #7
    utikawa is offline Member
    Join Date
    Feb 2013
    Posts
    4
    Rep Power
    0

    Default Re: SAXParser - Use enconding from XML

    Ok, I thought the same about you to see my code...
    It is really strange!
    I will continue trying to know what is wrong... any news I will update here!
    Thanks again!

    Marcelo Fonseca

Similar Threads

  1. Eclipse / JPA and SaxParser - confused
    By alexandra12 in forum Eclipse
    Replies: 0
    Last Post: 08-05-2012, 02:10 AM
  2. XML Parsing using SAXParser
    By krishanu in forum New To Java
    Replies: 9
    Last Post: 06-16-2011, 09:28 AM
  3. Replies: 1
    Last Post: 01-20-2010, 04:52 PM
  4. Initializing SAXParser
    By Java Tip in forum Java Tip
    Replies: 0
    Last Post: 12-14-2007, 06:33 PM

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •