Results 1 to 2 of 2
  1. #1
    ianstanton is offline Member
    Join Date
    Apr 2011
    Posts
    2
    Rep Power
    0

    Default SAX Parsing element containing html

    Hi
    I've been working on a program that I've built up from a Java SAX parser tutorial to parse an xml file that has been exported from a database.

    I've been doing well so far, but have now come across a problem with an element that contains html. I've found loads of articles online that discuss this problem, but can't seem to find a solution. Here is an extract of the xml file I'm trying to parse:

    <texts>
    <texts_row num="1">
    <texttype>Keynote</texttype>
    <textdescription>keynote</textdescription>
    <text>&lt;html>
    &lt;head>

    &lt;/head>
    &lt;body>
    &lt;p>
    A compellingly readable, agenda-setting account of how and why cities
    function as they do and why so many of us choose to live in them
    &lt;/p>
    &lt;/body>
    &lt;/html></text>
    </texts_row>
    <texts_row num="2">
    <texttype>Biographical Note</texttype>
    <textdescription>biog</textdescription>
    <text>&lt;html>
    &lt;head>

    &lt;/head>
    &lt;body>
    &lt;p style="margin-top: 0">
    Edward Glaeser is the Fred and Eleanor Glimp Professor of Economics at

    &lt;/p>
    &lt;/body>
    &lt;/html></text>

    </texts_row>
    </texts>

    By reading qName, I can discover elements <text>, <texts_row>, <texttype>, <textdescription> and <text>, and can read the values contained within those, except for the value in <text>, which is the one that contains the html marked up text. For that, the value I get is "/html>". I was hoping to get a String representing everything in between <text> and </text>, which I would then use String.replace() to strip out as much of the rubbish as I can.

    Does anyone have any suggestions/help they can give me? I'm tearing my hair out here, and can't seem to work out how to get this data.
    It's as if the content is being hidden from me by the parser.

    Here is the tutorial url, incase it helps to see the model I am following:

    XML and Java - Parsing XML using Java Tutorial

    And here is the code extract form the tutorial that I expect will be helpful (save you looking in the tutorial).
    //Event Handlers
    public void startElement(String uri, String localName, String qName,
    Attributes attributes) throws SAXException {
    //reset
    tempVal = "";
    if(qName.equalsIgnoreCase("Employee")) {
    //create a new instance of employee
    tempEmp = new Employee();
    tempEmp.setType(attributes.getValue("type"));
    }
    }


    public void characters(char[] ch, int start, int length) throws SAXException {
    tempVal = new String(ch,start,length);
    }

    public void endElement(String uri, String localName,
    String qName) throws SAXException {

    if(qName.equalsIgnoreCase("Employee")) {
    //add it to the list
    myEmpls.add(tempEmp);

    }else if (qName.equalsIgnoreCase("Name")) {
    tempEmp.setName(tempVal);
    }else if (qName.equalsIgnoreCase("Id")) {
    tempEmp.setId(Integer.parseInt(tempVal));
    }else if (qName.equalsIgnoreCase("Age")) {
    tempEmp.setAge(Integer.parseInt(tempVal));
    }

    }

  2. #2
    ianstanton is offline Member
    Join Date
    Apr 2011
    Posts
    2
    Rep Power
    0

    Default an answer

    Well, I thought I'd leave that question up and see if I had any answers by morning. No answers, but now if I search for "java sax reading html", this article comes up 4th in Google, which is an achievement in itself.

    Anyway, failing any help, I had to try one of the articles that I found, and managed to get it to work, so if anyone ever discovers this thread, with a similar problem, here is the article I found that answers my question.

    java - Retrieving HTML encoded text from XML using SAXParser - Stack Overflow

Similar Threads

  1. HTML FILE parsing.
    By makpandian in forum New To Java
    Replies: 8
    Last Post: 11-02-2010, 04:00 PM
  2. Replies: 1
    Last Post: 10-30-2009, 07:47 AM
  3. parsing and updating html file using JSP.
    By tskarthic in forum JavaServer Pages (JSP) and JSTL
    Replies: 1
    Last Post: 04-02-2009, 10:12 PM
  4. Parsing HTML
    By jaadu25 in forum Advanced Java
    Replies: 6
    Last Post: 07-20-2008, 07:51 PM
  5. jeditorpane help parsing html
    By asifsolkar in forum Advanced Java
    Replies: 4
    Last Post: 12-14-2007, 06:23 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •