SAX Parsing element containing html
Hi
I've been working on a program that I've built up from a Java SAX parser tutorial to parse an xml file that has been exported from a database.
I've been doing well so far, but have now come across a problem with an element that contains html. I've found loads of articles online that discuss this problem, but can't seem to find a solution. Here is an extract of the xml file I'm trying to parse:
<texts>
<texts_row num="1">
<texttype>Keynote</texttype>
<textdescription>keynote</textdescription>
<text><html>
<head>
</head>
<body>
<p>
A compellingly readable, agenda-setting account of how and why cities
function as they do and why so many of us choose to live in them
</p>
</body>
</html></text>
</texts_row>
<texts_row num="2">
<texttype>Biographical Note</texttype>
<textdescription>biog</textdescription>
<text><html>
<head>
</head>
<body>
<p style="margin-top: 0">
Edward Glaeser is the Fred and Eleanor Glimp Professor of Economics at
…
</p>
</body>
</html></text>
</texts_row>
</texts>
By reading qName, I can discover elements <text>, <texts_row>, <texttype>, <textdescription> and <text>, and can read the values contained within those, except for the value in <text>, which is the one that contains the html marked up text. For that, the value I get is "/html>". I was hoping to get a String representing everything in between <text> and </text>, which I would then use String.replace() to strip out as much of the rubbish as I can.
Does anyone have any suggestions/help they can give me? I'm tearing my hair out here, and can't seem to work out how to get this data.
It's as if the content is being hidden from me by the parser.
Here is the tutorial url, incase it helps to see the model I am following:
XML and Java - Parsing XML using Java Tutorial
And here is the code extract form the tutorial that I expect will be helpful (save you looking in the tutorial).
//Event Handlers
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
//reset
tempVal = "";
if(qName.equalsIgnoreCase("Employee")) {
//create a new instance of employee
tempEmp = new Employee();
tempEmp.setType(attributes.getValue("type"));
}
}
public void characters(char[] ch, int start, int length) throws SAXException {
tempVal = new String(ch,start,length);
}
public void endElement(String uri, String localName,
String qName) throws SAXException {
if(qName.equalsIgnoreCase("Employee")) {
//add it to the list
myEmpls.add(tempEmp);
}else if (qName.equalsIgnoreCase("Name")) {
tempEmp.setName(tempVal);
}else if (qName.equalsIgnoreCase("Id")) {
tempEmp.setId(Integer.parseInt(tempVal));
}else if (qName.equalsIgnoreCase("Age")) {
tempEmp.setAge(Integer.parseInt(tempVal));
}
}