Results 1 to 2 of 2
- 04-19-2011, 09:19 PM #1
Member
- Join Date
- Apr 2011
- Posts
- 2
- Rep Power
- 0
SAX Parsing element containing html
Hi
I've been working on a program that I've built up from a Java SAX parser tutorial to parse an xml file that has been exported from a database.
I've been doing well so far, but have now come across a problem with an element that contains html. I've found loads of articles online that discuss this problem, but can't seem to find a solution. Here is an extract of the xml file I'm trying to parse:
<texts>
<texts_row num="1">
<texttype>Keynote</texttype>
<textdescription>keynote</textdescription>
<text><html>
<head>
</head>
<body>
<p>
A compellingly readable, agenda-setting account of how and why cities
function as they do and why so many of us choose to live in them
</p>
</body>
</html></text>
</texts_row>
<texts_row num="2">
<texttype>Biographical Note</texttype>
<textdescription>biog</textdescription>
<text><html>
<head>
</head>
<body>
<p style="margin-top: 0">
Edward Glaeser is the Fred and Eleanor Glimp Professor of Economics at
…
</p>
</body>
</html></text>
</texts_row>
</texts>
By reading qName, I can discover elements <text>, <texts_row>, <texttype>, <textdescription> and <text>, and can read the values contained within those, except for the value in <text>, which is the one that contains the html marked up text. For that, the value I get is "/html>". I was hoping to get a String representing everything in between <text> and </text>, which I would then use String.replace() to strip out as much of the rubbish as I can.
Does anyone have any suggestions/help they can give me? I'm tearing my hair out here, and can't seem to work out how to get this data.
It's as if the content is being hidden from me by the parser.
Here is the tutorial url, incase it helps to see the model I am following:
XML and Java - Parsing XML using Java Tutorial
And here is the code extract form the tutorial that I expect will be helpful (save you looking in the tutorial).
//Event Handlers
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
//reset
tempVal = "";
if(qName.equalsIgnoreCase("Employee")) {
//create a new instance of employee
tempEmp = new Employee();
tempEmp.setType(attributes.getValue("type"));
}
}
public void characters(char[] ch, int start, int length) throws SAXException {
tempVal = new String(ch,start,length);
}
public void endElement(String uri, String localName,
String qName) throws SAXException {
if(qName.equalsIgnoreCase("Employee")) {
//add it to the list
myEmpls.add(tempEmp);
}else if (qName.equalsIgnoreCase("Name")) {
tempEmp.setName(tempVal);
}else if (qName.equalsIgnoreCase("Id")) {
tempEmp.setId(Integer.parseInt(tempVal));
}else if (qName.equalsIgnoreCase("Age")) {
tempEmp.setAge(Integer.parseInt(tempVal));
}
}
- 04-20-2011, 08:21 AM #2
Member
- Join Date
- Apr 2011
- Posts
- 2
- Rep Power
- 0
an answer
Well, I thought I'd leave that question up and see if I had any answers by morning. No answers, but now if I search for "java sax reading html", this article comes up 4th in Google, which is an achievement in itself.
Anyway, failing any help, I had to try one of the articles that I found, and managed to get it to work, so if anyone ever discovers this thread, with a similar problem, here is the article I found that answers my question.
java - Retrieving HTML encoded text from XML using SAXParser - Stack Overflow
Similar Threads
-
HTML FILE parsing.
By makpandian in forum New To JavaReplies: 8Last Post: 11-02-2010, 03:00 PM -
How can i move the mouse over a html element within web browser?
By bobomonkey in forum Advanced JavaReplies: 1Last Post: 10-30-2009, 06:47 AM -
parsing and updating html file using JSP.
By tskarthic in forum JavaServer Pages (JSP) and JSTLReplies: 1Last Post: 04-02-2009, 09:12 PM -
Parsing HTML
By jaadu25 in forum Advanced JavaReplies: 6Last Post: 07-20-2008, 06:51 PM -
jeditorpane help parsing html
By asifsolkar in forum Advanced JavaReplies: 4Last Post: 12-14-2007, 05:23 AM


LinkBack URL
About LinkBacks
Reply With Quote
Bookmarks