Results 1 to 3 of 3
- 10-26-2013, 03:31 AM #1
Member
- Join Date
- Jul 2009
- Posts
- 37
- Rep Power
- 0
Parse XML SAX with no </> (end tags) for child nodes.
Hi ,
I am trying to parse an xml like below:
<top>
<num> Number: 301
<title> International Organized Crime
<desc> Description:
Identify organizations that participate in international criminal
<narr> Narrative:
A relevant document must as a minimum identify the organization and the
type of illegal activity (e.g., Columbian cartel exporting cocaine).
</top>
<top>
<num> Number: 302
<title> Poliomyelitis and Post-Polio
<desc> Description:
Is the disease of Poliomyelitis (polio) under control in the
world?
<narr> Narrative:
Relevant documents should contain data or outbreaks of the
polio disease (large or small scale), medical protection
</top>
The element type "narr" must be terminated by the matching end-tag "</narr>"
public class QueryExpantion {
void run1(){
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
DefaultHandler handler = new DefaultHandler() {
boolean top = false;
boolean desc = false;
boolean num = false;
boolean narr = false;
String topString = "";
String descString = "";
String numString = "";
String narrString = "";
public void startElement(String uri, String localName,
String qName, Attributes attributes)
throws SAXException {
if (qName.equalsIgnoreCase("top")) {
top = true;
}
if (qName.equalsIgnoreCase("desc")) {
desc = true;
}
if (qName.equalsIgnoreCase("num")) {
num = true;
}
if (qName.equalsIgnoreCase("narr")) {
narr = true;
}
}
public void characters(char ch[], int start, int length)
throws SAXException {
if (top) {
topString = new String(ch, start, length);
}
if (desc) {
descString = new String(ch, start, length);
}
if (num) {
numString = new String(ch, start, length);
}
if (narr) {
narrString = new String(ch, start, length);
}
}
public void endElement(String uri, String localName,
String qName) throws SAXException {
if (qName.equalsIgnoreCase("top")) {
num = false;
desc = false;
narr = false;
top = false;
}
}
};
saxParser.parse("/home/munish/Documents/trec-demo-master/test-data/topics.301-450", handler);
} catch (Exception e) {
e.printStackTrace();
}
}
}
- 10-26-2013, 09:45 AM #2
Senior Member
- Join Date
- Jun 2008
- Posts
- 2,568
- Rep Power
- 14
Re: Parse XML SAX with no </> (end tags) for child nodes.
THAT is NOT proper XML. You could use an XML parser (MAYBE) to get the varying "top" groups, but the rest WILL NOT parse with any XML parser, and, because it contains a bunch of "open" tags, but no "close" tags, you probably can't even parse the "top" groups.
- 10-26-2013, 10:07 AM #3
- Join Date
- Sep 2008
- Location
- Voorschoten, the Netherlands
- Posts
- 14,422
- Blog Entries
- 7
- Rep Power
- 28
Re: Parse XML SAX with no </> (end tags) for child nodes.
XML is not HTML; i.e. it is very 'strict' when it comes to matching open and close tags; e.g. in HTML you can make a mess out of it (some Lisps had a similar 'feature' where a single ']' matched all open'('); thank the gods for XMLs 'strictness'.
kind regards,
JosBuild a wall around Donald Trump; I'll pay for it.
Similar Threads
-
Showing only the child nodes for a selected node
By KarlNorway in forum XMLReplies: 11Last Post: 02-08-2012, 03:43 PM -
How to get complete child nodes of an xml
By poorni in forum XMLReplies: 1Last Post: 07-14-2010, 09:58 AM -
parse XML tags (urgent)
By Cylab in forum New To JavaReplies: 5Last Post: 07-12-2010, 02:57 PM -
Sax XML parse - printing outer not inner nodes
By Fliz in forum XMLReplies: 3Last Post: 01-29-2010, 09:05 AM -
How to parse HTML tags
By Ada in forum Advanced JavaReplies: 1Last Post: 05-31-2007, 10:42 PM
Bookmarks