Using XPath with an XHTML web site
Hi everyone,
I've been trying to use XPath with an XHTML web site but I can't get it working...
Here's the start of the web page:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"
id="template">
<head>
....
The error I'm getting is:
java.net.MalformedURLException: no protocol: (and then it lists the entire web document string)
The error happens on line: Document doc = builder.parse(string);
I'm guessing because of the DOCTYPE tag which makes the document non-XML because it doesn't have an ending DOCTYPE tag. How do I download an XHTML web page and then use XPath on it?
Code:
import org.apache.commons.httpclient.*;
import org.apache.commons.httpclient.methods.*;
import javax.xml.xpath.*;
import org.w3c.dom.*;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
public class HTTPClientWithXPath {
/**
* @param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
HttpClient client = new HttpClient();
GetMethod method = new GetMethod("http://myURL");
try {
// Execute the method.
int statusCode = client.executeMethod(method);
if (statusCode != HttpStatus.SC_OK) {
System.out.println("Method failed: " + method.getStatusLine());
}
byte[] responseBody = method.getResponseBody();
String string = new String(responseBody);
XPathFactory factory = XPathFactory.newInstance();
DocumentBuilderFactory domfactory = DocumentBuilderFactory.newInstance();
domfactory.setNamespaceAware(true);
try {
DocumentBuilder builder = domfactory.newDocumentBuilder();
Document doc = builder.parse(string);
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("//td");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
} catch (Exception e) {
e.printStackTrace();
}
} catch (Exception e) {
e.printStackTrace();
} finally {
method.releaseConnection();
}
}
}
Thanks!
Chris