Dear All:

I am using TagSoup+XOM per:
BadMagicNumber Using XPath on real-world HTML documents
seems to work well except the following namespace problem:
Dom4j + XPath + TagSoup – Namespaces = sweet! :: Kelvin Tan - Lucene Solr Nutch Consultant

It seems other parsers are available:
Open Source HTML Parsers in Java

some of which support XPath.

Any ideas which is fastest for real-world HTML?

Any ideas if XOM is best way to go, or Dom4j, etc.?

Thank you
Misha