Results 1 to 1 of 1
- 05-12-2010, 11:11 AM #1Member
- Join Date
- May 2010
- Rep Power
Parsing Real World HTML with XPath support
I am using TagSoup+XOM per:
BadMagicNumber » Using XPath on real-world HTML documents
seems to work well except the following namespace problem:
Dom4j + XPath + TagSoup – Namespaces = sweet! :: Kelvin Tan - Lucene Solr Nutch Consultant
It seems other parsers are available:
Open Source HTML Parsers in Java
some of which support XPath.
Any ideas which is fastest for real-world HTML?
Any ideas if XOM is best way to go, or Dom4j, etc.?
- By thooom in forum XMLReplies: 6Last Post: 04-26-2010, 09:47 AM
- By thooom in forum New To JavaReplies: 6Last Post: 04-25-2010, 04:56 PM
- By yuyu200 in forum XMLReplies: 0Last Post: 11-13-2009, 09:51 PM
- By Mr.Beans in forum Jobs DiscussionReplies: 1Last Post: 08-15-2009, 04:59 AM
- By Zosden in forum Forum LobbyReplies: 6Last Post: 06-25-2008, 05:39 AM