Results 1 to 3 of 3
- 02-21-2009, 02:58 AM #1
Member
- Join Date
- Feb 2009
- Posts
- 5
- Rep Power
- 0
Java Classes to help extract metadata from Web pages
Hi,
I have a bit of a unique problem. I am trying to write a program that can extract the Title, date and author from an online news article.
Are there any built in classes in the Java libraries that would allow me to do this ?? This is a bit urgent and any help on this would be greatly appreciated.
Thanks
- 02-21-2009, 03:44 AM #2
Senior Member
- Join Date
- Sep 2008
- Posts
- 564
- Rep Power
- 5
look up html parsers that you can use with your program. i've only done xml parsing and that was a year ago, so i don't remember much about it, but here's what a quick google search on "java html parser" got me: Open Source HTML Parsers in Java
- 02-21-2009, 03:54 AM #3
Member
- Join Date
- Feb 2009
- Posts
- 5
- Rep Power
- 0
Thanks a million emceenugget :) .. Absolute Legend :)
This will help a lot. I will give HTML parsing a shot tonite.
In the meanwhile, I have actually never done this. This is my first time trying metadata extraction from web pages. Would you or anyone on this forum know of an example program I can study that does something similar to extracting data from a news article ?? It will give me some insight into how to do this.
Much appreciated.
Similar Threads
-
Java Server Pages
By ninian in forum JavaServer Pages (JSP) and JSTLReplies: 3Last Post: 12-20-2008, 02:49 AM -
How to extract .dat file from the internet using Java?
By burian in forum New To JavaReplies: 3Last Post: 12-09-2008, 08:17 AM -
Java Extract PDF data from location XY
By Unite in forum Advanced JavaReplies: 1Last Post: 06-30-2008, 01:31 PM -
Java Server Pages (I)
By Java Tutorial in forum Java TutorialReplies: 0Last Post: 02-17-2008, 11:17 AM -
Extract Text from PDF File using java
By TSW1016 in forum Advanced JavaReplies: 5Last Post: 01-06-2008, 11:03 PM


LinkBack URL
About LinkBacks
Reply With Quote
Bookmarks