Results 1 to 1 of 1
- 02-23-2011, 01:54 PM #1
Member
- Join Date
- Feb 2011
- Posts
- 4
- Rep Power
- 0
How to read SGML files using Java
Hi..
I'm doing my project on Text Categorization.I've got a text categorisation test collection called Reuters-21578 for my Information Retrieval project. It is distributed in 22 files. Each of the first 21 files (reut2-000.sgm through reut2-020.sgm) contains 1000 documents, while the last (reut2-021.sgm) contains 578 documents. The files are in SGML format. Each of the 22 files begins with a document type declaration line:
<!DOCTYPE lewis SYSTEM "lewis.dtd"> The DTD file lewis.dtd is included in the distribution. Following the document type declaration line are individual Reuters articles marked up with SGML tags.
I need help to write a java program to read those 21578 documents or transform them into 21578 seperated text files.
Plzz...
Similar Threads
-
How to read SGML files using Java
By priyanka588 in forum New To JavaReplies: 3Last Post: 02-23-2011, 05:39 PM -
Read from two files
By globo in forum New To JavaReplies: 1Last Post: 01-29-2011, 10:25 PM -
Files, cannot read!
By LennyKosmos in forum New To JavaReplies: 2Last Post: 10-02-2010, 01:35 PM -
How Read and Write XMl files using Java
By tjs in forum SWT / JFaceReplies: 0Last Post: 02-23-2009, 12:19 PM -
How to read pdf files from the folder using java?
By hnj81 in forum New To JavaReplies: 2Last Post: 02-20-2009, 07:13 AM


LinkBack URL
About LinkBacks

Bookmarks