Results 1 to 7 of 7
Thread: Parsing Text Files
- 02-17-2009, 01:35 AM #1
Member
- Join Date
- Jan 2009
- Posts
- 20
- Rep Power
- 0
Parsing Text Files
Hi,
If I have a text file with a sentence, such as, "Hello, how are you?" and want to search another text file (with a couple of paragraphs of text) for occurrences of these words (not in any given order or sequence), and maintain a count of how many times the words occur, what would be the best approach to to this? Does anyone know of any good examples?
Thank you.
- 02-17-2009, 02:23 AM #2
First, read the First file as a String. Then split the String into an array with split method, use regular expressions. Delimiters should be pretty much every kind of punctuation. Then Read the other file As a String. Then checking for occurences shouldn't be too hard. Heres code for reading files as String:
Then you could do something likeJava Code:private String readFileAsString(String filePath) throws IOException { StringBuilder fileData = new StringBuilder(1000); BufferedReader reader = new BufferedReader( new FileReader(filePath)); char[] buf = new char[1024]; int numRead = 0; while((numRead=reader.read(buf)) != -1){ fileData.append(buf, 0, numRead); } fileData.trimToSize(); reader.close(); return fileData.toString(); }
Replace 'regex' with the appropriate regular expression argument. Don't ask me about that, I don't know much about regular expressions.Java Code:MyClass mc = new MyClass(); String[] words = mc.readFileAsString("../sample.txt").split(regex);
Hope This helped.
-MK12Tell me if you want a cool Java logo avatar like mine and I'll make you one.
- 02-18-2009, 02:41 AM #3
Member
- Join Date
- Jan 2009
- Posts
- 20
- Rep Power
- 0
Thanks, that did help. Although, I'm not really having problems with the reading in of the file, as much as searching for a given set of words. I'm semi-familiar with stringtokenizers. Therefore, for example, if I have the following line of text <a>"Hello, how are you?"</a> how would I code a stringtokenizer to pick out only the words between the <a> tags? Also, assume that there is another set of words between tags that would need read in, as part of the same text file.
- 02-18-2009, 04:30 AM #4
Member
- Join Date
- Feb 2009
- Posts
- 32
- Rep Power
- 0
How about:
public static void find(String delim) {
File dir = new File("sample");
if(dir.exists()) {
String read;
try {
File files[] = dir.listFiles();
for(int i = 0; i < files.length; i++) {
File loaded = files[i];
if(loaded.getName().endsWith(".txt")) {
//System.out.println("Searching " + loaded.getName());
BufferedReader in = new BufferedReader(new FileReader(loaded));
StringBuffer load = new StringBuffer();
while((read = in.readLine()) != null) {
load.append(read + "\n");
}
String delimiter[] = new String(load).split(delim);
if(delimiter.length > 1) {
System.out.println("Found " + (delimiter.length - 1) + " time(s) in " + loaded.getName() + "!");
}
}
}
} catch(Exception e) {
e.printStackTrace();
}
} else {
System.out.println("error: dir wasn't found!");
}
}
- 02-18-2009, 04:32 AM #5
Member
- Join Date
- Feb 2009
- Posts
- 32
- Rep Power
- 0
dunno if that helps but i use it
- 02-18-2009, 06:30 AM #6
Senior Member
- Join Date
- Jan 2009
- Posts
- 671
- Rep Power
- 5
Assuming you are parsing html, you really should use an html parser to help you out, such as javax.swing.text.html.parser.Parser. Then you don't need to worry about malformed html that might include an <a> type tag inside quotes.
- 02-18-2009, 12:08 PM #7
Member
- Join Date
- Jan 2009
- Posts
- 20
- Rep Power
- 0
Similar Threads
-
parsing/storing large text data
By hkansal in forum New To JavaReplies: 4Last Post: 10-19-2008, 06:34 PM -
Behaving text files like binary files
By Farzaneh in forum New To JavaReplies: 2Last Post: 08-27-2008, 03:20 PM -
Parsing MIDI files
By rsk8332 in forum Advanced JavaReplies: 1Last Post: 01-21-2008, 10:43 PM -
packages for parsing docs files
By gabriel in forum Advanced JavaReplies: 1Last Post: 08-06-2007, 03:42 AM


LinkBack URL
About LinkBacks
Reply With Quote

Bookmarks