Results 1 to 7 of 7
  1. #1
    coder09 is offline Member
    Join Date
    Jan 2009
    Posts
    20
    Rep Power
    0

    Default Parsing Text Files

    Hi,

    If I have a text file with a sentence, such as, "Hello, how are you?" and want to search another text file (with a couple of paragraphs of text) for occurrences of these words (not in any given order or sequence), and maintain a count of how many times the words occur, what would be the best approach to to this? Does anyone know of any good examples?

    Thank you.

  2. #2
    MK12's Avatar
    MK12 is offline Senior Member
    Join Date
    Jan 2009
    Posts
    185
    Rep Power
    6

    Default

    First, read the First file as a String. Then split the String into an array with split method, use regular expressions. Delimiters should be pretty much every kind of punctuation. Then Read the other file As a String. Then checking for occurences shouldn't be too hard. Heres code for reading files as String:
    Java Code:
    private String readFileAsString(String filePath) throws IOException {
            StringBuilder fileData = new StringBuilder(1000);
            BufferedReader reader = new BufferedReader(
                    new FileReader(filePath));
            char[] buf = new char[1024];
            int numRead = 0;
            while((numRead=reader.read(buf)) != -1){
                fileData.append(buf, 0, numRead);
            }
            fileData.trimToSize();
            reader.close();
            return fileData.toString();
    }
    Then you could do something like
    Java Code:
    MyClass mc = new MyClass();
    String[] words = mc.readFileAsString("../sample.txt").split(regex);
    Replace 'regex' with the appropriate regular expression argument. Don't ask me about that, I don't know much about regular expressions.
    Hope This helped.
    -MK12
    Tell me if you want a cool Java logo avatar like mine and I'll make you one.

  3. #3
    coder09 is offline Member
    Join Date
    Jan 2009
    Posts
    20
    Rep Power
    0

    Default

    Thanks, that did help. Although, I'm not really having problems with the reading in of the file, as much as searching for a given set of words. I'm semi-familiar with stringtokenizers. Therefore, for example, if I have the following line of text <a>"Hello, how are you?"</a> how would I code a stringtokenizer to pick out only the words between the <a> tags? Also, assume that there is another set of words between tags that would need read in, as part of the same text file.

  4. #4
    Samgetsmoney is offline Member
    Join Date
    Feb 2009
    Posts
    32
    Rep Power
    0

    Default

    How about:

    public static void find(String delim) {
    File dir = new File("sample");
    if(dir.exists()) {
    String read;
    try {
    File files[] = dir.listFiles();
    for(int i = 0; i < files.length; i++) {
    File loaded = files[i];
    if(loaded.getName().endsWith(".txt")) {
    //System.out.println("Searching " + loaded.getName());
    BufferedReader in = new BufferedReader(new FileReader(loaded));
    StringBuffer load = new StringBuffer();
    while((read = in.readLine()) != null) {
    load.append(read + "\n");
    }
    String delimiter[] = new String(load).split(delim);
    if(delimiter.length > 1) {
    System.out.println("Found " + (delimiter.length - 1) + " time(s) in " + loaded.getName() + "!");
    }
    }
    }
    } catch(Exception e) {
    e.printStackTrace();
    }
    } else {
    System.out.println("error: dir wasn't found!");
    }
    }

  5. #5
    Samgetsmoney is offline Member
    Join Date
    Feb 2009
    Posts
    32
    Rep Power
    0

    Default

    dunno if that helps but i use it

  6. #6
    toadaly is offline Senior Member
    Join Date
    Jan 2009
    Posts
    671
    Rep Power
    6

    Default

    Assuming you are parsing html, you really should use an html parser to help you out, such as javax.swing.text.html.parser.Parser. Then you don't need to worry about malformed html that might include an <a> type tag inside quotes.

  7. #7
    coder09 is offline Member
    Join Date
    Jan 2009
    Posts
    20
    Rep Power
    0

    Default

    Thanks. I'll try it tonight after work and report back.

    Quote Originally Posted by Samgetsmoney View Post
    dunno if that helps but i use it

Similar Threads

  1. parsing/storing large text data
    By hkansal in forum New To Java
    Replies: 4
    Last Post: 10-19-2008, 07:34 PM
  2. Behaving text files like binary files
    By Farzaneh in forum New To Java
    Replies: 2
    Last Post: 08-27-2008, 04:20 PM
  3. Parsing MIDI files
    By rsk8332 in forum Advanced Java
    Replies: 1
    Last Post: 01-21-2008, 11:43 PM
  4. packages for parsing docs files
    By gabriel in forum Advanced Java
    Replies: 1
    Last Post: 08-06-2007, 04:42 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •