Results 1 to 18 of 18

Thread: Word Occurrence

  1. #1
    DMKanz is offline Member
    Join Date
    Oct 2011
    Posts
    14
    Rep Power
    0

    Default Word Occurrence

    Hello all I am working on an assignment that ask me to count the number of words in a text file and the number of occurrences for each and their postition in the text file. Here is what I have so far but I am not sure how I go about entering each word into an array. How do you create a string array for a word?

    Thanks in advance

    Java Code:
    
    import java.io.*; 
    import java.util.*; 
    
    
    public class WordEntry
    {
    	
    	public static void main(String[] args) throws IOException
    	{
    		// Declare variables
    		int words = 0; 
    	
    		File openFile = new File("File.txt"); 
    		Scanner readFile = new Scanner(openFile); 
    
    		
    		while (readFile.hasNext())
    		{
    			words++;
    			readFile.next();
    		}
        
    		
    		System.out.println("There are " + words + " words in the text.");
    		readFile.close(); 
    
    		System.exit(0);
    	} 
    }

  2. #2
    Junky's Avatar
    Junky is online now Grand Poobah
    Join Date
    Jan 2011
    Location
    Dystopia
    Posts
    3,781
    Rep Power
    7

    Default Re: Word Occurrence

    OK, if you are now working on word occurrences then the advice you got in your other thread about using a Map is a good idea. Alternatively what you can do is write your own class that has a String (the word) and an int (the count) as instance variables. What you then can do is create objects of your class and store them in an array.

    Read a word
    Search array for word
    If it exists increment count
    Else create new object

  3. #3
    DMKanz is offline Member
    Join Date
    Oct 2011
    Posts
    14
    Rep Power
    0

    Default Re: Word Occurrence

    Ok - researching

    Thanks again Junky

  4. #4
    kennyman94 is offline Member
    Join Date
    Feb 2011
    Posts
    24
    Rep Power
    0

    Default Re: Word Occurrence

    in that case, I would suggest an ArrayList rather than an array. You could use a Map<String, Integer> and just modify the value.

  5. #5
    DMKanz is offline Member
    Join Date
    Oct 2011
    Posts
    14
    Rep Power
    0

    Default Re: Word Occurrence

    I am dangerously close to figuring this out. How do i remove , : ' ? ; -

    Java Code:
    import java.io.*;
    import java.util.*;
    
    public class WordCount
    {
    	public TreeMap < String, Integer > wordMap = new TreeMap < String, Integer > ();
    
    	public static void main(String[] args) throws IOException
    	{
    		WordCount w = new WordCount();
    		w.countWords();
    		
    			for (Map.Entry<String, Integer> entry : w.wordMap.entrySet())
    			{
    				System.out.println("Word " + entry.getKey() + " appears " + entry.getValue() + " times at location");
    			}
    	}
    	
    	void countWords() throws IOException
    	{
    		BufferedReader br = new BufferedReader(new FileReader("File.txt"));
    		
    		String line = "";
    		
    		while((line = br.readLine()) != null)
    		{
    			
    			String lower = line.toLowerCase();
    			
    			String[] tokens = lower.split("\\s+");
    			
    			for(int i = 0; i < tokens.length; i++)
    			{
    				int count = wordMap.get(tokens[i]) == null ? 0 : wordMap.get(tokens[i]);
    				wordMap.put(tokens[i], ++count);
    			}
    		}
    		
    	}
    }
    Thanks in advance
    Last edited by DMKanz; 10-21-2011 at 04:17 AM.

  6. #6
    Junky's Avatar
    Junky is online now Grand Poobah
    Join Date
    Jan 2011
    Location
    Dystopia
    Posts
    3,781
    Rep Power
    7

    Default Re: Word Occurrence

    Check out the String.replaceAll method. It requires a regular expression. Read the Pattern class for the various character classes you can use in regex's

  7. #7
    kennyman94 is offline Member
    Join Date
    Feb 2011
    Posts
    24
    Rep Power
    0

    Default Re: Word Occurrence

    does the String.replaceAll method parse the regex as if invoking Pattern.compile because I was wondering if you have to follow the syntax of the regex api for the string methods that use a String for a parameter but name the variable as "regex"?

  8. #8
    Junky's Avatar
    Junky is online now Grand Poobah
    Join Date
    Jan 2011
    Location
    Dystopia
    Posts
    3,781
    Rep Power
    7

    Default Re: Word Occurrence

    From the Java API for String.replaceAll

    An invocation of this method of the form str.replaceAll(regex, repl) yields exactly the same result as the expression

    Pattern.compile(regex).matcher(str).replaceAll(rep l)

  9. #9
    DMKanz is offline Member
    Join Date
    Oct 2011
    Posts
    14
    Rep Power
    0

    Default Re: Word Occurrence

    Okay i have figured out replaceAll - One more question - i need the output to tell me the location of each word
    (i.e.'n' times at location 'x,x,x,x')

    Thanks again Junky

  10. #10
    Junky's Avatar
    Junky is online now Grand Poobah
    Join Date
    Jan 2011
    Location
    Dystopia
    Posts
    3,781
    Rep Power
    7

    Default Re: Word Occurrence

    You are going to have to determine what the location is and store that at the same time as you update the count.

  11. #11
    kennyman94 is offline Member
    Join Date
    Feb 2011
    Posts
    24
    Rep Power
    0

    Default Re: Word Occurrence

    Quote Originally Posted by Junky View Post
    From the Java API for String.replaceAll

    An invocation of this method of the form str.replaceAll(regex, repl) yields exactly the same result as the expression

    Pattern.compile(regex).matcher(str).replaceAll(rep l)
    I sort of figured. Thanks. that is a good thing to know.

  12. #12
    DMKanz is offline Member
    Join Date
    Oct 2011
    Posts
    14
    Rep Power
    0

    Default Re: Word Occurrence

    Would i create another TreeMap and call it locationMap and use the same process I used for the words to determine where in the file it is located?

  13. #13
    kennyman94 is offline Member
    Join Date
    Feb 2011
    Posts
    24
    Rep Power
    0

    Default Re: Word Occurrence

    The regex api has ways of doing that. use the Matcher object to search for matches. something like this:

    ArrayList<Integer> indexes = new ArrayList<Integer>();
    Matcher m = Pattern.compile(regex).matcher(inputString);
    while(m.find()){
    indexes.add(m.start());
    }

    and do that for every regex that you need to search for.

  14. #14
    DMKanz is offline Member
    Join Date
    Oct 2011
    Posts
    14
    Rep Power
    0

    Default Re: Word Occurrence

    I am not sure I follow. Here is what I have so far and it gives me everything except word location. Are you saying build the arrayList inside the while loop where i read in the file?

    Thanks in advance

    Java Code:
    
    import java.io.*; 
    import java.util.*; 
    
    public class WordEntry
    {
    	public TreeMap < String, Integer > wordMap = new TreeMap < String, Integer > ();
    
    	public static void main(String[] args) throws IOException
    	{
    		WordEntry w = new WordEntry();
    		w.countWords();
    		
    			for (Map.Entry<String, Integer> entry : w.wordMap.entrySet())
    			{
    				System.out.println("Word " + entry.getKey() + " appears " + entry.getValue() + " times at location ");
    			}
    	}
    	
    		void countWords() throws IOException
    		{
    			BufferedReader br = new BufferedReader(new FileReader("File.txt"));
    		
    			String line = "";
    		
    			while((line = br.readLine()) != null)
    			{
    			
    				String lower = line.toLowerCase();
    				String st = lower.replaceAll("\\W", " ");
    			
    				String[] tokens = st.split("\\s+");
    			
    				for(int i = 0; i < tokens.length; i++)
    				{
    					int count = wordMap.get(tokens[i]) == null ? 0 : wordMap.get(tokens[i]);
    					wordMap.put(tokens[i], ++count);
    				}
    			}
    			
    			br.close();
                            }
    }

  15. #15
    kennyman94 is offline Member
    Join Date
    Feb 2011
    Posts
    24
    Rep Power
    0

    Default Re: Word Occurrence

    I was thinking of something more along the lines this:

    Note: i did this in word pad without the benefit of a compiler/syntax checker so please forgive and/or ignore any mistakes/typos

    Java Code:
    import java.io.*;
    import java.util.*;
    
    public class WordEntry{
    	//don't forget about encapsullation. give everything the minimum accessability required.
    	private TreeSet<String> words = new TreeSet<String>();
    	private TreeMap<String, Integer> wordCount = new TreeMap<String, Integer>();
    	private TreeMap<String, TreeSet<Integer>> indexes = new TreeMap<String, TreeSet<Integer>>();
    
    	public WordEntry(String fileName)throws IOException{
    		File file = new File(fileName);
    		BufferedReader in = new BufferedReader(new FileReader(file));
    		String input = "";
    		String temp = "";
    		while((temp = in.nextLine()) != null){
    			input+= temp;
    		}
    		parse(input.toLowerCase());
    	}
    
    	private void parse(String input){
    		//to check for the existence of words and count repeat occurrences
    		String[] tokens = input.split("\\s");
    		for(String s : tokens){
    			if(!words.contains(s)){
    				words.add(s);
    				wordCount.put(s, 1);
    			}else{
    				wordCount.put(s, wordCount.get(s) + 1);
    			}
    		}
    		
    		//to get the index for each occurrence of each word
    		TreeSet<Integer> temp = null;
    		Matcher m = null;
    		for(String s : words){
    			temp = new TreeSet<Integer>();
    			m = Pattern.compile(s).matcher(input);
    			while(m.find()){
    				temp.add(m.start());
    			{
    			indexes.put(s, temp);
    `		}
    	}
    
    	public String toString(){
    		return "put the string representation of whatever the results are here";
    	}
    	
    	public static void main(String[] args){
    		System.out.println(new WordEntry(args[0]));
    	}
    }
    Last edited by kennyman94; 10-21-2011 at 06:21 AM. Reason: proofreading

  16. #16
    DMKanz is offline Member
    Join Date
    Oct 2011
    Posts
    14
    Rep Power
    0

    Default Re: Word Occurrence

    Kennyman94 - i tried running your code to step thru the process but I am getting an error:

    Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
    at WordEntry.main(WordEntry.java:71)

    Here is the code with some modification:

    Java Code:
    import java.io.*;
    import java.util.*;
    import java.util.regex.*;
    
    
     public class WordEntry
    {
    	 //don't forget about encapsullation. give everything the minimum accessability required.
    	 private TreeSet<String> words = new TreeSet<String>();
    	 private TreeMap<String, Integer> wordCount = new TreeMap<String, Integer>();
    	 private TreeMap<String, TreeSet<Integer>> indexes = new TreeMap<String, TreeSet<Integer>>();
    
    	 public WordEntry(String fileName)throws IOException
    	 {
    		 File file = new File("File.txt");
    		 BufferedReader in = new BufferedReader(new FileReader(file));
    		 String input = "";
    		 String temp = "";
    
    		 while((temp = in.readLine()) != null)
    		 {
    			 input += temp;
    		 }
    
    		 	parse(input.toLowerCase());
    	 }
    
    	 private void parse(String input)
    	 {
    		 //to check for the existence of words and count repeat occurrences
    		 String[] tokens = input.split("\\s");
            
    		 for(String s : tokens)
    		 {
    			 if(!words.contains(s))
    			 {
    				 words.add(s);
    				 wordCount.put(s, 1);
    			 }
    			 else
    			 {
    				 wordCount.put(s, wordCount.get(s) + 1);
    			 }
    		 }
       
    		 //to get the index for each occurrence of each word
    
    		 TreeSet<Integer> temp = null;
    		 Matcher m = null;
    
    		 for(String s : words)
    		 {
    			 temp = new TreeSet<Integer>();
    			 m = Pattern.compile(s).matcher(input);
     
    			 while(m.find())
    			 {
    				 temp.add(m.start());
    			 }
    			 indexes.put(s, temp);
    		 }
    	 }
    
    	 public String toString()
    	 {
    		 return "put the string representation of whatever the results are here";
    	 }
             
    	 public static void main(String[] args) throws IOException
    	 {
    		 System.out.println(new WordEntry(args[0]));
    	 }
    }
    Also not sure what I am to put in the return "put the string representation of whatever the results are here"

    I would like it to say There are x number of words on one line and then word 'a' appears 'x' times at location 'x,x,x,x'

    Thanks in advance

  17. #17
    kennyman94 is offline Member
    Join Date
    Feb 2011
    Posts
    24
    Rep Power
    0

    Default Re: Word Occurrence

    Quote Originally Posted by DMKanz View Post
    I would like it to say There are x number of words on one line and then word 'a' appears 'x' times at location 'x,x,x,x'
    then that is exactly what you put in there. for example:
    Java Code:
    public String toString(){
        String rv = "";
        for(String s : words){
        	String temp = "There are " + wordCount.get(s) + "occurences of " + s + " at: ";
            for(Integer i : indexes.get(s)){
                temp+= i + ",";
            }
            rv+=temp.substring(0, temp.length - 1) + "\n";//to get rid of the extra comma
        }
        return rv.substring(0, rv.length() - 1);//to get rid of the extra "\n"
    }
    also the reason you were getting the array index out of bounds exception was because i wrote the class to take the name of the file from the passed in arguments. if you want it to instead use a specific file, then replace the args[0] with a the file's name.
    Last edited by kennyman94; 10-21-2011 at 09:22 PM. Reason: forgot something

  18. #18
    DMKanz is offline Member
    Join Date
    Oct 2011
    Posts
    14
    Rep Power
    0

    Default Re: Word Occurrence

    Got it thank you for your help - one thing i noticed is the it is showing i think the location for each letter and not each word.

    For instance here is my output

    There are 119 words in the text.
    Word a appears 3 times at locations 21,93,97,111,133,137,142,144,150,154,168,219,226,2 38,254,258,263,276,281,285,297,322,331,391,402,407 ,435,449,455,461,466,483,509,531,559,563,569,571
    Word against appears 1 times at locations 142
    Word and appears 4 times at locations 93,168,219,263,276
    Last edited by DMKanz; 10-21-2011 at 10:41 PM.

Similar Threads

  1. Replies: 14
    Last Post: 09-12-2011, 11:54 PM
  2. Count occurrence.
    By sinobu in forum New To Java
    Replies: 1
    Last Post: 07-17-2011, 03:36 PM
  3. Replies: 0
    Last Post: 12-06-2010, 03:36 PM
  4. Words occurrence counter for any web page
    By Dodo in forum New To Java
    Replies: 11
    Last Post: 11-10-2009, 02:16 AM
  5. Regex Pattern/Matcher - Print only one occurrence!
    By racha0601 in forum Advanced Java
    Replies: 3
    Last Post: 04-06-2009, 05:05 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •