Results 1 to 18 of 18
Thread: Word Occurrence
- 10-21-2011, 02:44 AM #1
Member
- Join Date
- Oct 2011
- Posts
- 14
- Rep Power
- 0
Word Occurrence
Hello all I am working on an assignment that ask me to count the number of words in a text file and the number of occurrences for each and their postition in the text file. Here is what I have so far but I am not sure how I go about entering each word into an array. How do you create a string array for a word?
Thanks in advance
Java Code:import java.io.*; import java.util.*; public class WordEntry { public static void main(String[] args) throws IOException { // Declare variables int words = 0; File openFile = new File("File.txt"); Scanner readFile = new Scanner(openFile); while (readFile.hasNext()) { words++; readFile.next(); } System.out.println("There are " + words + " words in the text."); readFile.close(); System.exit(0); } }
- 10-21-2011, 02:50 AM #2
Re: Word Occurrence
OK, if you are now working on word occurrences then the advice you got in your other thread about using a Map is a good idea. Alternatively what you can do is write your own class that has a String (the word) and an int (the count) as instance variables. What you then can do is create objects of your class and store them in an array.
Read a word
Search array for word
If it exists increment count
Else create new object
- 10-21-2011, 02:52 AM #3
Member
- Join Date
- Oct 2011
- Posts
- 14
- Rep Power
- 0
Re: Word Occurrence
Ok - researching
Thanks again Junky
- 10-21-2011, 04:11 AM #4
Member
- Join Date
- Feb 2011
- Posts
- 24
- Rep Power
- 0
Re: Word Occurrence
in that case, I would suggest an ArrayList rather than an array. You could use a Map<String, Integer> and just modify the value.
- 10-21-2011, 04:15 AM #5
Member
- Join Date
- Oct 2011
- Posts
- 14
- Rep Power
- 0
Re: Word Occurrence
I am dangerously close to figuring this out. How do i remove , : ' ? ; -
Thanks in advanceJava Code:import java.io.*; import java.util.*; public class WordCount { public TreeMap < String, Integer > wordMap = new TreeMap < String, Integer > (); public static void main(String[] args) throws IOException { WordCount w = new WordCount(); w.countWords(); for (Map.Entry<String, Integer> entry : w.wordMap.entrySet()) { System.out.println("Word " + entry.getKey() + " appears " + entry.getValue() + " times at location"); } } void countWords() throws IOException { BufferedReader br = new BufferedReader(new FileReader("File.txt")); String line = ""; while((line = br.readLine()) != null) { String lower = line.toLowerCase(); String[] tokens = lower.split("\\s+"); for(int i = 0; i < tokens.length; i++) { int count = wordMap.get(tokens[i]) == null ? 0 : wordMap.get(tokens[i]); wordMap.put(tokens[i], ++count); } } } }Last edited by DMKanz; 10-21-2011 at 04:17 AM.
- 10-21-2011, 04:28 AM #6
Re: Word Occurrence
Check out the String.replaceAll method. It requires a regular expression. Read the Pattern class for the various character classes you can use in regex's
- 10-21-2011, 04:39 AM #7
Member
- Join Date
- Feb 2011
- Posts
- 24
- Rep Power
- 0
Re: Word Occurrence
does the String.replaceAll method parse the regex as if invoking Pattern.compile because I was wondering if you have to follow the syntax of the regex api for the string methods that use a String for a parameter but name the variable as "regex"?
- 10-21-2011, 04:48 AM #8
Re: Word Occurrence
From the Java API for String.replaceAll
An invocation of this method of the form str.replaceAll(regex, repl) yields exactly the same result as the expression
Pattern.compile(regex).matcher(str).replaceAll(rep l)
- 10-21-2011, 04:54 AM #9
Member
- Join Date
- Oct 2011
- Posts
- 14
- Rep Power
- 0
Re: Word Occurrence
Okay i have figured out replaceAll - One more question - i need the output to tell me the location of each word
(i.e.'n' times at location 'x,x,x,x')
Thanks again Junky
- 10-21-2011, 05:05 AM #10
Re: Word Occurrence
You are going to have to determine what the location is and store that at the same time as you update the count.
- 10-21-2011, 05:08 AM #11
Member
- Join Date
- Feb 2011
- Posts
- 24
- Rep Power
- 0
- 10-21-2011, 05:11 AM #12
Member
- Join Date
- Oct 2011
- Posts
- 14
- Rep Power
- 0
Re: Word Occurrence
Would i create another TreeMap and call it locationMap and use the same process I used for the words to determine where in the file it is located?
- 10-21-2011, 05:14 AM #13
Member
- Join Date
- Feb 2011
- Posts
- 24
- Rep Power
- 0
Re: Word Occurrence
The regex api has ways of doing that. use the Matcher object to search for matches. something like this:
ArrayList<Integer> indexes = new ArrayList<Integer>();
Matcher m = Pattern.compile(regex).matcher(inputString);
while(m.find()){
indexes.add(m.start());
}
and do that for every regex that you need to search for.
- 10-21-2011, 05:27 AM #14
Member
- Join Date
- Oct 2011
- Posts
- 14
- Rep Power
- 0
Re: Word Occurrence
I am not sure I follow. Here is what I have so far and it gives me everything except word location. Are you saying build the arrayList inside the while loop where i read in the file?
Thanks in advance
Java Code:import java.io.*; import java.util.*; public class WordEntry { public TreeMap < String, Integer > wordMap = new TreeMap < String, Integer > (); public static void main(String[] args) throws IOException { WordEntry w = new WordEntry(); w.countWords(); for (Map.Entry<String, Integer> entry : w.wordMap.entrySet()) { System.out.println("Word " + entry.getKey() + " appears " + entry.getValue() + " times at location "); } } void countWords() throws IOException { BufferedReader br = new BufferedReader(new FileReader("File.txt")); String line = ""; while((line = br.readLine()) != null) { String lower = line.toLowerCase(); String st = lower.replaceAll("\\W", " "); String[] tokens = st.split("\\s+"); for(int i = 0; i < tokens.length; i++) { int count = wordMap.get(tokens[i]) == null ? 0 : wordMap.get(tokens[i]); wordMap.put(tokens[i], ++count); } } br.close(); } }
- 10-21-2011, 06:19 AM #15
Member
- Join Date
- Feb 2011
- Posts
- 24
- Rep Power
- 0
Re: Word Occurrence
I was thinking of something more along the lines this:
Note: i did this in word pad without the benefit of a compiler/syntax checker so please forgive and/or ignore any mistakes/typos
Java Code:import java.io.*; import java.util.*; public class WordEntry{ //don't forget about encapsullation. give everything the minimum accessability required. private TreeSet<String> words = new TreeSet<String>(); private TreeMap<String, Integer> wordCount = new TreeMap<String, Integer>(); private TreeMap<String, TreeSet<Integer>> indexes = new TreeMap<String, TreeSet<Integer>>(); public WordEntry(String fileName)throws IOException{ File file = new File(fileName); BufferedReader in = new BufferedReader(new FileReader(file)); String input = ""; String temp = ""; while((temp = in.nextLine()) != null){ input+= temp; } parse(input.toLowerCase()); } private void parse(String input){ //to check for the existence of words and count repeat occurrences String[] tokens = input.split("\\s"); for(String s : tokens){ if(!words.contains(s)){ words.add(s); wordCount.put(s, 1); }else{ wordCount.put(s, wordCount.get(s) + 1); } } //to get the index for each occurrence of each word TreeSet<Integer> temp = null; Matcher m = null; for(String s : words){ temp = new TreeSet<Integer>(); m = Pattern.compile(s).matcher(input); while(m.find()){ temp.add(m.start()); { indexes.put(s, temp); ` } } public String toString(){ return "put the string representation of whatever the results are here"; } public static void main(String[] args){ System.out.println(new WordEntry(args[0])); } }Last edited by kennyman94; 10-21-2011 at 06:21 AM. Reason: proofreading
- 10-21-2011, 04:22 PM #16
Member
- Join Date
- Oct 2011
- Posts
- 14
- Rep Power
- 0
Re: Word Occurrence
Kennyman94 - i tried running your code to step thru the process but I am getting an error:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
at WordEntry.main(WordEntry.java:71)
Here is the code with some modification:
Also not sure what I am to put in the return "put the string representation of whatever the results are here"Java Code:import java.io.*; import java.util.*; import java.util.regex.*; public class WordEntry { //don't forget about encapsullation. give everything the minimum accessability required. private TreeSet<String> words = new TreeSet<String>(); private TreeMap<String, Integer> wordCount = new TreeMap<String, Integer>(); private TreeMap<String, TreeSet<Integer>> indexes = new TreeMap<String, TreeSet<Integer>>(); public WordEntry(String fileName)throws IOException { File file = new File("File.txt"); BufferedReader in = new BufferedReader(new FileReader(file)); String input = ""; String temp = ""; while((temp = in.readLine()) != null) { input += temp; } parse(input.toLowerCase()); } private void parse(String input) { //to check for the existence of words and count repeat occurrences String[] tokens = input.split("\\s"); for(String s : tokens) { if(!words.contains(s)) { words.add(s); wordCount.put(s, 1); } else { wordCount.put(s, wordCount.get(s) + 1); } } //to get the index for each occurrence of each word TreeSet<Integer> temp = null; Matcher m = null; for(String s : words) { temp = new TreeSet<Integer>(); m = Pattern.compile(s).matcher(input); while(m.find()) { temp.add(m.start()); } indexes.put(s, temp); } } public String toString() { return "put the string representation of whatever the results are here"; } public static void main(String[] args) throws IOException { System.out.println(new WordEntry(args[0])); } }
I would like it to say There are x number of words on one line and then word 'a' appears 'x' times at location 'x,x,x,x'
Thanks in advance
- 10-21-2011, 09:20 PM #17
Member
- Join Date
- Feb 2011
- Posts
- 24
- Rep Power
- 0
Re: Word Occurrence
then that is exactly what you put in there. for example:
also the reason you were getting the array index out of bounds exception was because i wrote the class to take the name of the file from the passed in arguments. if you want it to instead use a specific file, then replace the args[0] with a the file's name.Java Code:public String toString(){ String rv = ""; for(String s : words){ String temp = "There are " + wordCount.get(s) + "occurences of " + s + " at: "; for(Integer i : indexes.get(s)){ temp+= i + ","; } rv+=temp.substring(0, temp.length - 1) + "\n";//to get rid of the extra comma } return rv.substring(0, rv.length() - 1);//to get rid of the extra "\n" }Last edited by kennyman94; 10-21-2011 at 09:22 PM. Reason: forgot something
- 10-21-2011, 10:35 PM #18
Member
- Join Date
- Oct 2011
- Posts
- 14
- Rep Power
- 0
Re: Word Occurrence
Got it thank you for your help - one thing i noticed is the it is showing i think the location for each letter and not each word.
For instance here is my output
There are 119 words in the text.
Word a appears 3 times at locations 21,93,97,111,133,137,142,144,150,154,168,219,226,2 38,254,258,263,276,281,285,297,322,331,391,402,407 ,435,449,455,461,466,483,509,531,559,563,569,571
Word against appears 1 times at locations 142
Word and appears 4 times at locations 93,168,219,263,276Last edited by DMKanz; 10-21-2011 at 10:41 PM.
Similar Threads
-
Display occurrence of number based off user input… array
By Jason in forum New To JavaReplies: 14Last Post: 09-12-2011, 11:54 PM -
Count occurrence.
By sinobu in forum New To JavaReplies: 1Last Post: 07-17-2011, 03:36 PM -
lucene3.0.2: getting incorrect no. of occurrence in file
By ranjitots in forum LuceneReplies: 0Last Post: 12-06-2010, 03:36 PM -
Words occurrence counter for any web page
By Dodo in forum New To JavaReplies: 11Last Post: 11-10-2009, 02:16 AM -
Regex Pattern/Matcher - Print only one occurrence!
By racha0601 in forum Advanced JavaReplies: 3Last Post: 04-06-2009, 05:05 PM


LinkBack URL
About LinkBacks
Reply With Quote

Bookmarks