Treemap and sort 10 most used words
Hey all! I'm really new to java i got some problems with my code.
I want to load a txt file and sort the 10 most used words in a descending order..
Code:
public static void main(String[] args) throws FileNotFoundException {
File file = new File("hitchhikersguide.txt");
Map<String, Integer> map = new TreeMap<>();
Scanner scan = new Scanner(file).useDelimiter("[^a-zA-Z]+");
while (scan.hasNext()) {
String newWord = scan.next().toLowerCase();
Integer number = new Integer(1);
if (map.containsKey(newWord)) {
number = map.get(newWord) + 1;
}
map.put(newWord, number);
}
Iterator it = map.entrySet().iterator();
while (it.hasNext()) {
Map.Entry pairs = (Map.Entry) it.next();
System.out.println(pairs.getKey() + " = " + pairs.getValue());
}
}
}
As it is right now, it sorts in alphabetical order.
Any tips on what I do wrong?
Thanks!
Re: Treemap and sort 10 most used words
Read the API for TreeMap.
Specifically:
"The map is sorted according to the natural ordering of its keys"
Re: Treemap and sort 10 most used words
What I did is defined a class called Counter that I use to count the occurrences of each word. I believe that the Integer class as a value in the map won't do it because Integer objects are immutable. I defined Counter to be Comparable so that it could easily be sorted by its count.
Here's the code:
Code:
import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Collection;
import java.util.Collections;
import java.util.Scanner;
import java.util.TreeMap;
class Counter implements Comparable<Counter> {
private String word;
private Integer count;
public Counter(String word) { this.word = word; count = 1; }
public int getCount() { return count; }
public void increment() { count++; }
@Override
public int compareTo(Counter that) {
Integer thisCount = this.getCount();
Integer thatCount = that.getCount();
return thisCount.compareTo(thatCount);
}
public String toString() {
return word + " " + count;
}
}
public class MostCommonWords {
public static void main(String[] args) throws FileNotFoundException {
// setup local variables
File file = new File("ChristmasCarol.txt");
TreeMap<String, Counter> map = new TreeMap<String, Counter>();
Scanner scan = new Scanner(file).useDelimiter("[^a-zA-Z]+");
// put all of the words into a HashMap, counting the occurences of each word
while (scan.hasNext()) {
String newWord = scan.next().toLowerCase();
if (map.containsKey(newWord)) {
Counter counter = map.get(newWord);
counter.increment();
} else {
map.put(newWord, new Counter(newWord));
}
}
// get the values from the map
Collection<Counter> values = map.values();
// sort the values by occurences instead of alphabetically
ArrayList<Counter> list = new ArrayList<Counter>(values); // copy into an ArrayList so that the values can be sorted
Collections.sort(list); // use Collections.sort() because it's easy
// print out the 10 most common words from the whole file
for (int i = list.size()-1; i >= list.size()-11; i--) {
Counter counter = list.get(i);
System.out.println(counter);
}
}
}
Re: Treemap and sort 10 most used words
Quote:
I believe that the Integer class as a value in the map won't do it because Integer objects are immutable.
I don't think the immutability of Integer is a big problem. Analogously ordinary numbers are immutable in the same way: I mean 42 is always 42, it won't ever change its value. But if I'm counting eggs or something and find I have 3 1/2 dozen then 42 will do as a perfectly good counter value. As I count, each time I encounter a new egg I throw away the old (immutable) natural number I was using as the value and replace it with another (immutable) natural number that's one bigger. The OP's original code did much the same thing with Integer instances.
---
A little point, but since we've established that the SortedMap implementations sort on key rather than on value there doesn't seem much point in sticking with TreeMap. There might be some *other* reason why alphabetical listings could be needed, of course, but none have been given here.
---
Another way would be to use a <String,Integer> map as the OP has, and sort the map entries rather than the values (where an "entry" is a key/value pair). I haven't done this but the collections framework seems to include the relevant pieces: (1) a way of obtaining the set of entries (2) a way of constructing a list from the set and (3) A way of sorting the list (by supplying a Comparator based on the fact that Integer is comparable).
Re: Treemap and sort 10 most used words
Thx for the help! I kept reading on some forums and it helped me on solution.
Code:
public static void main(String[] args) throws FileNotFoundException {
File file = new File("hitchhikersguide.txt");
Map<String, Integer> map = new TreeMap<>();
ValueComparator comp = new ValueComparator(map);
TreeMap<String, Integer> sorted_map = new TreeMap<String, Integer>(comp);
Scanner sc = new Scanner(file).useDelimiter("[^a-zA-Z]+");
while (sc.hasNext()) {
String newWord = sc.next().toLowerCase(); r
Integer number = new Integer(1);
if (map.containsKey(newWord)) {
number = map.get(newWord) + 1;
}
map.put(newWord, number);
}
sorted_map.putAll(map);
Iterator< Map.Entry< String, Integer>> it = sorted_map.entrySet().iterator();
for (int i = 0; i <= 10; i++) {
Map.Entry< String, Integer> resultat = it.next();
System.out.println("results: " + resultat.getKey() + " = " + resultat.getValue());
}
}
}
class ValueComparator implements Comparator<String> {
Map<String, Integer> base;
public ValueComparator(Map<String, Integer> base) {
this.base = base;
}
@Override
public int compare(String a, String b) {
if (base.get(a) >= base.get(b)) {
return -1;
} else {
return 1;
}
}
}