Thread: Problem with an exercise in java

    Hello, i have an exercise in java that, we want to find word frequency in documents.

    e.g.We have 3 documents a1.txt, a2.txt, a3.txt.

    a1 = {aaa, aaa, bbb, kkk, ccc, hhh}
    a2 = {kkk, aaa, hhh, ddd}
    a3 = {mmm, ccc, ccc, hhh}

    So, the df(document frequency) for the distinct words are:

    aaa = 2
    bbb = 1
    ccc = 2
    ddd = 1
    hhh = 3
    mmm = 1
    kkk = 2 etc

    I have an ArrayList of the above class Word that i store various information about distinct words

    Java Code:
    private ArrayList<Word> words = new ArrayList<Word>();
    public static class Word{
            public int lineNum;
            public int index;
            private int df; //document frequency
            public String name;
            public int wordcount;
            private String filename;
            public Word(int lineNum, int index, String name, int wordcount, String filename, int documentfrequency) {
                this.lineNum = lineNum;
                this.index = index;
       = name;
                this.wordcount = wordcount;
                this.filename = filename;
                this.df = df;
            public int getLineNum() {
                return lineNum;
            public int getIndex() {
                return index;
            public String getName() {
                return name;
            //class to sort words by lineNumber
            public static  class CompLineNum implements Comparator<Word> {
                public int compare(Word arg0, Word arg1) {
                    return arg0.lineNum - arg1.lineNum;
            public static class CompName implements Comparator<Word> {
            private int mod = 1;
            public CompName(boolean desc) {
                if (desc) mod =-1;
            public int compare(Word arg0, Word arg1) {
                return mod*;
    so, to find df i read again the documents and i have the above method to find duplicates of words in documents

    Java Code:
        public void findDfOfWord() throws FileNotFoundException, UnsupportedEncodingException, IOException{
            String wordtemp = null;
            String filetemp = null;
            String[] word = null;
            int duplicatewordcount = 0;
            Map<String, Integer> unique = new LinkedHashMap<String, Integer>();
            //scan all files that we choose for the word
            for(int k=0; k<filelist.length; k++){
                    filename = listOfFiles[filelist[k]].getName();
                    FileInputStream fstream = new FileInputStream(folderpath+"/"+filename);           
                    DataInputStream in = new DataInputStream(fstream);         
                    BufferedReader input = new BufferedReader(new InputStreamReader(in,"UTF-8"));
                    String wordd = input.readLine();
                    for(int j=0; j<wordswithoutduplicates.size(); j++){
                        wordtemp = wordswithoutduplicates.get(j).name;
                        filetemp = wordswithoutduplicates.get(j).filename;
                            word = wordd.split("\\s+");
                            [I]if(wordtemp.compareTo(word.toString()) ==0 && filename.compareTo(wordtemp)==0
                                   && duplicatewordcount<1){[/I]
                                if(unique.get(wordtemp) == null)
                                    unique.put(wordtemp, 1);
                                    unique.put(wordtemp, unique.get(wordtemp) + 1);
                                String uniqueString = join(unique.keySet(), ", ");
                                List<Integer> value = new ArrayList<Integer>(unique.values());
                                System.out.println("Output = " + uniqueString);
                                System.out.println("Values = " + value);
                            wordd = input.readLine();
    but at the code in line 26-27, it doesn't find duplicates at the if condition.

    What is the error?
    Use some debugging.
    Put println() statements in there so you can see what it's comparing, because I don't think this:
    Java Code:
    is doing what you think it is.
    WHat do you think the output of 'word.toString()' is going to be?
