Results 1 to 6 of 6
  1. #1
    GKUIUC is offline Member
    Join Date
    Oct 2010
    Posts
    2
    Rep Power
    0

    Default String Intern & Java memory

    Hi all,
    I am desparately in need of your help. I am using Java to parse a very big file line by line, split each line into strings and then put the strings in a hashmap. Before putting the strings in hashmap, if I intern them, the program takes very low memory but if I do not do intern, it takes memory almost a factor of 2. I dont understand this since all these strings are temporary references and hashmap should keep only one copy of a string based on content. My code is:

    package test;

    import java.io.BufferedReader;
    import java.io.FileInputStream;
    import java.io.InputStreamReader;
    import java.util.HashMap;
    import java.util.HashSet;
    import java.util.List;
    import java.util.Set;
    import java.util.*;
    import java.io.*;

    public class Trouble
    {
    TreeMap<String, Integer> hs = new TreeMap();
    TreeSet<String> pruned = new TreeSet();
    BufferedReader BR = null;
    int count = 0;

    public Trouble(String paramString)
    {
    String[] arrayOfString = null;
    String str = null;

    try
    {
    this.BR = new BufferedReader(new InputStreamReader(new FileInputStream(paramString)));
    }
    catch (Exception localException1)
    {
    localException1.printStackTrace();
    }

    while (true)
    {
    try
    {
    str = this.BR.readLine();
    count ++;
    } catch (Exception localException2) {
    }
    if (str == null)
    break;
    if(count % 5000 == 0)
    System.out.println(count);

    arrayOfString = str.split("\\s+");
    for (int i = 1; i < arrayOfString.length; i++)
    {
    if (this.hs.containsKey(arrayOfString[i]))
    {
    int d = hs.get(arrayOfString[i]);
    if(d == 1)
    {
    //String internedS = arrayOfString[i];
    String internedS = arrayOfString[i].intern();
    this.hs.put(internedS, hs.get(internedS) + 1);
    }
    }
    else
    {
    //String internedS = arrayOfString[i];
    String internedS = arrayOfString[i].intern();
    this.hs.put(internedS, Integer.valueOf(1));
    }
    }
    }
    arrayOfString = (String[])this.hs.keySet().toArray(new String[0]);
    for (int i = 0; i < arrayOfString.length; i++)
    if (((Integer)this.hs.get(arrayOfString[i])).intValue() >= 2)
    this.pruned.add(arrayOfString[i]);
    this.hs.clear();
    try
    {
    this.BR.close();
    }
    catch (Exception localException3) {
    }
    }

    public static void main(String[] args)
    {
    Trouble e = new Trouble(args[0]);
    }
    }
    If you uncomment the two commented lines and comment the immediate next line, the program will take a lot more memory. I am running it like:

    java -Xmx5g -XX:MaxPermSize=2g -verbose:gc -cp bin:$CLASSPATH:$( echo Jars/*.jar . | sed 's/ /:/g') test.Trouble train-set

    plz help. I dont want to use intern(because its slow) but want to reduce memory. but I dont understand why this memory difference is happening.

    GK

  2. #2
    chyrl is offline Senior Member
    Join Date
    Mar 2010
    Location
    Manila, Philippines
    Posts
    257
    Rep Power
    5

    Default

    Quote Originally Posted by GKUIUC View Post
    Before putting the strings in hashmap, if I intern them, the program takes very low memory but if I do not do intern, it takes memory almost a factor of 2.
    Will the output be different if you didn't intern the process?

    Quote Originally Posted by GKUIUC View Post
    If you uncomment the two commented lines and comment the immediate next line, the program will take a lot more memory.
    Maybe try re-organize the flow of the code.
    I'm seeing duplicate lines. These maybe a factor to the memory issue.
    Every project, package, class, method, variable, syntax, algorithm, etc.
    are registered in my memory bank. Thanks to this thread.

  3. #3
    GKUIUC is offline Member
    Join Date
    Oct 2010
    Posts
    2
    Rep Power
    0

    Default

    Hi chyrl,
    no, the output will not be different.

    duplicate lines are inside if-else , so only one block get executed each time

  4. #4
    chyrl is offline Senior Member
    Join Date
    Mar 2010
    Location
    Manila, Philippines
    Posts
    257
    Rep Power
    5

    Default

    Have you tried using other collection object?
    Every project, package, class, method, variable, syntax, algorithm, etc.
    are registered in my memory bank. Thanks to this thread.

  5. #5
    pbrockway2 is offline Moderator
    Join Date
    Feb 2009
    Location
    New Zealand
    Posts
    4,565
    Rep Power
    12

    Default

    Could you repost using code tags so that the code is readable?

    From what I can gather your code seems to be reading some lines and adding each (whitespace separated) word to a TreeMap instance associating it with an integer value. I get lost at about this point!

    Note that split() returns an array of substrings of the string it was passed (the line in your case). My understanding is that for as long as a reference to that substring exists the whole line will be retained. If the line is long and the bits you want to retain are small you can remove this overhead with:

    Java Code:
    String toStore = new String(arrayOfString[i]);
    hs.put(toStore, 1);

    I'm also not clear about why you are calling intern(). A TreeMap uses the natural ordering of its keys, not ==, to get().

  6. #6
    Norm's Avatar
    Norm is offline Moderator
    Join Date
    Jun 2008
    Location
    SW Missouri
    Posts
    17,306
    Rep Power
    25

    Default

    Comment on the code:
    Instead of using the Integer class to save the count, create your own class with mutator methods etc to keep from creating a new Integer object for every update and having to do another put.

Similar Threads

  1. Java Memory Issue
    By personal in forum Advanced Java
    Replies: 12
    Last Post: 01-07-2012, 02:05 PM
  2. memory leaks in Java
    By Navatha in forum New To Java
    Replies: 8
    Last Post: 09-29-2010, 06:42 PM
  3. Java not using all free memory.
    By abacathoo in forum New To Java
    Replies: 10
    Last Post: 09-13-2010, 11:21 AM
  4. memory game in JAVA
    By lclclc in forum New To Java
    Replies: 19
    Last Post: 10-18-2009, 04:41 PM
  5. Replies: 2
    Last Post: 08-21-2008, 07:33 PM

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •