Results 1 to 18 of 18
  1. #1
    newsomjk is offline Member
    Join Date
    Jul 2012
    Posts
    4
    Rep Power
    0

    Default Parsing Huge Data Files

    My group is working on a project where we display a 3D neuron network which is loaded via txt file.

    The txt file is setup in the following manner:

    {Neuron ID} {x-pos} {y-pos} {z-pos} {neuron type} {{connecting id, charge}...{connecting id, charge}}
    (the average amount of connections is 7)

    The problem we're having is that when we load a network that's 4000 neurons long, the program crashes due to heap space. We're trying to find a better way of storing this kind of data instead of having a huge array of Neuron objects.

    Any advice would be greatly appreciated!

    source can be viewed here:
    / - neuronnetworksim - Neuron Network Simulation - Google Project Hosting

  2. #2
    JosAH's Avatar
    JosAH is online now Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,361
    Blog Entries
    7
    Rep Power
    20

    Default Re: Parsing Huge Data Files

    4000 neurons isn't that much and shouldn't cause an OutOfMemoryError; something else is wrong.

    kind regards,

    Jos
    cenosillicaphobia: the fear for an empty beer glass

  3. #3
    kjkrum's Avatar
    kjkrum is offline Senior Member
    Join Date
    Apr 2011
    Location
    Tucson, AZ
    Posts
    1,060
    Rep Power
    6

    Default Re: Parsing Huge Data Files

    This seems obvious, but have you changed the heap space allocated to Java? I've found the default heap space on OS X to be too small even for some relatively trivial programs.
    Get in the habit of using standard Java naming conventions!

  4. #4
    newsomjk is offline Member
    Join Date
    Jul 2012
    Posts
    4
    Rep Power
    0

    Default Re: Parsing Huge Data Files

    I mentioned that but the other guy in my group said something about we can't do that if we plan on distributing the program? I'll look into that some more.

    Is there any better way to manage the data though? instead of holding a full array of 4000 neuron objects? does a string take less memory than an object? in that case I could save the strings and just have a smaller array of "visible" neurons or something alone those lines?

  5. #5
    DarrylBurke's Avatar
    DarrylBurke is offline Member
    Join Date
    Sep 2008
    Location
    Madgaon, Goa, India
    Posts
    11,188
    Rep Power
    19

    Default Re: Parsing Huge Data Files

    Quote Originally Posted by newsomjk View Post
    does a string take less memory than an object?
    A String *is* an Object.

    db
    If you're forever cleaning cobwebs, it's time to get rid of the spiders.

  6. #6
    kjkrum's Avatar
    kjkrum is offline Senior Member
    Join Date
    Apr 2011
    Location
    Tucson, AZ
    Posts
    1,060
    Rep Power
    6

    Default Re: Parsing Huge Data Files

    Quote Originally Posted by newsomjk View Post
    I mentioned that but the other guy in my group said something about we can't do that if we plan on distributing the program?
    That's nonsense. Each particular Java installation has its own default heap size, either configured in a system environment variable or in a script/batch file that runs java. I'm sure there's a default/fallback size, too. You can change the heap size with a command line switch, and (I think - Google it) settings within the metadata of an executable jar. This isn't just something you can do; it's probably something you should do if you want to distribute your program without headaches.

    Quote Originally Posted by newsomjk View Post
    Is there any better way to manage the data though? instead of holding a full array of 4000 neuron objects?
    That depends on what you're doing with it. Converting it to another format and outputting it serially? Then yeah, you could use a smaller buffer at the possible expense of slowing down the I/O. But if you need to work with the data as a whole, there's probably no way around keeping it all in memory. As JosAH said, 4000 objects doesn't seem like many, especially in light of what you told us about the data format. Can you post the source of your Neuron class? Maybe there's something wrong with it.
    Get in the habit of using standard Java naming conventions!

  7. #7
    Tolls is online now Moderator
    Join Date
    Apr 2009
    Posts
    11,820
    Rep Power
    19

    Default Re: Parsing Huge Data Files

    {Neuron ID} long
    {x-pos} long
    {y-pos} long
    {z-pos} long
    {neuron type} 10 char String
    {
    {connecting id long
    charge long} x 7
    }

    This is being pessimistic I suspect, but that lot there is at worst 0.25Kb.
    4000 of them is therefore no more than 1Mb.
    If that's throwing a heap problem then you're working on 20+ year old kit.
    I can't believe OSX would have such a small default heap.

    I would suggest taking a heap dump and analysing it, because at the moment you're guessing as to the cause.
    Please do not ask for code as refusal often offends.

    ** This space for rent **

  8. #8
    kjkrum's Avatar
    kjkrum is offline Senior Member
    Join Date
    Apr 2011
    Location
    Tucson, AZ
    Posts
    1,060
    Rep Power
    6

    Default Re: Parsing Huge Data Files

    (The OP didn't say they were using OSX; that was my guess based on the fact that when I tested a fairly trivial app of my own, it worked fine on Win7 and Linux, but needed the heap size enlarged on OSX.)
    Get in the habit of using standard Java naming conventions!

  9. #9
    newsomjk is offline Member
    Join Date
    Jul 2012
    Posts
    4
    Rep Power
    0

    Default Re: Parsing Huge Data Files

    Actually, the other guy is the one that has the most issues with the heap space and his is OSX. I guess I'll have to look into changing it on the jars manifest file like you mentioned. Here's the code for the Neuron.java

    Java Code:
    package neuronvisualizer;
    
    import com.threed.jpct.Object3D;
    import com.threed.jpct.SimpleVector;
    
            /**
             * Neuron Class
             * @author John Newsom
             * 
             * This is the class that will hold all the data required
             * to represent a neuron in the network. The network in the
             * simulator will be a HashMap consisting of instances of 
             * this class, all linked to their ID for easy referencing.
             */
    
    
    
    public class Neuron{
            private int ID;
            private Type neuronType;
            private int xPos;
            private int yPos;
            private int zPos;
            private Layer layer;
            private Connection[] connections;
            private Object3D model;
            private boolean sectionShown, typeShown;
            
            /**
             * Default constructor, sets all properties to empty values
             */
            public Neuron(){
                    sectionShown = true;
                    typeShown = true;
                    ID = -1;
                    neuronType = Type.FS;
                    xPos = 0;
                    yPos = 0;
                    zPos = 0;
                    layer = Layer.II;
                    connections = null;
                    model = null;
            }
            
            /**
             * Main constructor, includes all necessary values
             */
            public Neuron(int IDnum,Type type,Layer lay,int x,int y,int z,Connection[] c,Object3D m){
                    sectionShown = true;
                    typeShown = true;
                    ID = IDnum;
                    neuronType = type;
                    layer = lay;
                    xPos = x;
                    yPos = y;
                    zPos = z;
                    connections = c;        
                    model = m;
                    model.setSelectable(Object3D.MOUSE_SELECTABLE);
    
                    model.translate(new SimpleVector(xPos,yPos,zPos));
            }
            
            /**
             * Used to add a new connection to a neuron at run-time.
             */
            public void setConnections(Connection[] c){
                    connections = c;
            }
            
            /**
             * Returns the size of the connection list
             */
            public int numberOfConnections(){
                    return connections.length;
            }
            
            /**
             * Returns the list of the connections
             */
            public Connection[] getConnections(){
                    return connections;
            }
            
            
            /**
             * Change or retrieve the 3d coordinates of the neuron.
             */
            public int getX(){
                    return xPos;
            }
            public void setX(int x){
                    xPos = x;
            }
            
            public int getY(){
                    return yPos;
            }
            public void setY(int y){
                    yPos = y;
            }
            
            public int getZ(){
                    return zPos;
            }
            public void setZ(int z){
                    zPos = z;
            }
            
            public void hideConnections(){
                    if (connections != null)
                            for(Connection c:connections){
                                    c.getModel().setVisibility(false);
                            }
            }
            
            public void showConnections(){
                    if (connections != null)
                            for(Connection c:connections){
                                    c.getModel().setVisibility(true);
                            }
            }
            
    
            /**
             * Change or retrieve the Neuron type
             */
            public Type getType(){
                    return neuronType;
            }
            public void setType(Type t){
                    neuronType = t;
            }
    
            
            /**
             * Change or retrieve the layer of the Neuron
             */
            public Layer getLayer(){
                    return layer;
            }
            public void setLayer(Layer l){
                    layer = l;
            }
            
            /**
             * Change or retrieve ID of the Neuron
             */
            public int getID(){
                    return ID;
            }
            public void setID(int id){
                    ID = id;
            }
            
            /**
             * Change or retrieve the neuron's model
             */
            public Object3D getModel(){
                    return model;
            }
            public void setModel(Object3D o){
                    model = o;
            }
            
            public void toggleVisibility(){
                    if(typeShown){
                            typeHide();
                    }
                    else{
                            typeShow();
                    }
            }
            public void typeShow(){
                    if(sectionShown){
                            model.setVisibility(true);
                            showConnections();
                    }
                    typeShown = true;
            }
            public void typeHide(){
                    model.setVisibility(false);
                    hideConnections();
                    typeShown = false;
            }
            public void hide(){
                    model.setVisibility(false);
                    hideConnections();
                    sectionShown = false;
            }
            public void show(){
                    if(typeShown){
                            model.setVisibility(true);
                            showConnections();
                    }
                    sectionShown = true;
            }
            public float getScale(){
                    return model.getScale();
            }
            public void setScale(float s){
                    model.scale(s);
            }
    
            
    }//END CLASS NEURON

  10. #10
    kjkrum's Avatar
    kjkrum is offline Senior Member
    Join Date
    Apr 2011
    Location
    Tucson, AZ
    Posts
    1,060
    Rep Power
    6

    Default Re: Parsing Huge Data Files

    At a glance I don't see anything wrong with your fields. I guess you could still be wasting memory if your Connections[] is unnecessarily large and initialized with duplicate elements.

    Try running this on the problem machine. I Googled for a few minutes and found some pages that suggest the default max heap size on some OSX systems could be as low as 64 or 128 mb. On my Linux box, it's some strange value just under 900 mb.

    Java Code:
    public class MaxMem {
    	public static void main(String[] args) {
    		System.out.println(Runtime.getRuntime().maxMemory());
    	}
    }
    Get in the habit of using standard Java naming conventions!

  11. #11
    Tolls is online now Moderator
    Join Date
    Apr 2009
    Posts
    11,820
    Rep Power
    19

    Default Re: Parsing Huge Data Files

    OSX is obviously still using the old base memory from 1.4(?) and prior, which was 64Mb I think.
    That's changed for every other distribution to the "server" setting, which is % of the total memory.

    That should be sufficient in itself anyway for the above amount of data, but I'd still look at a heap dump, even just out of curiosity as to what exactly is taking up the space.
    Please do not ask for code as refusal often offends.

    ** This space for rent **

  12. #12
    newsomjk is offline Member
    Join Date
    Jul 2012
    Posts
    4
    Rep Power
    0

    Default Re: Parsing Huge Data Files

    Alright, I'll try it out on my computers and see what it's doing and get him to do a heap dump. And I guess for safe measures I'll look through all the initializing code and make sure we don't have the Neuron list duplicated somehow

  13. #13
    Tolls is online now Moderator
    Join Date
    Apr 2009
    Posts
    11,820
    Rep Power
    19

    Default Re: Parsing Huge Data Files

    Assuming Type is an enum, the only things in that model that could be a cause of trouble are:
    private Layer layer;
    private Connection[] connections;
    private Object3D model;

    Object3D looks like a graphical class?
    Could that be a cause of trouble?
    Guesswork, though, where a heap dump and Eclipse-MAT would give you the actual cause.
    Please do not ask for code as refusal often offends.

    ** This space for rent **

  14. #14
    stathis is offline Member
    Join Date
    Aug 2012
    Posts
    4
    Rep Power
    0

    Default Re: Parsing Huge Data Files

    I hope that this post is not coming to late.

    First of all i agree that 4,000 objects is almost nothing for even not so modern computers. As a second thought in a neuron network, every neuron has arbitrary connections with other neurons. So 4,000 neurons can easily create an array of 4,000X4,000 connections, which makes an array of 16,000,000.

    In your description you made a limitation of 7 connections per neuron, but imagine a situation where A->B B->C & C->A. Perhaps in that case you program could end up eating all you memory in an endless loop.

    Test your program and if you cannot find any problem on it, take a look at the following.

    In general if you are facing memory problems with huge data you have to use a cache library.

    Some open source cache libraries you can find here.
    Last edited by stathis; 08-07-2012 at 08:27 AM.

  15. #15
    Tolls is online now Moderator
    Join Date
    Apr 2009
    Posts
    11,820
    Rep Power
    19

    Default Re: Parsing Huge Data Files

    How would it eat the memory in an endless loop?
    A holds a reference to B, B to C, C to A.
    That's a grand total of 3 references...12 or 24 bytes depending on the system.
    Please do not ask for code as refusal often offends.

    ** This space for rent **

  16. #16
    stathis is offline Member
    Join Date
    Aug 2012
    Posts
    4
    Rep Power
    0

    Default Re: Parsing Huge Data Files

    If it has just references.

    I was meant that if in their data there are circular references, they have to check in their program the way they are creating their objects. Otherwise their program can easily end up by eating all of their memory.

    This was one part of my answer which points a direction for investigation for a possible bug in their program. Another direction could be the mysterious Object3D or even Type?? ( Finally we can assume that there is no problem at all, because 4,000 objects are indeed NOTHING ).

    The second part was an answer to the tittle of this thread. Parsing Huge Data Files. And regardless if newsomjk has indeed any need of cache solution for now, he must consider about, if their network grows up to 4,000,000 neurons.

  17. #17
    Tolls is online now Moderator
    Join Date
    Apr 2009
    Posts
    11,820
    Rep Power
    19

    Default Re: Parsing Huge Data Files

    Ah, I think I see what you mean.
    But then it wouldn't be 4000 neurons (as I now see you pointed out).
    As I said earlier, though, this is all guesswork that a simple heap dump would answer.
    Please do not ask for code as refusal often offends.

    ** This space for rent **

  18. #18
    stathis is offline Member
    Join Date
    Aug 2012
    Posts
    4
    Rep Power
    0

    Default Re: Parsing Huge Data Files

    Indeed, I am almost sure that their real need is a good debugger, to investigate how a non-problem leads to memory crash.

Similar Threads

  1. Parsing Txt and XML files.
    By Sobutai in forum New To Java
    Replies: 5
    Last Post: 05-09-2011, 04:38 AM
  2. Replies: 15
    Last Post: 04-12-2011, 03:42 PM
  3. Issues with parsing huge file
    By aneuryzma in forum New To Java
    Replies: 2
    Last Post: 03-29-2011, 03:02 AM
  4. Parsing lnk files in windows
    By raysod in forum Advanced Java
    Replies: 2
    Last Post: 11-05-2010, 06:39 PM
  5. iteration on huge amount of files in a folder
    By tshaked in forum Advanced Java
    Replies: 1
    Last Post: 08-07-2007, 07:08 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •