Results 1 to 12 of 12
  1. #1
    Nish_biz is offline Member
    Join Date
    Dec 2011
    Posts
    5
    Rep Power
    0

    Default Issue reading 375M text file with java

    Hi,

    I am trying to read a 375M txt file to generate an excel file using that data(via aspose).
    I've increased heap to -Xmx4096m (4GB) (in the .sh file used to run this class)
    The text file consists of 800,000 records but when the file is generated it only consists 200,000 records.
    And the issue is java doesn't output any sort of an error.

    I am able to read a 200MB text file without any issue-and it generates 400,000 records to the excel file.

    I am not able to figure out what might be causing java to cut down the number of records exported.
    1 - Are there any limitations as to how big of a file java can read etc?
    2 - Any way that I can force java to generate some error message whatsoever?
    3 - Also, is there a better way to process such a large amount of data rather than keeping it in memory.

    Any help is very much appreciated.
    Thanks in advance!!

    here is the code:
    Java Code:
            String vals = values;//values= filepath;
            try
            {
                BufferedReader in = new BufferedReader(new FileReader(values));
                String str;
                while ((str = in.readLine()) != null)
                {
                    vals =str;
                }
                in.close();
            }
            catch (IOException e)
            {
            }
    
         
          String[] tempstr;
          String delimiter_one = ":";
          tempstr = vals.split(delimiter_one);
          for (int j = 0; j < tempstr.length; j++)
          { 
                String[] tempval;
                String valdelimiter = "<";
                tempval = tempstr[j].split(valdelimiter);
                for (int k = 0; k < tempval.length; k++)
                {
    
                    cells.get(rowIndex, k).setValue(tempval[k]);
                }
                rowIndex++;
          }
          wb.save(template_file_path + ".xlsx", FileFormatType.XLSX);
    Kind Regards,
    Nish
    Last edited by pbrockway2; 12-07-2011 at 06:13 AM. Reason: code tags added

  2. #2
    pbrockway2 is offline Moderator
    Join Date
    Feb 2009
    Location
    New Zealand
    Posts
    4,565
    Rep Power
    12

    Default Re: Issue reading 375M text file with java

    Is that really the code? It seems to throw most of the input away.

  3. #3
    Nish_biz is offline Member
    Join Date
    Dec 2011
    Posts
    5
    Rep Power
    0

    Default Re: Issue reading 375M text file with java

    yes it is the code.

    can you please point out why you think the data is getting thrown?

    I actually modified the code a bit, so that the whole file doesn't get read into memory. and also I got rid of the string buffer and I'm using just a string to collect values read from file.
    Now the file gets read until it reaches ":" character then gets each string to a variable.
    It then splits each of these strings with the delimiter "<" and those values are stored in the excel cell using aspose (cells.get(rowIndex, k).setValue(tempval[k]);)
    Hope my explanation is clear.

    Here is the new code.But it still needs the 4G memory.
    Java Code:
        File file = new File(values);
        int ch;
        String valdelimiter = "<";
        String[] tempval;
        
        FileInputStream fin = null;
        try {
          fin = new FileInputStream(file);
          while ((ch = fin.read()) != -1)
          {
    		  String vals = "";
    		  while((ch = fin.read()) != ':')
    		  {		
    			vals = vals+ (char)ch;
    		   }			
    			
    			tempval = vals.split(valdelimiter);
    			for (int k = 0; k < tempval.length; k++)
    			{
    				cells.get(rowIndex, k).setValue(tempval[k]);
    			}
    			rowIndex++;
           } 
          fin.close();
        } catch (Exception e) {
          System.out.println(e);
        }
          wb.save(template_file_path + ".xlsx", FileFormatType.XLSX);
    Last edited by pbrockway2; 12-07-2011 at 07:49 AM.

  4. #4
    pbrockway2 is offline Moderator
    Join Date
    Feb 2009
    Location
    New Zealand
    Posts
    4,565
    Rep Power
    12

    Default Re: Issue reading 375M text file with java

    Java Code:
    while ((str = in.readLine()) != null)
    {
        vals =str;
    }
    That was the code that caught my eye. You read a string and assign it to str, then assign it to vals then repeat. But by repeating the action you are discarding (replacing) the values that you assigned to the variables.

    -----

    I'll add "code" tags to your updated post so it's readable. When you post code put [code] at the start and [/code] at the end.

  5. #5
    pbrockway2 is offline Moderator
    Join Date
    Feb 2009
    Location
    New Zealand
    Posts
    4,565
    Rep Power
    12

    Default Re: Issue reading 375M text file with java

    I think it makes more sense to process the text and add it to the spreadsheet as you go - the way you are doing in the updated code.

    But if the source text is deliminated by new lines (ie one record per line) then it would probably be better to use a BufferedReader and process the text one line at a time, rather than building up the input one character at a time. The basic process would be:

    repeat as required:
    * read a line
    * slice and dice it with whatever String methods will do the job
    * add the extracted data to the spreadsheet

    (Even reading a character at a time would be more efficient - I think - with a buffered reader.)

  6. #6
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    11,949
    Rep Power
    19

    Default Re: Issue reading 375M text file with java

    I was going to say the same as pbrockway in the first post, but is this file just one long line with no line breaks?
    Looks like that from your delimiters.

    If it's a memory problem then it is not likely to be the input file, but the excel file you are creating that's the problem. Both POI and JExcel tend to, because of the nature of Excel, keep the whole document in memory...which means it's a massive footprint. WHich is why Excel is rather poor for this sort of thing.

    If (as seems to be the case form the code above) you are simply dumping data into a table in rows then you'd be better off using a basic CSV...at least that way you can stream. If it requires formatting, that implies people reading the data...and I can tell you now, no one reads 800,000 rows of data like that. Utter waste of time.

  7. #7
    Nish_biz is offline Member
    Join Date
    Dec 2011
    Posts
    5
    Rep Power
    0

    Default Re: Issue reading 375M text file with java

    Thank you for your replies pbrockway2 and Tolls.

    Yes, this input file is one long string with no line breaks. And I guess what Tolls is saying could be the issue. All excel cells have to be stored in memory with the data from the input file until the excel gets saved at the end.
    The requirement however is to be able to produce an excel with over 800,000 records.
    The aspose software I'm using to create the excel suggests using 'lightcells', I might have to try that and see I guess.

    I'll implement the character reading with a buffer reader to see if it makes a difference.
    Thanks again to you both.
    Please let me know of any other suggestions.

    Kind Regards,
    Nish

  8. #8
    Nish_biz is offline Member
    Join Date
    Dec 2011
    Posts
    5
    Rep Power
    0

    Default Re: Issue reading 375M text file with java

    Btw, When I try to retrieve 800,000 records after reducing the java heap to 3G, and using the modified code above , the error that I get now has to do with the Garbage collector.
    (Line 164 is -> vals = vals+ (char)ch;)
    some forum entries mention that this issue has to do with "large number of temporary objects and while GC executes it takes most of the CPU time but recovers a very less amount of Heap"
    and that you can suppress this by adding "-XX:-UseGCOverheadLimit" but also asks to use it as the last option.

    -How would "-XX:-UseGCOverheadLimit" affect the program. what negative points could using this have?
    -I was wondering if there was any way I could optimize this piece of code to use less number of temporary variables. But as I said earlier the input file has no new line entries.

    Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.Arrays.copyOf(Arrays.java:2894)
    at java.lang.AbstractStringBuilder.expandCapacity(Abs tractStringBuilder.java:117)
    at java.lang.AbstractStringBuilder.append(AbstractStr ingBuilder.java:407)
    at java.lang.StringBuilder.append(StringBuilder.java: 136)
    at Main.CreateExcel(Main.java:164)
    at Main.main(Main.java:34)

    Kind Regards,
    Nish

  9. #9
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    11,949
    Rep Power
    19

    Default Re: Issue reading 375M text file with java

    GC has a catch in it to try and prevent it thrashing in an attempt to reclaim memory...this is what the overhead limit is. ie "I'm doing lots of gc and not getting much space back, there may be a problem"

    Anyway, as I said...whoever decided an 800,000 row Excel report was a good idea is (frankly) a fool.
    Why can you not produce a CSV?

  10. #10
    d3n1s is offline Member
    Join Date
    Apr 2011
    Posts
    69
    Rep Power
    0

    Default Re: Issue reading 375M text file with java

    Doesn't excel have a 65 536 row limit anyway? I agree with the CSV idea as such a huge excel file is a pure waste of memory TBH

  11. #11
    Nish_biz is offline Member
    Join Date
    Dec 2011
    Posts
    5
    Rep Power
    0

    Default Re: Issue reading 375M text file with java

    The problem is solved at-least for now. after discussing we limited the maximum records allowed to be exported to 500,000. which works with 4G memory.
    btw, 65,536 limit is for the 2003 version(xls). I'm creating a 2007 version xlsx file, which can hold upto 1,048,576 records.
    Thanks everyone for your help.

    Kind Regards,
    Nish


  12. #12
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    11,949
    Rep Power
    19

    Default Re: Issue reading 375M text file with java

    It's still absurd to stick this stuff into Excel...

Similar Threads

  1. Replies: 8
    Last Post: 07-17-2011, 02:38 PM
  2. Replies: 33
    Last Post: 09-06-2010, 10:49 PM
  3. Reading in a text file
    By TheRealHoff in forum AWT / Swing
    Replies: 10
    Last Post: 02-07-2010, 11:47 PM
  4. Reading a text file
    By diegosened in forum New To Java
    Replies: 4
    Last Post: 01-15-2010, 11:32 PM
  5. Reading two text file and sum them up
    By matt_well in forum New To Java
    Replies: 36
    Last Post: 07-22-2008, 02:55 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •