Results 1 to 12 of 12
- 12-07-2011, 05:24 AM #1
Member
- Join Date
- Dec 2011
- Posts
- 5
- Rep Power
- 0
Issue reading 375M text file with java
Hi,
I am trying to read a 375M txt file to generate an excel file using that data(via aspose).
I've increased heap to -Xmx4096m (4GB) (in the .sh file used to run this class)
The text file consists of 800,000 records but when the file is generated it only consists 200,000 records.
And the issue is java doesn't output any sort of an error.
I am able to read a 200MB text file without any issue-and it generates 400,000 records to the excel file.
I am not able to figure out what might be causing java to cut down the number of records exported.
1 - Are there any limitations as to how big of a file java can read etc?
2 - Any way that I can force java to generate some error message whatsoever?
3 - Also, is there a better way to process such a large amount of data rather than keeping it in memory.
Any help is very much appreciated.
Thanks in advance!!
here is the code:
Kind Regards,Java Code:String vals = values;//values= filepath; try { BufferedReader in = new BufferedReader(new FileReader(values)); String str; while ((str = in.readLine()) != null) { vals =str; } in.close(); } catch (IOException e) { } String[] tempstr; String delimiter_one = ":"; tempstr = vals.split(delimiter_one); for (int j = 0; j < tempstr.length; j++) { String[] tempval; String valdelimiter = "<"; tempval = tempstr[j].split(valdelimiter); for (int k = 0; k < tempval.length; k++) { cells.get(rowIndex, k).setValue(tempval[k]); } rowIndex++; } wb.save(template_file_path + ".xlsx", FileFormatType.XLSX);
NishLast edited by pbrockway2; 12-07-2011 at 06:13 AM. Reason: code tags added
- 12-07-2011, 06:15 AM #2
Moderator
- Join Date
- Feb 2009
- Location
- New Zealand
- Posts
- 4,547
- Rep Power
- 11
Re: Issue reading 375M text file with java
Is that really the code? It seems to throw most of the input away.
- 12-07-2011, 07:23 AM #3
Member
- Join Date
- Dec 2011
- Posts
- 5
- Rep Power
- 0
Re: Issue reading 375M text file with java
yes it is the code.
can you please point out why you think the data is getting thrown?
I actually modified the code a bit, so that the whole file doesn't get read into memory. and also I got rid of the string buffer and I'm using just a string to collect values read from file.
Now the file gets read until it reaches ":" character then gets each string to a variable.
It then splits each of these strings with the delimiter "<" and those values are stored in the excel cell using aspose (cells.get(rowIndex, k).setValue(tempval[k]);)
Hope my explanation is clear.
Here is the new code.But it still needs the 4G memory.
Java Code:File file = new File(values); int ch; String valdelimiter = "<"; String[] tempval; FileInputStream fin = null; try { fin = new FileInputStream(file); while ((ch = fin.read()) != -1) { String vals = ""; while((ch = fin.read()) != ':') { vals = vals+ (char)ch; } tempval = vals.split(valdelimiter); for (int k = 0; k < tempval.length; k++) { cells.get(rowIndex, k).setValue(tempval[k]); } rowIndex++; } fin.close(); } catch (Exception e) { System.out.println(e); } wb.save(template_file_path + ".xlsx", FileFormatType.XLSX);Last edited by pbrockway2; 12-07-2011 at 07:49 AM.
- 12-07-2011, 07:49 AM #4
Moderator
- Join Date
- Feb 2009
- Location
- New Zealand
- Posts
- 4,547
- Rep Power
- 11
Re: Issue reading 375M text file with java
That was the code that caught my eye. You read a string and assign it to str, then assign it to vals then repeat. But by repeating the action you are discarding (replacing) the values that you assigned to the variables.Java Code:while ((str = in.readLine()) != null) { vals =str; }
-----
I'll add "code" tags to your updated post so it's readable. When you post code put [code] at the start and [/code] at the end.
- 12-07-2011, 07:56 AM #5
Moderator
- Join Date
- Feb 2009
- Location
- New Zealand
- Posts
- 4,547
- Rep Power
- 11
Re: Issue reading 375M text file with java
I think it makes more sense to process the text and add it to the spreadsheet as you go - the way you are doing in the updated code.
But if the source text is deliminated by new lines (ie one record per line) then it would probably be better to use a BufferedReader and process the text one line at a time, rather than building up the input one character at a time. The basic process would be:
repeat as required:
* read a line
* slice and dice it with whatever String methods will do the job
* add the extracted data to the spreadsheet
(Even reading a character at a time would be more efficient - I think - with a buffered reader.)
- 12-07-2011, 09:36 AM #6
Moderator
- Join Date
- Apr 2009
- Posts
- 10,481
- Rep Power
- 16
Re: Issue reading 375M text file with java
I was going to say the same as pbrockway in the first post, but is this file just one long line with no line breaks?
Looks like that from your delimiters.
If it's a memory problem then it is not likely to be the input file, but the excel file you are creating that's the problem. Both POI and JExcel tend to, because of the nature of Excel, keep the whole document in memory...which means it's a massive footprint. WHich is why Excel is rather poor for this sort of thing.
If (as seems to be the case form the code above) you are simply dumping data into a table in rows then you'd be better off using a basic CSV...at least that way you can stream. If it requires formatting, that implies people reading the data...and I can tell you now, no one reads 800,000 rows of data like that. Utter waste of time.
- 12-08-2011, 06:29 AM #7
Member
- Join Date
- Dec 2011
- Posts
- 5
- Rep Power
- 0
Re: Issue reading 375M text file with java
Thank you for your replies pbrockway2 and Tolls.
Yes, this input file is one long string with no line breaks. And I guess what Tolls is saying could be the issue. All excel cells have to be stored in memory with the data from the input file until the excel gets saved at the end.
The requirement however is to be able to produce an excel with over 800,000 records.
The aspose software I'm using to create the excel suggests using 'lightcells', I might have to try that and see I guess.
I'll implement the character reading with a buffer reader to see if it makes a difference.
Thanks again to you both.
Please let me know of any other suggestions.
Kind Regards,
Nish
- 12-08-2011, 06:59 AM #8
Member
- Join Date
- Dec 2011
- Posts
- 5
- Rep Power
- 0
Re: Issue reading 375M text file with java
Btw, When I try to retrieve 800,000 records after reducing the java heap to 3G, and using the modified code above , the error that I get now has to do with the Garbage collector.
(Line 164 is -> vals = vals+ (char)ch;)
some forum entries mention that this issue has to do with "large number of temporary objects and while GC executes it takes most of the CPU time but recovers a very less amount of Heap"
and that you can suppress this by adding "-XX:-UseGCOverheadLimit" but also asks to use it as the last option.
-How would "-XX:-UseGCOverheadLimit" affect the program. what negative points could using this have?
-I was wondering if there was any way I could optimize this piece of code to use less number of temporary variables. But as I said earlier the input file has no new line entries.
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOf(Arrays.java:2894)
at java.lang.AbstractStringBuilder.expandCapacity(Abs tractStringBuilder.java:117)
at java.lang.AbstractStringBuilder.append(AbstractStr ingBuilder.java:407)
at java.lang.StringBuilder.append(StringBuilder.java: 136)
at Main.CreateExcel(Main.java:164)
at Main.main(Main.java:34)
Kind Regards,
Nish
- 12-08-2011, 01:12 PM #9
Moderator
- Join Date
- Apr 2009
- Posts
- 10,481
- Rep Power
- 16
Re: Issue reading 375M text file with java
GC has a catch in it to try and prevent it thrashing in an attempt to reclaim memory...this is what the overhead limit is. ie "I'm doing lots of gc and not getting much space back, there may be a problem"
Anyway, as I said...whoever decided an 800,000 row Excel report was a good idea is (frankly) a fool.
Why can you not produce a CSV?
- 12-09-2011, 12:54 PM #10
Member
- Join Date
- Apr 2011
- Posts
- 69
- Rep Power
- 0
Re: Issue reading 375M text file with java
Doesn't excel have a 65 536 row limit anyway? I agree with the CSV idea as such a huge excel file is a pure waste of memory TBH
- 12-11-2011, 11:50 PM #11
Member
- Join Date
- Dec 2011
- Posts
- 5
- Rep Power
- 0
Re: Issue reading 375M text file with java
The problem is solved at-least for now. after discussing we limited the maximum records allowed to be exported to 500,000. which works with 4G memory.
btw, 65,536 limit is for the 2003 version(xls). I'm creating a 2007 version xlsx file, which can hold upto 1,048,576 records.
Thanks everyone for your help.
Kind Regards,
Nish
- 12-12-2011, 12:33 PM #12
Moderator
- Join Date
- Apr 2009
- Posts
- 10,481
- Rep Power
- 16
Similar Threads
-
Reading from a text file, then writing back to Text Area in Reverse
By medic642 in forum New To JavaReplies: 8Last Post: 07-17-2011, 02:38 PM -
Issue in reading .bmp file using javax.swing.ImageIcon
By Cbani in forum AWT / SwingReplies: 33Last Post: 09-06-2010, 10:49 PM -
Reading in a text file
By TheRealHoff in forum AWT / SwingReplies: 10Last Post: 02-07-2010, 11:47 PM -
Reading a text file
By diegosened in forum New To JavaReplies: 4Last Post: 01-15-2010, 11:32 PM -
Reading two text file and sum them up
By matt_well in forum New To JavaReplies: 36Last Post: 07-22-2008, 02:55 AM


LinkBack URL
About LinkBacks
Reply With Quote
Bookmarks