Page 1 of 2 12 LastLast
Results 1 to 20 of 37
  1. #1
    HeapSpace is offline Member
    Join Date
    Jun 2011
    Posts
    20
    Rep Power
    0

    Default How to best deal with large file uploads ?

    Hello forum,

    Although I'm still a bit of a Java newbie, I think the advanced section is probably better suited to this question, so here goes....

    I've got the following code snippet that works beautifully, or at least it did until I threw a 2GB file at it and then it complained with the error Exception in thread "main" java.lang.OutOfMemoryError: Java heap space.

    Should I be using an alternative technique if I'm going to be dealing with large files ? I am already calling my program with -Xms512m -Xmx1024m and don't really want to call it with more !

    Java Code:
                  
                        this.ssl.conn.connect();
                        bos = new BufferedOutputStream(this.ssl.conn.getOutputStream());
                        bis = new BufferedInputStream(new FileInputStream(this.fil));
                        int i;
                        while ((i = bis.read()) >= 0) {
                            bos.write(i);
                        }
                        bos.close();
                        bis.close();

  2. #2
    Norm's Avatar
    Norm is online now Moderator
    Join Date
    Jun 2008
    Location
    Eastern Florida
    Posts
    17,816
    Rep Power
    25

    Default

    Perhaps the buffering is reading too much of the file. Try it without the input file buffered.

  3. #3
    masijade is offline Senior Member
    Join Date
    Jun 2008
    Posts
    2,571
    Rep Power
    9

    Default

    flush your output stream every now and then. And your program would be much more effecient using the read(byte[] b, int off, int len) and write(byte[] b, int off, int len) so you are only making two method calls for every b.length bytes rather than two methods for every byte. For a few bytes, no problem, for 2 GB, good night.

  4. #4
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    12,184
    Rep Power
    20

    Default

    As Norm says, why buffer?
    Just use a FileInputStream and the OutputStream you get from this.ssl.conn.getOutputStream().
    No need for the buffering on either of them.

    And what masijade says...though the read(byte[]) one is good enough for this really.

  5. #5
    HeapSpace is offline Member
    Join Date
    Jun 2011
    Posts
    20
    Rep Power
    0

    Default

    Thanks for all the super quick replies.

    Tolls Re: "Why buffer ?" .... Can't really say it was a specific design decision, I'll put it down to my lack of experience.

    Norm & masijade, good food for thought there. Will go back and try agin without buffering first.

    Thanks again

  6. #6
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    12,184
    Rep Power
    20

    Default

    Generally if all you're doing is reading from a stream and writing straight away to another stream then stick the thing closest to the interfaces (InputStream and OutputStream), since you don't care about the data in there, you just want to move it. Buffering is useful if you want to do something with it.

  7. #7
    HeapSpace is offline Member
    Join Date
    Jun 2011
    Posts
    20
    Rep Power
    0

    Default

    Thanks for that, nice easy tip to remember !

    I suppose then according to your motto about "just moving" the data, the stuff on the following website is not worth considering as a solution ?

    Java tip: How to read files quickly | Nadeau Software

  8. #8
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    12,184
    Rep Power
    20

    Default

    They essentially come to the same conclusion we have here.
    Read using a byte array (coincidentally I use 8k by default, so I must have read something somewhere on that).
    I've not used the nio stuff, but those graphs are far too busy for me to see what's actually going on....I can't see massive differences to be honest once you start using the byte array.

  9. #9
    masijade is offline Senior Member
    Join Date
    Jun 2008
    Posts
    2,571
    Rep Power
    9

    Default

    Well, the smallest block size to use would be 512b as that is (was?) the standard disk sector size, but 4kb or 8kb is a much more effecient block size.

  10. #10
    JosAH's Avatar
    JosAH is offline Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,734
    Blog Entries
    7
    Rep Power
    21

    Default

    I don't think the buffering is to blame; no matter the size of the file those buffered stream (or readers) only buffer 8KB in total. Buffereing may be useless here but I doubt you can blame it for the OOME ...

    kind regards,

    Jos
    cenosillicaphobia: the fear for an empty beer glass

  11. #11
    HeapSpace is offline Member
    Join Date
    Jun 2011
    Posts
    20
    Rep Power
    0

    Default

    Thanks Tolls.

    I tried without any buffering, but that broke too. So array is my next test....

  12. #12
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    12,184
    Rep Power
    20

    Default

    Quote Originally Posted by JosAH View Post
    I don't think the buffering is to blame; no matter the size of the file those buffered stream (or readers) only buffer 8KB in total. Buffereing may be useless here but I doubt you can blame it for the OOME ...

    kind regards,

    Jos
    Well, no...but it is pointless in any case.

    In fact...:

    Quote Originally Posted by HeapSpace View Post
    Thanks Tolls.

    I tried without any buffering, but that broke too. So array is my next test....
    You're storing something somewhere that doesn't need storing...
    I'm going to hazard a guess it's that OutputStream, since the FIS isn't going to read ahead...at least not up to 2Gb.

    So...are you flushing the output stream (as suggested)?
    Failing that, take a heap dump and see what's taking up the space.

  13. #13
    HeapSpace is offline Member
    Join Date
    Jun 2011
    Posts
    20
    Rep Power
    0

    Default

    Am I just having a bad day or something.....

    Java Code:
                       private byte[] barray = new byte[8024];
                        this.inStream = new FileInputStream(this.fil);
                        this.outStream = this.ssl.conn.getOutputStream();
                        int r = 0;
                        while ((r = this.inStream.read(this.barray)) > 0) {
                            this.outStream.write(this.barray, 0, r);
                        }
    Is still giving me grief. Is it because 8024 is too big a size or am I just coding clumsily today ?

  14. #14
    HeapSpace is offline Member
    Join Date
    Jun 2011
    Posts
    20
    Rep Power
    0

    Default

    Tolls,

    I think my reply crossed in cyberspace with yours.

    So...are you flushing the output stream (as suggested)?
    As you can now see from my later post, I've been naughty and haven't tried that yet .... but I've given myself a slap on the wrist and am going back now....

  15. #15
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    12,184
    Rep Power
    20

    Default

    As I said, what is that output stream?
    What is conn?
    And should that not give any pointers, have you taken a dump (stop snickering at the back there!) and analysed it in something like Eclipse MAT?

    ETA: And we crossed again.
    Bad things happen when you cross streams...

  16. #16
    HeapSpace is offline Member
    Join Date
    Jun 2011
    Posts
    20
    Rep Power
    0

    Default

    Aah..... Mr Tolls....we may well have been crossing recently, but I think I can see convergence at the end of the tunnel.

    I've been doing a little bit of digging and it seems URLConnection (conn = HttpsURLConnection) doesn't play ball with flushing and so aims to cache the whole lot in memory.

    Now, the problem I've got is although the common answer suggested by Mr Google is to run URLConnection as Transfer-Encoding=chunked, I can't do that because I need to send an ETag header with the MD5 hash of the file and a Content-Length header with its size.

    So that's where I'm at right now.

  17. #17
    san_marcus is offline Member
    Join Date
    Jun 2011
    Posts
    5
    Rep Power
    0

    Default

    Hi, am also a newbie at this time, but i came across this post because it seems to have some relationship with my issue. you see...this time am working in a webservice for a national company in my country, so the web service calls a method and this query goes to the server bringing a litlle more than 50.000 registries in one only response, it happens in asincronohus method, so am pretty sure no new data will be loaded while am showing the 50.000 registries, of course the browser freeze when try to load so many registries, my main question is what can I do to paging al those record and only show 300 registries per page.. and of course, the 4 ussual buttons, Start, Back, Forward,Last. please any help would be good, some people have already told me that I must load this registry in a temporal memory, I am pretty sure, I don´t know how to do that?, am working with java 1.5 and my sdk is jdeveloper 10.1.3.5

  18. #18
    Norm's Avatar
    Norm is online now Moderator
    Join Date
    Jun 2008
    Location
    Eastern Florida
    Posts
    17,816
    Rep Power
    25

    Default

    URLConnection (conn = HttpsURLConnection) doesn't play ball with flushing and so aims to cache the whole lot in memory.
    Can this be detected by looking at the output from the program?
    Or can you see the number of writes to the internet your system is making? My Local Connection Status has a packet count. Would that increase in proportion to what is written? Would the class cache stuff in case of retry?

  19. #19
    HeapSpace is offline Member
    Join Date
    Jun 2011
    Posts
    20
    Rep Power
    0

    Default

    My plan for today is to try with Apache HttpClient as that seems to be how others have resolved their issue. But if I get a chance I'll try a tcpdump to see if I can get an answer to Norm's question (my present assumption about URLConnection caching comes from a quick speed-read of descriptions such as Bug ID: 4212479 Data(or Buffered)OutputStream from a URLConnection does not flush writes and Bug ID: 5026745 Cannot flush output stream when writing to an HttpUrlConnection).

  20. #20
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    12,184
    Rep Power
    20

    Default

    Yep, both of those say use chunked to get around this.
    I don't know how HTTP1.1 chunked works, but I would have thought your header and the like would still apply?
    Wouldn't the receiving end simply wait until it had received all the chunks?

Page 1 of 2 12 LastLast

Similar Threads

  1. Replies: 0
    Last Post: 02-08-2011, 09:51 AM
  2. how to split large xml file into small xml file in java
    By enggvijaysingh@gmail.com in forum XML
    Replies: 2
    Last Post: 02-07-2011, 10:34 AM
  3. post of large xml file on third party webservices
    By enggvijaysingh@gmail.com in forum XML
    Replies: 6
    Last Post: 11-16-2010, 04:03 PM
  4. Replies: 0
    Last Post: 04-25-2009, 11:18 PM
  5. Replies: 1
    Last Post: 08-07-2007, 06:37 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •