Results 1 to 6 of 6
  1. #1
    piyush@java is offline Member
    Join Date
    Jan 2011
    Posts
    2
    Rep Power
    0

    Default Spliting Large String Data (upto 2 GB)

    Hi All,

    If anyone has worked for spliting large amount of data then please guide me here.

    I need to split large string data coming as input in uptp 2gb in size.

    so to split this data what is the optimum way i should follow. i know below options.

    1) using String.split

    2) using StringTokenizer

    Let me know if there is any other approach apart from this that can be followed.

    Thanks,
    Piyush

  2. #2
    doWhile is offline Moderator
    Join Date
    Jul 2010
    Location
    California
    Posts
    1,642
    Rep Power
    7

    Default

    String.split works using a regular expression, StringTokenizer uses a literal string to 'split', the difference being that a regular expressions add a nice advantage but also overhead, and can be overkill as well as bog down an app over the long term (relative to and if a string can be split using a string literal).

  3. #3
    piyush@java is offline Member
    Join Date
    Jan 2011
    Posts
    2
    Rep Power
    0

    Default What is the Optimum approch

    Thanks for response.

    So what is the optimum approch to split such large data in size upto 2 gb?

    Thanks,
    Piyush

  4. #4
    JosAH's Avatar
    JosAH is offline Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,529
    Blog Entries
    7
    Rep Power
    20

    Default

    Quote Originally Posted by piyush@java View Post
    Thanks for response.

    So what is the optimum approch to split such large data in size upto 2 gb?
    The other way around: why did you store that stuff as one big String in memory in the first place? How do you want to split it? (i.e. on what conditions?)

    kind regards,

    Jos
    cenosillicaphobia: the fear for an empty beer glass

  5. #5
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    12,014
    Rep Power
    20

    Default

    That there.

    If this is a stream being read in, then split it as you read and process the parts as required. Don't read it all in and then split if at all possible.

  6. #6
    quad64bit's Avatar
    quad64bit is offline Moderator
    Join Date
    Jul 2009
    Location
    VA
    Posts
    1,323
    Rep Power
    7

    Default

    I think working with strings is the wrong approach. Try reading in bytes, writing them out again at the same time. When you've read (and written) 2GB, then start a new output file. Using input streams and output streams with byte buffers of moderate size (few megabytes in size, power of 2 optimal), would work very well.

    By venturing into String land and trying to use Regex, you are going to have MASSIVE overhead. Keep in mind too that java can only be allocated ~2GB of ram, so you'd surely run out of system memory. This application could avoid keeping more than a couple megabytes of data in ram at a time and thus run fast, small footprint, and be primarily IO bound, not memory bound.

Similar Threads

  1. Spliting a string (45 Characters)
    By BobAmin in forum JavaServer Pages (JSP) and JSTL
    Replies: 2
    Last Post: 11-20-2010, 09:45 AM
  2. spliting a string and checking each token's format
    By Implode in forum New To Java
    Replies: 1
    Last Post: 10-18-2009, 08:41 PM
  3. Large data over RMI
    By JavaDesigner in forum New To Java
    Replies: 7
    Last Post: 10-16-2009, 08:48 PM
  4. Replies: 7
    Last Post: 02-11-2009, 09:14 PM
  5. parsing/storing large text data
    By hkansal in forum New To Java
    Replies: 4
    Last Post: 10-19-2008, 06:34 PM

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •