Results 1 to 6 of 6
  1. #1
    murthybhat is offline Member
    Join Date
    Oct 2009
    Posts
    5
    Rep Power
    0

    Default Parsing Large PDF :Out of Memory Exception

    Hello,

    Background :
    This is related to pdf parsing in java using Itext.
    I have a requirment that says : Given a pdf path and a bookmark(text), my application should open up the pdf document and the chosen bookmark should be opened in the document.

    My approach :
    I am parsing the pdf file using Itext for getting mapping between bookmark page number . I would then open the pdf with its 'page' parameter. Everything works fine until the size of the document increases.

    The problem:
    Itext does not load/parse/handle pdf with huge data. My pdf file runs into 100 of mbs and java throws out a memory exception when loading the file.

    I ve tried increasing my heap size in java etc, but still the OutOfMemory Exception persists.

    Could anyone please tell me how do i change my approach in solving the problem ?

    Any help in this matter would be useful.

    Thanks in advance for help.

    Regards,
    Bhat

  2. #2
    r035198x is offline Senior Member
    Join Date
    Aug 2009
    Posts
    2,388
    Rep Power
    8

    Default

    In your processing do you have to load the whole file into memory or can you get away with processing the file in 8kb chunks?

  3. #3
    murthybhat is offline Member
    Join Date
    Oct 2009
    Posts
    5
    Rep Power
    0

    Default

    Thank you for the reply.
    I am fine with reading only a part of the file avoiding the complete load of the file into memory.

    I have tried reading the file in bytes, but the read bytes were not readable, in the sense that the bookmark-page number information is not extractable.

    Kindly correct me if my understanding is not correct.

    My Code

    private byte[] getByteArray(File file) {
    try {
    // Creating File stream for the file
    FileInputStream in = new FileInputStream(file);
    // Getting the length of the file
    int length = (int) file.length();
    // Creating output stream **Size has been reduced from length to 1024 **
    ByteArrayOutputStream out = new ByteArrayOutputStream(1024);

    // Creating byte array of the same size
    byte[] buf = new byte[1024];
    int len;

    // Looping for reading the file
    while ((len = in.read(buf)) > 0) {
    out.write(buf, 0, len);
    }

    byte[] arrBytes = new byte[out.size()];
    arrBytes = out.toByteArray();
    return arrBytes;
    } catch (FileNotFoundException e) {
    e.printStackTrace();
    } catch (IOException e) {
    e.printStackTrace();
    }
    return null;
    }

  4. #4
    r035198x is offline Senior Member
    Join Date
    Aug 2009
    Posts
    2,388
    Rep Power
    8

    Default

    You still have the method returning the full byte[] so those are all going to be loeade into memory.

  5. #5
    murthybhat is offline Member
    Join Date
    Oct 2009
    Posts
    5
    Rep Power
    0

    Default

    Yes, I agree.
    But even with limited number of bytes, the byte array [converted to string] does not give me the bookmarks info.

    I would want to read only the bytes in the pdf file which pertain to the bookmark related info.Any idea to get it ?

  6. #6
    murthybhat is offline Member
    Join Date
    Oct 2009
    Posts
    5
    Rep Power
    0

    Default Issue Resolved

    I was able to resolve my issue by handling the 'Named' destination instead of bookmark. The PdfReader has to be created with RandomAccessFileorArray and not with a complete file.

    I built a map of the title (In bookmark section) and their named destination. The named destination can be passed as an argument to open the pdf. So when the request arrives to open the specific bookmark, named destination is picked and passes.

    Sample code snippet :

    ....
    PdfReader reader = new PdfReader(new RandomAccessFileOrArray(fileLocation,false,false), null);


    List<HashMap<String, Object>> bookmarks = SimpleBookmark
    .getBookmark(reader);
    if (bookmarks != null) {
    for (HashMap<String, Object> bookmark : bookmarks) {
    if (bookmark instanceof HashMap) {
    System.out.println((String) bookmark.get("Title")+":"+(String) bookmark.get("Named"));
    //If the bookmark has child nodes, then the above statements have to be recursiely called for bookmark.
    //Child bookmarks can be accessed using bookmark.get("Kids").

    }
    }
    }
    ...

    Thanks for the help.

    Regards,
    Bhat
    Last edited by murthybhat; 11-03-2009 at 05:09 AM. Reason: Title was missing

Similar Threads

  1. Large data over RMI
    By JavaDesigner in forum New To Java
    Replies: 7
    Last Post: 10-16-2009, 08:48 PM
  2. Reading large XML
    By gkumar in forum XML
    Replies: 3
    Last Post: 08-06-2009, 04:38 AM
  3. parsing/storing large text data
    By hkansal in forum New To Java
    Replies: 4
    Last Post: 10-19-2008, 06:34 PM
  4. Replies: 2
    Last Post: 08-21-2008, 07:33 PM
  5. Latest XML parsing and memory usage benchmark
    By Jimmy Zhang in forum XML
    Replies: 0
    Last Post: 03-03-2008, 09:49 PM

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •