Results 1 to 16 of 16
  1. #1
    umadas is offline Member
    Join Date
    Jul 2007
    Posts
    2
    Rep Power
    0

    Default Reading text using PDFBOX

    Hi Ranchers

    I am using PDFBox to read the text from the PDF file and display the x-y cordinates of each character.For simple file it works fine.But when i have a pdf file containing text in different fonts,tables,graphs etc.The output is somewhat jumbled like it reads first two paragraphs ,then last paragraphs then third para.Also the text written vertically is not read property ,for example market is read as "mark" and then in next line it prints "et".Is there any solution.?Kinldly help

    Thanks

    Umadas

  2. #2
    sukatoa's Avatar
    sukatoa is offline Senior Member
    Join Date
    Jan 2008
    Location
    Cebu City, Philippines
    Posts
    556
    Rep Power
    7

    Default

    Have you read the documentations about PDFBox?
    freedom exists in the world of ideas

  3. #3
    s_raja1999 is offline Member
    Join Date
    Sep 2009
    Posts
    1
    Rep Power
    0

    Question Extract Text from pdf with their x-y coordinates.

    Hi Umadas,

    This is Raja Subramanian.
    i have a task that to extract text from pdf with their co-ordinates.
    for that am surfed a lot in net.and i tried with PDFBox.but i cant able to get the result.and there is no samples how to get the coordinates.and finaly i saw your post.can u please assist me.please tell me the sample code to archive this task.

    Thanks in Advance.

    Thanks and Regards,
    Raja Subramanian.

  4. #4
    santhoshg is offline Member
    Join Date
    Sep 2009
    Posts
    3
    Rep Power
    0

    Default

    Hi,

    please use the below code to read the pdf file and it worked fine for me. (i was able to read entire pdf text).

    PDDocument pddDocument=PDDocument.load(new File("a.pdf"));
    PDFTextStripper textStripper=new PDFTextStripper();
    System.out.println(textStripper.getText(pddDocumen t));
    pddDocument.close();


    For the above code supporting jar files are:
    pdfbox-0.8.0-incubating.jar
    fontbox-0.8.0-incubating.jar
    commons-logging-1.1.1.jar

  5. #5
    chiru is offline Member
    Join Date
    Oct 2010
    Posts
    5
    Rep Power
    0

    Smile java.lang.NoClassDefFoundError: org.

    Hi,
    im trying the same(above code n jar).
    Im gtng da error as in title.Hw to resolve te issue.10-07 12:43:42.456: ERROR/AndroidRuntime(18616): java.lang.NoClassDefFoundError: org.apache.pdfbox.pdmodel.PDDocument

  6. #6
    santhoshg is offline Member
    Join Date
    Sep 2009
    Posts
    3
    Rep Power
    0

    Default

    if you are getting this error: java.lang.NoClassDefFoundError: org.apache.pdfbox.pdmodel.PDDocument, it means the particular jar file is not recognised or not set in the classpath. java.lang.NoClassDefFoundError is due to the class file is not found during runtime.

    You should also check with the newest version of jar file (here you have 0.8 as the version). And also i see the error message containing AndroidRuntime, so make sure you configure all the jar files in the Eclipse IDE while running your Android App.

    If you have any doubts, you can reply to me..always welcome :)

  7. #7
    chiru is offline Member
    Join Date
    Oct 2010
    Posts
    5
    Rep Power
    0

    Default

    Is pdf box 1.3 is the latest version?

  8. #8
    chiru is offline Member
    Join Date
    Oct 2010
    Posts
    5
    Rep Power
    0

    Default

    Could not find class 'org.apache.pdfbox.pdmodel.PDDocument', referenced from method org.apache.pdfbox.pdfparser.PDFParser.getPDDocumen t.I tried with pdf box 1.2.1.common log 1.1.1 and font box 1.2.1..still it is showing the above error..

  9. #9
    santhoshg is offline Member
    Join Date
    Sep 2009
    Posts
    3
    Rep Power
    0

    Default

    Hi Chiru,

    I re-tested the code, and is working fine. Please check out what i have written:

    import java.io.File;
    import org.apache.pdfbox.pdmodel.PDDocument;
    import org.apache.pdfbox.util.PDFTextStripper;

    public class PdfBoxTest
    {
    public static void main(String args[])
    {
    try
    {
    PDDocument pddDocument=PDDocument.load(new File("a.pdf"));
    PDFTextStripper textStripper=new PDFTextStripper();
    System.out.println(textStripper.getText(pddDocumen t));
    pddDocument.close();
    }
    catch(Exception ex)
    {
    ex.printStackTrace();
    }
    }
    }

    In the classpath, i have included the following jar files:
    1) pdfbox-1.2.1.jar
    2) fontbox-1.2.1.jar
    3) commons-logging-1.1.1.jar

    I am not sure about the latest version of pdfbox, i guess it should be 1.2.1. The error that your are getting is probably the jar file not being set in the classpath, as the exception is purely related to inability to find the class file, it means the jar file is not being located at that location. Rest of the things will go fine if your jar file is properly recognized.

  10. #10
    chiru is offline Member
    Join Date
    Oct 2010
    Posts
    5
    Rep Power
    0

    Default

    Its wrkng superb in java,tnq..
    But its showing no class error as i mentioned in android.I tnk android doesn't support it.Any way tnq..

  11. #11
    chiru is offline Member
    Join Date
    Oct 2010
    Posts
    5
    Rep Power
    0

    Default Hidden classes of mydroid in android

    I am working with default settings of android to change some properties.It involves hidden methods and classes of mydroid source.How to use the hidden api methods,and hidden classes of android source.

    Regards,
    vinila.

  12. #12
    aanne is offline Member
    Join Date
    Jan 2011
    Posts
    1
    Rep Power
    0

    Exclamation How to get exact text from PDf

    Hi
    As I am able to print the PDF doc.
    Thanks to Mr.Santosh.
    I need more help in using PDFbox.
    How to read the particular line in PDF.
    Can you please post the sample code to print X,Y coordinate.Is these coordinates helpful in retrieving the exact needed data
    ?
    Thanks in Advance.:)

  13. #13
    AndroidDev is offline Member
    Join Date
    Jul 2011
    Posts
    1
    Rep Power
    0

    Default PDF to Text

    I used same jar files as you mentioned but again application has the same issue. Is this code for the android devices also.

  14. #14
    ganga.kondati is offline Member
    Join Date
    Aug 2011
    Posts
    2
    Rep Power
    0

    Default

    Hey dude i think u got the pdf formate in android can u help me how to read.........plz.......
    Quote Originally Posted by aanne View Post
    Hi
    As I am able to print the PDF doc.
    Thanks to Mr.Santosh.
    I need more help in using PDFbox.
    How to read the particular line in PDF.
    Can you please post the sample code to print X,Y coordinate.Is these coordinates helpful in retrieving the exact needed data
    ?
    Thanks in Advance.:)

  15. #15
    agastheswar is offline Member
    Join Date
    Jan 2012
    Posts
    3
    Rep Power
    0

    Default Re: Reading text using PDFBOX

    hi
    i am searching for a java code where i can save the extracted text obtained from the pdf file by using pdfbox api

  16. #16
    JosAH's Avatar
    JosAH is online now Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,338
    Blog Entries
    7
    Rep Power
    20

    Default Re: Reading text using PDFBOX

    Let this dead thread rest, start your own thread please; I'm closing this one.

    kind regards,

    Jos
    cenosillicaphobia: the fear for an empty beer glass

Similar Threads

  1. Reading Integers from a text file
    By tress in forum New To Java
    Replies: 6
    Last Post: 02-26-2011, 05:45 PM
  2. [SOLVED] Reading a text file into an Array
    By DonCash in forum New To Java
    Replies: 13
    Last Post: 01-25-2011, 12:51 AM
  3. Does OS intervene when reading Java text files
    By Tina G in forum Advanced Java
    Replies: 1
    Last Post: 04-07-2008, 02:29 PM
  4. Reading text from a URL using BufferedReader
    By Java Tip in forum Java Tip
    Replies: 0
    Last Post: 12-26-2007, 10:17 AM
  5. Reading text file
    By Lennon-Guru in forum New To Java
    Replies: 1
    Last Post: 12-15-2007, 11:38 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •