Results 1 to 16 of 16
Thread: Reading text using PDFBOX
- 05-19-2008, 12:47 PM #1
Member
- Join Date
- Jul 2007
- Posts
- 2
- Rep Power
- 0
Reading text using PDFBOX
Hi Ranchers
I am using PDFBox to read the text from the PDF file and display the x-y cordinates of each character.For simple file it works fine.But when i have a pdf file containing text in different fonts,tables,graphs etc.The output is somewhat jumbled like it reads first two paragraphs ,then last paragraphs then third para.Also the text written vertically is not read property ,for example market is read as "mark" and then in next line it prints "et".Is there any solution.?Kinldly help
Thanks
Umadas
- 05-19-2008, 05:27 PM #2
- 09-25-2009, 08:33 AM #3
Member
- Join Date
- Sep 2009
- Posts
- 1
- Rep Power
- 0
Extract Text from pdf with their x-y coordinates.
Hi Umadas,
This is Raja Subramanian.
i have a task that to extract text from pdf with their co-ordinates.
for that am surfed a lot in net.and i tried with PDFBox.but i cant able to get the result.and there is no samples how to get the coordinates.and finaly i saw your post.can u please assist me.please tell me the sample code to archive this task.
Thanks in Advance.
Thanks and Regards,
Raja Subramanian.
- 09-25-2009, 11:52 AM #4
Member
- Join Date
- Sep 2009
- Posts
- 3
- Rep Power
- 0
Hi,
please use the below code to read the pdf file and it worked fine for me. (i was able to read entire pdf text).
PDDocument pddDocument=PDDocument.load(new File("a.pdf"));
PDFTextStripper textStripper=new PDFTextStripper();
System.out.println(textStripper.getText(pddDocumen t));
pddDocument.close();
For the above code supporting jar files are:
pdfbox-0.8.0-incubating.jar
fontbox-0.8.0-incubating.jar
commons-logging-1.1.1.jar
- 10-07-2010, 01:57 PM #5
Member
- Join Date
- Oct 2010
- Posts
- 5
- Rep Power
- 0
java.lang.NoClassDefFoundError: org.
Hi,
im trying the same(above code n jar).
Im gtng da error as in title.Hw to resolve te issue.10-07 12:43:42.456: ERROR/AndroidRuntime(18616): java.lang.NoClassDefFoundError: org.apache.pdfbox.pdmodel.PDDocument
- 10-07-2010, 04:50 PM #6
Member
- Join Date
- Sep 2009
- Posts
- 3
- Rep Power
- 0
if you are getting this error: java.lang.NoClassDefFoundError: org.apache.pdfbox.pdmodel.PDDocument, it means the particular jar file is not recognised or not set in the classpath. java.lang.NoClassDefFoundError is due to the class file is not found during runtime.
You should also check with the newest version of jar file (here you have 0.8 as the version). And also i see the error message containing AndroidRuntime, so make sure you configure all the jar files in the Eclipse IDE while running your Android App.
If you have any doubts, you can reply to me..always welcome :)
- 10-08-2010, 06:13 AM #7
Member
- Join Date
- Oct 2010
- Posts
- 5
- Rep Power
- 0
Is pdf box 1.3 is the latest version?
- 10-08-2010, 06:26 AM #8
Member
- Join Date
- Oct 2010
- Posts
- 5
- Rep Power
- 0
Could not find class 'org.apache.pdfbox.pdmodel.PDDocument', referenced from method org.apache.pdfbox.pdfparser.PDFParser.getPDDocumen t.I tried with pdf box 1.2.1.common log 1.1.1 and font box 1.2.1..still it is showing the above error..
- 10-08-2010, 06:04 PM #9
Member
- Join Date
- Sep 2009
- Posts
- 3
- Rep Power
- 0
Hi Chiru,
I re-tested the code, and is working fine. Please check out what i have written:
import java.io.File;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.util.PDFTextStripper;
public class PdfBoxTest
{
public static void main(String args[])
{
try
{
PDDocument pddDocument=PDDocument.load(new File("a.pdf"));
PDFTextStripper textStripper=new PDFTextStripper();
System.out.println(textStripper.getText(pddDocumen t));
pddDocument.close();
}
catch(Exception ex)
{
ex.printStackTrace();
}
}
}
In the classpath, i have included the following jar files:
1) pdfbox-1.2.1.jar
2) fontbox-1.2.1.jar
3) commons-logging-1.1.1.jar
I am not sure about the latest version of pdfbox, i guess it should be 1.2.1. The error that your are getting is probably the jar file not being set in the classpath, as the exception is purely related to inability to find the class file, it means the jar file is not being located at that location. Rest of the things will go fine if your jar file is properly recognized.
- 10-13-2010, 12:51 PM #10
Member
- Join Date
- Oct 2010
- Posts
- 5
- Rep Power
- 0
Its wrkng superb in java,tnq..
But its showing no class error as i mentioned in android.I tnk android doesn't support it.Any way tnq..
- 10-16-2010, 12:40 PM #11
Member
- Join Date
- Oct 2010
- Posts
- 5
- Rep Power
- 0
Hidden classes of mydroid in android
I am working with default settings of android to change some properties.It involves hidden methods and classes of mydroid source.How to use the hidden api methods,and hidden classes of android source.
Regards,
vinila.
- 01-08-2011, 07:35 AM #12
Member
- Join Date
- Jan 2011
- Posts
- 1
- Rep Power
- 0
How to get exact text from PDf
Hi
As I am able to print the PDF doc.
Thanks to Mr.Santosh.
I need more help in using PDFbox.
How to read the particular line in PDF.
Can you please post the sample code to print X,Y coordinate.Is these coordinates helpful in retrieving the exact needed data
?
Thanks in Advance.:)
- 07-13-2011, 07:18 AM #13
Member
- Join Date
- Jul 2011
- Posts
- 1
- Rep Power
- 0
PDF to Text
I used same jar files as you mentioned but again application has the same issue. Is this code for the android devices also.
- 08-11-2011, 11:34 AM #14
Member
- Join Date
- Aug 2011
- Posts
- 2
- Rep Power
- 0
- 01-21-2012, 07:31 AM #15
Member
- Join Date
- Jan 2012
- Posts
- 3
- Rep Power
- 0
Re: Reading text using PDFBOX
hi
i am searching for a java code where i can save the extracted text obtained from the pdf file by using pdfbox api
- 01-21-2012, 08:47 AM #16
- Join Date
- Sep 2008
- Location
- Voorschoten, the Netherlands
- Posts
- 11,589
- Blog Entries
- 7
- Rep Power
- 17
Similar Threads
-
Reading Integers from a text file
By tress in forum New To JavaReplies: 6Last Post: 02-26-2011, 05:45 PM -
[SOLVED] Reading a text file into an Array
By DonCash in forum New To JavaReplies: 13Last Post: 01-25-2011, 12:51 AM -
Does OS intervene when reading Java text files
By Tina G in forum Advanced JavaReplies: 1Last Post: 04-07-2008, 02:29 PM -
Reading text from a URL using BufferedReader
By Java Tip in forum Java TipReplies: 0Last Post: 12-26-2007, 10:17 AM -
Reading text file
By Lennon-Guru in forum New To JavaReplies: 1Last Post: 12-15-2007, 11:38 PM


LinkBack URL
About LinkBacks


Bookmarks