Results 1 to 14 of 14
Thread: read pdf
- 06-16-2010, 12:08 PM #1
read pdf
i want to read a pdf file and output only the text in a console. how to accomplish this? perhaps somebody can give a hint, even with pdfbox-1.1.0.jar?
- 06-16-2010, 12:15 PM #2
- Join Date
- Sep 2008
- Location
- Voorschoten, the Netherlands
- Posts
- 11,375
- Blog Entries
- 7
- Rep Power
- 17
- 06-16-2010, 02:29 PM #3
- 06-16-2010, 02:48 PM #4
Moderator
- Join Date
- Apr 2009
- Posts
- 10,438
- Rep Power
- 16
According to PDFBox documentation (well the first page) it says it can extract text, and has a command line for that.
Since the source code is available, why not look at it for ExtractText?
- 06-16-2010, 02:51 PM #5
- Join Date
- Sep 2008
- Location
- Voorschoten, the Netherlands
- Posts
- 11,375
- Blog Entries
- 7
- Rep Power
- 17
How strange because when I google for "Java pdf read" and follow the second link I get this page. It has a fine manual stuffed with easy to read Java examples ...
kind regards,
Jos
- 06-16-2010, 02:56 PM #6
Moderator
- Join Date
- Apr 2009
- Posts
- 10,438
- Rep Power
- 16
To be fair, that's not exactly a free thing (outside of the trial).
- 06-16-2010, 02:59 PM #7
- Join Date
- Sep 2008
- Location
- Voorschoten, the Netherlands
- Posts
- 11,375
- Blog Entries
- 7
- Rep Power
- 17
- 06-16-2010, 03:36 PM #8
Moderator
- Join Date
- Apr 2009
- Posts
- 10,438
- Rep Power
- 16
As I say, the Apache thing has a tool for extracting text, so I would argue it's just a case of opening up that source code and seeing what they do.
- 06-16-2010, 10:07 PM #9
i've tried out some jars goole suggested and i could extract the text from a pdf but it looked like this
I1enti1iers 1nd Ke1word1
Al1 the J1va co1pone1ts we 1ust t1lked 1bout1clas1es, v1riab1es, a1d met1ods—1
nee1 name1. In J1va th1se na1es ar1 call1d ide1tifi1rs, a1d, as 1ou mi1ht ex1ect,1
the1e are 1ules 1or wh1t con1titu1es a l1gal J1va id1ntif1er. B1yond 1hat'1 lega1,
unusable. i was not in the mood for trying all jars google suggested. but now the AspriseJavaPDF.jar works great and the output is very proper! thank you again.
the only drawback is this:
****** There are more text found, however this evaluation version only allows you to extract max. 1000 chars per page. Simply purchase a license to remove this restriction. Asprise Java PDF Library ******
and the price of the Java PDF Reader for a single developer license is USD 998.00Last edited by j2me64; 06-16-2010 at 10:15 PM.
- 06-17-2010, 03:52 AM #10
Senior Member
- Join Date
- Dec 2008
- Posts
- 526
- Rep Power
- 0
hello
I recommend this lib Apache PDFBox - Apache PDFBox - Java PDF LibraryIf my answer helped you. Please click my "REP" button and add a comment
Have a Good Java Coding :)
- 06-17-2010, 07:36 AM #11
- Join Date
- Sep 2008
- Location
- Voorschoten, the Netherlands
- Posts
- 11,375
- Blog Entries
- 7
- Rep Power
- 17
- 06-17-2010, 07:39 AM #12
- Join Date
- Sep 2008
- Location
- Voorschoten, the Netherlands
- Posts
- 11,375
- Blog Entries
- 7
- Rep Power
- 17
- 06-17-2010, 10:15 AM #13
when i execute pdfbox with java org.apache.pdfbox.ExtractText everything went fine and i got the usage text on my console. as soon i give the pdf document to use i got this
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/lo
gging/LogFactory
at org.apache.pdfbox.pdfparser.BaseParser.<clinit>(Ba seParser.java:58)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocume nt.java:865)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocume nt.java:831)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocume nt.java:756)
at org.apache.pdfbox.ExtractText.main(ExtractText.jav a:179)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.logging.LogFacto
ry
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at sun.misc.Launcher$ExtClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
... 5 more
1) the commons-logging-api-1.1.1.jar is in the classpath and there is a org.apache.commons.logging.LogFactory in it.
2) i don't get what the "AccessController.doPrivileged" means.
somebody can help?
- 06-17-2010, 10:24 AM #14
Senior Member
- Join Date
- Aug 2009
- Posts
- 2,388
- Rep Power
- 6
Similar Threads
-
Must Read.....
By sanjeevtarar in forum Forum LobbyReplies: 10Last Post: 03-03-2010, 07:16 PM -
Read Xls
By Deepa in forum New To JavaReplies: 2Last Post: 01-16-2009, 12:46 PM -
java.io.IOException: Unable to read entire block; 493 bytes read before EOF; expected
By kushagra in forum New To JavaReplies: 5Last Post: 10-17-2008, 02:13 PM -
Please Read!!!
By jeffranc in forum New To JavaReplies: 0Last Post: 08-21-2008, 08:47 PM -
How to read the following
By rrp in forum New To JavaReplies: 0Last Post: 12-03-2007, 06:16 PM


LinkBack URL
About LinkBacks
Reply With Quote

Bookmarks