Hi,
I'm developing an indexing structure (like AVLTree) in JAVA, but can not find the way to make it read properly pdf, docx, xlsx... files. :confused:
Could Someone give me an idea?
Thanks!
carlneto
:)
Printable View
Hi,
I'm developing an indexing structure (like AVLTree) in JAVA, but can not find the way to make it read properly pdf, docx, xlsx... files. :confused:
Could Someone give me an idea?
Thanks!
carlneto
:)
Not sure what you are looking for...so I suggest you start with this:
Lesson: Basic I/O (The Java™ Tutorials > Essential Classes)
Each file type will have its own format, so if you want to parse out text you will need parsers for each type (apache open source projects contain several 3rd party libraries which might be useful to you)
I have already found the apache pdfbox, but I can't seam able to incorporate it into IDE NetBeans and use it's methods. It would hypothetically work like this:
My code would call that method
if (objFile.isFile()){
if((objFile.getName().endsWith(".pdf") || objFile.getName().endsWith(".pdf")){
// how do I call the text parser?
}
}
- How do I install PDFBox in the IDE NetBeans?
- How do I Call it's methods?
You need to add the jar(s) to your classpath to be able to access the library. I don't use netbeans so can't lead you through it, but I bet a quick google search will lead you in the right direction.
It as been very helpfull thank you very much.
My Regards