Java Extract PDF data from location XY
Anyone know of any classes that would allow me to input a pdf, convert it into a image format, locate data at XY-XXYY (rectangle) and convert to readable text?
Heres the scenario :
I need to read PDF bills and extract 5 fields worth of information to be inputted into a database. The PDF's can be generated from any application eg : Pastel Accounting. The XY-XXYY locations for the data to be extracted will be called from a database. Any suggestions or help?
I have tried the method of converting to a HTML document and reading the absolute positions but the problem Im having is if a amount is right aligned 1000.00 will have a different XY to 100.00 for example. Aswell because the PDF is a invoice/bill I cant say read the 20th line from the top cause each item will increase the amount of lines so this is unreliable.