I have a requirement where I have to convert the PDF document to HTML5. I do not want to use any available tool achieve this. I want to write my own code to achieve this. Being java developer I have started with iText but I saw that, iText just extract the text from PDF and does not keep the formatting layout on PDF.
Can someone please guide which API i should use to achieve this? below is my high level requirement.
1-Extract the text from the PDF without loosing formatting layout.
2-extract the images if any.
3-Retain the formatting in the newly converted HTML5 page same as that of PDF page.
Thanks in Advance.
Moved from a staff-only section
Do you understand the structure of a PDF document and how it handles layout and the like?
If not then you'll need to read up on that before you start.