Has anyone seen any libs for pulling text from either .jpeg or standard office document files? My goal is strictly to rid the datastream of all formatting and get anything that is truly
1. char
2. char[]
3. Character
4. Character[]
5. String data = new String(" ","UTF-8");
6. String data = new String(" ","UTF-16");

Unihan, Unicode, ascii 7/8 or whatever but does not have any of the useless ancilliary formatting codes that are endemic to office software. The goal here is to eliminate manual data entry from a datastore that is currently being conveyed in a format that has proven un-reliable in an industry where reliability is hyper-critical later down the datastream.

One approach: I can take a jpeg of the 8 x 11 inches paper format.
Other approach: Calling driver for shrink-wrapped software.

A better solution would be if there are established libraries that have proven track on pulling the text from contemporary ..... I tried digging into Open Office but that used up thirty or fourty hours of research budget with nothing to show for it.

Another idea I had was to look into Open GL but I have never done any work with that tool and would need to know where to start.