    May 2010
    [QUESTION]Get text from image?

    Hi guys, I'm currently trying to figure out a way of retrieving text from an JPEG.
    I have absolutely no clue as to how yet, but i have some ideas, and since Googleing specific questions is time consuming i thought i would ask some questions related to the subject here.

    Basically what I'm thinking is to read every pixel in the jpeg and then make some kind of recognizing script that will be able to determine the character based on how the pixel array looks.

    Since the text i need to get from the files are always in the same place i don't need to scan the entire jpeg. So my question is this:

    How can i scan the pixels of a jpeg?
    - And so, how should i store it for optimal functionality.
    - Should I create a new smaller jpeg and run that trough a script, or should i - keep everything in a array

    Is this even a remotely feasible approach to this?
    - If not, I'm open to suggestions.

    I also need a way of scanning a folder for files, but i can figure that part out later, but if you have some suggestions I'm not saying no :)

    I'm sorry for not posting any code, but i don't have any yet, i don't plan on starting before i have all the basic principals down, so i don't end up wasting time and a partially functioning software.

    Any help is greatly appreciated.
    - Thomas Kristian.

    Jun 2010
    Well, I think you are just trying to reinvent the wheel. The problem of optical character recognition is named OCR and well known. It could get quite hard, while you have to take different fonts, colors, sizes and so on into account.

    The best way to start might be to a look at the existing solutions, maybe the Java OCR Project might be useful for you.

    I also need a way of scanning a folder for files
    Well, all that you need is given by the java file class. Just define your own FileFilter and run this method.

    Best regards,
    Herr K.

    May 2010
    Hi HerrK, thanks for your reply. After some research i learned of OCR, but i don't really need a very powerful ORC technology, since the text i want to grab is in the same font in the same place on every document.

    I could look into the OCR Project, but I'm that kind of guy who reinvents the wheel, my wheel might not be as sturdy as others, but its my design, and i learn more about the inner workings of it.

    That is mainly one of the reasons why I'm trying to do this by my self, because if i just use some pre-compiled piece of code I don't get any better at java's syntax, and what it can be used for; I get better at using the classes someone else have made for me.

    While this project has gone down on my priority list I still want to try it, so if you, or anyone else has a answer to the questions in the OP I would appreciate them greatly.

    Thomas H

