Results 1 to 6 of 6

Thread: OCR in java

  1. #1
    sunithamm is offline Member
    Join Date
    Dec 2009
    Posts
    19
    Rep Power
    0

    Default OCR in java

    hi all,
    i want to develop a code for Optical Character Recognition in java. The software should be able to read both handwritten and printed documents. i need to provide options for various formatting also..
    I dont know where to start..
    can anybody help???
    Thanks in advance...

    Regards,
    Sunitha.

  2. #2
    JosAH's Avatar
    JosAH is online now Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,447
    Blog Entries
    7
    Rep Power
    20

    Default

    Quote Originally Posted by sunithamm View Post
    hi all,
    i want to develop a code for Optical Character Recognition in java. The software should be able to read both handwritten and printed documents. i need to provide options for various formatting also..
    Implementing a (more or less) functional OCR system cannot be done in a week or so; it's complicated matter and the existing systems apply two different (sub)systems for hand written and printed text. The two just don't work well together.

    Some form of neural network is used in most of the existing systems. Google for 'BAM' (Bidirectional Associative Memory); it's a neural net suitable for image recognition. You can also try to download an existing (free) OCR system; there are a couple of them floating around on the net.

    kind regards,

    Jos

  3. #3
    AndreB's Avatar
    AndreB is offline Senior Member
    Join Date
    Dec 2009
    Location
    Stuttgart, Germany
    Posts
    114
    Rep Power
    0

    Default

    ;-)

    Well, although its not a java specific topic (besides the use of java language) i want to provide some suggestions.

    I hope its for your study otherwise stop reading and usesome third party tools.
    :p

    Like Jos said a system for OCR has usually two parts (besides the handwritten and computerwritten chars). And is not realizable under one week or so.

    If your system is based on statistical learning then the first part is the "learning" engine (see Supvervised Learning in AI).
    The second one is the real AI which do the actual work.

    You asked where to start. Start with the first Part!
    1. Before you begin the development provide data for training and test! (usually a set of OCs data is free of charge at the universities)
    2. Provide methods for Data Input/Import
    3. Now create your AI: There are several approaches (read Articles provided by many reasearchers! -> google scholar is your friend)

    To name a few which come to my mind first:
    1. Using Genetic Algorithms (GA)
      • Especially Evolutionary Algorithms
    2. Using Neural Nets
      • Backpropagation Network
      • Selforganizing Maps (very promising)
      • Radial basis Functions
    3. Normalized Compression Distance + SVM

    Of course SVM can also be applied in some way, but for now i dont know how ;-) (svmlight is a nice library)

    4. Now train and test your AI

    When the training is done, extract parameters and configuration and transfer them to the final application, test the application for real acquired data and you're done ;-)

    Sound easy, hm ?

  4. #4
    tim's Avatar
    tim
    tim is offline Senior Member
    Join Date
    Dec 2007
    Posts
    435
    Rep Power
    7

    Default

    Quote Originally Posted by AndreB View Post
    Sound easy, hm ?
    Wow, this stuff really is advanced. :D I'm going to do my honors next year and it includes "Genetic Algorithms". Not sure if it includes "Neural Nets". Sounds awesome though. ;) Sorry for the off topic post.

    Tim
    Eyes dwelling into the past are blind to what lies in the future. Step carefully.

  5. #5
    masternerdguy is offline Member
    Join Date
    Jan 2010
    Posts
    9
    Rep Power
    0

    Default

    Have a look at GOCR

    (it wont let me post a link, google GOCR)

    It's a functional open source char recognition program. You could read the source code and get an idea of how it is achieved.

  6. #6
    sunithamm is offline Member
    Join Date
    Dec 2009
    Posts
    19
    Rep Power
    0

    Default

    thanks a lot for al the help
    i am using PixelGrabber class to convert an image into array of pixels. A document image contains so many letters. can anybody tell me hw to get individual letters??
    Thanks in advance..

    Regards,
    Sunitha.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •