Results 1 to 4 of 4
  1. #1
    gotito is offline Member
    Join Date
    May 2013
    Posts
    1
    Rep Power
    0

    Default How to get the price (an image) from a webpage as a plain text?

    Hi!

    I need to get the price from a product's webpage. The price isn't displayed as plain characters, but as an image (inside a <img> tag). I know how to get the <img> tag from the website and also the URL of the "src" property, where the price is encoded as an image.

    For example, this is the <img> tag in the web side:
    <img style="margin-bottom:-5px;" alt="Preis" src="Moderator edit: url removed">

    So I get the URL [Moderator edit: irrelevant link removed]. From there, how I can get the price as a String?

    Thanks.
    Last edited by DarrylBurke; 06-02-2013 at 09:48 AM. Reason: Removed links

  2. #2
    JosAH's Avatar
    JosAH is offline Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,729
    Blog Entries
    7
    Rep Power
    21

    Default Re: How to get the price (an image) from a webpage as a plain text?

    Google for 'OCR' (Optical Character Recognition; it's a very complicated subject, but, depending on the character font(s) used, it can be done.

    kind regards,

    Jos

  3. #3
    heatblazer is offline Senior Member
    Join Date
    Nov 2012
    Posts
    137
    Rep Power
    0

    Default Re: How to get the price (an image) from a webpage as a plain text?

    Is this something like, Open CV, btw?

  4. #4
    JosAH's Avatar
    JosAH is offline Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,729
    Blog Entries
    7
    Rep Power
    21

    Default Re: How to get the price (an image) from a webpage as a plain text?

    Quote Originally Posted by heatblazer View Post
    Is this something like, Open CV, btw?
    I think open CV is much too powerful (and not aimed for OCR); it can fiddle-diddle with arbitrary images; OCR only needs to recognize characters. Basically OCR works like this: 1) you train a neural network with a certain set of images of characters; 2) you show some characters (here: digits) to the network and the network tries to recognize them. 3) finding individual characters in an image with more than one character is an entirely separate problem (it's a geometric problem).

    kind regards,

    Jos
    cenosillicaphobia: the fear for an empty beer glass

Similar Threads

  1. Plain text from XML
    By acio23 in forum XML
    Replies: 3
    Last Post: 05-14-2012, 03:27 PM
  2. plain text printing with java
    By berkeleybross in forum Advanced Java
    Replies: 3
    Last Post: 04-04-2011, 02:01 AM
  3. Replies: 3
    Last Post: 06-08-2010, 09:10 PM
  4. Indentation - plain text
    By barney75 in forum New To Java
    Replies: 1
    Last Post: 03-23-2009, 06:54 PM
  5. convert html to plain text
    By vissu007 in forum New To Java
    Replies: 3
    Last Post: 07-07-2007, 03:39 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •