Results 1 to 8 of 8

Thread: Jsoup help

  1. #1
    mbschultz97 is offline Super OP Noob
    Join Date
    May 2014
    Location
    Virginia
    Posts
    65
    Rep Power
    0

    Default Jsoup help

    Hey guys... I've been messing around with Jsoup try to get it to tell me the current price of Microsoft stock :P

    Java Code:
    import org.jsoup.Jsoup;
    import org.jsoup.nodes.Document;
    import org.jsoup.nodes.Element;
    import org.jsoup.select.Elements;
    
    public class GetInfo{
    
       public static void main(String[] args) throws Exception{
       
          String example = "http://www.nasdaq.com/symbol/msft";
          Document document = Jsoup.connect(example).get();
          Element price = document.getElementById("qwidget_lastsale");
          System.out.println(price);
       }
    }
    It prints out:
    <div id="qwidget_lastsale" class="qwidget-dollar">
    $39.54
    </div>
    How do I get the element price to only get 39.54? Also I was extremely lucky to find the Id, where in inspect element should I usually be looking to find the info I need for Jsoup? thanks! :D

  2. #2
    mbschultz97 is offline Super OP Noob
    Join Date
    May 2014
    Location
    Virginia
    Posts
    65
    Rep Power
    0

    Default Re: Jsoup help

    Awesome! I just found out how to get it to only return $39.54...

    Java Code:
    String random;
          random = price.text();
          System.out.println(random);
    I could still use some tips though about where to look to find Ids and stuff from inspect element in google chrome... thanks :)

  3. #3
    jashburn is offline Senior Member
    Join Date
    Feb 2014
    Posts
    219
    Rep Power
    1

    Default Re: Jsoup help

    It depends on what you want to look for. If you're lucky the data to extract is contained within an element with an ID like the one you found. If not, you may need to use the CSS/jQuery-like syntax to select the branch of elements to arrive to the unique destination containing the required data.

    The cookbook is a good place to start. See also the Selector API documentation.

    Using Chrome, the simplest way to get to the relevant line of HTML code containing the required data is to highlight the data in the browser, then right-click > Inspect Element. E.g., doing this on the "Exchange: NASDAQ" text on the web page opens up Chrome's Elements window, automatically drilling into
    Java Code:
    <span id="qbar_exchangeLabel">
        <b>Exchange: </b>
        "NASDAQ"
    </span>
    If you select "NASDAQ" on the Elements window, notice that on the status bar at the bottom of the window shows the progression of HTML elements leading up to "NASDAQ":
    Java Code:
    html  #body  ...  span#qbar_exchangeLabel  (text)
    At this point you have the choice of using either the DOM methods (e.g., getElementById(id)) or to use a selector (e.g., select(selector), making use of the progression of HTML elements to help you formulate your selector string) to get to the text contained here.

  4. #4
    mbschultz97 is offline Super OP Noob
    Join Date
    May 2014
    Location
    Virginia
    Posts
    65
    Rep Power
    0

    Default Re: Jsoup help

    I ran into another problem D: when i try to print out today's low it prints out $39.37 when on the nasdaq website it shows it as $ 39.37...
    Java Code:
    import org.jsoup.Jsoup;
    import org.jsoup.nodes.Document;
    import org.jsoup.nodes.Element;
    import org.jsoup.select.Elements;
    import java.util.*;
    
    public class GetInfo{
    
       public static void main(String[] args) throws Exception{
       
          Scanner scan = new Scanner(System.in);
          System.out.println("What stock would you like the price of?");
          String userCode = scan.nextLine();
          String example = "http://www.nasdaq.com/symbol/" + userCode;
          Document document = Jsoup.connect(example).get();
          Element price = document.getElementById("qwidget_lastsale");
          Element random = document.getElementById("Label1");
          System.out.println(price.text());
          System.out.println(random.text());
       }
    }
    how do i make it so it prints out $39.37 and not $39.37? thanks!

  5. #5
    SurfMan's Avatar
    SurfMan is offline Godlike
    Join Date
    Nov 2012
    Location
    The Netherlands
    Posts
    980
    Rep Power
    2

    Default Re: Jsoup help

    Just a comment: I am pretty sure screen scraping (as this process is called) is not allowed on the NASDAQ site, nor many other sites that provide numbers and things. You should check the Terms of Use or other documents on the website to check if you're doing "The Right Thing"™
    "It's not fixed until you stop calling the problem weird and you understand what was wrong." - gimbal2 2013

  6. #6
    mbschultz97 is offline Super OP Noob
    Join Date
    May 2014
    Location
    Virginia
    Posts
    65
    Rep Power
    0

    Default Re: Jsoup help

    Quote Originally Posted by SurfMan View Post
    Just a comment: I am pretty sure screen scraping (as this process is called) is not allowed on the NASDAQ site, nor many other sites that provide numbers and things. You should check the Terms of Use or other documents on the website to check if you're doing "The Right Thing"™
    I read through their terms of use and looked at the fair use section under the copyright act and I'm pretty sure I'm aloud to be doing this... Do you know why it returns $39.37 instead of $39.37? thanks :)
    Last edited by mbschultz97; 05-12-2014 at 12:29 AM.

  7. #7
    jashburn is offline Senior Member
    Join Date
    Feb 2014
    Posts
    219
    Rep Power
    1

    Default Re: Jsoup help

    I think this is an encoding issue. The web page's HTTP response header's Content-Type states that it uses the UTF-8 character set. jsoup automatically recognises this and uses the correct encoding when parsing the response (source: https://groups.google.com/forum/#!to...up/ZiuFi5BptQk , by Jonathan Hedly, jsoup author.)

    The HTML code for this is "$&nbsp;39.37", where &nbsp; is the HTML entity for a non-breaking space character. When rendered on a browser it looks like a space character, but it actually isn't. If you take a hex dump for this, &nbsp; is hex C2 A0 (as opposed to hex 20 for a "normal" space character.) In your case, when you print it out, what you see depends on the encoding (or on Windows, the code page) that is used. Unless UTF-8 is used in your output console, you will see character(s) other than the non-breaking space character (and this is typical with the Windows command prompt window.)

    Note: Don't worry if you don't understand the above. This is something you'll encounter if you work with software internationalisation (I18N).

    I wouldn't bother messing around with encoding. It's easier to just write the code to extract the numerical value that you want. E.g., you can do this:
    Java Code:
    // Extract price bypassing the $ and non-breaking space characters
    String lowPrice = random.text().substring(2); 
    System.out.println("$" + lowPrice);
    or if you're feeling adventurous,
    Java Code:
    // Use regular expression for more flexible search-and-replace
    String lowPrice = random.text().replaceAll("[^\\d]+([\\d\\.]+)", "\\$$1");
    System.out.println(lowPrice);
    Btw on using data from NASDAQ's web site, I wouldn't worry too much if this is for your own education. However if this is to be used publicly, e.g., if you're writing an application that will be used in a web site or that will be publicly distributed, commercially or otherwise, you're most likely not allowed to do so without express consent from NASDAQ. When in doubt, consult Legal.

  8. #8
    mbschultz97 is offline Super OP Noob
    Join Date
    May 2014
    Location
    Virginia
    Posts
    65
    Rep Power
    0

    Default Re: Jsoup help

    Quote Originally Posted by jashburn View Post
    I think this is an encoding issue. The web page's HTTP response header's Content-Type states that it uses the UTF-8 character set. jsoup automatically recognises this and uses the correct encoding when parsing the response (source: https://groups.google.com/forum/#!to...up/ZiuFi5BptQk , by Jonathan Hedly, jsoup author.)

    The HTML code for this is "$*39.37", where * is the HTML entity for a non-breaking space character. When rendered on a browser it looks like a space character, but it actually isn't. If you take a hex dump for this, * is hex C2 A0 (as opposed to hex 20 for a "normal" space character.) In your case, when you print it out, what you see depends on the encoding (or on Windows, the code page) that is used. Unless UTF-8 is used in your output console, you will see character(s) other than the non-breaking space character (and this is typical with the Windows command prompt window.)

    Note: Don't worry if you don't understand the above. This is something you'll encounter if you work with software internationalisation (I18N).

    I wouldn't bother messing around with encoding. It's easier to just write the code to extract the numerical value that you want. E.g., you can do this:
    Java Code:
    // Extract price bypassing the $ and non-breaking space characters
    String lowPrice = random.text().substring(2); 
    System.out.println("$" + lowPrice);
    or if you're feeling adventurous,
    Java Code:
    // Use regular expression for more flexible search-and-replace
    String lowPrice = random.text().replaceAll("[^\\d]+([\\d\\.]+)", "\\$$1");
    System.out.println(lowPrice);
    Btw on using data from NASDAQ's web site, I wouldn't worry too much if this is for your own education. However if this is to be used publicly, e.g., if you're writing an application that will be used in a web site or that will be publicly distributed, commercially or otherwise, you're most likely not allowed to do so without express consent from NASDAQ. When in doubt, consult Legal.
    thanks for the help :) and yah this program is just for me learning how to do this stuff and maybe show a couple people but nothing more :P

Similar Threads

  1. Jsoup cant Login on Page
    By pedromoto4 in forum New To Java
    Replies: 2
    Last Post: 10-06-2013, 01:15 PM
  2. scraping using Jsoup
    By gvs048 in forum New To Java
    Replies: 12
    Last Post: 06-07-2013, 09:00 AM
  3. jsoup 1.7.2
    By java software in forum Java Software
    Replies: 0
    Last Post: 02-01-2013, 06:05 AM
  4. jsoup 1.6.2
    By java software in forum Java Software
    Replies: 0
    Last Post: 04-02-2012, 05:05 PM
  5. JSoup how to submit form?
    By Gwindow in forum Networking
    Replies: 0
    Last Post: 07-12-2011, 09:07 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •