Page 1 of 2 12 LastLast
Results 1 to 20 of 22
  1. #1
    gandalf001 is offline Member
    Join Date
    May 2014
    Posts
    11
    Rep Power
    0

    Angry Java Jsoup with Javascript code

    Hello,
    I am trying to figuring out a problem. I have a Javascript in my HTML code. I want navigate through the links ending with "Doc". In this HTML there is only one link, called SunnydataDoc. So I want search this string on this page and if there are existing any links ending with "Doc", I want to navigate further down in those pages. Could you please help me out in this? I've heard I can use regex and match methods in combination with Jsoup. Here my code.

    Java Code:
    <script>
        var data = {"totalRecords": 2, "sort": "name", "startIndex": 0, "dir": "asc", "records": [{"raw_name": "samia/export/sunnydata", "last_changeset": "\n  <div>\n      <pre><a title=\"ownerID:\n\nAdded tag V2.11.d50.mkt.001 for changeset 56e10a4864ff\" class=\"tooltip\" href=\"/samia/export/sunnydata/changeset/f602409eba261d749d23dc75551b2959425dfa8d\">r17:f602409eba26</a></pre>\n  </div>\n", "atom": "\n    <a title=\"Subscribe to samia/export/sunnydata atom feed\" href=\"/samia/export/sunnydata/feed/atom?api_key=e214ebea2335318bee1460a1fd33725ab3e1002e\"><i class=\"icon-rss-sign\"  style=\"color: #fa9b39\"></i></a>\n", "owner": "ownerID (Owner)", "rss": "\n    <a title=\"Subscribe to samia/export/sunnydata rss feed\" href=\"/samia/export/sunnydata/feed/rss?api_key=e214ebea2335318bee1460a1fd33725ab3e1002e\"><i class=\"icon-rss-sign\" style=\"color: #fa9b39\"></i></a>\n", "name": "\n    \n  <div style=\"white-space: nowrap; }\">\n        <a href=\"/samia/export/sunnydata\">\n\n        <span title=\"Mercurial repository\"><i class=\"icon-hg\" style=\"color: #316293; font-size: 14px;\"></i></span>\n\n      <span style=\"margin: 0px 8px 0px 8px\"></span>\n    Sunnydata\n    </a>\n  </div>\n", "last_rev_raw": 17, "state": "\n  <div>\n        <div class=\"btn btn-mini btn-success disabled\">Created</div>\n  </div>\n", "menu": "\n  <ul class=\"menu_items hidden\">\n\n    <li style=\"border-top:1px solid #003367;margin-left:18px;padding-left:-99px\"></li>\n    <li>\n       <a title=\"Summary\" href=\"/samia/export/sunnydata\">\n       <span class=\"icon\">\n           <i class=\"icon-file-text\"></i>\n       </span>\n       <span>Summary</span>\n       </a>\n    </li>\n    <li>\n       <a title=\"Changelog\" href=\"/samia/export/sunnydata/changelog\">\n       <span class=\"icon\">\n           <i class=\"icon-list-alt\"></i>\n       </span>\n       <span>Changelog</span>\n       </a>\n    </li>\n    <li>\n       <a title=\"Files\" href=\"/samia/export/sunnydata/files/tip/\">\n       <span class=\"icon\">\n           <i class=\"icon-file-alt\"></i>\n       </span>\n       <span>Files</span>\n       </a>\n    </li>\n    <li>\n       <a title=\"Fork\" href=\"/samia/export/sunnydata/fork\">\n       <span class=\"icon\">\n           <i class=\"icon-code-fork\"></i>\n       </span>\n       <span>Fork</span>\n       </a>\n    </li>\n  </ul>\n", "desc": "GHU Sunnydataimport", "last_change": "\n  <span class=\"tooltip\" date=\"2014-08-21 18:49:50\" title=\"Thu, 21 Aug 2014 18:49:50\">10 days and 16 hours ago</span>\n"}, {"raw_name": "samia/export/sunnydatadoc", "last_changeset": "\n  <div>\n      <pre><a title=\"ownerID;lt;owneremail;gt;:\n\nChangedokumentation\" class=\"tooltip\" href=\"/samia/export/sunnydataDoc/changeset/9ed1679c7a35b76e1402b540cee38000461fdfdd\">r0:9ed1679c7a35</a></pre>\n  </div>\n", "atom": "\n    <a title=\"Subscribe to samia/export/sunnydataDoc atom feed\" href=\"/samia/export/sunnydataDoc/feed/atom?api_key=e214ebea2335318bee1460a1fd33725ab3e1002e\"><i class=\"icon-rss-sign\"  style=\"color: #fa9b39\"></i></a>\n", "owner": "ownerID (Owner)", "rss": "\n    <a title=\"Subscribe to samia/export/sunnydataDoc rss feed\" href=\"/samia/export/sunnydataDoc/feed/rss?api_key=e214ebea2335318bee1460a1fd33725ab3e1002e\"><i class=\"icon-rss-sign\" style=\"color: #fa9b39\"></i></a>\n", "name": "\n    \n  <div style=\"white-space: nowrap; }\">\n        <a href=\"/samia/export/sunnydataDoc\">\n\n        <span title=\"Mercurial repository\"><i class=\"icon-hg\" style=\"color: #316293; font-size: 14px;\"></i></span>\n\n      <span style=\"margin: 0px 8px 0px 8px\"></span>\n    SunnydataDoc\n    </a>\n  </div>\n", "last_rev_raw": 0, "state": "\n  <div>\n        <div class=\"btn btn-mini btn-success disabled\">Created</div>\n  </div>\n", "menu": "\n  <ul class=\"menu_items hidden\">\n\n    <li style=\"border-top:1px solid #003367;margin-left:18px;padding-left:-99px\"></li>\n    <li>\n       <a title=\"Summary\" href=\"/samia/export/sunnydataDoc\">\n       <span class=\"icon\">\n           <i class=\"icon-file-text\"></i>\n       </span>\n       <span>Summary</span>\n       </a>\n    </li>\n    <li>\n       <a title=\"Changelog\" href=\"/samia/export/sunnydataDoc/changelog\">\n       <span class=\"icon\">\n           <i class=\"icon-list-alt\"></i>\n       </span>\n       <span>Changelog</span>\n       </a>\n    </li>\n    <li>\n       <a title=\"Files\" href=\"/samia/export/sunnydataDoc/files/tip/\">\n       <span class=\"icon\">\n           <i class=\"icon-file-alt\"></i>\n       </span>\n       <span>Files</span>\n       </a>\n    </li>\n    <li>\n       <a title=\"Fork\" href=\"/samia/export/sunnydataDoc/fork\">\n       <span class=\"icon\">\n           <i class=\"icon-code-fork\"></i>\n       </span>\n       <span>Fork</span>\n       </a>\n    </li>\n  </ul>\n", "desc": "GHU Sunnydataimport (Dokumentation)", "last_change": "\n  <span class=\"tooltip\" date=\"2014-04-25 11:03:45\" title=\"Fri, 25 Apr 2014 11:03:45\">4 months and 6 days ago</span>\n"}]};
        var myDataSource = new YAHOO.util.DataSource(data);
        myDataSource.responseType = YAHOO.util.DataSource.TYPE_JSON;


    So in this example I have this link: href=\"/samia/export/sunnydataDoc\". I want to take this link and go in there with my code.

    And this is my Java code.

    Java Code:
    public class JScripttest {
    
    public static void main(String[] args) throws IOException {
    
        Response res = Jsoup
                .connect(
                        "url")
                .data("username", "username", "password", "password")
                .method(Method.POST).execute();
        Map<String, String> loginCookies = res.cookies();
        Document doc = Jsoup.connect("url")
                .cookies(loginCookies).get();
    
    
        Element script = doc.select("href").last();
    
        Pattern p = Pattern.compile("href\\s*=\\s*\"([^\"]+Doc)\"");
        Matcher m = p.matcher(script.html()); 
    
        while( m.find() )
        {
            System.out.println(m.group()); 
            System.out.println(m.group(1));
        }
    
        }
    
    
    
    private static void print(String msg, Object... args) {
        System.out.println(String.format(msg, args));
    }



    When I run the program I get "no match found". Can anyone help?

    Thanks

  2. #2
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    12,224
    Rep Power
    20

    Default Re: Java Jsoup with Javascript code

    Debug your code and see what 'script' actually contains...because I'm fairly sure JSoup will only parse HTML, and not anything in Javascript.
    So the 'select()' call will only return href's already in the page.
    Please do not ask for code as refusal often offends.

    ** This space for rent **

  3. #3
    gandalf001 is offline Member
    Join Date
    May 2014
    Posts
    11
    Rep Power
    0

    Default Re: Java Jsoup with Javascript code

    Hi Tolls,

    thanks for repyling. From another forum, I see Javascript is working with Jsoup. See here. java - Parse JavaScript with jsoup - Stack Overflow

  4. #4
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    12,224
    Rep Power
    20

    Default Re: Java Jsoup with Javascript code

    And that Stack Overflow thread says exactly what I said. JSoup does not parse Javascript.
    The accepted answer there involves manually parsing the content of the <script> tag.
    Please do not ask for code as refusal often offends.

    ** This space for rent **

  5. #5
    gimbal2 is offline Just a guy
    Join Date
    Jun 2013
    Location
    Netherlands
    Posts
    4,354
    Rep Power
    6

    Default Re: Java Jsoup with Javascript code

    And it states that in impossible to miss size 7 headings. Even if you need glasses and you didn't put them on, you can still read that.
    "Syntactic sugar causes cancer of the semicolon." -- Alan Perlis

  6. #6
    gandalf001 is offline Member
    Join Date
    May 2014
    Posts
    11
    Rep Power
    0

    Default Re: Java Jsoup with Javascript code

    Sorry guys, you are right, I was mistaking by saying Javascript is working fine with Jsoup. It should be Javascript can only work with manual parsing with Jsoup.
    But, thank you gimbal2 for your comment with glasses :-), as a professional you should be looking at the code what I am trying. Obviously I've chosen manual parsing and not chicky banana.

    So, pardon my expression, would you mind to take another look and help me in manual parsing. Eventually the regular expressoin would not the correct one, but I am not sure if something else is wrong.

  7. #7
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    12,224
    Rep Power
    20

    Default Re: Java Jsoup with Javascript code

    As I said in my first post:
    Java Code:
    doc.select("href").last();
    That returns any href's in the HTML, and therefore does nothing with anything in the Javascript.

    You need to get the contents of the <script> tag and then manually parse that, looking for href tags.
    Please do not ask for code as refusal often offends.

    ** This space for rent **

  8. #8
    gimbal2 is offline Just a guy
    Join Date
    Jun 2013
    Location
    Netherlands
    Posts
    4,354
    Rep Power
    6

    Default Re: Java Jsoup with Javascript code

    Take small steps. First write some code which finds any and all URLs in the code. I would start by looking for chunks of text which starts with "href=". You don't necessarily need a regular expression to do that.
    "Syntactic sugar causes cancer of the semicolon." -- Alan Perlis

  9. #9
    gandalf001 is offline Member
    Join Date
    May 2014
    Posts
    11
    Rep Power
    0

    Default Re: Java Jsoup with Javascript code

    Thanks gimbal2,

    I changed it, but no success, same error "no match found". Could be something wrong with the regex "href\\s*=\\s*\"([^\"]+Doc)\"" ?

  10. #10
    gandalf001 is offline Member
    Join Date
    May 2014
    Posts
    11
    Rep Power
    0

    Default Re: Java Jsoup with Javascript code

    I can easily get all href links from the whole HTML page. But those links are in the script section, I guess in Javascript block. I struggle to get those links. If I can find locate them correctly, the next step is to go inside of thosel links. Maybe you can also generally suggest how to get/navigate in links. I mean searching for links with special tag name or so and if found, get in those subpages and do further search for links. That is my approach.
    If you are saying, this can be done better with other solution, so please advice.
    thanks.
    Last edited by gandalf001; 09-02-2014 at 01:56 PM.

  11. #11
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    12,224
    Rep Power
    20

    Default Re: Java Jsoup with Javascript code

    "I guess in Javascript block"

    That's the problem then.
    You're guessing as to where these are.

    If you know they're in script tags on the page then you can use JSoup to get those and then parse them by hand.
    If they're js files, not sure how you'd get at them. Presumably follow the <script> link and parse each file.
    Please do not ask for code as refusal often offends.

    ** This space for rent **

  12. #12
    gandalf001 is offline Member
    Join Date
    May 2014
    Posts
    11
    Rep Power
    0

    Default Re: Java Jsoup with Javascript code

    So you are saying there is a difference between <script type="text/javascript"> and just <script>.
    In my HTML page, where I am looking for those links are only in <script> tag.
    So I can parse them by hand? Is my approach above correct?

  13. #13
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    12,224
    Rep Power
    20

    Default Re: Java Jsoup with Javascript code

    No, I'm saying there's a difference between
    <script type="text/javascript" src="/scripts/something.js">
    and
    <script>

    The first holds the javascript in a separate file, which is not the HTML page and is downloaded on its own (if needed).
    The second is embedded in the page.
    Please do not ask for code as refusal often offends.

    ** This space for rent **

  14. #14
    gandalf001 is offline Member
    Join Date
    May 2014
    Posts
    11
    Rep Power
    0

    Default Re: Java Jsoup with Javascript code

    OK, so how would you solve this problem. Its only on that one page where the href links are in <script> section. All other child links are in a normal tag.

  15. #15
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    12,224
    Rep Power
    20

    Default Re: Java Jsoup with Javascript code

    OK, so get the javascript text from the script tag and parse that.
    Which I think I've already suggested.

    And debug, debug, debug.
    Print out everything, never assume you've read in a particular piece of text...
    Please do not ask for code as refusal often offends.

    ** This space for rent **

  16. #16
    gandalf001 is offline Member
    Join Date
    May 2014
    Posts
    11
    Rep Power
    0

    Default Re: Java Jsoup with Javascript code

    ok, could you please demonstrate a small portion of debug. I beg your pardon, I am kind of a newbie in this field.
    appreciated.

  17. #17
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    12,224
    Rep Power
    20

    Default Re: Java Jsoup with Javascript code

    So why are you trying to use a fairly complex tool like JSoup and Regex parsing if you can't do System.out.println()?
    Please do not ask for code as refusal often offends.

    ** This space for rent **

  18. #18
    gandalf001 is offline Member
    Join Date
    May 2014
    Posts
    11
    Rep Power
    0

    Default Re: Java Jsoup with Javascript code

    Sorry, I dont get you now. Is that your understanding that debug is just "system.out......"? I wouldnt believe it. I know Jsoup generally, but never used javascriptcode with it. It is fairly easy to work with just HTML code. I have only the one part of this code where javascript is used, the rest are all HTML tags. So I think Jsoup would handle it, if I do this script part with manual parsing. Dont get me wrong, I am not questioning the willingnes of your help. I really thank yo four this. But having so much bad experience with this on quite many forums on the internet, I thought this forum is really having the specialists and not talking off-topic. On the other hand, this question with the part of javascript is just a small part of the whole work I am doing. And if you work on something for some weeks and investigate much time on it, you cannot just say, OK, lets use a different solution. I really have no time to waste on this.
    As my initial question is how to grap those links ending with "Doc" and navigating each further down and look for more href links. As a specialist with many years experience, this should not be very difficult to solve.
    Sorry, if I bothered you too much.

  19. #19
    gimbal2 is offline Just a guy
    Join Date
    Jun 2013
    Location
    Netherlands
    Posts
    4,354
    Rep Power
    6

    Default Re: Java Jsoup with Javascript code

    System.out.println() is the easiest form of debugging there is, yes. Since you make it really difficult to know what you do know and don't know, I can understand that Tolls recommends that form of debugging and not something more complicated as using an actual debugger.

    As a specialist with many years experience, this should not be very difficult to solve.
    I'm pretty sure that Tolls can whip something up in under 15 minutes. Why do you say that as if it is going to help you? The whole point here is that -you- do it, not Tolls. So far you've made it very difficult for Tolls to help you with that.
    "Syntactic sugar causes cancer of the semicolon." -- Alan Perlis

  20. #20
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    12,224
    Rep Power
    20

    Default Re: Java Jsoup with Javascript code

    Java Code:
    Element script = doc.select("href").last();
    
    Pattern p = Pattern.compile("href\\s*=\\s*\"([^\"]+Doc)\"");
    Matcher m = p.matcher(script.html());
    That's your current bit of relevant code.
    You are getting the wrong tag...you still haven't shown anything getting the correct tag for parsing the Javascript.

    Second, do you really want to run a match on the html of the <script> tag?
    I would (and this might take some experimenting, hence my "debug debug debug" comment, and use of Sysout) expect text() would be a better fit...it all depends how JSoup views the contents of a <script> tag.

    So there you go.

    That is all you should need in order to at least attempt this.
    Please do not ask for code as refusal often offends.

    ** This space for rent **

Page 1 of 2 12 LastLast

Similar Threads

  1. Jsoup help
    By mbschultz97 in forum New To Java
    Replies: 3
    Last Post: 05-15-2014, 04:04 AM
  2. Html scraping Site Loads Wrong Jsoup Java
    By kevinn205 in forum Advanced Java
    Replies: 1
    Last Post: 08-27-2012, 10:19 PM
  3. Need JAVASCRIPT code for button
    By presh4u in forum New To Java
    Replies: 8
    Last Post: 11-01-2008, 05:22 PM
  4. JAVASCRIPT code for button
    By presh4u in forum Java Applets
    Replies: 2
    Last Post: 10-31-2008, 03:03 PM
  5. please help with javascript code!!!!
    By sahkab in forum New To Java
    Replies: 1
    Last Post: 11-26-2007, 04:20 PM

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •