Results 1 to 3 of 3
  1. #1
    basfot is offline Member
    Join Date
    Feb 2012
    Posts
    25
    Rep Power
    0

    Default How to get absolute URL of web sites without links to files

    I need get absolute path of links without links to files. I have this code which get me links and some links there missing.

    Java Code:
    public class Main {
    
    public static void main(String[] args) throws Exception {
        URI uri = new URI("http://www.niocchi.com/");
        printURLofPages(uri);
    }
    
    private static void printURLofPages(URI uri) throws IOException {
        Document doc = Jsoup.connect(uri.toString()).get();
        Elements links = doc.select("a[href~=^[^#]+$]");
    
        for (Element link : links) {
            String href = link.attr("abs:href");
            URL url = new URL(href);
            String path = url.getPath();
            int lastdot = path.lastIndexOf(".");
            if (lastdot > 0) {
                String extension = path.substring(lastdot);
                if (!extension.equalsIgnoreCase(".html") && !extension.equalsIgnoreCase(".htm"))
                    return;
            }
            System.out.println(href);
        }
    }
    }
    This code get me following links:

    http://www.enormo.com/
    http://www.vitalprix.com/
    http://www.niocchi.com/javadoc
    http://www.niocchi.com/

    I need get this links:
    http://www.enormo.com/
    http://www.vitalprix.com/
    http://www.niocchi.com/javadoc
    http://www.linkedin.com/in/flmommens
    http://www.linkedin.com/in/ivanprado
    http://www.linkedin.com/in/marcgracia
    http://es.linkedin.com/in/tdibaja
    http://www.linkody.com
    http://www.niocchi.com/

    Thanks a lot for advices.
    Last edited by basfot; 02-23-2015 at 05:46 PM.

  2. #2
    SurfMan's Avatar
    SurfMan is offline Godlike
    Join Date
    Nov 2012
    Location
    The Netherlands
    Posts
    1,989
    Rep Power
    8

    Default Re: How to get absolute URL of web sites without links to files

    Those missing links have HREF in caps.
    "It's not fixed until you stop calling the problem weird and you understand what was wrong." - gimbal2 2013

  3. #3
    basfot is offline Member
    Join Date
    Feb 2012
    Posts
    25
    Rep Power
    0

    Default Re: How to get absolute URL of web sites without links to files

    Quote Originally Posted by SurfMan View Post
    Those missing links have HREF in caps.
    Thanks for advice but problem I had in another place.

    I solved for now. return --> continue ... MY STUPID MISTAKE

Similar Threads

  1. Replies: 1
    Last Post: 08-24-2013, 06:22 PM
  2. Absolute beginner with questions
    By robl249 in forum New To Java
    Replies: 2
    Last Post: 02-02-2013, 01:39 PM
  3. Absolute beginner
    By duff18 in forum New To Java
    Replies: 15
    Last Post: 02-19-2011, 10:07 AM
  4. Specifying absolute path in web.xml
    By Felissa in forum JavaServer Pages (JSP) and JSTL
    Replies: 1
    Last Post: 07-05-2007, 06:56 AM

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •