Results 1 to 2 of 2
  1. #1
    yasmin k is offline Member
    Join Date
    Mar 2009
    Posts
    23
    Rep Power
    0

    Question hello need help :)

    Hello i am doing application recursion and sets.

    i will need to write a program thats given a url 'u' thats prints out set of all url's which are reachable from 'u' but are also in the same domain. the domian is a command line argument. i assume a link is in the domain if it contains the domain in the string e.g Google (two command line arguments) should print out all the url's reachable from Google. i cannot visit link outside the domain (i.e links which do not contain second command line argument.




    i have already started the coding and pretty much did most of it i have three method, i need help on the first 2 methods, my first method need to contain the string and the domain and my second method will need to contain a string, domain and a hash set.

    Java Code:
    import org.htmlparser.util.*;	
    import org.htmlparser.*;
    import org.htmlparser.tags.*;
    import org.htmlparser.filters.*;
    import java.util.HashSet;
    class betterRecursiveCrawler1
    {
    
    
    	public static HashSet<String> visit (String url) 
    	{
    		
    		
    		HashSet <String> s1 = new HashSet(); 
    		try{
    			Parser parser1 = new Parser (url);
    			NodeList list1 = parser1.parse (new LinkStringFilter("http:")); // no filter
    			for (int i=0;i<list1.size();i++)
    			{
    				String st = ((LinkTag)(list1.elementAt(i))).extractLink();
    	        		s1.add(st);	
    	    
    			}
    			return s1;
    		   }		
    		catch (Exception e)
    		{
    			return new HashSet();
    		}	
    	}
    
    	
    	
    	public static HashSet<String> visit (String url, int depth,HashSet<String> already) 
    	{
    		
    		HashSet <String> s= new HashSet();
    		if (depth==0) {s.add(url); return s;}
    		else {  
    			already.add(url);
    			HashSet <String> t= visit(url);
    			for (String u:t) 
    			   if (!already.contains(u))
    				{
    				   
    				   already.add(u);
    				   s.addAll(visit(u,depth-1,already));
    				}
    					
    		     }
    		     
    		return s;	
    	}
    	
    	public static void main(String args[]) throws Exception	
    	{ 
    		int depth = Integer.parseInt(args[1]);
    		HashSet <String> already=new HashSet();
    		HashSet <String> s = visit(args[0],depth,already); 
    		//for (String u:s) System.out.println(u);
    		System.out.println(s.size());
    	}
    }

  2. #2
    Turtle is offline Member
    Join Date
    Nov 2007
    Location
    New Zealand
    Posts
    36
    Rep Power
    0

    Default

    Hi Yasmin_K,

    Neat program. But I fail to understand what your problem is.

    --- Instructions to others interested in running this code... ---

    download library from: HTML Parser - HTML Parser
    compile using: javac -cp .;htmlexer.jar;htmlparser.jar betterRecursiveCrawler1.java
    run using: java -cp .;htmlexer.jar;htmlparser.jar betterRecursiveCrawler1 http://google.com]Google 2

    --- end ---
    Last edited by Turtle; 02-02-2010 at 10:09 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •