Results 1 to 7 of 7
  1. #1
    bholzer is offline Member
    Join Date
    May 2011
    Posts
    3
    Rep Power
    0

    Question Reading values between HTML tags.

    I want to write a program to run continuously in the background that will alert me when a value on a web page has changed.

    The value is within a unique tag on the page.

    How would I even begin to go about this?

  2. #2
    doWhile is offline Moderator
    Join Date
    Jul 2010
    Location
    California
    Posts
    1,642
    Rep Power
    7

    Default

    Lesson: Regular Expressions (The Java™ Tutorials > Essential Classes) will allow you to parse the html and find certain content. That being said, if by change you mean change via a script such as javascript the problem becomes exponentially more complicated.

  3. #3
    bholzer is offline Member
    Join Date
    May 2011
    Posts
    3
    Rep Power
    0

    Default

    Yea, I just want to parse html generated by php content.

  4. #4
    Solarsonic is offline Senior Member
    Join Date
    Mar 2011
    Posts
    261
    Rep Power
    4

    Default

    Use an URL and a BufferedReader like so:

    Java Code:
    URL url = new URL("direct url to page");
    BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));

  5. #5
    bholzer is offline Member
    Join Date
    May 2011
    Posts
    3
    Rep Power
    0

    Default

    I've gotten that far, but once I'm there, how do I designate exactly which values to pull?

  6. #6
    Maximus-EVG is offline Member
    Join Date
    Apr 2011
    Location
    Canada!
    Posts
    30
    Rep Power
    0

    Default

    Quote Originally Posted by bholzer View Post
    I've gotten that far, but once I'm there, how do I designate exactly which values to pull?
    Well the solution is to parse the page line by line, and find what you need based on regular expressions, or simple string operations.
    Hehe actually Im writing an article on a multithreaded webcrawler, so what I need are the html pages enclosed in <a href= </a> tags.
    Here's how I do it, perhaps you can adapt it to your own design.

    Java Code:
    public void scrapeYourself() throws IOException{
    		InputStream input = theURL.openConnection().getInputStream();
    		BufferedReader reader = new BufferedReader(new InputStreamReader(input));		
    		String line;
    		int startIndex = 0;
    		int endIndex = 0;
    		
    		while ((line = reader.readLine()) != null){
    			//System.out.println(line);
    			/*See if <aref=.... > tag is present. Stuff (link name, target attribute
    			 *  can be in between <aref="" and </a>, but consider simplest case for now  */
    			if (line.contains("<a href=\"http:") && line.contains("</a>") && (line.indexOf("<a href=\"http:") < line.indexOf("</a>"))){
    				//System.out.println("The line: " + line);
    				String[] linkTokensWithinLine = line.split("<a href=\"");
    				String theLink = null;		
    				/* Method 2  */
    				for (int i = 0; i < linkTokensWithinLine.length; i++){
    					/*Ignore part of the line before the first <a href=...> split */
    					if (linkTokensWithinLine[i].contains("</a>")){
    						try{
    						theLink = linkTokensWithinLine[i].substring(linkTokensWithinLine[i].indexOf("http:"), linkTokensWithinLine[i].indexOf("\"", linkTokensWithinLine[i].indexOf("http:")+ 7));
    						} catch (StringIndexOutOfBoundsException ex){
    							//If link is in javascript:/etc... format, just ignore it.
    						}
    					//System.out.println("" + i + ": " + linkTokensWithinLine[i]);
    					//System.out.println("Absolute URL: " + theLink); 
    					/*Add the proper URL to the list of this link's sublinks */
    					this.allSublinks.add(new Link(new URL(theLink), this.crawler));
    					/*Debugging: */
    					//System.out.println("    Sublink:" + this.allSublinks);
    					}
    				/*Method 2 END */
    
    				}				
    			}
    			
    		}
    		/*Release the resources */
    		reader.close();
    		input.close();
    	}

  7. #7
    doWhile is offline Moderator
    Join Date
    Jul 2010
    Location
    California
    Posts
    1,642
    Rep Power
    7

    Default

    Have you looked at the link I provided above? Did you try using anything learned in it, and if so what? A Pattern/Matcher combo will allow you to parse text in ways unimaginable, and is a powerful tool a programmer should have in their arsenal. You've provided little insight into what tag you wish to grab and what about it you want, so posting an SSCCE or example will help as well.

Similar Threads

  1. Help with html tags in java
    By peliukasss in forum New To Java
    Replies: 5
    Last Post: 02-03-2010, 06:13 AM
  2. Help in reading values from html form in java
    By ichkoguy in forum Advanced Java
    Replies: 7
    Last Post: 03-16-2009, 07:45 AM
  3. HTML tags anyone?
    By tim in forum Suggestions & Feedback
    Replies: 2
    Last Post: 06-29-2008, 04:49 AM
  4. Html tags within XML- need help
    By iamhappy in forum XML
    Replies: 2
    Last Post: 03-27-2008, 04:21 PM

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •