Results 1 to 3 of 3
  1. #1
    beijct is offline Member
    Join Date
    Nov 2011
    Posts
    12
    Rep Power
    0

    Default Problems Reading HTML

    Hi all, I've been asked to write a program that pulls links out of HTML code for a pretty basic website and prints them out, I'm just starting and am trying to print out all the lines that contain the string "href" and am having some trouble. Heres the code:


    Java Code:
     import java.net.*;
    import java.io.*;
    
    public class readURLTest {
    
    
       public static void main(String args[])
         {
           try {
              URL interisle = null;
              DataInputStream dis = null;
    
              interisle = new URL("http://www.interisle.net");
             
              dis = new DataInputStream(interisle.openStream());
    
              String line = dis.readLine();
          	  String ref = "href";
              boolean b = line.contains(ref);
    
              while (line != null)
                {
            	line = dis.readLine();
                if(b==true)
                {
                  System.out.println(line);              
                }
                }
            }
          catch (IOException e)
            {
              System.out.println("Error:" + e.getMessage());
            }
         }
      }
    my understanding is that the program should read through the HTML code line by line and print the lines that contain "href" but I don't get any returns or an error, so I'm assuming b is always false. Can anyone explain what I'm doing wrong? Also I'm sorry if this belongs in advanced java instead of new to java, I was unsure where it falls and since I have only been programming for a year I thought it would be best fit here, thanks.

  2. #2
    Fubarable's Avatar
    Fubarable is offline Moderator
    Join Date
    Jun 2008
    Posts
    19,316
    Blog Entries
    1
    Rep Power
    26

    Default Re: Problems Reading HTML

    For my money, I'd use a library that makes this easy to do, one that parses the HTML for you such as the wonderful JSoup.

  3. #3
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    12,172
    Rep Power
    20

    Default Re: Problems Reading HTML

    Java Code:
    String line = dis.readLine();
    String ref = "href";
    boolean b = line.contains(ref);
    while (line != null) {
       line = dis.readLine();
       if(b==true) {
          System.out.println(line);              
       }
    }
    You only check the first line read in for 'href'.
    That boolean never changes after that.
    Please do not ask for code as refusal often offends.

    ** This space for rent **

Similar Threads

  1. reading urls from html file.
    By fishy8158 in forum New To Java
    Replies: 2
    Last Post: 11-20-2011, 07:21 AM
  2. Reading values between HTML tags.
    By bholzer in forum New To Java
    Replies: 6
    Last Post: 05-03-2011, 03:25 AM
  3. problems with html parser
    By vitaly87 in forum Advanced Java
    Replies: 0
    Last Post: 03-13-2010, 02:37 PM
  4. reading an Html file and checking for urls
    By sudukrish in forum Advanced Java
    Replies: 1
    Last Post: 04-25-2009, 02:39 AM
  5. Help in reading values from html form in java
    By ichkoguy in forum Advanced Java
    Replies: 7
    Last Post: 03-16-2009, 08:45 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •