Results 1 to 9 of 9
  1. #1
    Dark's Avatar
    Dark is offline Senior Member
    Join Date
    Apr 2011
    Location
    Camp Lejuene, North Carolina
    Posts
    643
    Rep Power
    4

    Default Grabbing HTML source code from a URL

    Hello, and thank you for opening my thread as I will assume you did in an attempt to help me. So right now my goal is to grab the source code from a website, and save it to a text file. This is only a little section of what I want my end goal to be, but this is where I'm at right now.

    Now so far I have successfully managed to do everything I just said. There is just one issue, the next step in my code is to go through it line by line and interpret what each line means. The only issue is, the method I am using is saving everything on one line. I am not sure why it is saving it on one line, or if the method I'm using is even possible to do this. My code is below, and I would definitely love some pointers on where I need to go from here.

    Java Code:
    public class HtmlGet {
    	public static void main(String[] args) throws MalformedURLException, IOException{
    	//Yes I know, these are only there like this to get my compiler to shut up for my SSCCE
    		URL url = new URL("http://www.mytestaddress.com");
    
    		BufferedReader in = new BufferedReader(
    						new InputStreamReader(url.openStream()));
    
    		String inputLine;
    		try {
    			BufferedWriter writer = new BufferedWriter(new FileWriter("C:\\Users\\YourComputerName\\Desktop\\test.txt"));
    			//If a change needs to be made, I believe it would be somewhere in this area.
    			while ((inputLine = in.readLine()) != null)
    				writer.write(inputLine);
    			} catch(IOException ex) {
    				System.out.println("Couldn't write to file.");
    				ex.printStackTrace();
    				}
    			in.close();
    			writer.close();
    	}
    }
    This code should be compilable, I rewrote it from hand because I am on a computer without a compiler to test. If it is not, I will fix it when I have access to the resources.
    • Use [code][/code] tags when posting code. That way people don't want to stab their eyes out when trying to help you.
    • +Rep people for helpful posts.

  2. #2
    DarrylBurke's Avatar
    DarrylBurke is online now Member
    Join Date
    Sep 2008
    Location
    Madgaon, Goa, India
    Posts
    11,244
    Rep Power
    19

    Default Re: Grabbing HTML source code from a URL

    Quote Originally Posted by Dark View Post
    There is just one issue, the next step in my code is to go through it line by line and interpret what each line means. The only issue is, the method I am using is saving everything on one line. I am not sure why it is saving it on one line
    You're reading the input line by line -- which strips the linefeed at the end of each line -- and writing it out without adding a linefeed after each line.

    db
    If you're forever cleaning cobwebs, it's time to get rid of the spiders.

  3. #3
    Dark's Avatar
    Dark is offline Senior Member
    Join Date
    Apr 2011
    Location
    Camp Lejuene, North Carolina
    Posts
    643
    Rep Power
    4

    Default Re: Grabbing HTML source code from a URL

    Quote Originally Posted by DarrylBurke View Post
    You're reading the input line by line -- which strips the linefeed at the end of each line -- and writing it out without adding a linefeed after each line.

    db
    Ok, I figured that was the case. I have tried using the \n character to insert newlines. However that isn't working if I add it to any of the existing strings. Is there a different way I should be looking in to or am I just not placing my +"\n" string where it is supposed to be?

    Java Code:
    while (((inputLine = in.readLine())+"\n") != null){
    			            writer.write(inputLine);
    					}
    Ends up crashing my text document every time I try to open it after running it.

    And this attempt does not place a \n anywhere but at the end of the document.

    Java Code:
    while ((inputLine = in.readLine()) != null){
    			            writer.write(inputLine + "\n");
    					}
    So, I know the new line character needs to be before writer.write and after setting inputLine to in.readLine(). Am I at least in the right swimming pool here DB? This is actually bugging me so much right now that I can't focus on the Calculus review sheet for my exam that is coming up in less than an hour.
    • Use [code][/code] tags when posting code. That way people don't want to stab their eyes out when trying to help you.
    • +Rep people for helpful posts.

  4. #4
    DarrylBurke's Avatar
    DarrylBurke is online now Member
    Join Date
    Sep 2008
    Location
    Madgaon, Goa, India
    Posts
    11,244
    Rep Power
    19

    Default Re: Grabbing HTML source code from a URL

    Quote Originally Posted by Dark View Post
    Ends up crashing my text document every time I try to open it after running it.
    I haven't a clue what you mean by that. Care to share the details?

    db
    If you're forever cleaning cobwebs, it's time to get rid of the spiders.

  5. #5
    Dark's Avatar
    Dark is offline Senior Member
    Join Date
    Apr 2011
    Location
    Camp Lejuene, North Carolina
    Posts
    643
    Rep Power
    4

    Default Re: Grabbing HTML source code from a URL

    Quote Originally Posted by DarrylBurke View Post
    I haven't a clue what you mean by that. Care to share the details?

    db
    The text file opens, but then notepad stops responding before any of the text shows up. Almost like my program is still running, but at the end of the code I have a System.out.println("finished") to reassure me it finished running. This printed for me, and then I tried to open the .txt in Notepad. Waiting for Windows to try and figure out why it stopped responding doesn't yield any additional information.

    I don't know if it is a limitation of the computer, as it's a netbook and only has 2 gigs of ram, but the text file should only be KB so that shouldn't be the issue.
    Last edited by Dark; 03-28-2013 at 07:11 PM.
    • Use [code][/code] tags when posting code. That way people don't want to stab their eyes out when trying to help you.
    • +Rep people for helpful posts.

  6. #6
    DarrylBurke's Avatar
    DarrylBurke is online now Member
    Join Date
    Sep 2008
    Location
    Madgaon, Goa, India
    Posts
    11,244
    Rep Power
    19

    Default Re: Grabbing HTML source code from a URL

    Notepad has loads of shortcomings, try another text editor if you can.

    Also, Notepad doesn't recognize \n without \r as a linefeed. You could try adding System.getProperty("line.separator") (see System Properties (The Java™ Tutorials > Essential Classes > The Platform Environment)) instead of "\n"

    db
    If you're forever cleaning cobwebs, it's time to get rid of the spiders.

  7. #7
    pbrockway2 is offline Moderator
    Join Date
    Feb 2009
    Location
    New Zealand
    Posts
    4,565
    Rep Power
    12

    Default Re: Grabbing HTML source code from a URL

    "((inputLine = in.readLine())+"\n") != null" is a crazy condition to use. I don't think it will ever be false so if it appears that your application is still running that could well be because it is still running.

    The other one is the way to go. But as db says Notepad won't recognise a bare '\n' character as meaning "new line".

    Good luck with the calculus test.

  8. #8
    Dark's Avatar
    Dark is offline Senior Member
    Join Date
    Apr 2011
    Location
    Camp Lejuene, North Carolina
    Posts
    643
    Rep Power
    4

    Default Re: Grabbing HTML source code from a URL

    Quote Originally Posted by DarrylBurke View Post
    Notepad has loads of shortcomings, try another text editor if you can.

    Also, Notepad doesn't recognize \n without \r as a linefeed. You could try adding System.getProperty("line.separator") (see System Properties (The Java™ Tutorials > Essential Classes > The Platform Environment)) instead of "\n"

    db
    Java Code:
    public class HtmlGet {
        public static void main(String[] args) throws MalformedURLException, IOException{
        //Yes I know, these are only there like this to get my compiler to shut up for my SSCCE
            URL url = new URL("http://www.mytestaddress.com");
     
            BufferedReader in = new BufferedReader(
                            new InputStreamReader(url.openStream()));
     
            String inputLine;
            try {
                BufferedWriter writer = new BufferedWriter(new FileWriter("C:\\Users\\YourComputerName\\Desktop\\test.txt"));
                //If a change needs to be made, I believe it would be somewhere in this area.
                while ((inputLine = in.readLine()) != null)
                    inputLine = inputLine + System.getProperty("line.separator");
                    writer.write(inputLine);
                } catch(IOException ex) {
                    System.out.println("Couldn't write to file.");
                    ex.printStackTrace();
                    }
                in.close();
                writer.close();
        }
    }
    I got this to work, whether or not "\n" worked in notepad or Notepad++ I found I had to change the inputLine variable before it went in to the writer. Just posting my final solution and my own personal discovery.

    Thanks DB and pb, helpful as always.
    • Use [code][/code] tags when posting code. That way people don't want to stab their eyes out when trying to help you.
    • +Rep people for helpful posts.

  9. #9
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    12,015
    Rep Power
    20

    Default Re: Grabbing HTML source code from a URL

    You could have gone with a PrintWriter instead, and just used println().
    Please do not ask for code as refusal often offends.

    ** This space for rent **

Similar Threads

  1. how to obtain html source of this page
    By ali zi zeperto in forum New To Java
    Replies: 3
    Last Post: 10-09-2012, 12:47 PM
  2. Replies: 16
    Last Post: 01-31-2012, 08:36 PM
  3. Replies: 0
    Last Post: 08-07-2011, 08:32 PM
  4. Help me source code
    By choqi Amt in forum NetBeans
    Replies: 2
    Last Post: 07-18-2010, 09:57 AM
  5. Replies: 1
    Last Post: 11-28-2008, 06:27 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •