Results 1 to 10 of 10

Thread: File Download

  1. #1
    AnGuRuSO is offline Member
    Join Date
    Oct 2008
    Posts
    9
    Rep Power
    0

    Default File Download

    so I'm using code from [somewhere] in the hopes of inputting a URL and getting the file that URL points to. It works, sometimes. 90% of the time it doesn't work. Also, it seems a lot of the pages I want to download do not end in html, although I'm not too sure if that is a problem or not. Anyway, what I want is to know if its possible to supply a URL as a string and then download the file that URL points to. I have a feeling there's an easier way to do that, but I'm not sure.

    I'm writing a little program that download HTML files, scapes important information from those HTML files and writes the important stuff to a database. So far, as you might have guessed, things aren't going too well.

    Java Code:
    import java.io.*;
    import java.net.*;
    
    /*
     * Command line program to download data from URLs and save
     * it to local files. Run like this:
     * java FileDownload h t t p://schmidt.devlib.org/java/file-download.html
     * @author Marco Schmidt
     */
    public class FileDownload {
    	public static void download(String address, String localFileName) {
    		OutputStream out = null;
    		URLConnection conn = null;
    		InputStream  in = null;
    		
    		SocketAddress sa = new InetSocketAddress("proxy.csu.edu.au", 8080);
    		Proxy proxy = new Proxy(Proxy.Type.HTTP, sa);
    		
    		try {
    			URL url = new URL(address);
    			out = new BufferedOutputStream(
    				new FileOutputStream(localFileName));
    			conn = url.openConnection(proxy);
    			in = conn.getInputStream();
    			byte[] buffer = new byte[1024];
    			int numRead;
    			long numWritten = 0;
    			while ((numRead = in.read(buffer)) != -1) {
    				out.write(buffer, 0, numRead);
    				numWritten += numRead;
    			}
    			System.out.println(localFileName + "\t" + numWritten);
    		} catch (Exception exception) {
    			exception.printStackTrace();
    		} finally {
    			try {
    				if (in != null) {
    					in.close();
    				}
    				if (out != null) {
    					out.close();
    				}
    			} catch (IOException ioe) {
    			}
    		}
    	}
    
    	public static void download(String address) {
    		int lastSlashIndex = address.lastIndexOf('/');
    		if (lastSlashIndex >= 0 &&
    		    lastSlashIndex < address.length() - 1) {
    			download(address, address.substring(lastSlashIndex + 1));
    		} else {
    			System.err.println("Could not figure out local file name for " +
    				address);
    		}
    	}
    
    	public static void main(String[] args) {
    
    			download("h t tp://schmidt.devlib.org/java/file-download.html");
    		
    	}
    }



    Angus Cheng

  2. #2
    Eranga's Avatar
    Eranga is offline Moderator
    Join Date
    Jul 2007
    Location
    Colombo, Sri Lanka
    Posts
    11,372
    Blog Entries
    1
    Rep Power
    20

  3. #3
    AnGuRuSO is offline Member
    Join Date
    Oct 2008
    Posts
    9
    Rep Power
    0

    Default

    Sure did:

    Java Code:
    java.io.IOException: Server returned HTTP response code: 407 for URL: [REMOVED]
        at 
    [REMOVED]
    (HttpURLConnection.java:1241)
        at FileDownload.download(FileDownload.java:24)
        at FileDownload.download(FileDownload.java:52)
        at FileDownload.main(FileDownload.java:61)

  4. #4
    Eranga's Avatar
    Eranga is offline Moderator
    Join Date
    Jul 2007
    Location
    Colombo, Sri Lanka
    Posts
    11,372
    Blog Entries
    1
    Rep Power
    20

    Default

    Ah, you get 407 response code. That mean you have to, actually you need to, authenticate the client with the proxy itself.

  5. #5
    AnGuRuSO is offline Member
    Join Date
    Oct 2008
    Posts
    9
    Rep Power
    0

    Default

    I thought I already did that with these two lines:

    Java Code:
    SocketAddress sa = new InetSocketAddress("proxy.csu.edu.au", 8080);
    		Proxy proxy = new Proxy(Proxy.Type.HTTP, sa)
    Not saying your wrong, just saying I'm not sure how to authenticate the client with the proxy itself.

  6. #6
    Eranga's Avatar
    Eranga is offline Moderator
    Join Date
    Jul 2007
    Location
    Colombo, Sri Lanka
    Posts
    11,372
    Blog Entries
    1
    Rep Power
    20

  7. #7
    AnGuRuSO is offline Member
    Join Date
    Oct 2008
    Posts
    9
    Rep Power
    0

    Default

    Thanks for your help Eranga, but I've found away around that problem (I requested and was granted a direct connection) and now found a new problem.

    A lot of pages, especially horse racing results pages, do not have URLs that end in a filename.

    The code fails if the url does not end in a filename. So if I were to input:

    h t t ps://w w w.sportsbet.com.au/results/racing/Date/today

    the code will fail.

    Any ideas? In the meantime I'm scouring the internet looking for someone who has horse racing results that have links ending in .html

    Thanks in advanced,
    Angus

  8. #8
    Eranga's Avatar
    Eranga is offline Moderator
    Join Date
    Jul 2007
    Location
    Colombo, Sri Lanka
    Posts
    11,372
    Blog Entries
    1
    Rep Power
    20

    Default

    You cannot download a folder in HTTP request like in your code above. In that case you need to have a zip file, so there is an extension, to identify a file.

    To avoid that issue, from the URL find the last part after the / sign and check the extension. Basically find the last index of / sign and from that point to the end gives the file name.

  9. #9
    AnGuRuSO is offline Member
    Join Date
    Oct 2008
    Posts
    9
    Rep Power
    0

    Default

    One day I hope to look upon this thread and laugh at how much of a bad programmer I was. Anyway, problem solved, everything is GREAT thanks a lot for your help Eranga

    What I did was very simple.

    1. I didn't bother with proxy settings and used a direct internet connection (might bite me in the ass later).
    2. There are two download methods in the above code.

    download(String address);
    download(String address, String outputFileName);

    At first I was calling the first download function, which looks for a filename from the address, then calls the second download function.

    So now I have supplied the address of the page I want to download, then hardcoded an outputFileName. Everything works and just in case anyone out there is as stupid as me (not likely) here it is:

    Java Code:
    import java.io.*;
    import java.net.*;
    
    /*
     * Command line program to download data from URLs and save
     * it to local files. Run like this:
     * java FileDownload [SOME SORT OF ADDRESS]
     * @author Marco Schmidt
     */
    public class FileDownload {
    	public static void download(String address, String localFileName) {
    		OutputStream out = null;
    		URLConnection conn = null;
    		InputStream  in = null;
    		
    		//SocketAddress sa = new InetSocketAddress("proxy.csu.edu.au", 8080);
    		//Proxy proxy = new Proxy(Proxy.Type.HTTP, sa);
    		
    		try {
    			URL url = new URL(address);
    			out = new BufferedOutputStream(
    				new FileOutputStream(localFileName));
    			//conn = url.openConnection(proxy);
    			conn = url.openConnection();
    			in = conn.getInputStream();
    			byte[] buffer = new byte[1024];
    			int numRead;
    			long numWritten = 0;
    			while ((numRead = in.read(buffer)) != -1) {
    				out.write(buffer, 0, numRead);
    				numWritten += numRead;
    			}
    			System.out.println(localFileName + "\t" + numWritten);
    		} catch (Exception exception) {
    			exception.printStackTrace();
    		} finally {
    			try {
    				if (in != null) {
    					in.close();
    				}
    				if (out != null) {
    					out.close();
    				}
    			} catch (IOException ioe) {
    			}
    		}
    	}
    
    	public static void download(String address) {
    		int lastSlashIndex = address.lastIndexOf('/');
    		if (lastSlashIndex >= 0 &&
    		    lastSlashIndex < address.length() - 1) {
    			download(address, address.substring(lastSlashIndex + 1));
    		} else {
    			System.err.println("Could not figure out local file name for " +
    				address);
    		}
    	}
    
    	public static void main(String[] args) {
    
    			download("[ADDRESS]", "jur.txt");
    		
    	}
    }

  10. #10
    Eranga's Avatar
    Eranga is offline Moderator
    Join Date
    Jul 2007
    Location
    Colombo, Sri Lanka
    Posts
    11,372
    Blog Entries
    1
    Rep Power
    20

    Default

    Grete, that's the way to learn. Start from the basis thing you know. Implement it first. While you learning new things you can move with them. Nice work lol, and good luck for your studies on Java. :)

Similar Threads

  1. Download JDK 1.5
    By Nick15 in forum New To Java
    Replies: 4
    Last Post: 01-01-2009, 04:10 AM
  2. Download managers
    By islamfunny in forum CLDC and MIDP
    Replies: 4
    Last Post: 08-15-2008, 02:19 AM
  3. file download
    By abhiN in forum New To Java
    Replies: 0
    Last Post: 02-08-2008, 10:10 AM
  4. What jdk do I download?
    By padutch2 in forum New To Java
    Replies: 3
    Last Post: 11-29-2007, 05:28 AM
  5. I could download JDK 1.5
    By Albert in forum New To Java
    Replies: 2
    Last Post: 07-13-2007, 04:36 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •