Results 1 to 13 of 13
  1. #1
    wildheart is offline Member
    Join Date
    Apr 2009
    Posts
    12
    Rep Power
    0

    Default viewing sourcecode

    Hello I'm Leo.
    I'm a lazy person. I have a rapidshare account which is a bliss but sometimes its annoying because a website might have 20+ links most of which i'd have to copy myself into the address bar, click on GO, and then (95% of the time) my orbit download manager picks up the link by itself because its monitoring my clipboard.
    so i thought i'd write a program that would view the sourcode of a website and extract rapidshare links to a seperate file. then i just have orbit manager read that file and start the links automatically
    then i thought why not just have orbit read the source code itself. as simpler as it'd b if i did that, we all no that the links also include undesired text. for example a token in the source code might be <b>rapidsharelink</b>. Orbit doesnt read that.
    nywyz so i thought i'd write my own application. all i have to do is enter the website into my application and it'll view the source code online. like it'll view the source from the source (server or w/e the website is stored at) instead of me having to save hte source code on my desktop.

    how can i do it? if i'v confused you then here are some easy steps that i wish to accomplish
    1- run application
    2- insert address
    3- application checks website and reads source code
    4- application checks for rapidshare links and writes them into a seperate file
    5- i get orbit to read that new file
    6- let the downloading begin.

    i want step 3 to be automated. manual way: user goes to webiste, saves source code as txt file in desktop, then application reads that txt file and does steps 4-6. i want it to be done automatically. no user interaction.

    Help? :D

  2. #2
    OrangeDog's Avatar
    OrangeDog is offline Senior Member
    Join Date
    Jan 2009
    Location
    Cambridge, UK
    Posts
    838
    Rep Power
    6

    Default

    Use a URLConnection or similar I/O object to connect to the webserver and download the page. The source will be available as the InputStream from the connection.

  3. #3
    wildheart is offline Member
    Join Date
    Apr 2009
    Posts
    12
    Rep Power
    0

    Default

    thnx for the quick response but aside from understanding the terms u mentioned, i really dont get what you want me to do. i guess i should google urlconnection and see wht comes up. sorry am just starting. i wouldnt even b here had this problem not affected my la-zy lifestlye.
    so when you said it'll download hte page, you meant like cached? cuz i dont want anything downloaded. i THINk when u view a website it gets cached first and then you view it. NOT view it directly from the webserver? or am i wrong?
    thnx agian.

  4. #4
    wildheart is offline Member
    Join Date
    Apr 2009
    Posts
    12
    Rep Power
    0

    Default

    Ok i found a class that i THINK might solve my problem
    Here's the code

    import java.net.*;
    import java.io.*;

    public class URLReader {
    public static void main(String[] args) throws Exception {
    URL yahoo = new URL("rapidsharedownload dot net / videos/x-men-origins-wolverine-real-proper-workprint-xvid-ilg/");
    BufferedReader in = new BufferedReader(
    new InputStreamReader(
    yahoo.openStream()));

    String inputLine;

    while ((inputLine = in.readLine()) != null)
    System.out.println(inputLine);

    in.close();
    }
    }

    i tried yahoo. output and source code were way WAY different. i tried the link you saw in my code and the results were identical!
    So problem solved. Thank you for your help. I'll edit this code to include my search and write method.
    Thank you again.

    btw mayb you can tell me why the output i got was nothing compared to the actual source code when i tried yahoo. thnx

  5. #5
    OrangeDog's Avatar
    OrangeDog is offline Senior Member
    Join Date
    Jan 2009
    Location
    Cambridge, UK
    Posts
    838
    Rep Power
    6

    Default

    First of all I must remind you that downloading those files is rather illegal.
    Webpages are indeed often cached by browsers, they are stored in files in various proprietary locations with odd filenames so are difficult to get at. The yahoo homepage is generated using javascript - you can see this by looking at "view source" in your browser. The only way to see the results in a Java app would be to execute the javascript, at which point you may as well use an existing rendering engine.

  6. #6
    wildheart is offline Member
    Join Date
    Apr 2009
    Posts
    12
    Rep Power
    0

    Default

    sorry but what files are you talking about? all am doing is that instead of visiting a page and copying rapidshare links to my download manager, i get a program to do it thts all.

    next, an 'efficiency' question.
    there are two while loops. one tht reads the lines of the txt file and checks if they contain rapidsharedotcom. if it does then it goes through another while loop where the line is tokenized and every token is checked for the string rapidsharedotcom. when a token is found its copied to a new txt file.
    my question is, is there a way to make the entire txt file as one line? cuz i can save myself one while loop. i'd just tokenize the whole thing and use only one while loop in the prg to check every token for w/e string i want.
    EDIT EDIT EDIT also, suppose i cant make the whole txt file as one line. thts ok. which is better? to check if a line contains the string then tokenize it and check every token till i get the string then copy that token? or to read a line, tokenize it, and then check every token? which is faster? cuz if i check if the line contains the string b4 tokenizing it i can save some time. but if i just tokenize the line right away then i save myself an if statement which algorithm wise is no big deal cuz its just a constant when you calculate the complexity of your program. so...? :D

    Thanx
    Last edited by wildheart; 04-02-2009 at 11:58 PM.

  7. #7
    OrangeDog's Avatar
    OrangeDog is offline Senior Member
    Join Date
    Jan 2009
    Location
    Cambridge, UK
    Posts
    838
    Rep Power
    6

    Default

    Indeed, and I am only providing assistance in Java programming concepts.

    You can use regular expressions to check the whole thing in one go using java.util.regex.Pattern and java.util.regex.Matcher. There is no need to tokenize. A possible pattern would be
    Java Code:
    pattern = new Pattern("href=\"(http://rapidshare\\.com/.*?)\"");
    When matched, the link will be in capture group #1.
    Last edited by OrangeDog; 04-03-2009 at 04:31 AM.

  8. #8
    wildheart is offline Member
    Join Date
    Apr 2009
    Posts
    12
    Rep Power
    0

    Default

    ya i no about regular expressions but i only checked 3 links and they werent helpful at all so i thought i'd do it the hard way. I was going to ask my professor tomorrow about it but thanx.
    Doing it the long way, 2 loops i think, got me a clean link without any nondigit/letter characters.
    basically the format is httpcolon//(or wwwdot)rapidsharedotcom/files/file#/name.all extensions i could think of from experience.### The .### is because sometimes ppl split the files and always instruct to use hjsplit. this prg always splits them into 001 002 and so on.

    if i want to really make this easier for me, i need to find the regular expression that pretty much accepts that format and nothing else.

    next step is to modify this class i found online where the clipboard is under constant monitoring. as soon as you copy anything this ugly interface appears where you paste w/e you copied in w/e directory you choose. i just need it to monitor copying strings tht contain tht format :D so it'll take a while to modify this 663 line class
    worst case i just do wht i do now. enter the address manually and let it do the rest.
    btw, is copying "letter" from MSWORD treated differently than copying wwwdotrapidshare/files/023024234/hellodotrar from the address bar? or are they both treated as copying txt from w/e source?

    as for the illegal think, i literally dont no what files u were talking about so thts why i asked. wht r they? or is this entire prg bad? :O

    yes u have been helpful indeed thanx
    Last edited by wildheart; 04-03-2009 at 04:46 AM.

  9. #9
    OrangeDog's Avatar
    OrangeDog is offline Senior Member
    Join Date
    Jan 2009
    Location
    Cambridge, UK
    Posts
    838
    Rep Power
    6

    Default

    You know, that whole downloading films from sharing sites thing.

    The API doc for Pattern has a guide to regexing - Java Platform SE 6. The pattern \d{3} would match 3 consecutive digits (equivalent to [0-9][0-9][0-9]), but something that does this
    Java Code:
    href="(http://(www\.)?rapidshare\.com/files/\d+?/NAME/.+?)"
    is probably the best for making sure you have the right links. N.B. you have to escape all the '\' and '"' when you embed this in a Java String literal.

  10. #10
    OrangeDog's Avatar
    OrangeDog is offline Senior Member
    Join Date
    Jan 2009
    Location
    Cambridge, UK
    Posts
    838
    Rep Power
    6

    Default

    Not sure about the clipboard stuff, but java.awt.datatransfer is the package to look at.

  11. #11
    wildheart is offline Member
    Join Date
    Apr 2009
    Posts
    12
    Rep Power
    0

    Default

    ya thts the one. ok so now i learned tht if i website is blocked then my program wont work.
    Exception in thread "main" java.io.IOException: Server returned HTTP response code: 403 for URL: site'sname
    at sundotnetdotwwwdotprotocoldothttpdotHttpURLConnect ion.getInputStream(HttpURLConnection.java:1241)
    at rapidshare.main(rapidshare.java:10)

    in cases like this i'll just have to view the source manually.

    now i need to get some sleep. its already 8am! will come back here tonight and hopefully my professor would have che3cked my code and gave me some feedback.

  12. #12
    OrangeDog's Avatar
    OrangeDog is offline Senior Member
    Join Date
    Jan 2009
    Location
    Cambridge, UK
    Posts
    838
    Rep Power
    6

    Default

    If the website's returning a 403 then presumably you can't look at it in a browser either (but it is possible that it is rejecting UserAgents it doesn't like).

    TIP: If you use the "Go Advanced" controls you can turn off link embedding. I imagine it should let you post class names like the one above - makes everything easier to read.

  13. #13
    wildheart is offline Member
    Join Date
    Apr 2009
    Posts
    12
    Rep Power
    0

    Default

    i lost interest >.< there are still some kinks that i need to work on my own code. tht sucks. but i did take a look at tht huge class file i mentioned and took some code out of it. but this is going to b added to the ton of applications tht i have in my incomplete folder. i'v worked on a lot of different applications (small or big) using a few diff languages yet most of the time i just stop. its like i got the gist of it so i stop and move on >.< sorry
    but thnx. u really did help.
    perhaps i'll b back another time with a new project :D

Similar Threads

  1. Problems viewing applets
    By teamvarsity87 in forum Java Applets
    Replies: 5
    Last Post: 02-24-2009, 01:58 AM
  2. InsertionSort My SourceCode - wrong results
    By JohnF8FJohn in forum New To Java
    Replies: 1
    Last Post: 11-27-2008, 01:11 AM
  3. Viewing contents of zip file
    By Java Tip in forum Java Tip
    Replies: 0
    Last Post: 03-03-2008, 05:16 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •