Page 1 of 10 123 ... LastLast
Results 1 to 20 of 186
  1. #1
    nickrowe_2k is offline Member
    Join Date
    May 2010
    Location
    Buckinghamshire
    Posts
    77
    Rep Power
    0

    Lightbulb Capturing Data From A Html File

    Hi guys im new to Java programming and my boss has asked me to do something thats puzzling me. I know the method i need to follow with regards to standard programming principles but having never used java im unsure how to go about it.

    Basically I need to read in a html file from a java program and extract dynamichtml resource text references from it (im guessing by using a wildcard) once I have extracted them I want to put them in an array container and sort them alphabetically before outputting them.

    If anyone can help me i'd really be greatful, its giving me a hard time.
    regards Nick:)
    Last edited by nickrowe_2k; 05-19-2010 at 04:13 PM. Reason: update

  2. #2
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    12,224
    Rep Power
    20

    Default

    What have you done so far?

    And tell your boss that getting people to do work on Java who know nothing about Java without even giving them a training course is out and out silly.

  3. #3
    nickrowe_2k is offline Member
    Join Date
    May 2010
    Location
    Buckinghamshire
    Posts
    77
    Rep Power
    0

    Default

    So Far I have A Reader which i found online and an array which I have created using a tutorial.

    I realise what I have isn't much to go by. And with my knowledge from programming that I gained at uni I know what path I need to follow. I just dont have a clue about how to go about it. I wish I could just do it in Javascript lol.

    Basically this is driving me mad and I agree its silly. within the html file i have there are many references to resources such as dynamichtml ........name...
    I need to capture the names of all these instances and store them in an array until the reader/buffer completes the document. When all the instances have been collected i need to sort them alphabetically and print them out.

    I know i need some kind of autoarray but so far this is all i've been able to find as a base. Its ridiculous, but when ur a junior and are getting thrown this stuff what do u do. He's sort of a put u in an office on your own and figure it out bloke lol.

    import java.io.BufferedInputStream;
    import java.io.DataInputStream;
    import java.io.File;
    import java.io.FileInputStream;
    import java.io.FileNotFoundException;
    import java.io.IOException;

    File myhtml = new File("C:\Documents and Settings\Kieren McDonald\Desktop\Nick\Java\my.html");
    FileInputStream fileinput = null;
    BufferedInputStream mybuffer = null;
    DataInputStream datainput = null;

    fileinput = new FileInputStream(myhtml);
    mybuffer = new BufferedInputStream(fileinput);
    datainput = new DataInputStream(mybuffer);

    while (datainput.available() != 0) {
    System.out.println(datainput.readLine());
    }

    myHTML.close();
    mybuffer.close();
    datainput.close();

    class Array {
    public static void main(String[] args) {
    String[] anArray; // declares an array of strings

    anArray = String[3]; // allocates memory for 3 strings

    anArray[0] = dynamichtml TimelineManager_top_links; // initialize first element
    anArray[1] = dynamichtml TimelineManager_footer; // initialize second element
    anArray[2] = dynamichtml TimelineManager_quicksearch_form; // etc.


    System.out.println("Resource Name 0: " + anArray[0]);
    System.out.println("Resource Name 1: " + anArray[1]);
    System.out.println("Resource Name 2: " + anArray[2]);

    }
    }

  4. #4
    JosAH's Avatar
    JosAH is offline Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,765
    Blog Entries
    7
    Rep Power
    21

    Default

    You have a lot to read: start reading the API documentation for every class that starts with HTML and read about the parser that does the job: the DocumentParser class. When such an object parses the HTML text it does all the nitty-gritty work for you and it calls a HTMLEditorKit.ParserCallBack object for everything interesting it found. You have to write that callback class by (preferably?) extending from the HTMLEditorKit.ParserCallBack class.

    It may seem confusing at first but the entire scenario resembles the SAX parser approach (for XML).

    kind regards,

    Jos

  5. #5
    Norm's Avatar
    Norm is offline Moderator
    Join Date
    Jun 2008
    Location
    Eastern Florida
    Posts
    17,882
    Rep Power
    25

    Default

    dynamic html resource text references
    Could you define what it is you are trying to extract from an html page, perhaps with some examples.

  6. #6
    nickrowe_2k is offline Member
    Join Date
    May 2010
    Location
    Buckinghamshire
    Posts
    77
    Rep Power
    0

    Default Explanation

    Ok basically i have a html file which contains several resource references to components used on a content management system.

    Using my java program i need to extract the names of these references which all start with dynamichtml (followed by their name i.e. quicksearch_form)

    So with my java program i need to read in the html file, im guessing have a wild card that searches for an instance of what comes after dynamichtml and then stores it in a point within an array. Once the instance has been stored the process needs to loop until the end of the document. Once the document has been read the array needs to be sorted alphabetically and then outputted.

    This is so when in component wizard the resources appear alphabetically instead of how they appear to be called within the html file.

  7. #7
    Norm's Avatar
    Norm is offline Moderator
    Join Date
    Jun 2008
    Location
    Eastern Florida
    Posts
    17,882
    Rep Power
    25

    Default

    Sorry, my question is: What do you mean by dynamichtml? Is that javascript or ???
    And what are the resources? URLs or ???

    references which all start with dynamichtml
    Or is "dynamichtml" the prefix for some variable or data or what?

  8. #8
    nickrowe_2k is offline Member
    Join Date
    May 2010
    Location
    Buckinghamshire
    Posts
    77
    Rep Power
    0

    Smile

    Below is an example or a resource I have pasted in from the body of the html file.
    :)



    <@dynamichtml TimelineManager_quicksearch_form@>
    <div>
    <form name="QUICK_SEARCHFORM" method="GET" action="<$ssNodeLink(80013)$>" style="PADDING-RIGHT: 0px; DISPLAY: inline; PADDING-LEFT: 0px; PADDING-BOTTOM: 0px; MARGIN: 0px; PADDING-TOP: 0px">
    <input type=hidden name="QueryText" value="">
    <input type=hidden name="ResultCount" value="<$AdvancedSearch_ResultCount$>">
    <input class="searchField" accesskey="f" size="16" name="searchStr">&nbsp;
    <input class="searchButton" accesskey="g" type="button" value="Search" onclick="QuickSearch_BuildQueryTextAndSubmit()">
    </form>
    </div>
    <@end@>

    kind regards Nick :)

  9. #9
    Norm's Avatar
    Norm is offline Moderator
    Join Date
    Jun 2008
    Location
    Eastern Florida
    Posts
    17,882
    Rep Power
    25

    Default

    Thanks.
    Who/what processes the <@ ... @> tags? And the <$...$>

  10. #10
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    12,224
    Rep Power
    20

    Default

    I suspect the HTML parser stuff Jos mentioned won't handle that terribly well.
    I could be wrong, though.

    If, as asked by Norm, you knew what did the initial processing of these you might be able to nick the code that searches for these things? That would be half the battle sorted out...

  11. #11
    nickrowe_2k is offline Member
    Join Date
    May 2010
    Location
    Buckinghamshire
    Posts
    77
    Rep Power
    0

    Default

    I believe the tags are processed by the component wizard installed with the ucm, but my boss has explained that it can all be done via a java program.

    What i dont understand is that 1 if i create an array for each resource in javascript rather than java its quicker and two I can easily go into each resource and manually arrange them so that they output in the same way anyway.

    The idea is that the java program sees an instance of a resource captures, stores and then outputs it in an ascending order.

    Obv this would be quicker once the program is completed but having not worked with java its like a slap in the face. I figured that by using the reader to read the html file and convert it to a string i could simply look for an instance of dynamichtml as a string using some sort of wildcard.

    hope that makes sense :)

  12. #12
    nickrowe_2k is offline Member
    Join Date
    May 2010
    Location
    Buckinghamshire
    Posts
    77
    Rep Power
    0

    Default

    Hi Tolls, Norm

    Yeah initially i thought the same however, when i opened up the component wizard, looked at a resource file and then opened up the java tab there was nothing in there. I thought if I could read the code I could just modify it a little but unfortunately no luck.

    Good idea though

  13. #13
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    12,224
    Rep Power
    20

    Default

    Bang goes my favourite trick...:)

  14. #14
    nickrowe_2k is offline Member
    Join Date
    May 2010
    Location
    Buckinghamshire
    Posts
    77
    Rep Power
    0

    Default

    Haha, yeh mine too actually, sooooo much easier.
    Having not worked with Java before, i knew id at least be able to modify some existing code and find some bits and pieces on the net to guide me the rest of the way, unfortunately I cant find ANY tutorials or instances of what i am finding to do.

    I simply cant believe that no one has attempted to take several occurances of a string from a html file and bring it over to be sorted and outputted in a java program. I would have thought that this type of thing would be common.

    I mean if a reader converts EVERYTHING into a string then why cant i just search for a string of text referencing dynamictml, even if it doesnt bring up the variable resource it should still bring back the resource name shouldn't it? or is that just an uneducated answer to my java problem?

  15. #15
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    12,224
    Rep Power
    20

    Default

    So you just want the <@dynamichtml blahblahblah@> bit?
    Not the stuff after it to the <@end@> part?

  16. #16
    nickrowe_2k is offline Member
    Join Date
    May 2010
    Location
    Buckinghamshire
    Posts
    77
    Rep Power
    0

    Default

    Yep i only need the name(s) and to sort them alphabetically once i have them.
    I cant search for the names EXACTLY as they are different for each htm file. im simply working on the one at the moment. So i need to write some code that will extract the name AFTER dynamichtml and sort it. thats all :)

  17. #17
    nickrowe_2k is offline Member
    Join Date
    May 2010
    Location
    Buckinghamshire
    Posts
    77
    Rep Power
    0

    Default

    Dude if you can help ur getting the biggest hi-5 of your life and definately a god star, possibly a chocolate biscuit and the title of absolute legend to go with it :)

  18. #18
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    12,224
    Rep Power
    20

    Default

    OK, re-reading your OP...

    Read the strings in.
    You can use indexOf() (I think that's the method name) to see if the "<@dynamichtml" is in there....store the number.
    (Someone might have a funkier regex for hunting these down, but I'm just doing a brute force thing)

    So, you now know the index of the '<' character, so you can offset to the bit you're interested in, and run to the indexOf("@>"), assuming you don't have more than one of these things on a line.

    Store that resulting string.
    That is, use substring().

    Here's the String api, which has all this stuff.

    That'll allow you to populate your ArrayList.

  19. #19
    Norm's Avatar
    Norm is offline Moderator
    Join Date
    Jun 2008
    Location
    Eastern Florida
    Posts
    17,882
    Rep Power
    25

    Default

    How does being in an Html file concern the project. It looks like this is just a String search project.

  20. #20
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    12,224
    Rep Power
    20

    Default

    Yep, that's what it seems.
    I thought initially it might have been html tags they were looking for, but I misread that.

Page 1 of 10 123 ... LastLast

Similar Threads

  1. How can I include a html file in html textarea?
    By surya_dks in forum New To Java
    Replies: 2
    Last Post: 10-04-2008, 08:20 AM
  2. get data from servlet to html
    By lema in forum Java Servlet
    Replies: 7
    Last Post: 05-22-2008, 05:00 PM
  3. get data from html to servlet
    By lema in forum Java Servlet
    Replies: 66
    Last Post: 04-09-2008, 03:43 PM
  4. Replies: 0
    Last Post: 04-04-2008, 10:36 AM
  5. how to upload a file along with html form data
    By pranith in forum Java Servlet
    Replies: 3
    Last Post: 07-30-2007, 03:33 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •