Results 1 to 14 of 14
  1. #1
    mblem22 is offline Member
    Join Date
    Jun 2012
    Posts
    9
    Rep Power
    0

    Default Word Frequency in News Articles

    Hello everyone. I was wondering if anyone could point me in the right direction. I am trying to write a Java method which will take, as an argument, a string, 'searchText' and an array of strings, 'keyWords'.

    It will then open a URL connection to google news or some other news website and perform a search using the given 'searchText' string. Then, it will open the first 50 news articles returned and count the number of times each of the 'keyWords' strings show up in each article.

    I am fairly experienced with Java programming for local applications, but I have never tried to do anything extensive with web access.

    Can anyone point me in the right direction or give me some pointers? I would really appreciate it.

  2. #2
    KevinWorkman's Avatar
    KevinWorkman is offline Crazy Cat Lady
    Join Date
    Oct 2010
    Location
    Washington, DC
    Posts
    3,701
    Rep Power
    8

    Default Re: Word Frequency in News Articles

    Which part of this are you stuck on?
    How to Ask Questions the Smart Way
    Static Void Games - Play indie games, learn from game tutorials and source code, upload your own games!

  3. #3
    mblem22 is offline Member
    Join Date
    Jun 2012
    Posts
    9
    Rep Power
    0

    Default Re: Word Frequency in News Articles

    Basically, I feel capable of opening a URL connection to the website, but I'm not sure how to:

    1. Execute the search
    2. Open/download the contents of each returned web page
    3. Isolate the main body of the article from the banners, sidebars, comments, etc.

    Once I have the raw content from the articles, it should be a breeze.

    Thanks for the reply and I really appreciate any help.

  4. #4
    mblem22 is offline Member
    Join Date
    Jun 2012
    Posts
    9
    Rep Power
    0

    Default Re: Word Frequency in News Articles

    I believe this may belong in the 'new to java' forum, so I am going to move it there.

  5. #5
    mblem22 is offline Member
    Join Date
    Jun 2012
    Posts
    9
    Rep Power
    0

    Default Word Frequency in News Articles

    Moved from Forum - Learn Java - Java Code - Web Service:

    Hello everyone. I was wondering if anyone could point me in the right direction. I am trying to write a Java method which will take, as an argument, a string, 'searchText' and an array of strings, 'keyWords'.

    It will then open a URL connection to google news or some other news website and perform a search using the given 'searchText' string. Then, it will open the first 50 news articles returned and count the number of times each of the 'keyWords' strings show up in each article.

    I am fairly experienced with Java programming for local applications, but I have never tried to do anything extensive with web access.

    Basically, I feel capable of opening a URL connection to the website, but I'm not sure how to:

    1. Execute the search
    2. Open/download the contents of each returned web page
    3. Isolate the main body of the article from the banners, sidebars, comments, etc.

    Once I have the raw content from the articles, it should be a breeze.

    Can anyone point me in the right direction or give me some pointers? I would really appreciate it.

  6. #6
    JosAH's Avatar
    JosAH is offline Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,019
    Blog Entries
    7
    Rep Power
    20

    Default Re: Word Frequency in News Articles

    Given a URL object you can get a URLConnection from it; that connection object can give you an ordinary InputStream from which you can read its (html) content; filtering out the interesting part of the content differs per web site; start reading the API documentation for the mentioned classes.

    kind regards,

    Jos
    cenosillicaphobia: the fear for an empty beer glass

  7. #7
    Norm's Avatar
    Norm is online now Moderator
    Join Date
    Jun 2008
    Location
    Eastern Florida
    Posts
    16,607
    Rep Power
    23

    Default Re: Word Frequency in News Articles

    If you want to read an html page using a URL, create a URL pointing to the site, use the openStream() method wrapped in a BufferedReader and read the content of the page returned by the site.

    I don't know what parser to use to extract content from html page minus the tags.
    If you don't understand my response, don't ignore it, ask a question.

  8. #8
    mblem22 is offline Member
    Join Date
    Jun 2012
    Posts
    9
    Rep Power
    0

    Default Re: Word Frequency in News Articles

    Thanks for the replies. I am familiar with the URL, URLConnection, and BufferedReader classes. I am more concerned about how to execute the search (ie. get the list of returned URLs) and parse the data from those URLs. Anyone have any experience with this? or ideas?

  9. #9
    doWhile is offline Moderator
    Join Date
    Jul 2010
    Location
    California
    Posts
    1,642
    Rep Power
    6

    Default Re: Word Frequency in News Articles

    Read about the structure of html code...

    ...and please be forthright when posting the same question to different forums.
    Word Frequency in News Articles

  10. #10
    cselic is offline Senior Member
    Join Date
    Apr 2010
    Location
    Belgrade, Serbia
    Posts
    278
    Rep Power
    5

    Default Re: Word Frequency in News Articles

    Quote Originally Posted by mblem22 View Post
    Thanks for the replies. I am familiar with the URL, URLConnection, and BufferedReader classes. I am more concerned about how to execute the search (ie. get the list of returned URLs) and parse the data from those URLs. Anyone have any experience with this? or ideas?
    Maybe this could help:
    Many HTML pages contain forms — text fields and other GUI objects that let you enter data to send to the server. After you type in the required information and initiate the query by clicking a button, your Web browser writes the data to the URL over the network. At the other end the server receives the data, processes it, and then sends you a response, usually in the form of a new HTML page.

    Many of these HTML forms use the HTTP POST METHOD to send data to the server. Thus writing to a URL is often called posting to a URL. The server recognizes the POST request and reads the data sent from the client...

    Reading from and Writing to a URLConnection (The Java™ Tutorials > Custom Networking > Working with URLs)

  11. #11
    mblem22 is offline Member
    Join Date
    Jun 2012
    Posts
    9
    Rep Power
    0

    Default Re: Word Frequency in News Articles

    Thanks, cselic. I suspect this will be very helpful. I was kind of hoping to find a class someone had written specifically to tackle this problem. Something with a 'getURLsFromSearchString()' method.

    Oh well, I guess I'll have to learn something...
    Last edited by mblem22; 06-27-2012 at 04:40 AM. Reason: Got it.

  12. #12
    Norm's Avatar
    Norm is online now Moderator
    Join Date
    Jun 2008
    Location
    Eastern Florida
    Posts
    16,607
    Rep Power
    23

    Default Re: Word Frequency in News Articles

    What is wrong with posting it to two different forums? Are the two affiliated? Why the two URLs?
    We'd like to know if other's have answered this question so we don't waste time working on an answer for a question that has already been answered on another forum. Also anyone else looking for a solution to this problem can see all the responses that have been posted for it on all the forums where it has been asked.
    If you don't understand my response, don't ignore it, ask a question.

  13. #13
    KevinWorkman's Avatar
    KevinWorkman is offline Crazy Cat Lady
    Join Date
    Oct 2010
    Location
    Washington, DC
    Posts
    3,701
    Rep Power
    8

    Default Re: Word Frequency in News Articles

    Quote Originally Posted by mblem22 View Post
    I believe this may belong in the 'new to java' forum, so I am going to move it there.
    Please don't post multiple copies of the same question in different forums. That just creates more work for the moderators and makes it harder to track what's already been said.

    I merged your duplicate posts.
    How to Ask Questions the Smart Way
    Static Void Games - Play indie games, learn from game tutorials and source code, upload your own games!

  14. #14
    mblem22 is offline Member
    Join Date
    Jun 2012
    Posts
    9
    Rep Power
    0

Similar Threads

  1. Sort by Word frequency and alphabetically
    By darpan12 in forum New To Java
    Replies: 3
    Last Post: 01-06-2011, 06:26 PM
  2. Replies: 3
    Last Post: 11-09-2010, 12:10 AM
  3. good news!!
    By bunz in forum Forum Lobby
    Replies: 1
    Last Post: 03-17-2010, 01:28 PM
  4. Word Frequency
    By capu in forum Advanced Java
    Replies: 2
    Last Post: 10-09-2008, 02:03 PM
  5. Applet with news articles
    By Sarek in forum Java Applets
    Replies: 2
    Last Post: 11-20-2007, 11:33 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •