Results 1 to 14 of 14
Thread: Word Frequency in News Articles
- 06-26-2012, 08:36 PM #1
Member
- Join Date
- Jun 2012
- Posts
- 9
- Rep Power
- 0
Word Frequency in News Articles
Hello everyone. I was wondering if anyone could point me in the right direction. I am trying to write a Java method which will take, as an argument, a string, 'searchText' and an array of strings, 'keyWords'.
It will then open a URL connection to google news or some other news website and perform a search using the given 'searchText' string. Then, it will open the first 50 news articles returned and count the number of times each of the 'keyWords' strings show up in each article.
I am fairly experienced with Java programming for local applications, but I have never tried to do anything extensive with web access.
Can anyone point me in the right direction or give me some pointers? I would really appreciate it.
- 06-26-2012, 08:46 PM #2
Re: Word Frequency in News Articles
Which part of this are you stuck on?
How to Ask Questions the Smart Way
Static Void Games - GameDev tutorials, free Java and JavaScript hosting!
Static Void Games forum - Come say hello!
- 06-26-2012, 09:01 PM #3
Member
- Join Date
- Jun 2012
- Posts
- 9
- Rep Power
- 0
Re: Word Frequency in News Articles
Basically, I feel capable of opening a URL connection to the website, but I'm not sure how to:
1. Execute the search
2. Open/download the contents of each returned web page
3. Isolate the main body of the article from the banners, sidebars, comments, etc.
Once I have the raw content from the articles, it should be a breeze.
Thanks for the reply and I really appreciate any help.
- 06-26-2012, 10:51 PM #4
Member
- Join Date
- Jun 2012
- Posts
- 9
- Rep Power
- 0
Re: Word Frequency in News Articles
I believe this may belong in the 'new to java' forum, so I am going to move it there.
- 06-26-2012, 10:52 PM #5
Member
- Join Date
- Jun 2012
- Posts
- 9
- Rep Power
- 0
Word Frequency in News Articles
Moved from Forum - Learn Java - Java Code - Web Service:
Hello everyone. I was wondering if anyone could point me in the right direction. I am trying to write a Java method which will take, as an argument, a string, 'searchText' and an array of strings, 'keyWords'.
It will then open a URL connection to google news or some other news website and perform a search using the given 'searchText' string. Then, it will open the first 50 news articles returned and count the number of times each of the 'keyWords' strings show up in each article.
I am fairly experienced with Java programming for local applications, but I have never tried to do anything extensive with web access.
Basically, I feel capable of opening a URL connection to the website, but I'm not sure how to:
1. Execute the search
2. Open/download the contents of each returned web page
3. Isolate the main body of the article from the banners, sidebars, comments, etc.
Once I have the raw content from the articles, it should be a breeze.
Can anyone point me in the right direction or give me some pointers? I would really appreciate it.
- 06-26-2012, 11:08 PM #6
- Join Date
- Sep 2008
- Location
- Voorschoten, the Netherlands
- Posts
- 14,422
- Blog Entries
- 7
- Rep Power
- 29
Re: Word Frequency in News Articles
Given a URL object you can get a URLConnection from it; that connection object can give you an ordinary InputStream from which you can read its (html) content; filtering out the interesting part of the content differs per web site; start reading the API documentation for the mentioned classes.
kind regards,
JosBuild a wall around Donald Trump; I'll pay for it.
- 06-26-2012, 11:12 PM #7
Re: Word Frequency in News Articles
If you want to read an html page using a URL, create a URL pointing to the site, use the openStream() method wrapped in a BufferedReader and read the content of the page returned by the site.
I don't know what parser to use to extract content from html page minus the tags.If you don't understand my response, don't ignore it, ask a question.
- 06-26-2012, 11:40 PM #8
Member
- Join Date
- Jun 2012
- Posts
- 9
- Rep Power
- 0
Re: Word Frequency in News Articles
Thanks for the replies. I am familiar with the URL, URLConnection, and BufferedReader classes. I am more concerned about how to execute the search (ie. get the list of returned URLs) and parse the data from those URLs. Anyone have any experience with this? or ideas?
- 06-27-2012, 12:39 AM #9
Moderator
- Join Date
- Jul 2010
- Location
- California
- Posts
- 1,638
- Rep Power
- 13
Re: Word Frequency in News Articles
Read about the structure of html code...
...and please be forthright when posting the same question to different forums.
Word Frequency in News Articles
- 06-27-2012, 02:12 AM #10
Senior Member
- Join Date
- Apr 2010
- Location
- Belgrade, Serbia
- Posts
- 278
- Rep Power
- 11
Re: Word Frequency in News Articles
Maybe this could help:
Many HTML pages contain forms — text fields and other GUI objects that let you enter data to send to the server. After you type in the required information and initiate the query by clicking a button, your Web browser writes the data to the URL over the network. At the other end the server receives the data, processes it, and then sends you a response, usually in the form of a new HTML page.
Many of these HTML forms use the HTTP POST METHOD to send data to the server. Thus writing to a URL is often called posting to a URL. The server recognizes the POST request and reads the data sent from the client...
Reading from and Writing to a URLConnection (The Java™ Tutorials > Custom Networking > Working with URLs)
- 06-27-2012, 05:32 AM #11
Member
- Join Date
- Jun 2012
- Posts
- 9
- Rep Power
- 0
Re: Word Frequency in News Articles
Thanks, cselic. I suspect this will be very helpful. I was kind of hoping to find a class someone had written specifically to tackle this problem. Something with a 'getURLsFromSearchString()' method.
Oh well, I guess I'll have to learn something...Last edited by mblem22; 06-27-2012 at 05:40 AM. Reason: Got it.
- 06-27-2012, 02:58 PM #12
Re: Word Frequency in News Articles
What is wrong with posting it to two different forums? Are the two affiliated? Why the two URLs?If you don't understand my response, don't ignore it, ask a question.
- 06-27-2012, 03:12 PM #13
Re: Word Frequency in News Articles
How to Ask Questions the Smart Way
Static Void Games - GameDev tutorials, free Java and JavaScript hosting!
Static Void Games forum - Come say hello!
- 06-27-2012, 04:09 PM #14
Member
- Join Date
- Jun 2012
- Posts
- 9
- Rep Power
- 0
Similar Threads
-
Sort by Word frequency and alphabetically
By darpan12 in forum New To JavaReplies: 3Last Post: 01-06-2011, 07:26 PM -
Count the frequency of the word in a text file instead of a sentence.
By bMorgan in forum New To JavaReplies: 3Last Post: 11-09-2010, 01:10 AM -
good news!!
By bunz in forum Forum LobbyReplies: 1Last Post: 03-17-2010, 02:28 PM -
Word Frequency
By capu in forum Advanced JavaReplies: 2Last Post: 10-09-2008, 03:03 PM -
Applet with news articles
By Sarek in forum Java AppletsReplies: 2Last Post: 11-20-2007, 12:33 PM
Bookmarks