Results 1 to 4 of 4
Thread: Query to group URLs together
- 07-01-2011, 12:57 PM #1
Member
- Join Date
- Nov 2010
- Location
- Johannesburg
- Posts
- 23
- Rep Power
- 0
Query to group URLs together
Hi,
I've developed a Java application that allows me to process Squid proxy log files (which contain info such as websites visited, bandwidth consumed and date) and store them into a database, by the way all the mentioned fields are stored into the same table. Now, I'm struggling to group these different websites by URLs so that I'll be able to get the total bandwidth for a particular site. Below is an example of what I mean:
Record 1- http://www.somesite.com/retgdf
Record 2- http://www.somesite.com/party
Record 3- http://www.somesite.com/tryfsg
My main interest here is the main website (in this case www.somesite.com). I want to build a query that would be able to identify that all 3 records have in common (www.somesite.com) so that it discards the text after the last foreslash and the query returns www.somesite.com as a single record along with the sum of the bandwidth consumed by all 3 URLs (it will be the sum of the bandwidth consumed for Record 1 + bandwidth consumed for Record 3 etc...) .
The main use of this query would be to allow me to determine the bandwidth consumption by each site as I explained above...
Thank you
- 07-01-2011, 04:26 PM #2
Moderator
- Join Date
- Jul 2010
- Location
- California
- Posts
- 1,604
- Rep Power
- 5
Depending upon what database you are using, you could query via a regular expression. However this would be quite inefficient if your database is huge. An alternative would be to create a mapping table that defines the domain name itself, mapped to each of the entries - in which case you can query that table then join across the mapping to retrieve the unique pages.
- 07-04-2011, 10:56 AM #3
Moderator
- Join Date
- Apr 2009
- Posts
- 10,438
- Rep Power
- 16
The latter choice would be my preferred.
Get the data into the db (if possible) in a format you can then query easily (and can be indexed).
Should be easy enough to have the URL Strings broken up.
Trying to query substrings is simply not efficient.
- 07-04-2011, 01:02 PM #4
Member
- Join Date
- Nov 2010
- Location
- Johannesburg
- Posts
- 23
- Rep Power
- 0
Hi doWhile,
Thank you very much for your response. The only problem is that I have a huge database with tons of records, therefore there is a possibility that I may also have a relatively big number of domain names.
I don't know how I can build a query that can retrieve all the different domain names for me. I'd really appreciate your input on that.
Thanks...
Similar Threads
-
[HELP] Java with URLs/HTML
By Nerd in forum Java AppletsReplies: 7Last Post: 11-20-2010, 09:21 PM -
Manipulating URLs
By TheFlying_Boy in forum NetworkingReplies: 0Last Post: 08-03-2009, 05:01 PM -
Web Spider - Extract URLS
By heveen in forum NetworkingReplies: 2Last Post: 07-09-2009, 01:15 PM -
getting URLs
By Shiv in forum NetworkingReplies: 3Last Post: 04-16-2009, 05:48 PM


LinkBack URL
About LinkBacks
Reply With Quote

Bookmarks