Results 1 to 4 of 4
  1. #1
    Unnel is offline Member
    Join Date
    Nov 2010
    Location
    Johannesburg
    Posts
    23
    Rep Power
    0

    Default Query to group URLs together

    Hi,

    I've developed a Java application that allows me to process Squid proxy log files (which contain info such as websites visited, bandwidth consumed and date) and store them into a database, by the way all the mentioned fields are stored into the same table. Now, I'm struggling to group these different websites by URLs so that I'll be able to get the total bandwidth for a particular site. Below is an example of what I mean:

    Record 1- http://www.somesite.com/retgdf

    Record 2- http://www.somesite.com/party

    Record 3- http://www.somesite.com/tryfsg


    My main interest here is the main website (in this case www.somesite.com). I want to build a query that would be able to identify that all 3 records have in common (www.somesite.com) so that it discards the text after the last foreslash and the query returns www.somesite.com as a single record along with the sum of the bandwidth consumed by all 3 URLs (it will be the sum of the bandwidth consumed for Record 1 + bandwidth consumed for Record 3 etc...) .

    The main use of this query would be to allow me to determine the bandwidth consumption by each site as I explained above...

    Thank you

  2. #2
    doWhile is offline Moderator
    Join Date
    Jul 2010
    Location
    California
    Posts
    1,642
    Rep Power
    7

    Default

    Depending upon what database you are using, you could query via a regular expression. However this would be quite inefficient if your database is huge. An alternative would be to create a mapping table that defines the domain name itself, mapped to each of the entries - in which case you can query that table then join across the mapping to retrieve the unique pages.

  3. #3
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    11,751
    Rep Power
    19

    Default

    The latter choice would be my preferred.
    Get the data into the db (if possible) in a format you can then query easily (and can be indexed).
    Should be easy enough to have the URL Strings broken up.

    Trying to query substrings is simply not efficient.

  4. #4
    Unnel is offline Member
    Join Date
    Nov 2010
    Location
    Johannesburg
    Posts
    23
    Rep Power
    0

    Default

    Quote Originally Posted by doWhile View Post
    Depending upon what database you are using, you could query via a regular expression. However this would be quite inefficient if your database is huge. An alternative would be to create a mapping table that defines the domain name itself, mapped to each of the entries - in which case you can query that table then join across the mapping to retrieve the unique pages.
    Hi doWhile,

    Thank you very much for your response. The only problem is that I have a huge database with tons of records, therefore there is a possibility that I may also have a relatively big number of domain names.
    I don't know how I can build a query that can retrieve all the different domain names for me. I'd really appreciate your input on that.

    Thanks...

Similar Threads

  1. [HELP] Java with URLs/HTML
    By Nerd in forum Java Applets
    Replies: 7
    Last Post: 11-20-2010, 09:21 PM
  2. Manipulating URLs
    By TheFlying_Boy in forum Networking
    Replies: 0
    Last Post: 08-03-2009, 05:01 PM
  3. Web Spider - Extract URLS
    By heveen in forum Networking
    Replies: 2
    Last Post: 07-09-2009, 01:15 PM
  4. getting URLs
    By Shiv in forum Networking
    Replies: 3
    Last Post: 04-16-2009, 05:48 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •