Results 1 to 12 of 12
  1. #1
    Unnel is offline Member
    Join Date
    Nov 2010
    Location
    Johannesburg
    Posts
    23
    Rep Power
    0

    Question Java application for processing Squid proxy logs

    Good day,

    I would like to know if it's possible to develop a Java application that will help me reading Squid proxy log files and store the data they contain into a database. Thank you for giving me a few pointers...:):):)

  2. #2
    FON
    FON is offline Senior Member
    Join Date
    Dec 2009
    Location
    Belgrade, Serbia
    Posts
    366
    Rep Power
    6

    Default

    There are so many log file analysis tools around, are you sure you want to make you own app?

    squid : Optimising Web Delivery

    Maybe you can just pick up some tool from that list,
    and find a way to run it and store result of analysis in DB without writing whole app, if it meets your demands of course...

  3. #3
    Unnel is offline Member
    Join Date
    Nov 2010
    Location
    Johannesburg
    Posts
    23
    Rep Power
    0

    Default Thanks

    Thanks for your response FON, I really appreciate.

    I knew about the link you've provided me with. Unfortunately some of the applications provided there are not free or do not perform the exact actions I want to. That's why I thought I'd develop my own application.

    I am actually just looking for some 'How to' tips if of course I can use Java to develop such an application...

    Thanks :)

  4. #4
    FON
    FON is offline Senior Member
    Join Date
    Dec 2009
    Location
    Belgrade, Serbia
    Posts
    366
    Rep Power
    6

    Default

    Ok, then you should maybe paste a piece of example log
    and explain what is it that you want to read and store to DB.

    Have you considered basic app type :
    more API like or just some simple jar that will be used from time to time manually?
    Any GUI or just some command prompt commands...?
    Size of app?
    Using any out-ox-box tokenizers, parsers...?

  5. #5
    Unnel is offline Member
    Join Date
    Nov 2010
    Location
    Johannesburg
    Posts
    23
    Rep Power
    0

    Default Thanks

    Hi FON,

    Thanks again for your input below is an example of the data contained in the Squid logs I want to analyze:

    1222952697.113 1321 146.141.28.175 TCP_MISS/200 5757 GET http://skins.gmodules.com/ig/skin_xml_to_css? STUDENTS\300160 DIRECT/209.85.129.99 text/css
    1222952699.681 420 146.141.28.175 TCP_MISS/200 486 GET http://skins.gmodules.com/ig/skin_fetch? STUDENTS\300160 DIRECT/209.85.129.147 image/gif
    1222952703.692 4909 146.141.28.175 TCP_MISS/200 63962 GET http://skins.gmodules.com/ig/skin_fetch? STUDENTS\300160 DIRECT/209.85.129.99 image/jpeg

    These are just a few lines...:):) So basically they store the time (UNIX time), the sites the person goes to etc....Basically I want to read these files and create a database that will store this data in a more intelligible format. These are just a few lines of probably 1hours of Internet browsing and I'm looking at processing Squid logs worth months of Internet browsing for quite a god number of individuals...

  6. #6
    FON
    FON is offline Senior Member
    Join Date
    Dec 2009
    Location
    Belgrade, Serbia
    Posts
    366
    Rep Power
    6

    Default

    I'm not familiar with Squid but i found this:

    "Using access_log tag you specify where to log, and the format to log in. "


    "access_log :

    These files log client request activities. Has a line every HTTP or
    ICP request. The format is:
    access_log <module>:<place> [<logformat name> [acl acl ...]]
    access_log none [acl acl ...]] "
    So do you have to write your own parser/tokenizer that can understand this format completely and then just decide what to log using some config params of yours?

    And please if you can answer all of mine questions from previous post so i can get picture of your application.

  7. #7
    Unnel is offline Member
    Join Date
    Nov 2010
    Location
    Johannesburg
    Posts
    23
    Rep Power
    0

    Default

    Hi FON,

    Yes!! You have the picture of what I basically want to do. I want to write my own parser which will extract only the information I need. Basically I'll just need the website visited, the time it was visited at, and the bandwidth (amount of data) requested. To answer your previous questions I'll say I want to develop an application that is more API-like; also, command prompts will do for me because I unfortunately don't have experience in Graphical User Interface Design...

    Thanks a lot for your feedback. I really appreciate your input...

  8. #8
    FON
    FON is offline Senior Member
    Join Date
    Dec 2009
    Location
    Belgrade, Serbia
    Posts
    366
    Rep Power
    6

    Default

    Ok then, for the beginning make sure you fully understand format of each line in squid log file.

    Almost every logging API has a some config xml file.
    Using params you can define what to log.

    There can me many params and each one can have different meaning like (this is from some apache doc):
    %a Remote IP-address
    %T The time taken to serve the request, in seconds.
    ...

    You decided to use just some of them.

    And there are rules on how each log line is created, like:
    "fields are separated by spaces, and delimited by ; "

    So when you study line format you can start by creating parser. Parser can created in many ways - maybe you can use regular expressions or maybe just simply tokenize each line based on some simple rules. Choose any approach you are familiar with and post your progress and questions if any.

    good luck!

  9. #9
    Unnel is offline Member
    Join Date
    Nov 2010
    Location
    Johannesburg
    Posts
    23
    Rep Power
    0

    Default

    Thank you so much FON. You are a star. I truly appreciate your valuable input.I'm actually doing this as part of a Research project. I'm currently busy with another aspect of it and will get to the Squid Proxy logs analysis bit at a later stage but pretty soon. I will keep you posted of my progress and in case I have further question.

    Cheers!

  10. #10
    Unnel is offline Member
    Join Date
    Nov 2010
    Location
    Johannesburg
    Posts
    23
    Rep Power
    0

    Default

    Hi FON,

    I've been experiencing difficulties to extract particular data from the lines in the text file that I'm reading. Because all the data in a line are not relevant for me I've been trying to extract the ones that are important. If you look at the example that I pasted below, you'll see that data are delimited by a space. I've been trying to read the lines and writing the elements of them that are relevant in a new file using in my parser the space command as a delimiter but unfortunately I'm not getting what I want.

    An example of what I mean with the line below is to write a loop that will extract for each line the Unix date (in the line below 122295697.113), the usage (5757) and the address (http:////skins.gmodules.com....), these are the only fields I need to write in a new file but the program that I wrote won't read them..Any hint?

    Thanks in advance,

    1222952697.113 1321 146.141.28.175 TCP_MISS/200 5757 GET http://skins.gmodules.com/ig/skin_xml_to_css? STUDENTS\300160 DIRECT/209.85.129.99 text/css

  11. #11
    vishnu22001 is offline Member
    Join Date
    Mar 2011
    Location
    Sriharikota
    Posts
    3
    Rep Power
    0

    Default

    Unnel this is vishnu...

    am also working on the same project ..



    i am developing using java and using mysql as database..
    i have writen the code for inserting values into database and i have some difficulties ...
    can any one hep me out..
    Last edited by vishnu22001; 03-04-2011 at 09:26 PM.

  12. #12
    Fubarable's Avatar
    Fubarable is offline Moderator
    Join Date
    Jun 2008
    Posts
    19,315
    Blog Entries
    1
    Rep Power
    26

Similar Threads

  1. How to test my proxy application on localhost?
    By ragnonerodocet in forum Java Servlet
    Replies: 2
    Last Post: 04-18-2011, 11:15 AM
  2. Java client - proxy connection
    By ragnonerodocet in forum Networking
    Replies: 0
    Last Post: 03-10-2010, 07:21 PM
  3. Log4j Grouping application logs
    By mhanda in forum New To Java
    Replies: 0
    Last Post: 03-09-2010, 01:19 AM
  4. Http - proxy or non-proxy ?
    By Shiv in forum Networking
    Replies: 0
    Last Post: 04-11-2009, 09:07 AM
  5. Replies: 0
    Last Post: 11-24-2008, 07:48 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •