Skip to content
This repository has been archived by the owner on Aug 8, 2024. It is now read-only.

amandasaurus/apache-log-parser

Repository files navigation

Parses log lines from an apache log file in (almost) any format possible

Build Status

Installation

pip install apache-log-parser

Usage

import apache_log_parser
line_parser = apache_log_parser.make_parser("%v %h %l %u %t \"%r\" %>s %b")

This creates & returns a function, line_parser, which accepts a line from an apache log file in that format, and will return the parsed values in a dictionary.

Example

>>> import apache_log_parser
>>> line_parser = apache_log_parser.make_parser("%h <<%P>> %t %Dus \"%r\" %>s %b  \"%{Referer}i\" \"%{User-Agent}i\" %l %u")
>>> log_line_data = line_parser('127.0.0.1 <<6113>> [16/Aug/2013:15:45:34 +0000] 1966093us "GET / HTTP/1.1" 200 3478  "https://example.com/" "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.18)" - -')
>>> from pprint import pprint
>>> pprint(log_line_data)
{'pid': '6113',
 'remote_host': '127.0.0.1',
 'remote_logname': '-',
 'remote_user': '',
 'request_first_line': 'GET / HTTP/1.1',
 'request_header_referer': 'https://example.com/',
 'request_header_user_agent': 'Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.18)',
 'request_header_user_agent__browser__family': 'Other',
 'request_header_user_agent__browser__version_string': '',
 'request_header_user_agent__is_mobile': False,
 'request_header_user_agent__os__family': 'Linux',
 'request_header_user_agent__os__version_string': '',
 'request_http_ver': '1.1',
 'request_method': 'GET',
 'request_url': '/',
 'response_bytes_clf': '3478',
 'status': '200',
 'time_received': '[16/Aug/2013:15:45:34 +0000]',
 'time_received_datetimeobj': datetime.datetime(2013, 8, 16, 15, 45, 34),
 'time_received_isoformat': '2013-08-16T15:45:34',
 'time_received_tz_datetimeobj': datetime.datetime(2013, 8, 16, 15, 45, 34, tzinfo='0000'),
 'time_received_tz_isoformat': '2013-08-16T15:45:34+00:00',
 'time_received_utc_datetimeobj': datetime.datetime(2013, 8, 16, 15, 45, 34, tzinfo='0000'),
 'time_received_utc_isoformat': '2013-08-16T15:45:34+00:00',
 'time_us': '1966093'}

There is a at least one key/value in the returned dictionary for each apache log placeholder. Some have more than one (e.g. all the time_received*).

The version numbers follow Semantic Versioning.

Supported values

    '%a'  #	Remote IP-address
    '%A'  #	Local IP-address
    '%B'  #	Size of response in bytes, excluding HTTP headers.
    '%b'  #	Size of response in bytes, excluding HTTP headers. In CLF format, i.e. a '-' rather than a 0 when no bytes are sent.
    '%D'  #	The time taken to serve the request, in microseconds.
    '%f'  #	Filename
    '%h'  #	Remote host
    '%H'  #	The request protocol
    '%k'  #	Number of keepalive requests handled on this connection. Interesting if KeepAlive is being used, so that, for example, a '1' means the first keepalive request after the initial one, '2' the second, etc...; otherwise this is always 0 (indicating the initial request). Available in versions 2.2.11 and later.
    '%l'  #	Remote logname (from identd, if supplied). This will return a dash unless mod_ident is present and IdentityCheck is set On.
    '%m'  #	The request method
    '%p'  #	The canonical port of the server serving the request
    '%P'  #	The process ID of the child that serviced the request.
    '%q'  #	The query string (prepended with a ? if a query string exists, otherwise an empty string)
    '%r'  #	First line of request
    '%R'  #	The handler generating the response (if any).
    '%s'  #	Status. For requests that got internally redirected, this is the status of the *original* request --- %>s for the last.
    '%t'  #	Time the request was received (standard english format)
    '%T'  #	The time taken to serve the request, in seconds.
    '%u'  #	Remote user (from auth; may be bogus if return status (%s) is 401)
    '%U'  #	The URL path requested, not including any query string.
    '%v'  #	The canonical ServerName of the server serving the request.
    '%V'  #	The server name according to the UseCanonicalName setting.
    '%X'  #	Connection status when response is completed:
              # X =	connection aborted before the response completed.
              # + =	connection may be kept alive after the response is sent.
              # - =	connection will be closed after the response is sent.
              # (This directive was %c in late versions of Apache 1.3, but this conflicted with the historical ssl %{var}c syntax.)
    '%I'  #	Bytes received, including request and headers, cannot be zero. You need to enable mod_logio to use this.
    '%O'  #	Bytes sent, including headers, cannot be zero. You need to enable mod_logio to use this.
    
    '%\{User-Agent\}i'  # Special case of below, for matching just user agent
    '%\{[^\}]+?\}i'  #	The contents of Foobar: header line(s) in the request sent to the server. Changes made by other modules (e.g. mod_headers) affect this. If you're interested in what the request header was prior to when most modules would have modified it, use mod_setenvif to copy the header into an internal environment variable and log that value with the %\{VARNAME}e described above.
    
    '%\{[^\}]+?\}C'  #	The contents of cookie Foobar in the request sent to the server. Only version 0 cookies are fully supported.
    '%\{[^\}]+?\}e'  #	The contents of the environment variable FOOBAR
    '%\{[^\}]+?\}n'  #	The contents of note Foobar from another module.
    '%\{[^\}]+?\}o'  #	The contents of Foobar: header line(s) in the reply.
    '%\{[^\}]+?\}p'  #	The canonical port of the server serving the request or the server's actual port or the client's actual port. Valid formats are canonical, local, or remote.
    '%\{[^\}]+?\}P'  #	The process ID or thread id of the child that serviced the request. Valid formats are pid, tid, and hextid. hextid requires APR 1.2.0 or higher.
    '%\{[^\}]+?\}t'  #	The time, in the form given by format, which should be in strftime(3) format. (potentially localized)
    '%\{[^\}]+?\}x'  # Extension value, e.g. mod_ssl protocol and cipher

Copyright

This package is © 2013-2015 Rory McCann, released under the terms of the GNU GPL v3 (or at your option a later version). If you'd like a different licence, please email [email protected]

Bitdeli Badge