Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NGinx access log convertion to CSV format #6

Open
fititnt opened this issue Jan 24, 2021 · 2 comments
Open

NGinx access log convertion to CSV format #6

fititnt opened this issue Jan 24, 2021 · 2 comments

Comments

@fititnt
Copy link
Member

fititnt commented Jan 24, 2021

After data from the database itself (and not considering external sources like Google Analytics and Google Search Console), one way to extract information from access logs from NGinx server (and later Apache server), seems a common need.

Both for Apache and Nginx access log files, the common data mining programs do not have some native importer. One quick and dirty way to do it would be open with LibreOffice using space as file separator, and ignoring the datetime inside [] that it breaks in two coluns, actually works somewhat OK. BUT LibreCalc, like Excel, have limitation of 1 million of lines, and sometimes this is not hard for a busy site, in special if each page access (like images, css and JS) that a single page can have more than 100.

On a quick look, I did not found simply ways to just do a quick conversion. Some tools like the fantastic goaccess (but also there is nginxtop, and other tools like this) are able to parse NGinx/Apache files, but they export feature is for already agregated result (in other words, these tools themselves do all the calculation, they don't allow simply convert Apache and NGinx access file to something to work on other tools.

fititnt added a commit that referenced this issue Jan 24, 2021
fititnt added a commit that referenced this issue Jan 24, 2021
@fititnt
Copy link
Member Author

fititnt commented Jan 24, 2021

Some people, when confronted with a problem, think
“I know, I'll use regular expressions.” Now they have two problems.
http://regex.info/blog/2006-09-15/247

Oh boy

@fititnt
Copy link
Member Author

fititnt commented Jan 24, 2021

Maybe I will eventually move the nginxlogs2csv (and later likely an apachelogs2csv) to an dedicated GitHub repository.

But for now, I understand why is hard to find more than a bunch of regexes and small scripts to parse NGinx and Apache logs: there is a not of difference between implementations. I think that I will even, do more than one strategy of parsing, like one to literally fallback to just split the IP and the date, as simply recommend to the people change the script itself would require then not only know some python, but know python and Regex.

And just some places that have regex ONLY for IPv4 and IPV6 (one of the features of each line) already is bigger than the initial regex easier to find when looking for this subject (that, by the way, failed on my test case, ::1).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant