PIPLUP: Plug It and Play on Logs: configUration-free Parser

The replication package for "Plug it and Play on Logs: A Configuration-Free Statistic-Based Log Parser".

Introduction

The overview of our tool structure is shown in the following figure. The design of PIPLUP comprises two parsing stages: online log clustering, cluster updating. After parsing all log lines, the results are stored in a CSV file during the template matching stage for further verification. PIPLUP leverages a novel tree structure without the pre-assumption of the position of constant tokens, and enhances the template extraction approach based on template similarity and describability. Further, it uses a set of data-insensitive parameters, which enables users to directly "plug and play" on their log files without excessive configuration needs.

Online Log Clustering: Inspired by Drain, PIPLUP leverages a similar tree structure as a hashing function to find the most compatible leaf for an incoming log message and conduct further comparisons. Instead of hashing with $n$ prefixes, PIPLUP relaxes the assumption by using a tree with two fixed levels: Constant Token Level and Length Level. During the log clustering stage, PIPLUP searches for the most compatible cluster for the incoming message using the key on the two dictionary levels and log similarity; if no cluster is found, a new cluster will be created.

Cluster Updating: The matched clusters are dynamically updated to allow online parsing. First, an in-cluster update will be triggered to append to the log line number the cluster's log line list and update the cluster's message example, path list, and template list. If the template list is updated, an inter-cluster template merging process among clusters under the same constant token node will be triggered to reduce redundancy in event templates.

Template Matching: PIPLUP may create multiple templates for a log cluster. If a cluster contains only one template, then all log messages are matched with this event directly. Conversely, if multiple templates are inferred, we assign the template to in-cluster log messages using regex matching. The matched results are stored in a CSV file for further analysis.

Dependencies

python>=3.8
chardet==5.1.0
ipython==8.12.0
matplotlib==3.7.2
natsort==8.4.0
numpy==1.24.4
pandas==2.0.3
regex==2022.3.2
scipy
tqdm==4.65.0
rpy2
spacy

Dataset

For experiments, we leveraged the log data from Loghub 2.0. The original data can be obtained from the Loghub 2.0 repository. Before replicating the experiments, please obtain Loghub 2.0 from its original repository. Before starting the experiment, we corrected the event templates with the latest rules provided by LOGPAI to ensure the quality of the ground truths. The ground truth templates can be automatically corrected with template_correction.py.

Experiments

To replicate the results in RQ1, switch folder to benchmark and run the command ./run_rq1_br_thresh.sh and ./run_rq1_sim_thresh.sh; to replicate the overall evaluation of RQ2 and RQ3, run the command ./run_all_full.sh. To conduct the Scott-Knott effect size difference (ESD) analysis, please first follow the tutorial ScottKnottESD to install the package, then run the code ./sk_analysis.py.

RQ1: How do different parameters impact PIPLUP?

Two parameters are required in PIPLUP's parsing process, namely $\theta_{sim}$ and $\theta_{br}$. $\theta_{sim}$ is a message-specific parameter used for similarity-based cluster search; $\theta_{br}$ is another globally-insensitive parameter used as a branching threshold for preserving different templates in a cluster. When we vary one parameter, we fix the other parameter at its default value. Our default setting for PIPLUP are: $\theta_{sim}=dynamic$ and $\theta_{br}=2$. We evaluated the impact of different settings of these parameters through a set of experiments, and the results are summarized in the following table:

Evaluating the impact of $\theta_{sim}$: We evaluated PIPLUP's performance using multiple static $\theta_{sim}$ values (i.e., 0.4, 0.5, 0.6, and 0.7) and our message-specific, dynamic $\theta_{sim}$ to understand the impact of dynamic similarity thresholds. Our message-specific (i.e., dynamic) set $\theta_{sim}$ achieved a near-optimal performance compared to the best-performing fixed threshold, with optimal values in 3 datasets, and an average performance gap with maximum values around 1% on all datasets. The default $\theta_{sim}$ values are listed in the following table:

Evaluating the impact of $\theta_{br}$: According to our analysis on similarity-based template clusters, the candidates for $\theta_{br}$ are 2, 3, 4, 5, and 6. Our default setting (i.e., $\theta_{br}=2$) obtained the best performance on average. Further, it also achieved the maximum values on 5 datasets and an average performance gap with the maximum value around 1%, in terms of all four metrics. The comparison result further revealed our branching strategy's stability and applicability.

Detailed results can be found under ./results/RQ1/. We inherit the parameter settings from RQ1 and use them to parse all 14 datasets in RQ2 and RQ3.

RQ2: How does PIPLUP compare to state-of-the-art parsers in terms of parsing effectiveness?

PIPLUP is compared with 7 state-of-the-art log parsers, including Drain, XDrain, Preprocessed-Drain, LILAC, LibreLog, LogBatcher, and LUNAR. Due to resource limitations, we did not replicate LibreLog. Therefore, the parsing results and time consumption of this log parser are obtained from its original study, and the evaluations are re-run on the corrected ground truths. We also conducted the experiment for PILAR, which is another data-insensitive log parser. The replication code of PILAR can be found in the PILAR_implementation folder. The following table shows the parsing effectiveness of PIPLUP, along with the seven benchmarks.

According to the table, PIPLUP obtains significantly better average performance than state-of-the-art statistic-based parsers on all four metrics. Moreover, even with LLM-powered semantic-based parsers included, the simple PIPLUP approach is still statistically optimal or near-optimal in terms of all four metrics. Even with LLM-powered semantic-based parsers included, the simple PIPLUP approach is still competitive in terms of all four metrics.

RQ3: How does PIPLUP compare to state-of-the-art parsers in terms of parsing efficiency?

The datasets are sorted from the smallest (i.e., with the least number of lines) to the largest (i.e., with the most number of lines), and their time consumptions are documented in the following table. As shown in the following table, all parsers show anomalous high time consumption on the Thunderbird dataset. Therefore, we provided two versions of statistical time rankings (i.e., all files and files excluding Thunderbird) to avoid obtaining misleading conclusions.

PIPLUP requires more processing time than Drain, but less time than XDrain and Preprocessed-Drain on average. It also has a much lower time consumption than the semantic-based parsers. According to the Scott-Knott ESD ranking, PIPLUP is the second most efficient. It requires only $\sim$1.5 seconds to $\sim$25 minutes to parse each of the studied datasets. Its time efficiency is statistically comparable to state-of-the-art statistic-based parsers and much better than semantic-based ones that rely on expensive computing resources (e.g., only using ~6% of LUNAR's parsing time).

Detailed results for RQ2 and RQ3 can be found under ./results/RQ2&RQ3/.

Folder Structure

├── 2k_dataset # Loghub-2k
├── PILAR_implementation # PILAR evaluation with Loghub 2.0 evaluation functions
├── benchmark
    ├── evaluation # Configurations for the parsers
    ├── logparser # Main code for parsers
        ├── Drain
        ├── PIPLUP
        ├── Preprocessed_Drain
        ├── utils
        ├── XDrain
        └── __init__.py
    ├── old_benchmark # Default settings for the Drain series
    ├── run_all_full.sh # Script for running PIPLUP
    ├── run_rq1_br_thresh.sh
    ├── run_rq1_sim_thresh.sh
    └── README.md
├── figures 
├── result # Results for all parsers in CSV format (including PILAR)
    ├── RQ1
    └── RQ2&RQ3
├── sk_analysis.py # Code for Scott-Knott ESD analysis
├── template_correction.py # Code for ground truth template correction
└── README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PIPLUP: Plug It and Play on Logs: configUration-free Parser

The replication package for "Plug it and Play on Logs: A Configuration-Free Statistic-Based Log Parser".

Introduction

Dependencies

Dataset

Experiments

RQ1: How do different parameters impact PIPLUP?

RQ2: How does PIPLUP compare to state-of-the-art parsers in terms of parsing effectiveness?

RQ3: How does PIPLUP compare to state-of-the-art parsers in terms of parsing efficiency?

Folder Structure

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
2k_dataset		2k_dataset
PILAR_implementation		PILAR_implementation
benchmark		benchmark
figures		figures
result		result
README.md		README.md
sk_analysis.py		sk_analysis.py
template_correction.py		template_correction.py

mooselab/PIPLUP-A-Configuration-Free-Statistic-Based-Log-Parser

Folders and files

Latest commit

History

Repository files navigation

PIPLUP: Plug It and Play on Logs: configUration-free Parser

The replication package for "Plug it and Play on Logs: A Configuration-Free Statistic-Based Log Parser".

Introduction

Dependencies

Dataset

Experiments

RQ1: How do different parameters impact PIPLUP?

RQ2: How does PIPLUP compare to state-of-the-art parsers in terms of parsing effectiveness?

RQ3: How does PIPLUP compare to state-of-the-art parsers in terms of parsing efficiency?

Folder Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages