This repository provides an open-source toolkit for the LogIE framework which is part of LogSummary as proposed by Meng et al. in "Summarizing Unstructured Logs in Online Services."
LogIE accurately and efficiently extracts information from logs in online services in the form of OpenIE triples by combining template matching with information extraction approaches.
Requirements are listed in requirements.txt
. To install these, run:
pip install -r requirements.txt
For most methods, it requires to have Java installed additionally to Python as it runs third party tools.
Install the Stanford CoreNLP from here. Then update the openie.ini
config file within the openie
package.
Install Ollie using their "Local Machine" installation process you can find here. Then update the openie.ini
config file within the openie
package.
Install OpenIE5 using their pre-compiled stand-alone JAR you can find here.
Before using OpenIE5 you will need to run it as a server using a command similar to this one: java -Xmx10g -XX:+UseConcMarkSweepGC -jar openie-assembly-5.0-SNAPSHOT.jar --httpPort 8000
executing the downloaded stand-alone JAR. This is explained here in their repo.
Additionally, it requires a python wrapper you can find here which is already installed when you install the requirements.txt
.
Proceed to install PredPatt as they explain it in their repo here.
Download ClauseIE from their source here. Update the openie.ini
file using its directory for the jar location.
Clone this repo where PropS has been upgraded to Python 3.8. Then you will need to update the config file at openie.ini
and specify its package directory.
After installing the OpenIE methods above, make sure to update the openie.ini
configuration file located inside the openie
package according to your installation. It provides part of the settings for running the OpenIE methods that depend on Java such as StanfordNLP or external Python packages such as PropS.
{
"<id1>": [
"< online template >",
"< ground truth template >",
[
[
"arg1",
"predicate",
"arg2"
],
[< more triples >],
[< more triples >]
]
],
"<id2>":[...]
}
Although logs have more freedom in their format as their preprocessing details are specific to each log type, the current expected format of most logs is the following.
<log_idx1>\t<log_message1>
<log_idx2>\t<log_message2>
...
Only the original switch logs format is expected to not have and index and be simply the log message.
<log_message1>
<log_message2>
...
After the installation, run the following command in the home directory where this project is located.
python -m LogIE.run --templates "<Templates File>" --evaluation lexical --rules new --openie predpatt
Runs information extraction from logs.
arguments:
-h, --help show this help message and exit
--templates templates
input raw templates file path (default: None)
--raw_logs raw_logs input raw raw_logs file path (default: None)
--base_dir base_dir base output directory for output files (default: ['<Project Folder>\\output'])
--log_type log_type Input type of templates. (default: ['original'])
--rules rules Predefined rules to extract triples from templates. (default: None)
--evaluation evaluation [evaluation ...]
Triples extraction evaluation metrics. (default: [])
--openie openie OpenIE approach to be used for triple extraction. (default: ['stanford'])
--id id Experiment id. Automatically generated if not specified. (default: None)
--tag Tag variables in the output triples (i.e. [([variable])] ). (default: False)
--save_output Save the output of logs or templates triples. (default: False)
--force Force overwriting previous output with same id. (default: False
This command only generates output from LogIE without evaluation using the provided online templates in their corresponding field of the json format specified above. The ground truth will be disregarded to generate this output.
python -m LogIE.run --templates "<Templates File Path>" --"<Raw Logs File Path>" --rules new --openie <OpenIE approach> --save_output --tag
Please note that --tag
will tag the variables in the output triples (i.e. [([variable])] ).