Skip to content
/ BAnDIT Public

Repository supporting CoopIs submission "BAnDIT: Business Process Anomaly Detection in Transactions"

License

Notifications You must be signed in to change notification settings

nico-ru/BAnDIT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BAnDIT: Business Process Anomaly Detection in Transactions

python PyTorch Lightning Config: Hydra Template

Description

Abstract


Business process anomaly detection enables the prevention of misuse and failures. Existing approaches focus on detecting anomalies in control, temporal, and resource behavior of individual instances, neglecting the data flow, collaborations, and choreographies involving multiple instances. Consequently, anomaly detection capabilities are limited, as culprits can strategically split their actions across multiple instances to evade detection. This study presents a novel neural network-based approach to detect anomalies in distributed business processes. Unlike existing methods, our solution considers message data exchanged during process transactions. Allowing the generation of detection profiles incorporating the relationship between multiple instances, related services, and exchanged data to detect point and contextual anomalies during process runtime, thus reducing the likelihood of anomalies going unnoticed. To validate the proposed solution, it is demonstrated on a publicly available prototype implementation of a distributed system as well as real-life and artificial execution logs with injected artifical anomalies.

How to run

Basics and Dependency Installation:

# clone project
git clone https://github.com/nico-ru/BAnDIT
cd BAnDIT

# [OPTIONAL] create conda environment
conda create -n <env_name> python=3.10
conda activate <env_name>

# install pytorch according to instructions
# https://pytorch.org/get-started/
# install requirements
pip install -r requirements.txt

Train model with default configuration (some parameters will be required)

# train on CPU
python src/train.py trainer=cpu
# train on GPU
python src/train.py trainer=gpu

Train model with chosen experiment configuration from config/experiment/

python src/train.py experiment=<experiment>

You can override any configuration parameter from command line like this

python src/train.py experiment=<experiment> trainer.max_epochs=20 

Run example experiment:

Extract example data set

tar -xzvf data/example_dataset.tar.gz -C data/

Run training for point anomaly detection

python src/train.py experiment=point

Run training for context anomaly detection

python src/train.py experiment=context

Run evaluation
The training script will prompt yout with the checkpoint path of the best model weights for the training.
You can run the evaluation on the entire data set with passing this checkpoint path as follows:

python src/eval.py experiment=point ckpt_path=<last/checkpoint/path>
# or
python src/eval.py experiment=sequence ckpt_path=<last/checkpoint/path>

You can view training metrics by starting a tensorboard server and passing the training run:

tensorboard logdir=</path/to/train/run>

View evaluation results by running the jupyter notebook notebooks/analyze_evaluation.ipynb


The example experiment relies on an exemplary data set previously generated by the process simulation also provided in this repository.
This simulation environment can be viewed as a separate project included into this repositry and lives under simulation.
The discription of the simulation environment can be found in the following or simulation/README.md.

Process Simulation

Description

The simulated system aims to mimic a webshop system consisting of individual services. Each service is responsible for a specific domain in the ordering process.
Purpose is to generate a log of the underlying workflow of the system. This log containing the data transferred in the communication between services. Following figure illustrates the implemented system:

Illustration of the implemented webshop micorservice system

Given this system, an order results in the following process choreography: Illustration of

Quickstart

setup the project

# switch to simulation project
cd simulation

# [OPTIONAL] create conda environment
conda create -n <env_name> python=3.10
conda activate <env_name>

# [OPTIONAL] install graphviz executables
# https://graphviz.org/download/

# install requirements
pip install -r requirements.txt

# setup and install package
python setup.py install
pip install .

# setup .env file according to env.example
cp env.example .env
sed -i "s|<base_dir>|$(pwd)|" .env   # note: if there is a | character in your path change the delimiter for the sed command

# or edit with your preferred editor
vim .env # --> edit path variable

run the simulation

# start the services
/bin/bash scripts/start_services.sh

# in case you ran the simulation make sure to clean the logs directories
/bin/bash scripts/clear_logs.sh

# start simulation
python run_simulation.py --n_requests 100

# wait until all processes have stopped
# server logs can be observed with
tail -f logs/server/*.log

Logs

The logs of the system are saved individually for each of the services. This includes the process event log in a csv format as well as the corresponding messages transferred between services. In order to merge the logs of the process choreography use:

python merge_logs.py

Service Compounds

In case only the communication bewteen a certain subset of services (i.e. service compound) needs to be analyzed, the logs of only these can be merged as follows:

# merge only order and inventory logs
python merge_logs.py --services order inventory

# results will be in logs/compound/

Export with messages

python export_logs.py

# export for only order and inventory services
python merge_logs.py --services order inventory

By default, results will be copied to the /data directory of the main Project. This can be changed by altering the RESULT_DIR environment vaiable in the /simulation/.env file.
You can specify the name of the exported data set by passing it as an argument as follows:

python export_logs.py --name <dataset_name>

About

Repository supporting CoopIs submission "BAnDIT: Business Process Anomaly Detection in Transactions"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published