BAnDIT: Business Process Anomaly Detection in Transactions

Description

Abstract

Business process anomaly detection enables the prevention of misuse and failures. Existing approaches focus on detecting anomalies in control, temporal, and resource behavior of individual instances, neglecting the data flow, collaborations, and choreographies involving multiple instances. Consequently, anomaly detection capabilities are limited, as culprits can strategically split their actions across multiple instances to evade detection. This study presents a novel neural network-based approach to detect anomalies in distributed business processes. Unlike existing methods, our solution considers message data exchanged during process transactions. Allowing the generation of detection profiles incorporating the relationship between multiple instances, related services, and exchanged data to detect point and contextual anomalies during process runtime, thus reducing the likelihood of anomalies going unnoticed. To validate the proposed solution, it is demonstrated on a publicly available prototype implementation of a distributed system as well as real-life and artificial execution logs with injected artifical anomalies.

How to run

Basics and Dependency Installation:

# clone project
git clone https://github.com/nico-ru/BAnDIT
cd BAnDIT

# [OPTIONAL] create conda environment
conda create -n <env_name> python=3.10
conda activate <env_name>

# install pytorch according to instructions
# https://pytorch.org/get-started/
# install requirements
pip install -r requirements.txt

Train model with default configuration (some parameters will be required)

# train on CPU
python src/train.py trainer=cpu
# train on GPU
python src/train.py trainer=gpu

Train model with chosen experiment configuration from config/experiment/

python src/train.py experiment=<experiment>

You can override any configuration parameter from command line like this

python src/train.py experiment=<experiment> trainer.max_epochs=20

Run example experiment:

Extract example data set

tar -xzvf data/example_dataset.tar.gz -C data/

Run training for point anomaly detection

python src/train.py experiment=point

Run training for context anomaly detection

python src/train.py experiment=context

Run evaluation
The training script will prompt yout with the checkpoint path of the best model weights for the training.
You can run the evaluation on the entire data set with passing this checkpoint path as follows:

python src/eval.py experiment=point ckpt_path=<last/checkpoint/path>
# or
python src/eval.py experiment=sequence ckpt_path=<last/checkpoint/path>

You can view training metrics by starting a tensorboard server and passing the training run:

tensorboard logdir=</path/to/train/run>

View evaluation results by running the jupyter notebook notebooks/analyze_evaluation.ipynb

The example experiment relies on an exemplary data set previously generated by the process simulation also provided in this repository.
This simulation environment can be viewed as a separate project included into this repositry and lives under simulation.
The discription of the simulation environment can be found in the following or simulation/README.md.

Process Simulation

Description

The simulated system aims to mimic a webshop system consisting of individual services. Each service is responsible for a specific domain in the ordering process.
Purpose is to generate a log of the underlying workflow of the system. This log containing the data transferred in the communication between services. Following figure illustrates the implemented system:

Given this system, an order results in the following process choreography:

Quickstart

setup the project

# switch to simulation project
cd simulation

# [OPTIONAL] create conda environment
conda create -n <env_name> python=3.10
conda activate <env_name>

# [OPTIONAL] install graphviz executables
# https://graphviz.org/download/

# install requirements
pip install -r requirements.txt

# setup and install package
python setup.py install
pip install .

# setup .env file according to env.example
cp env.example .env
sed -i "s|<base_dir>|$(pwd)|" .env   # note: if there is a | character in your path change the delimiter for the sed command

# or edit with your preferred editor
vim .env # --> edit path variable

run the simulation

# start the services
/bin/bash scripts/start_services.sh

# in case you ran the simulation make sure to clean the logs directories
/bin/bash scripts/clear_logs.sh

# start simulation
python run_simulation.py --n_requests 100

# wait until all processes have stopped
# server logs can be observed with
tail -f logs/server/*.log

Logs

The logs of the system are saved individually for each of the services. This includes the process event log in a csv format as well as the corresponding messages transferred between services. In order to merge the logs of the process choreography use:

python merge_logs.py

Service Compounds

In case only the communication bewteen a certain subset of services (i.e. service compound) needs to be analyzed, the logs of only these can be merged as follows:

# merge only order and inventory logs
python merge_logs.py --services order inventory

# results will be in logs/compound/

Export with messages

python export_logs.py

# export for only order and inventory services
python merge_logs.py --services order inventory

By default, results will be copied to the /data directory of the main Project. This can be changed by altering the RESULT_DIR environment vaiable in the /simulation/.env file.
You can specify the name of the exported data set by passing it as an argument as follows:

python export_logs.py --name <dataset_name>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BAnDIT: Business Process Anomaly Detection in Transactions

Description

Abstract

How to run

Basics and Dependency Installation:

Run example experiment:

Process Simulation

Description

Quickstart

setup the project

run the simulation

Logs

Service Compounds

Export with messages

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
config		config
data		data
logs		logs
notebooks		notebooks
simulation		simulation
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

License

nico-ru/BAnDIT

Folders and files

Latest commit

History

Repository files navigation

BAnDIT: Business Process Anomaly Detection in Transactions

Description

Abstract

How to run

Basics and Dependency Installation:

Run example experiment:

Process Simulation

Description

Quickstart

setup the project

run the simulation

Logs

Service Compounds

Export with messages

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages