MEGR-APT is a scalable APT hunting system to discover suspicious subgraphs matching an attack scenario (query graph) published in Cyber Threat Intelligence (CTI) reports. MEGR-APT hunts APTs in a twofold process: (i) memory-efficient suspicious subgraphs extraction, and (ii) fast subgraph matching based on graph neural network (GNN) and attack representation learning.
The input to the system are kernel audit logs in a structured database, Postgres, and attack query graphs in Json format. The system consist of multiple python scripts and other bash script to command them in an interactive way.
/src
directory holds all python scripts./bash_src
directory holds all bash scripts./technical_reports
directory contains a separate documentation file to explain scripts./logs
directory is the default location for all generated system logs/model
directory is the default location for all GNN trained models./dataset
directory is the default location for query graphs, IOC files, experiments checkpoints and results and detected subgraphs.Investigation_Reports.ipynb
: A notebook with scripts to generate investigation reportsfor detected subgraphs. the notebook includes a demo scenario for two query graphs from DARPA TC3 CADETS host.
To setup the environment install requirements.txt
then torch_requirements.txt
. We prepared an example bash script for setting up the environment setup_environment.sh
, Please recheck before using it.
The first step in MEGR-APT is to construct provenance graphs in the RDF graph engine.
- Use
construct_pg_cadets.py
to query kernel audit logs from a structured database, Postgres, and construct a provenance graph in NetworkX format. - Use
construct_rdf_graph_cadets.py
to construct RDF-Based provenance graphs and store them in RDF graph engine, Stardog.
MEGR-APT hunting pipeline consist of 2 steps as follows:
- Use
extract_rdf_subgraphs_cadets.py
to extract suspicious subgraphs based on given attack query graphs' IOCs. - Run
main.py
to find matches between suspicious subgraphs and attack query graphs using pre-trained GNN models (Has to run the script with the same parameters as the trained model, check the GNN matching documentation for more details).
The full hunting pipeline could be run using run-megrapt-on-a-query-graph.sh
bash script to finds search for a specific query graph in a provenance graph.
For evaluation, run-megrapt-per-host-for-evaluation.sh
could be used.
Use the Investigation_Reports.ipynb
jupyter notebook to investigate detected subgraphs and produce a report to human analyst.
To train a GNN graph matching model for MEGR-APT, you need to configure training/testing details in get_training_testing_sets() function in dataset_config.py
file. Then take the following training steps:
- Use
extract_rdf_subgraphs_[dataset].py
with--training
argument, to extract a training/testing set of random benign subgraphs. - Use
compute_ged_for_training.py
to compute GED for the training set ( This step run is computationally expensive, takes long time, however it runs in parallel using multiple cores.). - Run
main.py
with the selected model training parameters as arguments ( See the GNN matching documentation for more details). The training pipeline could be run usingtrain_megrapt_model.sh
bash script.