Advanced Persistent Threats (APTs) are among the most sophisticated threats facing critical organizations worldwide. APTs employ specific tactics, techniques, and procedures (TTPs) which make them difficult to detect in comparison to frequent and aggressive attacks. In fact, current network intrusion detection systems struggle to detect APTs communications, allowing such threats to persist unnoticed on victims’ machines for months or even years. In this paper, we present EarlyCrow, an approach to detect APT malware Command and Control connections using contextual summaries. The design of EarlyCrow is informed by a novel threat model focused on TTPs present in traffic generated by tools recently used as part of APT campaigns. The threat model highlights the importance of the context around the malicious connections, and suggests traffic attributes which help APT detection. EarlyCrow defines a novel multipurpose network flow format called PairFlow, which is leveraged to build the contextual summary of a PCAP capture, representing key behavioral, statistical and protocol information relevant to APT TTPs. We evaluate the effectiveness of EarlyCrow on unseen APTs obtaining a headline macro average F1-score of 93.02% with FPR of 0.74%.
If you decide to use our code or data, please cite our paper.
@inproceedings{alageel2022earlycrow,
title={EarlyCrow: Detecting APT Malware Command and Control
Over HTTP(S) Using Contextual Summaries},
author={Alageel, Almuthanna and Maffeis, Sergio},
booktitle={International Conference on Information Security},
pages={},
year={2022},
organization={Springer}
}
-
Clone the repo
git clone https://github.com/ICL-ml4csec/EarlyCrowAPT.git
-
Prerequisites
1- Before running any files, please make sure to install prerequisites packages
pip install -r requirements.txt
2- It is required that PYTHONPATH be set up so that all EarlyCrow modules are available for use during the below commands
export PYTHONPATH=/home/[PC_NAME]/EarlyCrowAPT/
EarlyCrow is composed of four main processes, as depicted in Figure 3. First, it starts with buffering and dispatching using PairFlow, which summarizes a PCAP into contextually relevant fields including packet behavior, domain and URL list, User-Agent, status code, and content type for HTTP. After the PairFlow HTTP variant is generated, these flows are preprocessed for profile pivoting to generate three profiles: Host, Destination, and URL. Then, two types of feature extraction follow (PairFlow and profile features) to form a ContextualSummary which is then classified by a random forest. When another PairFlow is received, it will follow the same workflow. A further step is required when the new PairFlow matches one of the previous ContextualSummary ID in the repository. The ContextualSummary updating process is responsible for updating the matched ContextualSummary to maintain the contextualization and reclassify again.
If you are interested in running the experiments directly, you can skip this section. We have already attached the training and test set in our repository to be used for the Experiment section, so you do not need to download data from our Google Drive storage. Google Drive storage includes all datasets for each step. However, this section is to regenerate (replicate) datasets based on our PCAP files or anyone interested in using EarlyCrow.
You need first to use our proposed data format described in our paper which is called PairFlow . Please click here and follow the instructions.
After the PairFlow is prepared and stored in data/pairflows/
, the next step is to generate ContextualSummaries. Please run the following command, which includes pivoting, PairFlow and Profile feature generation, and Continual Updating process.
python EarlyCrow/dataflow.py
Now the data is ready to be used for the experiments. You may run the following command to evaluate EarlyCrow on Known malware (The first experiment on the paper).
python experiments/classification_performance_known_malware.py
Note: make sure to update the PATH and FILENAME in the file classification_performance_known_malware.py
to match your own generated data.
Each experiment can be run without performing any previous steps. You can follow the instruction here and generate the results for our work.
We provide several measurements taken on the training set summarized in Table 1 and described in Section 4. Since our scope is to detect APTs at the early stage, all measurements are observed during the first 15 minutes of each connection.
python experiments/measurements_study.py
Classifiers are evaluated in two modes. First, HTTP-Mode, which assumes the administrator connects the NIDS to a web proxy to decrypt HTTPS and accesses features such as UA, HTTP response codes, content type, and URL. Second, HTTPS-Mode, where the administrator places the NIDS at the network edge, without deciphering HTTPS.
We randomly split the training and testing sets ten times. Then, we take the average performance under two constraints. First, the malware should be presented in both sets. Second, the infected hosts and the destination C&C server should be unique and not leaked from training to testing.
python experiments/classification_performance_known_malware.py
We train our classifiers on the training set used for our measurement study. Then we evaluate the performance against unseen malware.
python experiments/classification_performance_unseen_malware.py
The detection APTs necessitates a spread of features, as presented in Section 2. In Figure 4, we show the extent to which additional features affect the performance of the various classifiers.
python experiments/unseen_malware_features_diversity.py
We investigate feature importance on the third dataset in HTTPS mode, which comprises APTs, botnets, and legitimate samples because it is the one closest to a realistic scenario for APT hunting.
python experiments/unseen_malware_top_features_distrbution.py