Relation-Extraction-and-Knowledge-Graph-Generation-on-MISP-Event-Reports

Master's thesis on extracting Relations from CTI Texts

Work on Repo in Progress

Abstract

The rapid growth of cyber-attacks requires organizations to be aware of procedures currently employed by malicious actors. This awareness can be gained through the sharing of cyber threat intelligence (CTI). As manually examining all CTI information is a time-intensive task for security analysts, this work investigates the possibilities of extracting relevant relations present within CTI-texts by using deep learning models to generate knowledge graphs for easier understanding. Because no fitting dataset for relation extraction in the CTI domain was publicly available, a dataset was created as part of this work. Using this dataset as well as the New-York-Times relation extraction dataset we fine-tuned Googles T5 language model to extract relations from CTI-texts. This combination of model, data, and framing relation extraction as a sequence-to-sequence task proved to be most effective for extracting relations from CTI-texts. To improve the model’s performance on the highly nuanced texts present in CTI reports, a pre-processing pipeline, which replaces special words before passing them to the deep learning model, was developed. Results show that the use of the pre-processing pipeline was highly effective, increasing the model’s performance by 27%. Using the system, sensible relations can be extracted from reports. However, due to limitations in the dataset, the developed model cannot achieve human-level performance at generating knowledge graphs from event reports. Contributions of this work to the field of natural language processing in the CTI-domain include: a publicly available dataset for training named entity recognition and relation extraction models, a model which can extract relations from CTI-texts with an F1-Score of 0.38 and a pre-processing pipeline that introduces the concept of using wordlists of important CTI-terms such as names of malware or threat actors.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
img		img
test_code		test_code
train_code		train_code
wordlists		wordlists
README.md		README.md
Thesis.pdf		Thesis.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Relation-Extraction-and-Knowledge-Graph-Generation-on-MISP-Event-Reports

Master's thesis on extracting Relations from CTI Texts

Work on Repo in Progress

Abstract

System Overview

Huggingface Models and Datasets

Base dataset created with Alexander Schwankner

Adapted Dataset

Models with required preprocessing(pipe) and without

Repo Structure

About

Releases

Packages

Languages

l0renor/Relation-Extraction-and-Knowledge-Graph-Generation-on-MISP-Event-Reports

Folders and files

Latest commit

History

Repository files navigation

Relation-Extraction-and-Knowledge-Graph-Generation-on-MISP-Event-Reports

Master's thesis on extracting Relations from CTI Texts

Work on Repo in Progress

Abstract

System Overview

Huggingface Models and Datasets

Base dataset created with Alexander Schwankner

Adapted Dataset

Models with required preprocessing(pipe) and without

Repo Structure

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages