Skip to content

Commit

Permalink
Merge pull request #12 from fkie-cad/issue-10-related-work
Browse files Browse the repository at this point in the history
Add related work
  • Loading branch information
ru37z authored Feb 21, 2024
2 parents 91653d8 + ad28729 commit 47170c3
Show file tree
Hide file tree
Showing 10 changed files with 453 additions and 8 deletions.
1 change: 1 addition & 0 deletions _config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ navbar-links:
All Datasets: "content/all_datasets"
Contributing: "content/contributing"
About: "content/about"
Related Work: "content/related_work"
# Author's home: "https://deanattali.com"

################
Expand Down
33 changes: 33 additions & 0 deletions _posts/2024-02-21-related-work.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
---
layout: post
title: Related Work added
subtitle: A collection of related surveys and non-scientific collections of IDS datasets
gh-repo: fkie-cad/intrusion-detection-datasets
gh-badge: [ star, fork, follow ]
tags: [ website ]
comments: true
author: Philipp Bönninghausen
---

This update adds a new subpage for "Related Work", intended to provide additional source material and accessible via the navbar (or [this link](/intrusion-detection-datasets/content/related_work)).
Contents are divided into "Publications" and "Collections", where the former is any academic work that at least partially covers the topic of available IDS datasets.
Entries of this category, which are usually surveys, consist of the following:
- Publication title
- Citation
- Short description of the publication
- List of referenced datasets
- List of referenced collections

Referenced datasets link to their respective entries on this webpage, if available.
Those that are not (which are quite a few) will be looked at and possibly be added to the Intrusion Detection Datasets collection in the future.

The latter category, "Collections", simply features dataset collections not backed by a scientific publication.
These are maintained by individuals or organizations, and cover different types of datasets, ranging from "only pcaps" to "anything cybersecurity-related".
Entries consist of:
- Collection name
- Link
- Date of last update, i.e., the last time a new entry was added
- Short description of the focus of this collection

There is of course a significant overlap between the different publications/collections, for example, almost every survey references the age-old [KDD Cup 1999 dataset](/intrusion-detection-datasets/content/datasets/kdd_cup_1999).
The diversity of collections might nevertheless prove useful, as each resource provides a slightly different viewpoint upon the topic of IDS datasets.
6 changes: 6 additions & 0 deletions assets/css/beautifuljekyll.css
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@ body > main {
p {
line-height: 1.5;
margin: 1.875rem 0;
margin-block-start: 0.6rem;
margin-block-end: 0.8rem;
}
h1,h2,h3,h4,h5,h6 {
font-family: 'Open Sans', 'Helvetica Neue', Helvetica, Arial, sans-serif;
Expand Down Expand Up @@ -73,6 +75,10 @@ hr.small {
border-color: inherit;
border-radius: 0.1875rem;
}
ul {
margin-block-start: 0.2em;
margin-block-end: 0.6em;
}

/* fix in-page anchors to not be behind fixed header */
:target:before {
Expand Down
2 changes: 1 addition & 1 deletion assets/img/ngids_ds.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion content/all_datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ before-content: gh_buttons.html
| [AIT Alert Dataset](../datasets/ait_alert_dataset) | Host & Network | Alerts generated from the AIT log dataset, including labels. Only caveat is the lack of Windows machines | 2023 | Enterprise IT | Linux | 🟩 | Wazuh, Suricata and AMiner alerts | 96 MB | 2,9 GB |
| [AIT Log Dataset](../datasets/ait_log_dataset) | Host & Network | Huge variety of labeled logs collected from multiple simulation runs of an enterprise network under attack. With user emulation. but only Linux machines | 2023 | Enterprise IT | Linux | 🟩 | pcaps, Suricata alerts, misc. logs (Apache, auth, dns, vpn, audit, suricata, syslog) | 130 GB | 206 GB |
| [ASNM Datasets](../datasets/asnm_datasets) | Network | Specialized features extracted from instances of remote buffer overflow attacks for the purpose of anomaly-based detection | 2009-2018 | Mixed | Windows, Linux | 🟩 | Custom NetFlows | 21 MB | 95 GB |
| [AWSCTD](/content/../datasets/awsctd) | Host | Syscalls collected from ~10k malware samples running on Windows 7, no user emulation | 2018 | Single OS | Windows | 🟩 | Sequences of syscall numbers | 10 MB | 558 MB |
| [AWSCTD](../datasets/awsctd) | Host | Syscalls collected from ~10k malware samples running on Windows 7, no user emulation | 2018 | Single OS | Windows | 🟩 | Sequences of syscall numbers | 10 MB | 558 MB |
| [BotsV3](../datasets/botsv3) [_ON HOLD_] | | _Requires usage of Splunk + a bunch of extensions, postponed_ | 2020 | | | | | 17 GB | - |
| [CDX CTF 2009](../datasets/cdx_2009) | Network | Dataset captured from a CTF event, generally intended to provide methods for reliable generating labeled datasets from such events | 2009 | Enterprise IT | Windows, Linux | 🟨 | pcaps, Snort IDS alerts, Apache logs, Splunk logs | 12 GB | 15,3 GB |
| [CIC-IDS2017](../datasets/cic_ids2017) | Network | Simulation of medium-sized company network under attack, focuses solely on network traffic | 2017 | Enterprise IT | Windows, Linux | 🟩 | pcaps, NetFlows, custom network features | 48,4 GB | 50 GB |
Expand Down
2 changes: 1 addition & 1 deletion content/datasets/ait_log_dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ title: AIT Log Data Set
| **OS Types** | Ubuntu 20.04 |
| **Number of Machines** | 9-27 |
| **Total Runtime** | 4-6 days per sim, 8 simulations total |
| **Year of Collection** | 2023 |
| **Year of Collection** | 2022 |
| **Attack Categories** | Reconnaissance <br> Privilege Escalation <br> Data Exfiltration <br> Web-based Attacks <Remote Command Execution> |
| **User Emulation** | Yes, models complex behavior |
| | |
Expand Down
6 changes: 3 additions & 3 deletions content/datasets/nigds_dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ title: NGIDS Dataset
| **OS Types** | Ubuntu 14.04 |
| **Number of Machines** | _n/a_ |
| **Total Runtime** | ~5 days |
| **Year of Collection** | 2018 |
| **Year of Collection** | 2016 |
| **Attack Categories** | DDoS<br/>Shellcode<br/>Worms<br/>Reconnaissance<br/>Exploits<br/>"Generic" |
| **User Emulation** | Yes, using IXIA PerfectStorm |
| | |
Expand All @@ -37,7 +37,7 @@ The Next-Generation Intrusion Detection System Dataset (NGIDS-DS) was created as
It attempts to improve upon major datasets of its time (namely KDD'98 and ADFA-LD), following a set of "requirements"
laid out in the paper, which are all aimed towards generating a more realistic dataset.
It is a collection of host and network logs from a simulated enterprise environment, generally intended to be used with
anomaly-based detection methods, with the paper defining a novel "combined feature" for this purpose.
anomaly-based detection methods, with the paper defining a novel "combined feature" for this purpose, merging information about a system call and its execution time.
Their requirements for a simulation are:

- complete capture of OS audit logs and network packets
Expand Down Expand Up @@ -79,7 +79,7 @@ which acts as an all-in-one solution:
- generates ground truth for said attacks

Further details regarding user behavior is not provided.
The entire simulation ran for a duration of approximately five days.
The entire simulation ran for a duration of approximately five days, from March 11, 2016, to March 16, 2016.

### Contained Data

Expand Down
2 changes: 1 addition & 1 deletion content/datasets/optc.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ title: OpTC
| **OS Types** | Windows 10 |
| **Number of Machines** | 1000 (only data for 500 included) |
| **Total Runtime** | 6 days |
| **Year of Collection** | 2020 |
| **Year of Collection** | 2019 |
| **Attack Categories** | Powershell Empire<br/>Malicious Upgrades |
| **User Emulation** | Yes |
| | |
Expand Down
2 changes: 1 addition & 1 deletion content/datasets/pwnjutsu.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ title: PWNJUTSU
| **Total Runtime** | n/a |
| **Year of Collection** | 2022 |
| **Attack Categories** | Discovery<br/>Lateral Movement<br/>Credential Access<br/>Privilege Escalation |
| **User Emulation** | |
| **User Emulation** | n/a |
| | |
| **Packed Size** | 82 GB |
| **Unpacked Size** | n/a |
Expand Down
Loading

0 comments on commit 47170c3

Please sign in to comment.