Skip to content

Commit

Permalink
Merge pull request #30 from fkie-cad/issue-13-add-datasets
Browse files Browse the repository at this point in the history
Add new datasets
  • Loading branch information
ru37z authored Apr 4, 2024
2 parents 8d85d16 + ec2e387 commit 702b6c3
Show file tree
Hide file tree
Showing 11 changed files with 371 additions and 34 deletions.
7 changes: 5 additions & 2 deletions content/all_datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,19 +14,21 @@ before-content: gh_buttons.html
| [AIT Log Dataset](../datasets/ait_log_dataset) | Host & Network | Huge variety of labeled logs collected from multiple simulation runs of an enterprise network under attack. With user emulation. but only Linux machines | 2023 | Enterprise IT | Linux | 🟩 | pcaps, Suricata alerts, misc. logs (Apache, auth, dns, vpn, audit, suricata, syslog) | 130 GB | 206 GB |
| [ASNM Datasets](../datasets/asnm_datasets) | Network | Specialized features extracted from instances of remote buffer overflow attacks for the purpose of anomaly-based detection | 2009-2018 | Mixed | Windows, Linux | 🟩 | Custom NetFlows | 21 MB | 95 GB |
| [AWSCTD](../datasets/awsctd) | Host | Syscalls collected from ~10k malware samples running on Windows 7, no user emulation | 2018 | Single OS | Windows | 🟩 | Sequences of syscall numbers | 10 MB | 558 MB |
| [BotsV3](../datasets/botsv3) [_ON HOLD_] | | _Requires usage of Splunk + a bunch of extensions, postponed_ | 2020 | | | | | 17 GB | - |
| [CDX CTF 2009](../datasets/cdx_2009) | Network | Dataset captured from a CTF event, generally intended to provide methods for reliable generating labeled datasets from such events | 2009 | Enterprise IT | Windows, Linux | 🟨 | pcaps, Snort IDS alerts, Apache logs, Splunk logs | 12 GB | 15,3 GB |
| [CIC DoS](../datasets/cic_dos) | Network | Dataset focusing on different DoS attacks targeting the application layer (instead of network layer), but no longer available | 2017 | Enterprise IT | Linux | 🟩 | Network traffic (unknown format) | - | 4,6 GB |
| [CIC-DDoS2019](../datasets/cic_ddos) | Network | Dataset focusing on various DDoS attacks, covering a broad range of categories. Includes benign behavior, but only for Pcaps, not NetFlows | 2019 | Enterprise IT | Windows, Linux | 🟩 | Pcaps, NetFlows, Windows events, Ubuntu events | 24,4 GB | - |
| [CIC-IDS2017](../datasets/cic_ids2017) | Network | Simulation of medium-sized company network under attack, focuses solely on network traffic | 2017 | Enterprise IT | Windows, Linux | 🟩 | pcaps, NetFlows, custom network features | 48,4 GB | 50 GB |
| [CIDD](../datasets/cidd) | - | Spin on the DARPA'98 dataset, correlating user behavior over different systems/environments for behavior-based IDSs | 2012 | Military IT | Unix | 🟩 | Sequences of user "audits" | - | 22 GB |
| [CLUE-LDS](../datasets/clue_lds) | - | Database of real user behavior without known attacks, for evaluation of methods detecting shifts in user behavior | 2022 | Enterprise Subsystem | - (hBox) | 🟥 | Custom event logs | 640 MB | 14,9 GB |
| [Comprehensive, Multi-Source Cyber-Security Events](../datasets/comp_multi_source_cybersec_events) | Host & Network | Various events from production network with red team activity, but extremely limited information per event | 2015 | Enterprise IT | Windows, Linux | 🟩 | Custom event logs (auth, proc, network flows, dns, redteam) | 12 GB | - |
| [CSE-CIC-IDS2018](../datasets/cse_cic_ids2018) | Network | Simulation of large enterprise IT (450 machines) with user emulation and various attacks, includes host and network logs, but only the latter are labeled | 2018 | Enterprise IT | Windows, Linux, MacOS | 🟨 | pcaps, NetFlows, custom network features, Windows events, Ubuntu events | 220 GB | - |
| [CSE-CIC-IDS2018](../datasets/cse_cic_ids2018) | Network | Simulation of large enterprise IT (450 machines) with user emulation and various attacks, includes host and network logs, but only the latter are labeled | 2018 | Enterprise IT | Windows, Linux, MacOS | 🟩 | pcaps, NetFlows, Windows events, Ubuntu events | 220 GB | - |
| [CTU 13](../datasets/ctu_13) | Network | Collection of various botnet behavior combined with loads of background traffic, but very limited feature space | 2011 | Enterprise IT | Windows, Undisclosed | 🟩 | pcaps, NetFlows, Bro logs | - | 697 GB |
| [DAPT 2020](../datasets/dapt2020) | Network | Focuses on attacks mimicking those of an APT group, executed in a rather small environment | 2020 | Enterprise IT | Undisclosed | 🟩 | NetFlows, misc. logs (DNS, syslog, auditd, apache, auth, various services) | 460 MB | - |
| [DARPA'98 Intrusion Detection Program](../datasets/darpa98) | Network | Simulation of a small U.S. Air Force network under attack. No longer appropriate to use for a multiple reasons | 1998 | Military IT | Unix | 🟩 | tcpdumps, host audit logs, file system dumps | 5 GB | - |
| [DARPA TC3](../datasets/darpa_tc3) | Host | Custom event logs from network under attack, designed to facilitate provenance tracking | 2018 | Undisclosed | Undisclosed | 🟨 | Custom event logs | 115 GB | - |
| [DARPA TC5](../datasets/darpa_tc5) | Host | Custom event logs from network under attack from APT groups, designed to facilitate provenance tracking | 2019 | Undisclosed | Undisclosed | 🟨 | Custom event logs | - | - |
| [EVTX to MITRE ATT&CK](../datasets/evtx_to_mitre_attck) | Host | Small dataset providing various events corresponding to certain MITRE tactics/techniques | 2022 | Single OS | Windows | 🟩 | Windows events | <1 GB | <1 GB |
| [gureKDDCup](../datasets/gure_kddcup) | Network | An extension of the KDDCup 1999 dataset, adding additional information about payloads to each connection record | 2008 | Military IT | Unix | 🟩 | Connection records with payload information | 10 GB | - |
| [ISCX Intrusion Detection Evaluation](../datasets/iscx_ids_2012) | Network | Focus on realistic traffic generation in a company network, combined with some basic attacks | 2012 | Enterprise IT | Windows, Ubuntu | 🟩 | pcaps | 84 GB | 87 GB |
| [KDD Cup 1999](../datasets/kdd_cup_1999) | Network | Network connection events derived from simulated U.S. Air Force network under attack. No longer appropriate to use for multiple reasons | 1999 | Military IT | Unix | 🟩 | Connection records | 18 MB | 743 MB |
| [Kyoto Honeypot](../datasets/kyoto_honeypot) | Network | Collection of features derived from attack traffic targeting honeypots over the span of 9 years | 2006-2015 | Diverse | Windows, Unix, MacOS | 🟩 | Custom network features | 20 GB | - |
Expand All @@ -50,6 +52,7 @@ before-content: gh_buttons.html
| [Unified Host and Network Data Set](../datasets/unified_host_and_network_dataset) | - | Selection of network and host events collected from operational environment, but without any attacks | 2017 | Enterprise IT | Windows, Linux | 🟥 | NetFlows, Windows events | - | - |
| [Unraveled](../datasets/unraveled) | Host & Network | Large dataset with intricate labeling, though the focus seems to be on network flows. Mapping will be annoying. | 2021 | Enterprise IT | Windows, Ubuntu, Kali | 🟩 | pcaps, misc. logs (syslog, audit, auth, Snort) | - | 22 GB |
| [UNSW-NB15](../datasets/unsw_nb15) | Network | Custom network undergoing a variety of attacks using IXIA PerfectStorm hardware. Mostly geared towards anomaly-based NIDS | 2015 | Undisclosed | Undisclosed | 🟩 | pcaps, custom network features | >100 GB | - |
| [User-Computer Associations in Time](../datasets/user_computer_associations) | - | Large number of authentication events over a period of 9 months, but with very little detail and without any attacks | 2014 | Enterprise IT | Undisclosed | 🟥 | Custom auth event logs | 2,3 GB | - |
| [VAST Challenge 2011](../datasets/vast_2011) | Network | Originated from a challenge about data analytics, focus on network but also contains host logs. Labeling is a bit lacking | 2011 | Enterprise IT | Windows | 🟨 | pcaps, Windows events, misc. logs (firewall, Snort, Nessus) | 940 MB | 9,3 GB |
| [VAST Challenge 2012](../datasets/vast_2012) | Network | Originated from a challenge about data analytics, focus an a large network being the victim of a botnet | 2012 | Enterprise IT | Undisclosed | 🟨 | Snort alerts, firewall logs | 186 MB | 2,9 GB |

Expand Down
2 changes: 1 addition & 1 deletion content/datasets/botsv3.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: BOTSv3
title: BOTSv3 [UNLISTED ENTRY]
---

- [Overview](#overview)
Expand Down
Loading

0 comments on commit 702b6c3

Please sign in to comment.