Skip to content

Commit

Permalink
Merge pull request #25 from fkie-cad/add-dataset-2-cic-ddos
Browse files Browse the repository at this point in the history
Add "CIC-DDoS2019" dataset
  • Loading branch information
ru37z authored Apr 3, 2024
2 parents 18cf960 + 45bb3aa commit ec2e387
Show file tree
Hide file tree
Showing 4 changed files with 86 additions and 4 deletions.
3 changes: 2 additions & 1 deletion content/all_datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,12 @@ before-content: gh_buttons.html
| [AWSCTD](../datasets/awsctd) | Host | Syscalls collected from ~10k malware samples running on Windows 7, no user emulation | 2018 | Single OS | Windows | 🟩 | Sequences of syscall numbers | 10 MB | 558 MB |
| [CDX CTF 2009](../datasets/cdx_2009) | Network | Dataset captured from a CTF event, generally intended to provide methods for reliable generating labeled datasets from such events | 2009 | Enterprise IT | Windows, Linux | 🟨 | pcaps, Snort IDS alerts, Apache logs, Splunk logs | 12 GB | 15,3 GB |
| [CIC DoS](../datasets/cic_dos) | Network | Dataset focusing on different DoS attacks targeting the application layer (instead of network layer), but no longer available | 2017 | Enterprise IT | Linux | 🟩 | Network traffic (unknown format) | - | 4,6 GB |
| [CIC-DDoS2019](../datasets/cic_ddos) | Network | Dataset focusing on various DDoS attacks, covering a broad range of categories. Includes benign behavior, but only for Pcaps, not NetFlows | 2019 | Enterprise IT | Windows, Linux | 🟩 | Pcaps, NetFlows, Windows events, Ubuntu events | 24,4 GB | - |
| [CIC-IDS2017](../datasets/cic_ids2017) | Network | Simulation of medium-sized company network under attack, focuses solely on network traffic | 2017 | Enterprise IT | Windows, Linux | 🟩 | pcaps, NetFlows, custom network features | 48,4 GB | 50 GB |
| [CIDD](../datasets/cidd) | - | Spin on the DARPA'98 dataset, correlating user behavior over different systems/environments for behavior-based IDSs | 2012 | Military IT | Unix | 🟩 | Sequences of user "audits" | - | 22 GB |
| [CLUE-LDS](../datasets/clue_lds) | - | Database of real user behavior without known attacks, for evaluation of methods detecting shifts in user behavior | 2022 | Enterprise Subsystem | - (hBox) | 🟥 | Custom event logs | 640 MB | 14,9 GB |
| [Comprehensive, Multi-Source Cyber-Security Events](../datasets/comp_multi_source_cybersec_events) | Host & Network | Various events from production network with red team activity, but extremely limited information per event | 2015 | Enterprise IT | Windows, Linux | 🟩 | Custom event logs (auth, proc, network flows, dns, redteam) | 12 GB | - |
| [CSE-CIC-IDS2018](../datasets/cse_cic_ids2018) | Network | Simulation of large enterprise IT (450 machines) with user emulation and various attacks, includes host and network logs, but only the latter are labeled | 2018 | Enterprise IT | Windows, Linux, MacOS | 🟨 | pcaps, NetFlows, custom network features, Windows events, Ubuntu events | 220 GB | - |
| [CSE-CIC-IDS2018](../datasets/cse_cic_ids2018) | Network | Simulation of large enterprise IT (450 machines) with user emulation and various attacks, includes host and network logs, but only the latter are labeled | 2018 | Enterprise IT | Windows, Linux, MacOS | 🟩 | pcaps, NetFlows, Windows events, Ubuntu events | 220 GB | - |
| [CTU 13](../datasets/ctu_13) | Network | Collection of various botnet behavior combined with loads of background traffic, but very limited feature space | 2011 | Enterprise IT | Windows, Undisclosed | 🟩 | pcaps, NetFlows, Bro logs | - | 697 GB |
| [DAPT 2020](../datasets/dapt2020) | Network | Focuses on attacks mimicking those of an APT group, executed in a rather small environment | 2020 | Enterprise IT | Undisclosed | 🟩 | NetFlows, misc. logs (DNS, syslog, auditd, apache, auth, various services) | 460 MB | - |
| [DARPA'98 Intrusion Detection Program](../datasets/darpa98) | Network | Simulation of a small U.S. Air Force network under attack. No longer appropriate to use for a multiple reasons | 1998 | Military IT | Unix | 🟩 | tcpdumps, host audit logs, file system dumps | 5 GB | - |
Expand Down
81 changes: 81 additions & 0 deletions content/datasets/cic_ddos.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
---
title: CIC-DDos2019
---

- [Overview](#overview)
- [Environment](#environment)
- [Activity](#activity)
- [Contained Data](#contained-data)
- [Papers](#papers)
- [Links](#links)
- [Data Examples](#data-examples)

| <!-- --> | <!-- --> |
|--------------------------|---------------------------------------------------------------|
| **Network Log Source** | Pcaps, NetFlows |
| **Network Logs Labeled** | Flows are labeled |
| **Host Log Source** | Windows event logs, Ubuntu event logs |
| **Host Logs Labeled** | No |
| | |
| **Overall Setting** | Enterprise IT |
| **OS Types** | Windows Vista/7/8.1/10<br/>Ubuntu 16.04<br/>Fortinet |
| **Number of Machines** | 6 |
| **Total Runtime** | ~16 hours |
| **Year of Collection** | 2019 |
| **Attack Categories** | Various DDoS attacks |
| **User Emulation** | Yes, models complex behavior |
| | |
| **Packed Size** | 24,4 GB |
| **Unpacked Size** | n/a |
| **Download Link** | [goto](http://205.174.165.80/CICDataset/CICDDoS2019/Dataset/) |

***

### Overview
The CIC-DDos2019 dataset, developed by the Canadian Institute for Cybersecurity (CIC), was created to enable evaluation of new DDoS detection methods, which, according to the authors, was not possible with previously existing datasets containing DDoS attacks.
The dataset is accompanied by a newly proposed taxonomy for DDoS attacks, dividing them into several subclasses.
These attacks are then executed within a small testbed, consisting of a victim network performing benign behavior and a separate attacker network.
This simulation was run on two separate days, namely training and testing day;
data was collected in the form of pcaps, which are then processed into labeled NetFlows.

### Environment
The victim network consists of four Windows machines (Vista/7/8.1/10), an Ubuntu 16.04 Web Server and a firewall.
Information regarding software is not available, IPs of individual machines can be found on the homepage.
Attacks originate from a separate attacker network, which is also not further detailed.

### Activity
So-called B(enign)-Profiles are leveraged to define normal behavior which is performed during the collection period;
this simulates 25 distinct users interacting with HTTP, HTTPS, FTP, SSH, and email-protocols.
Statistics for these interactions have been derived from observing real human behavior.

Executed attacks are based on the newly proposed taxonomy of DDoS attacks, for details regarding this refer to Chapter 3 of the cited paper.
On the first day (training day), 12 different DDoS attacks were executed at different points in time.
On the second day (testing day), a subset of 5 of these attacks were executed, plus a sixth one that was not performed previously.

### Contained Data
Attacks were exclusively executed within the collection period, i.e., no attack is running when data collection starts.
Data is organized per day and consists of pcaps, which were then processed into NetFlows using CICFlowMeter and subsequently labeled.
These flows are grouped by attack in separate `csv` files, but there are no flows available for benign behavior.
While these probably could be extracted manually from the available pcaps, I'm honestly not quite sure why they weren't included in the first place.

A detailed analysis of these flows, especially with respect to the effects of individual attacks on certain features, is available in Chapter 5 of the paper.

### Papers
- [Developing Realistic Distributed Denial of Service (DDoS) Attack Dataset and Taxonomy (2019)](https://doi.org/10.1109/CCST.2019.8888419)

### Links
- [Homepage](https://www.unb.ca/cic/datasets/ddos-2019.html)
- [Download](http://205.174.165.80/CICDataset/CICDDoS2019/Dataset/)

### Data Examples
Labeled flows taken from `CSVs/CSV-03-11/Portmap.csv`
```
Unnamed: 0,Flow ID, Source IP, Source Port, Destination IP, Destination Port, Protocol, Timestamp, Flow Duration, Total Fwd Packets, Total Backward Packets,Total Length of Fwd Packets, Total Length of Bwd Packets, Fwd Packet Length Max, Fwd Packet Length Min, Fwd Packet Length Mean, Fwd Packet Length Std,Bwd Packet Length Max, Bwd Packet Length Min, Bwd Packet Length Mean, Bwd Packet Length Std,Flow Bytes/s, Flow Packets/s, Flow IAT Mean, Flow IAT Std, Flow IAT Max, Flow IAT Min,Fwd IAT Total, Fwd IAT Mean, Fwd IAT Std, Fwd IAT Max, Fwd IAT Min,Bwd IAT Total, Bwd IAT Mean, Bwd IAT Std, Bwd IAT Max, Bwd IAT Min,Fwd PSH Flags, Bwd PSH Flags, Fwd URG Flags, Bwd URG Flags, Fwd Header Length, Bwd Header Length,Fwd Packets/s, Bwd Packets/s, Min Packet Length, Max Packet Length, Packet Length Mean, Packet Length Std, Packet Length Variance,FIN Flag Count, SYN Flag Count, RST Flag Count, PSH Flag Count, ACK Flag Count, URG Flag Count, CWE Flag Count, ECE Flag Count, Down/Up Ratio, Average Packet Size, Avg Fwd Segment Size, Avg Bwd Segment Size, Fwd Header Length.1,Fwd Avg Bytes/Bulk, Fwd Avg Packets/Bulk, Fwd Avg Bulk Rate, Bwd Avg Bytes/Bulk, Bwd Avg Packets/Bulk,Bwd Avg Bulk Rate,Subflow Fwd Packets, Subflow Fwd Bytes, Subflow Bwd Packets, Subflow Bwd Bytes,Init_Win_bytes_forward, Init_Win_bytes_backward, act_data_pkt_fwd, min_seg_size_forward,Active Mean, Active Std, Active Max, Active Min,Idle Mean, Idle Std, Idle Max, Idle Min,SimillarHTTP, Inbound, Label
[...]
162471,172.16.0.5-192.168.50.4-932-44723-17,172.16.0.5,932,192.168.50.4,44723,17,2018-11-03 10:01:35.983831,1,2,0,458.0,0.0,229.0,229.0,229.0,0.0,0.0,0.0,0.0,0.0,4.58E8,2000000.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,40,0,2000000.0,0.0,229.0,229.0,229.0,0.0,0.0,0,0,0,0,0,0,0,0,0.0,343.5,229.0,0.0,40,0,0,0,0,0,0,2,458,0,0,-1,-1,1,20,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,1,Portmap
61268,172.16.0.5-192.168.50.4-933-39983-17,172.16.0.5,933,192.168.50.4,39983,17,2018-11-03 10:01:35.984211,1,2,0,458.0,0.0,229.0,229.0,229.0,0.0,0.0,0.0,0.0,0.0,4.58E8,2000000.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,40,0,2000000.0,0.0,229.0,229.0,229.0,0.0,0.0,0,0,0,0,0,0,0,0,0.0,343.5,229.0,0.0,40,0,0,0,0,0,0,2,458,0,0,-1,-1,1,20,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,1,Portmap
27258,172.16.0.5-192.168.50.4-934-26737-17,172.16.0.5,934,192.168.50.4,26737,17,2018-11-03 10:01:35.984213,1,2,0,458.0,0.0,229.0,229.0,229.0,0.0,0.0,0.0,0.0,0.0,4.58E8,2000000.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,40,0,2000000.0,0.0,229.0,229.0,229.0,0.0,0.0,0,0,0,0,0,0,0,0,0.0,343.5,229.0,0.0,40,0,0,0,0,0,0,2,458,0,0,-1,-1,1,20,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,1,Portmap
85566,172.16.0.5-192.168.50.4-648-21313-17,172.16.0.5,648,192.168.50.4,21313,17,2018-11-03 10:01:35.984783,2,2,0,458.0,0.0,229.0,229.0,229.0,0.0,0.0,0.0,0.0,0.0,2.29E8,1000000.0,2.0,0.0,2.0,2.0,2.0,2.0,0.0,2.0,2.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,40,0,1000000.0,0.0,229.0,229.0,229.0,0.0,0.0,0,0,0,0,0,0,0,0,0.0,343.5,229.0,0.0,40,0,0,0,0,0,0,2,458,0,0,-1,-1,1,20,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,1,Portmap
108025,172.16.0.5-192.168.50.4-935-15051-17,172.16.0.5,935,192.168.50.4,15051,17,2018-11-03 10:01:35.984786,0,2,0,530.0,0.0,265.0,265.0,265.0,0.0,0.0,0.0,0.0,0.0,Infinity,Infinity,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,40,0,0.0,0.0,265.0,265.0,265.0,0.0,0.0,0,0,0,0,0,0,0,0,0.0,397.5,265.0,0.0,40,0,0,0,0,0,0,2,530,0,0,-1,-1,1,20,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,1,Portmap
87041,172.16.0.5-192.168.50.4-936-49469-17,172.16.0.5,936,192.168.50.4,49469,17,2018-11-03 10:01:35.985305,2,2,0,458.0,0.0,229.0,229.0,229.0,0.0,0.0,0.0,0.0,0.0,2.29E8,1000000.0,2.0,0.0,2.0,2.0,2.0,2.0,0.0,2.0,2.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,40,0,1000000.0,0.0,229.0,229.0,229.0,0.0,0.0,0,0,0,0,0,0,0,0,0.0,343.5,229.0,0.0,40,0,0,0,0,0,0,2,458,0,0,-1,-1,1,20,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,1,Portmap
```
2 changes: 1 addition & 1 deletion content/datasets/cic_dos.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ Further details are not available.
### Activity
The declared goal of executed attacks was to render services on the server side unresponsive while being as stealthy and resource-efficient as possible, including stopping attacks as soon as servers became unresponsive.
The authors state that attacks were selected to match the most common types of application layer DoS, resulting in a mix of high- and log-volume attacks.
These attacks were executed leveraging a several publicly available tools such as [Goldeneye](https://github.com/jseidl/GoldenEye) or [Slowloris](https://github.com/gkbrk/slowloris), for a total of eight attacks:
These attacks were executed leveraging several publicly available tools such as [Goldeneye](https://github.com/jseidl/GoldenEye) or [Slowloris](https://github.com/gkbrk/slowloris), for a total of eight attacks:
- High-volume HTTP attacks:
- DoS improved GET
- DDoS GET
Expand Down
4 changes: 2 additions & 2 deletions content/datasets/cse_cic_ids2018.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ title: CSE-CIC-IDS2018

| <!-- --> | <!-- --> |
|--------------------------|----------------------------------------------------------------------------------------------------------|
| **Network Log Source** | pcaps, network features |
| **Network Logs Labeled** | Only features are labeled |
| **Network Log Source** | pcaps, NetFlows |
| **Network Logs Labeled** | NetFlows are labeled |
| **Host Log Source** | Ubuntu event logs, Windows event logs |
| **Host Logs Labeled** | No |
| | |
Expand Down

0 comments on commit ec2e387

Please sign in to comment.