-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'main' into issue-13-add-datasets
- Loading branch information
Showing
11 changed files
with
261 additions
and
115 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
# This workflow uses actions that are not certified by GitHub. | ||
# They are provided by a third-party and are governed by | ||
# separate terms of service, privacy policy, and support | ||
# documentation. | ||
|
||
# Sample workflow for building and deploying a Jekyll site to GitHub Pages | ||
name: Deploy Jekyll site to Pages | ||
|
||
on: | ||
# Runs on pushes targeting the default branch | ||
push: | ||
branches: ["main"] | ||
|
||
# Allows you to run this workflow manually from the Actions tab | ||
workflow_dispatch: | ||
|
||
# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages | ||
permissions: | ||
contents: read | ||
pages: write | ||
id-token: write | ||
|
||
# Allow only one concurrent deployment, skipping runs queued between the run in-progress and latest queued. | ||
# However, do NOT cancel in-progress runs as we want to allow these production deployments to complete. | ||
concurrency: | ||
group: "pages" | ||
cancel-in-progress: false | ||
|
||
jobs: | ||
# Build job | ||
build: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- name: Checkout | ||
uses: actions/checkout@v4 | ||
- name: Setup Ruby | ||
uses: ruby/setup-ruby@8575951200e472d5f2d95c625da0c7bec8217c42 # v1.161.0 | ||
with: | ||
ruby-version: '3.1' # Not needed with a .ruby-version file | ||
bundler-cache: true # runs 'bundle install' and caches installed gems automatically | ||
cache-version: 0 # Increment this number if you need to re-download cached gems | ||
- name: Setup Pages | ||
id: pages | ||
uses: actions/configure-pages@v5 | ||
- name: Setup Python | ||
uses: actions/setup-python@v5 | ||
with: | ||
python-version: "3.10" | ||
- name: Install Python dependencies | ||
run: pip install pandas matplotlib seaborn | ||
- name: Generate CSV assets | ||
run: python tools/generate_csv.py | ||
- name: Generate plot assets | ||
run: python tools/generate_plots.py | ||
- name: Build with Jekyll | ||
# Outputs to the './_site' directory by default | ||
run: bundle exec jekyll build --baseurl "${{ steps.pages.outputs.base_path }}" | ||
env: | ||
JEKYLL_ENV: production | ||
- name: Upload artifact | ||
# Automatically uploads an artifact from the './_site' directory by default | ||
uses: actions/upload-pages-artifact@v3 | ||
|
||
# Deployment job | ||
deploy: | ||
environment: | ||
name: github-pages | ||
url: ${{ steps.deployment.outputs.page_url }} | ||
runs-on: ubuntu-latest | ||
needs: build | ||
steps: | ||
- name: Deploy to GitHub Pages | ||
id: deployment | ||
uses: actions/deploy-pages@v4 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
Empty file.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
--- | ||
title: Dataset Statistics | ||
--- | ||
|
||
The following plots are generated from the CSV file provided in [CSV Download](/intrusion-detection-datasets/content/csv_download). | ||
|
||
### Distribution of datasets over time | ||
|
||
This figure presents the distribution of currently surveyed datasets over time, where "time" refers to the year the underlying data was generated in, which may differ from the year of publication -- if this information is not available, the latter datum is chosen instead. | ||
Datasets containing data from more than one year are represented accordingly. | ||
Additionally, data sources and label availability are shown: | ||
Data sources are grouped into "Network Data" (e.g., packet captures or network flows), "Host Data" (e.g., system logs or syscalls), and "Both" (any combination of the previous two); | ||
label availability for each dataset has been classified into either "Direct" (explicit labels for at least a subset of data), "Indirect" (meta-information allowing for manual labeling), or "No Labels". | ||
|
||
Even though this simplifies certain aspects, the figure provides a reasonably broad overview of the current landscape of IDS-related datasets. | ||
As an example, while the [DARPA '98](/intrusion-detection-datasets/content/datasets/darpa98) and [CSE-CIC-IDS2018](/intrusion-detection-datasets/content/datasets/cse_cic_ids2018) datasets contain both network and host data and are visualized as such, only their network data is labeled and thus typically used by other publications. | ||
Still, declaring these datasets to contain only network data would go beyond the purpose of a survey, as it is up to other researchers to decide whether the (in this case host) data can be utilized for their purposes. | ||
|
||
<p style="text-align: center;"> | ||
<img src="{{ "/assets/data/plots/datasets_over_years.png" | relative_url }}" alt="Figure 1: Distribution of datasets in time" /> | ||
</p> | ||
|
||
<p style="text-align: center;font-size:0.8em;"> | ||
<a href="{{ site.baseurl }}/assets/data/plots/datasets_over_years.pdf" download>Download PDF</a> | ||
</p> | ||
|
||
### Dataset characteristics | ||
|
||
This figure lists various characteristics of surveyed datasets, grouped into five categories: Source of network data, source of host data, how benign activity was generated, which operating systems were included, and how many systems in total were part of the scenario. | ||
Except for the final category, these classifications are not mutually exclusive -- consequently, the sum of a specific category might not align with the total number of datasets surveyed. | ||
This discrepancy occurs because some datasets, for example, do not include network data, while others may include multiple operating systems, affecting the sums respectively. | ||
|
||
<p style="text-align: center;"> | ||
<img src="{{ "/assets/data/plots/datatypes_count.png" | relative_url }}" alt="Figure 2: Characteristics of surveyed datasets, grouped into categories." /> | ||
</p> | ||
|
||
<p style="text-align: center;font-size:0.8em;"> | ||
<a href="{{ site.baseurl }}/assets/data/plots/datatypes_count.pdf" download>Download PDF</a> | ||
</p> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.