Below are code to experiment with different scenarios of the environment. This code also makes it possible to reproduce any results that may have been reported
Experiment in version 0 environments. That is, evironments with the following network topology:
Start
|
|
v
Server
|
|
v
Data
Nodes are initialized with the following state (index of the defense values to set to zero is selected randomly):
attack values: [0,0,0,0,0,0,0,0,0,0]
defense values: [2,2,0,2,2,2,2,2,2,2]
det: 2
The environment has sparse rewards (+1,-1 rewards are given at the terminal state of each episode) The environment is partially observed (attacker can only see attack attributes of neighboring nodes, defender can only see defense attributes)
Experiments where one or two of the agents are using some learning algorithm to update their policy.
Experiments in the random_defense-v0
environment.
An environment where the defender is following a random defense policy.
-
tabular_q_learning_vs_random_defense-v0
- This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the random defender.
-
- This experiment trains an attacker agent using DQN to act optimally in the given environment and defeat the random defender.
Experiments in the minimal_defense-v0
environment.
An environment where the defender is following the defend_minimal
defense policy.
The defend_minimal
policy entails that the defender will always
defend the attribute with the minimal value out of all of its neighbors.
-
tabular_q_learning_vs_minimal_defense-v0
- This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the defender.
-
- This experiment trains an attacker agent using DQN to act optimally in the given environment and defeat the random defender.
This is an experiment in the random_attack-v0
environment.
An environment where the attack is following a random attack policy.
- random_attack_vs_tabular_q_learning-v0
- This experiment trains a defender agent using tabular q-learning to act optimally in the given environment and defeat the random attacker.
Experiments in the maximal_attack-v0
environment.
An environment where the attack is following the attack_maximal
attack policy.
The attack_maximal
policy entails that the attacker will always attack the attribute with
the maximum value out of all of its neighbors.
- maximal_attack_vs_tabular_q_learning-v0
- This experiment trains a defender agent using tabular q-learning to act optimally in the given environment and detect the attacker.
Experiments in the idsgame-v0
environment.
An environment where neither the attacker nor defender is part of the environment, i.e.
it is intended for 2-agent simulations or RL training.
- tabular_q_learning_vs_tabular_q_learning-v0
- This experiment trains both an attacker and a defender agent simultaneously against each other using tabular q-learning.
Experiments with pre-defined policies (no training)
Experiments in the idsgame-v0
environment.
An environment where neither the attacker nor defender is part of the environment, i.e.
it is intended for 2-agent simulations or RL training.
-
- In this experiment, the attacker is implemented with a random attack policy. Similarly, the defender is implemented with a random defense policy.
-
- In this experiment, the attacker is implemented with a random attack policy.
The defender is implemented with the policy
defend_minimal
. Thedefend_minimal
policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.
- In this experiment, the attacker is implemented with a random attack policy.
The defender is implemented with the policy
-
attack_maximal_vs_defend_minimal-v0
- In this experiment, the attacker is implemented with the policy
attack_maximal
. Theattack_maximal
policy entails that the attacker will always attack the attribute with the maximum value out of all of its neighbors. Similarly, the defender is implemented with the policydefend_minimal
. Thedefend_minimal
policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.
- In this experiment, the attacker is implemented with the policy
-
- In this experiment, the attacker is implemented with the policy
attack_maximal
. Theattack_maximal
policy entails that the attacker will always attack the attribute with the maximum value out of all of its neighbors. The defender is implemented with a random defense policy.
- In this experiment, the attacker is implemented with the policy
-
- In this experiment, the attacker is implemented with a greedy policy based on a save Q-table.The defender is implemented with a random defense policy.
-
- In this experiment, the defender is implemented with a greedy policy based on a save Q-table. The attacker is implemented with a random attack policy.
Experiment in version 1 environments. That is, evironments with the following network topology:
Start
|
|
v
Server
|
|
v
Data
Nodes are initialized with the following state (index of the defense values to set to zero is selected randomly):
attack values: [0,0,0,0,0,0,0,0,0,0]
defense values: [4,0,0,4,4,0,4,4,0,4]
det: 3
The environment has sparse rewards (+1,-1 rewards are given at the terminal state of each episode) The environment is partially observed (attacker can only see attack attributes of neighboring nodes, defender can only see defense attributes)
Experiments where one or two of the agents are using some learning algorithm to update their policy.
Experiments in the random_defense-v1
environment.
An environment where the defender is following a random defense policy.
-
tabular_q_learning_vs_random_defense-v1
- This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the random defender.
-
- This experiment trains an attacker agent using DQN to act optimally in the given environment and defeat the random defender.
This is an experiment in the minimal_defense-v1
environment.
An environment where the defender is following the defend_minimal
defense policy.
The defend_minimal
policy entails that the defender will always
defend the attribute with the minimal value out of all of its neighbors.
-
tabular_q_learning_vs_minimal_defense-v1
- This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the defender.
-
- This experiment trains an attacker agent using DQN to act optimally in the given environment and defeat the random defender.
This is an experiment in the random_attack-v1
environment.
An environment where the attack is following a random attack policy.
- random_attack_vs_tabular_q_learning-v1
- This experiment trains a defender agent using tabular q-learning to act optimally in the given environment and defeat the random attacker.
This is an experiment in the maximal_attack-v1
environment.
An environment where the attack is following the attack_maximal
attack policy.
The attack_maximal
policy entails that the attacker will always attack the attribute with
the maximum value out of all of its neighbors.
- maximal_attack_vs_tabular_q_learning-v1
- This experiment trains a defender agent using tabular q-learning to act optimally in the given environment and detect the attacker.
Experiments in the idsgame-v1
environment.
An environment where neither the attacker nor defender is part of the environment, i.e.
it is intended for 2-agent simulations or RL training.
- tabular_q_learning_vs_tabular_q_learning-v1
- This experiment trains both an attacker and a defender agent simultaneously against each other using tabular q-learning.
Experiments with pre-defined policies (no training)
Experiments in the idsgame-v1
environment.
An environment where neither the attacker nor defender is part of the environment, i.e.
it is intended for 2-agent simulations or RL training.
-
- In this experiment, the attacker is implemented with a random attack policy. Similarly, the defender is implemented with a random defense policy.
-
- In this experiment, the attacker is implemented with a random attack policy.
The defender is implemented with the policy
defend_minimal
. Thedefend_minimal
policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.
- In this experiment, the attacker is implemented with a random attack policy.
The defender is implemented with the policy
-
attack_maximal_vs_defend_minimal-v1
- In this experiment, the attacker is implemented with the policy
attack_maximal
. Theattack_maximal
policy entails that the attacker will always attack the attribute with the maximum value out of all of its neighbors. Similarly, the defender is implemented with the policydefend_minimal
. Thedefend_minimal
policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.
- In this experiment, the attacker is implemented with the policy
-
- In this experiment, the attacker is implemented with the policy
attack_maximal
. Theattack_maximal
policy entails that the attacker will always attack the attribute with the maximum value out of all of its neighbors. The defender is implemented with a random defense policy.
- In this experiment, the attacker is implemented with the policy
Experiment in version 2 environments. That is, evironments with the following network topology:
Start
|
+--------+--------+
| |
Server Server
| |
+--------+--------+
|
Data
This is the standard network from Elderman et al. Nodes are initialized with the following state (index of the defense values to set to zero is selected randomly):
attack values: [0,0,0,0,0,0,0,0,0,0]
defense values: [2,2,0,2,2,2,2,2,2,2]
det: 2
The environment has sparse rewards (+1,-1 rewards are given at the terminal state of each episode). The environment is partially observed (attacker can only see attack attributes of neighboring nodes, defender can only see defense attributes)
Experiments where one or two of the agents are using some learning algorithm to update their policy.
This is an experiment in the random_defense-v2
environment.
An environment where the defender is following a random defense policy.
-
tabular_q_learning_vs_random_defense-v2
- This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the random defender.
-
- This experiment trains an attacker agent using DQN to act optimally in the given environment and defeat the random defender.
This is an experiment in the minimal_defense-v2
environment.
An environment where the defender is following the defend_minimal
defense policy.
The defend_minimal
policy entails that the defender will always
defend the attribute with the minimal value out of all of its neighbors.
-
tabular_q_learning_vs_minimal_defense-v2
- This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the defender.
-
- This experiment trains an attacker agent using DQN to act optimally in the given environment and defeat the random defender.
This is an experiment in the random_attack-v2
environment.
An environment where the attack is following a random attack policy.
- random_attack_vs_tabular_q_learning-v2
- This experiment trains a defender agent using tabular q-learning to act optimally in the given environment and defeat the random attacker.
This is an experiment in the maximal_attack-v2
environment.
An environment where the attack is following the attack_maximal
attack policy.
The attack_maximal
policy entails that the attacker will always attack the attribute with
the maximum value out of all of its neighbors.
- maximal_attack_vs_tabular_q_learning-v2
- This experiment trains a defender agent using tabular q-learning to act optimally in the given environment and detect the attacker.
This is an experiment in the idsgame-v2
environment.
An environment where neither the attacker nor defender is part of the environment, i.e.
it is intended for 2-agent simulations or RL training.
- tabular_q_learning_vs_tabular_q_learning-v2
- This experiment trains both an attacker and a defender agent simultaneously against each other using tabular q-learning.
Experiments with pre-defined policies (no training)
Experiments in the idsgame-v2
environment.
An environment where neither the attacker nor defender is part of the environment, i.e.
it is intended for 2-agent simulations or RL training.
-
- In this experiment, the attacker is implemented with a random attack policy. Similarly, the defender is implemented with a random defense policy.
-
- In this experiment, the attacker is implemented with a random attack policy.
The defender is implemented with the policy
defend_minimal
. Thedefend_minimal
policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.
- In this experiment, the attacker is implemented with a random attack policy.
The defender is implemented with the policy
-
attack_maximal_vs_defend_minimal-v2
- In this experiment, the attacker is implemented with the policy
attack_maximal
. Theattack_maximal
policy entails that the attacker will always attack the attribute with the maximum value out of all of its neighbors. Similarly, the defender is implemented with the policydefend_minimal
. Thedefend_minimal
policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.
- In this experiment, the attacker is implemented with the policy
-
- In this experiment, the attacker is implemented with the policy
attack_maximal
. Theattack_maximal
policy entails that the attacker will always attack the attribute with the maximum value out of all of its neighbors. The defender is implemented with a random defense policy.
- In this experiment, the attacker is implemented with the policy
Experiment in version 3 environments. That is, evironments with the following network topology:
Start
|
|
+-------------------+
| | |
v v v
Server Server Server
| | |
| | |
v v v
Server Server Server
| | |
| | |
+----------+--------+
|
v
Data
Nodes are initialized with the following state (index of the defense values to set to zero is selected randomly):
attack values: [0,0,0,0,0,0,0,0,0,0]
defense values: [2,2,0,2,2,2,2,2,2,2]
det: 2
The environment has sparse rewards (+1,-1 rewards are given at the terminal state of each episode). The environment is partially observed (attacker can only see attack attributes of neighboring nodes, defender can only see defense attributes)
Experiments where one or two of the agents are using some learning algorithm to update their policy.
This is an experiment in the random_defense-v3
environment.
An environment where the defender is following a random defense policy.
-
tabular_q_learning_vs_random_defense-v3
- This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the random defender.
-
- This experiment trains an attacker agent using DQN to act optimally in the given environment and defeat the random defender.
This is an experiment in the minimal_defense-v3
environment.
An environment where the defender is following the defend_minimal
defense policy.
The defend_minimal
policy entails that the defender will always
defend the attribute with the minimal value out of all of its neighbors.
-
tabular_q_learning_vs_minimal_defense-v3
- This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the defender.
-
- This experiment trains an attacker agent using DQN to act optimally in the given environment and defeat the random defender.
Experiments in the random_attack-v3
environment.
An environment where the attack is following a random attack policy.
- random_attack_vs_tabular_q_learning-v3
- This experiment trains a defender agent using tabular q-learning to act optimally in the given environment and defeat the random attacker.
Experiments in the maximal_attack-v3
environment.
An environment where the attack is following the attack_maximal
attack policy.
The attack_maximal
policy entails that the attacker will always attack the attribute with
the maximum value out of all of its neighbors.
- maximal_attack_vs_tabular_q_learning-v3
- This experiment trains a defender agent using tabular q-learning to act optimally in the given environment and detect the attacker.
Experiments in the idsgame-v3
environment.
An environment where neither the attacker nor defender is part of the environment, i.e.
it is intended for 2-agent simulations or RL training.
- tabular_q_learning_vs_tabular_q_learning-v3
- This experiment trains both an attacker and a defender agent simultaneously against each other using tabular q-learning.
Experiments with pre-defined policies (no training)
Experiments in the idsgame-v3
environment.
An environment where neither the attacker nor defender is part of the environment, i.e.
it is intended for 2-agent simulations or RL training.
-
- In this experiment, the attacker is implemented with a random attack policy. Similarly, the defender is implemented with a random defense policy.
-
- In this experiment, the attacker is implemented with a random attack policy.
The defender is implemented with the policy
defend_minimal
. Thedefend_minimal
policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.
- In this experiment, the attacker is implemented with a random attack policy.
The defender is implemented with the policy
-
attack_maximal_vs_defend_minimal-v3
- In this experiment, the attacker is implemented with the policy
attack_maximal
. Theattack_maximal
policy entails that the attacker will always attack the attribute with the maximum value out of all of its neighbors. Similarly, the defender is implemented with the policydefend_minimal
. Thedefend_minimal
policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.
- In this experiment, the attacker is implemented with the policy
-
- In this experiment, the attacker is implemented with the policy
attack_maximal
. Theattack_maximal
policy entails that the attacker will always attack the attribute with the maximum value out of all of its neighbors. The defender is implemented with a random defense policy.
- In this experiment, the attacker is implemented with the policy
-
tabular_q_agent_vs_tabular_q_agent-v3
- In this experiment, both the attacker and defender are implemented with a greedy policy based on saved Q-table from previous Q-training.
Experiment in version 4 environments. That is, evironments with the following network topology:
Start
|
|
+-----------------------------+-----------------------------+-------------------------+-------------------------+
| | | | |
| | | | |
v v v v v
Server Server Server Server Server
| | | | |
| | | | |
| | | | |
v v v v v
Server Server Server Server Server
| | | | |
| | | | |
| | | | |
v v v v v
Server Server Server Server Server
| | | | |
| | | | |
| | | | |
v v v v v
Server Server Server Server Server
| | | | |
| | | | |
+-----------------------------+-----------------------------+-------------------------+-------------------------+
|
|
v
Data
Nodes are initialized with the following state (index of the defense values to set to zero is selected randomly):
attack values: [0,0,0,0,0,0,0,0,0,0]
defense values: [2,2,0,2,2,2,2,2,2,2]
det: 2
The environment has sparse rewards (+1,-1 rewards are given at the terminal state of each episode). The environment is partially observed (attacker can only see attack attributes of neighboring nodes, defender can only see defense attributes)
Experiments where one or two of the agents are using some learning algorithm to update their policy.
Experiments in the random_defense-v4
environment.
An environment where the defender is following a random defense policy.
-
tabular_q_learning_vs_random_defense-v4
- This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the random defender.
-
- This experiment trains an attacker agent using DQN to act optimally in the given environment and defeat the random defender.
Experiments in the minimal_defense-v4
environment.
An environment where the defender is following the defend_minimal
defense policy.
The defend_minimal
policy entails that the defender will always
defend the attribute with the minimal value out of all of its neighbors.
-
tabular_q_learning_vs_minimal_defense-v4
- This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the defender.
-
- This experiment trains an attacker agent using DQN to act optimally in the given environment and defeat the random defender.
Experiments in the random_attack-v4
environment.
An environment where the attack is following a random attack policy.
- random_attack_vs_tabular_q_learning-v4
- This experiment trains a defender agent using tabular q-learning to act optimally in the given environment and defeat the random attacker.
Experiments in the maximal_attack-v4
environment.
An environment where the attack is following the attack_maximal
attack policy.
- maximal_attack_vs_tabular_q_learning-v4
- The
attack_maximal
policy entails that the attacker will always attack the attribute with the maximum value out of all of its neighbors. The defender is implemented with a random defense policy. This experiment trains a defender agent using tabular q-learning to act optimally in the given environment and detect the attacker.
- The
Experiments in the idsgame-v4
environment.
An environment where neither the attacker nor defender is part of the environment, i.e.
it is intended for 2-agent simulations or RL training.
- tabular_q_learning_vs_tabular_q_learning-v4
- This experiment trains both an attacker and a defender agent simultaneously against each other using tabular q-learning.
Experiments with pre-defined policies (no training)
Experiments in the idsgame-v4
environment.
An environment where neither the attacker nor defender is part of the environment, i.e.
it is intended for 2-agent simulations or RL training.
-
- In this experiment, the attacker is implemented with a random attack policy. Similarly, the defender is implemented with a random defense policy.
-
- In this experiment, the attacker is implemented with a random attack policy.
The defender is implemented with the policy
defend_minimal
. Thedefend_minimal
policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.
- In this experiment, the attacker is implemented with a random attack policy.
The defender is implemented with the policy
-
attack_maximal_vs_defend_minimal-v4
- In this experiment, the attacker is implemented with the policy
attack_maximal
. Theattack_maximal
policy entails that the attacker will always attack the attribute with the maximum value out of all of its neighbors. Similarly, the defender is implemented with the policydefend_minimal
. Thedefend_minimal
policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.
- In this experiment, the attacker is implemented with the policy
-
- In this experiment, the attacker is implemented with the policy
attack_maximal
. Theattack_maximal
policy entails that the attacker will always attack the attribute with the maximum value out of all of its neighbors. The defender is implemented with a random defense policy.
- In this experiment, the attacker is implemented with the policy
-
tabular_q_agent_vs_defend_minimal-v4
- In this experiment, the attacker is implemented with a greedy policy
based on a save Q-table. The defender is implemented with the
policy
defend_minimal
. Thedefend_minimal
policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.
- In this experiment, the attacker is implemented with a greedy policy
based on a save Q-table. The defender is implemented with the
policy
Experiment in version 5 environments. That is, evironments with the following network topology:
Start
|
|
+-----------------------------+-----------------------------+-------------------------+-------------------------+
| | | | |
| | | | |
v v v v v
Server------------------------Server------------------------Server--------------------Server--------------------Server
| | | | |
| | | | |
| | | | |
v v v v v
Server------------------------Server------------------------Server--------------------Server--------------------Server
| | | | |
| | | | |
| | | | |
v v v v v
Server------------------------Server------------------------Server--------------------Server--------------------Server
| | | | |
| | | | |
| | | | |
v v v v v
Server------------------------Server------------------------Server--------------------Server--------------------Server
| | | | |
| | | | |
+-----------------------------+-----------------------------+-------------------------+-------------------------+
|
|
v
Data
Nodes are initialized with the following state (index of the defense values to set to zero is selected randomly):
attack values: [0,0,0,0,0,0,0,0,0,0]
defense values: [2,2,0,2,2,2,2,2,2,2]
det: 2
Moreover, only two nodes per layer has a vulnerability (defense value set to 0, all other nodes in the layer have defense value initialized to 2 on all attributes)
The environment has sparse rewards (+1,-1 rewards are given at the terminal state of each episode). The environment is partially observed (attacker can only see attack attributes of neighboring nodes, defender can only see defense attributes)
Experiments where one or two of the agents are using some learning algorithm to update their policy.
Experiments in the random_defense-v5
environment.
An environment where the defender is following a random defense policy.
-
tabular_q_learning_vs_random_defense-v5
- This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the random defender.
-
- This experiment trains an attacker agent using DQN to act optimally in the given environment and defeat the random defender.
Experiments in the minimal_defense-v5
environment.
An environment where the defender is following the defend_minimal
defense policy.
The defend_minimal
policy entails that the defender will always
defend the attribute with the minimal value out of all of its neighbors.
-
tabular_q_learning_vs_minimal_defense-v5
- This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the defender.
-
- This experiment trains an attacker agent using dqn to act optimally in the given environment and defeat the defender.
Experiments in the random_attack-v5
environment.
An environment where the attack is following a random attack policy.
- random_attack_vs_tabular_q_learning-v5
- This experiment trains a defender agent using tabular q-learning to act optimally in the given environment and defeat the random attacker.
This is an experiment in the maximal_attack-v5
environment.
An environment where the attack is following the attack_maximal
attack policy.
- maximal_attack_vs_tabular_q_learning-v5
- The
attack_maximal
policy entails that the attacker will always attack the attribute with the maximum value out of all of its neighbors. The defender is implemented with a random defense policy. This experiment trains a defender agent using tabular q-learning to act optimally in the given environment and detect the attacker.
- The
This is an experiment in the idsgame-v5
environment.
An environment where neither the attacker nor defender is part of the environment, i.e.
it is intended for 2-agent simulations or RL training.
- tabular_q_learning_vs_tabular_q_learning-v5
- This experiment trains both an attacker and a defender agent simultaneously against each other using tabular q-learning.
Experiments with pre-defined policies (no training)
This is an experiment in the idsgame-v5
environment.
An environment where neither the attacker nor defender is part of the environment, i.e.
it is intended for 2-agent simulations or RL training.
-
- In this experiment, the attacker is implemented with a random attack policy. Similarly, the defender is implemented with a random defense policy.
-
- In this experiment, the attacker is implemented with a random attack policy.
The defender is implemented with the policy
defend_minimal
. Thedefend_minimal
policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.
- In this experiment, the attacker is implemented with a random attack policy.
The defender is implemented with the policy
-
attack_maximal_vs_defend_minimal-v5
- In this experiment, the attacker is implemented with the policy
attack_maximal
. Theattack_maximal
policy entails that the attacker will always attack the attribute with the maximum value out of all of its neighbors. Similarly, the defender is implemented with the policydefend_minimal
. Thedefend_minimal
policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.
- In this experiment, the attacker is implemented with the policy
-
- In this experiment, the attacker is implemented with the policy
attack_maximal
. Theattack_maximal
policy entails that the attacker will always attack the attribute with the maximum value out of all of its neighbors. The defender is implemented with a random defense policy.
- In this experiment, the attacker is implemented with the policy
Experiment in version 7 environments. That is, evironments with the following network topology:
Start
|
|
+-------------------+
| | |
v v v
Server Server Server
| | |
| | |
v v v
Server Server Server
| | |
| | |
+----------+--------+
|
v
Data
Nodes are initialized with the following state (index of the defense values to set to zero is selected randomly):
attack values: [0,0,0,0,0,0,0,0,0,0]
defense values: [2,2,0,2,2,2,2,2,2,2]
det: 2
The environment has dense rewards (+1,-1 given whenever the attacker reaches a new level in the network) The environment is partially observed (attacker can only see attack attributes of neighboring nodes, defender can only see defense attributes).
Experiments where one or two of the agents are using some learning algorithm to update their policy.
This is an experiment in the random_defense-v7
environment.
An environment where the defender is following a random defense policy.
-
tabular_q_learning_vs_random_defense-v7
- This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the random defender.
-
- This experiment trains an attacker agent using DQN to act optimally in the given environment and defeat the random defender.
-
reinforce_vs_random_defense-v7
- This experiment trains an attacker agent using REINFORCE to act optimally in the given environment and defeat the random defender.
-
actor_critic_vs_random_defense-v7
- This experiment trains an attacker agent using Actor-Critic to act optimally in the given environment and defeat the random defender.
This is an experiment in the minimal_defense-v7
environment.
An environment where the defender is following the defend_minimal
defense policy.
The defend_minimal
policy entails that the defender will always
defend the attribute with the minimal value out of all of its neighbors.
-
tabular_q_learning_vs_minimal_defense-v7
- This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the defender.
-
- This experiment trains an attacker agent using DQN to act optimally in the given environment and defeat the random defender.
-
reinforce_vs_minimal_defense-v7
- This experiment trains an attacker agent using REINFORCE to act optimally in the given environment and defeat the random defender.
-
actor_critic_vs_minimal_defense-v7
- This experiment trains an attacker agent using Actor-Critic to act optimally in the given environment and defeat the random defender.
Experiments in the random_attack-v7
environment.
An environment where the attack is following a random attack policy.
-
random_attack_vs_tabular_q_learning-v7
- This experiment trains a defender agent using tabular q-learning to act optimally in the given environment and defeat the random attacker.
-
- This experiment trains a defender agent using DQN to act optimally in the given environment and defeat the random attacker.
-
- This experiment trains a defender agent using REINFORCE to act optimally in the given environment and defeat the random attacker.
-
random_attack_vs_actor_critic-v7
- This experiment trains a defender agent using Actor-Critic to act optimally in the given environment and defeat the random attacker.
Experiments in the maximal_attack-v7
environment.
An environment where the attack is following the attack_maximal
attack policy.
The attack_maximal
policy entails that the attacker will always attack the attribute with
the maximum value out of all of its neighbors.
-
maximal_attack_vs_tabular_q_learning-v7
- This experiment trains a defender agent using tabular q-learning to act optimally in the given environment and detect the attacker.
-
- This experiment trains a defender agent using DQN to act optimally in the given environment and detect the attacker.
-
maximal_attack_vs_reinforce-v7
- This experiment trains a defender agent using REINFORCE to act optimally in the given environment and detect the attacker.
-
maximal_attack_vs_actor_critic-v7
- This experiment trains a defender agent using Actor-Critic to act optimally in the given environment and detect the attacker.
Experiments in the idsgame-v7
environment.
An environment where neither the attacker nor defender is part of the environment, i.e.
it is intended for 2-agent simulations or RL training.
-
tabular_q_learning_vs_tabular_q_learning-v7
- This experiment trains both an attacker and a defender agent simultaneously against each other using tabular q-learning.
-
- This experiment trains both an attacker and a defender agent simultaneously against each other using DQN.
-
- This experiment trains both an attacker and a defender agent simultaneously against each other using REINFORCE.
-
actor_critic_vs_actor_critic-v7
- This experiment trains both an attacker and a defender agent simultaneously against each other using Actor-Critic.
Experiment in version 8 environments. That is, evironments with the following network topology:
Start
|
|
v
Server
|
|
v
Data
Nodes are initialized with the following state (index of the defense values to set to zero is selected randomly):
attack values: [0,0,0,0,0,0,0,0,0,0]
defense values: [2,2,0,2,2,2,2,2,2,2]
det: 2
The environment has dense rewards (+1,-1 given whenever the attacker reaches a new level in the network) The environment is partially observed (attacker can only see attack attributes of neighboring nodes, defender can only see defense attributes).
Experiments where one or two of the agents are using some learning algorithm to update their policy.
Experiments in the random_defense-v8
environment.
An environment where the defender is following a random defense policy.
-
tabular_q_learning_vs_random_defense-v8
- This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the random defender.
-
- This experiment trains an attacker agent using DQN to act optimally in the given environment and defeat the random defender.
-
reinforce_vs_random_defense-v8
- This experiment trains an attacker agent using REINFORCE to act optimally in the given environment and defeat the random defender.
-
actor_critic_vs_random_defense-v8
- This experiment trains an attacker agent using Actor-Critic to act optimally in the given environment and defeat the random defender.
Experiments in the minimal_defense-v8
environment.
An environment where the defender is following the defend_minimal
defense policy.
The defend_minimal
policy entails that the defender will always
defend the attribute with the minimal value out of all of its neighbors.
-
tabular_q_learning_vs_minimal_defense-v8
- This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the defender.
-
- This experiment trains an attacker agent using DQN to act optimally in the given environment and defeat the random defender.
-
reinforce_vs_minimal_defense-v8
- This experiment trains an attacker agent using REINFORCE to act optimally in the given environment and defeat the random defender.
-
actor_critic_vs_minimal_defense-v8
- This experiment trains an attacker agent using Actor-Critic to act optimally in the given environment and defeat the random defender.
This is an experiment in the random_attack-v8
environment.
An environment where the attack is following a random attack policy.
-
random_attack_vs_tabular_q_learning-v8
- This experiment trains a defender agent using tabular q-learning to act optimally in the given environment and defeat the random attacker.
-
- This experiment trains a defender agent using DQN to act optimally in the given environment and defeat the random attacker.
-
- This experiment trains a defender agent using REINFORCE to act optimally in the given environment and defeat the random attacker.
-
random_attack_vs_actor_critic-v8
- This experiment trains a defender agent using Actor-Critic to act optimally in the given environment and defeat the random attacker.
Experiments in the maximal_attack-v8
environment.
An environment where the attack is following the attack_maximal
attack policy.
The attack_maximal
policy entails that the attacker will always attack the attribute with
the maximum value out of all of its neighbors.
-
maximal_attack_vs_tabular_q_learning-v8
- This experiment trains a defender agent using tabular q-learning to act optimally in the given environment and detect the attacker.
-
- This experiment trains a defender agent using DQN to act optimally in the given environment and detect the attacker.
-
maximal_attack_vs_reinforce-v8
- This experiment trains a defender agent using REINFORCE to act optimally in the given environment and detect the attacker.
-
maximal_attack_vs_actor_critic-v8
- This experiment trains a defender agent using Actor-Critic to act optimally in the given environment and detect the attacker.
Experiments in the idsgame-v8
environment.
An environment where neither the attacker nor defender is part of the environment, i.e.
it is intended for 2-agent simulations or RL training.
-
tabular_q_learning_vs_tabular_q_learning-v8
- This experiment trains both an attacker and a defender agent simultaneously against each other using tabular q-learning.
-
- This experiment trains both an attacker and a defender agent simultaneously against each other using DQN.
-
- This experiment trains both an attacker and a defender agent simultaneously against each other using REINFORCE.
-
actor_critic_vs_actor_critic-v8
- This experiment trains both an attacker and a defender agent simultaneously against each other using advantage actor-critic.
Experiment in version 9 environments. That is, evironments with the following network topology:
Start
|
+--------+--------+
| |
Server Server
| |
+--------+--------+
|
Data
This is the standard network from Elderman et al. Nodes are initialized with the following state (index of the defense values to set to zero is selected randomly):
attack values: [0,0,0,0,0,0,0,0,0,0]
defense values: [2,2,0,2,2,2,2,2,2,2]
det: 2
The environment has dense rewards (+1,-1 given whenever the attacker reaches a new level in the network) The environment is partially observed (attacker can only see attack attributes of neighboring nodes, defender can only see defense attributes).
Experiments where one or two of the agents are using some learning algorithm to update their policy.
This is an experiment in the random_defense-v9
environment.
An environment where the defender is following a random defense policy.
-
tabular_q_learning_vs_random_defense-v9
- This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the random defender.
-
- This experiment trains an attacker agent using DQN to act optimally in the given environment and defeat the random defender.
-
reinforce_vs_random_defense-v9
- This experiment trains an attacker agent using REINFORCE to act optimally in the given environment and defeat the random defender.
-
actor_critic_vs_random_defense-v9
- This experiment trains an attacker agent using Actor-Critic to act optimally in the given environment and defeat the random defender.
This is an experiment in the minimal_defense-v9
environment.
An environment where the defender is following the defend_minimal
defense policy.
The defend_minimal
policy entails that the defender will always
defend the attribute with the minimal value out of all of its neighbors.
-
tabular_q_learning_vs_minimal_defense-v9
- This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the defender.
-
- This experiment trains an attacker agent using DQN to act optimally in the given environment and defeat the random defender.
-
reinforce_vs_minimal_defense-v9
- This experiment trains an attacker agent using REINFORCE to act optimally in the given environment and defeat the random defender.
-
actor_critic_vs_minimal_defense-v9
- This experiment trains an attacker agent using Actor-Critic to act optimally in the given environment and defeat the random defender.
This is an experiment in the random_attack-v9
environment.
An environment where the attack is following a random attack policy.
-
random_attack_vs_tabular_q_learning-v9
- This experiment trains a defender agent using tabular q-learning to act optimally in the given environment and defeat the random attacker.
-
- This experiment trains a defender agent using dqn to act optimally in the given environment and defeat the random attacker.
-
- This experiment trains a defender agent using REINFORCE to act optimally in the given environment and defeat the random attacker
-
random_attack_vs_actor_critic-v9
- This experiment trains a defender agent using Actor-Critic to act optimally in the given environment and defeat the random attacker
This is an experiment in the maximal_attack-v9
environment.
An environment where the attack is following the attack_maximal
attack policy.
The attack_maximal
policy entails that the attacker will always attack the attribute with
the maximum value out of all of its neighbors.
-
maximal_attack_vs_tabular_q_learning-v9
- This experiment trains a defender agent using tabular q-learning to act optimally in the given environment and detect the attacker.
-
- This experiment trains a defender agent using dqn to act optimally in the given environment and detect the attacker.
-
maximal_attack_vs_reinforce-v9
- This experiment trains a defender agent using REINFORCE to act optimally in the given environment and detect the attacker.
-
maximal_attack_vs_actor_critic-v9
- This experiment trains a defender agent using Actor-Critic to act optimally in the given environment and detect the attacker.
This is an experiment in the idsgame-v9
environment.
An environment where neither the attacker nor defender is part of the environment, i.e.
it is intended for 2-agent simulations or RL training.
-
tabular_q_learning_vs_tabular_q_learning-v9
- This experiment trains both an attacker and a defender agent simultaneously against each other using tabular q-learning.
-
- This experiment trains both an attacker and a defender agent simultaneously against each other using DQN.
-
- This experiment trains both an attacker and a defender agent simultaneously against each other using REINFORCE.
-
actor_critic_vs_actor_critic-v9
- This experiment trains both an attacker and a defender agent simultaneously against each other using Actor-Critic.
Experiment in version 10 environments. That is, evironments with the following network topology:
Start
|
|
v
Server
|
|
v
Data
Nodes are initialized with the following state (index of the defense values to set to zero is selected randomly):
attack values: [0,0,0,0,0,0,0,0,0,0]
defense values: [2,2,0,2,2,2,2,2,2,2]
det: 2
The environment has dense rewards (+1,-1 given whenever the attacker reaches a new level in the network)
The environment is fully observed for both the defender and attacker
Experiments where one or two of the agents are using some learning algorithm to update their policy.
Experiments in the minimal_defense-v10
environment.
An environment where the defender is following the defend_minimal
defense policy.
The defend_minimal
policy entails that the defender will always
defend the attribute with the minimal value out of all of its neighbors.
- actor_critic_vs_minimal_defense-v10
- This experiment trains an attacker agent using Actor-Critic to act optimally in the given environment and defeat the random defender.
Experiments in the idsgame-v10
environment.
An environment where neither the attacker nor defender is part of the environment, i.e.
it is intended for 2-agent simulations or RL training.
- actor_critic_vs_actor_critic-v10
- This experiment trains both an attacker and a defender agent simultaneously against each other using advantage actor-critic.
Experiment in version 11 environments. That is, evironments with the following network topology:
Start
|
|
v
Server
|
|
v
Data
Nodes are initialized with the following state (index of the defense values to set to zero is selected randomly):
attack values: [0,0]
defense values: [0,0]
det: 1
The environment has dense rewards (+1,-1 given whenever the attacker reaches a new level in the network)
The environment is fully observed for both the defender and attacker
Experiments where one or two of the agents are using some learning algorithm to update their policy.
Experiments in the minimal_defense-v11
environment.
An environment where the defender is following the defend_minimal
defense policy.
The defend_minimal
policy entails that the defender will always
defend the attribute with the minimal value out of all of its neighbors.
-
actor_critic_vs_minimal_defense-v11
- This experiment trains an attacker agent using Actor-Critic to act optimally in the given environment and defeat the random defender.
-
tabular_q_learning_vs_minimal_defense-v11
- This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the defender.
Experiments in the idsgame-v11
environment.
An environment where neither the attacker nor defender is part of the environment, i.e.
it is intended for 2-agent simulations or RL training.
- actor_critic_vs_actor_critic-v11
- This experiment trains both an attacker and a defender agent simultaneously against each other using advantage actor-critic.
Experiment in version 12 environments. That is, evironments with the following network topology:
Start
|
|
v
Server
|
|
v
Data
Nodes are initialized with the following state (index of the defense values to set to zero is selected randomly):
attack values: [(0-1) randomized, (0-1) randomized]
defense values: [(0-1) randomized, (0-1) randomized]
det: (0-1) randomized
The environment has dense rewards (+1,-1 given whenever the attacker reaches a new level in the network)
The environment is fully observed for both the defender and attacker
Experiments where one or two of the agents are using some learning algorithm to update their policy.
Experiments in the minimal_defense-v12
environment.
An environment where the defender is following the defend_minimal
defense policy.
The defend_minimal
policy entails that the defender will always
defend the attribute with the minimal value out of all of its neighbors.
-
actor_critic_vs_minimal_defense-v12
- This experiment trains an attacker agent using Actor-Critic to act optimally in the given environment and defeat the random defender.
-
tabular_q_learning_vs_minimal_defense-v12
- This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the defender.
Experiments in the idsgame-v12
environment.
An environment where neither the attacker nor defender is part of the environment, i.e.
it is intended for 2-agent simulations or RL training.
- actor_critic_vs_actor_critic-v12
- This experiment trains both an attacker and a defender agent simultaneously against each other using advantage actor-critic.
Experiment in version 13 environments. That is, evironments with the following network topology:
Start
|
|
v
Server
|
|
v
Data
Nodes are initialized with the following state (index of the defense values to set to zero is selected randomly):
attack values: [0,0]
defense values: [0,0]
det: 10
The environment has dense rewards (+1,-1 given whenever the attacker reaches a new level in the network)
The environment is fully observed for both the defender and attacker
Experiments where one or two of the agents are using some learning algorithm to update their policy.
Experiments in the minimal_defense-v13
environment.
An environment where the defender is following the defend_minimal
defense policy.
The defend_minimal
policy entails that the defender will always
defend the attribute with the minimal value out of all of its neighbors.
-
actor_critic_vs_minimal_defense-v13
- This experiment trains an attacker agent using Actor-Critic to act optimally in the given environment and defeat the random defender.
-
tabular_q_learning_vs_minimal_defense-v13
- This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the defender.
Experiments in the idsgame-v13
environment.
An environment where neither the attacker nor defender is part of the environment, i.e.
it is intended for 2-agent simulations or RL training.
- actor_critic_vs_actor_critic-v13
- This experiment trains both an attacker and a defender agent simultaneously against each other using advantage actor-critic.