Skip to content

Latest commit

 

History

History
1423 lines (1080 loc) · 62.6 KB

README.md

File metadata and controls

1423 lines (1080 loc) · 62.6 KB

Experiments

Below are code to experiment with different scenarios of the environment. This code also makes it possible to reproduce any results that may have been reported

Experiments in Version 0 Environments

Experiment in version 0 environments. That is, evironments with the following network topology:

                   Start
		     |
		     |
		     v
		   Server
		     |
		     |
		     v
		   Data

Nodes are initialized with the following state (index of the defense values to set to zero is selected randomly):

attack values: [0,0,0,0,0,0,0,0,0,0]
defense values: [2,2,0,2,2,2,2,2,2,2]
det: 2

The environment has sparse rewards (+1,-1 rewards are given at the terminal state of each episode) The environment is partially observed (attacker can only see attack attributes of neighboring nodes, defender can only see defense attributes)

Training Experiments

Experiments where one or two of the agents are using some learning algorithm to update their policy.

Random Defense

Experiments in the random_defense-v0 environment. An environment where the defender is following a random defense policy.

  • tabular_q_learning_vs_random_defense-v0

    • This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the random defender.
  • dqn_vs_random_defense-v0

    • This experiment trains an attacker agent using DQN to act optimally in the given environment and defeat the random defender.

Minimal Defense

Experiments in the minimal_defense-v0 environment.
An environment where the defender is following the defend_minimal defense policy. The defend_minimal policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.

  • tabular_q_learning_vs_minimal_defense-v0

    • This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the defender.
  • dqn_vs_minimal_defense-v0

    • This experiment trains an attacker agent using DQN to act optimally in the given environment and defeat the random defender.

Random Attack

This is an experiment in the random_attack-v0 environment.
An environment where the attack is following a random attack policy.

Maximal Attack

Experiments in the maximal_attack-v0 environment. An environment where the attack is following the attack_maximal attack policy. The attack_maximal policy entails that the attacker will always attack the attribute with the maximum value out of all of its neighbors.

Two Agents

Experiments in the idsgame-v0 environment. An environment where neither the attacker nor defender is part of the environment, i.e. it is intended for 2-agent simulations or RL training.

Simulation Experiments

Experiments with pre-defined policies (no training)

Two Agents

Experiments in the idsgame-v0 environment.
An environment where neither the attacker nor defender is part of the environment, i.e. it is intended for 2-agent simulations or RL training.

  • random_vs_random-v0

    • In this experiment, the attacker is implemented with a random attack policy. Similarly, the defender is implemented with a random defense policy.
  • random_vs_defend_minimal-v0

    • In this experiment, the attacker is implemented with a random attack policy. The defender is implemented with the policy defend_minimal. The defend_minimal policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.
  • attack_maximal_vs_defend_minimal-v0

    • In this experiment, the attacker is implemented with the policy attack_maximal. The attack_maximal policy entails that the attacker will always attack the attribute with the maximum value out of all of its neighbors. Similarly, the defender is implemented with the policy defend_minimal. The defend_minimal policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.
  • attack_maximal_vs_random-v0

    • In this experiment, the attacker is implemented with the policy attack_maximal. The attack_maximal policy entails that the attacker will always attack the attribute with the maximum value out of all of its neighbors. The defender is implemented with a random defense policy.
  • tabular_q_agent_vs_random-v0

    • In this experiment, the attacker is implemented with a greedy policy based on a save Q-table.The defender is implemented with a random defense policy.
  • random_vs_tabular_q_agent-v0

    • In this experiment, the defender is implemented with a greedy policy based on a save Q-table. The attacker is implemented with a random attack policy.

Experiments in Version 1 Environments

Experiment in version 1 environments. That is, evironments with the following network topology:

                   Start
		     |
		     |
		     v
		   Server
		     |
		     |
		     v
		   Data

Nodes are initialized with the following state (index of the defense values to set to zero is selected randomly):

attack values: [0,0,0,0,0,0,0,0,0,0]
defense values: [4,0,0,4,4,0,4,4,0,4]
det: 3

The environment has sparse rewards (+1,-1 rewards are given at the terminal state of each episode) The environment is partially observed (attacker can only see attack attributes of neighboring nodes, defender can only see defense attributes)

Training Experiments

Experiments where one or two of the agents are using some learning algorithm to update their policy.

Random Defense

Experiments in the random_defense-v1 environment. An environment where the defender is following a random defense policy.

  • tabular_q_learning_vs_random_defense-v1

    • This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the random defender.
  • dqn_vs_random_defense-v1

    • This experiment trains an attacker agent using DQN to act optimally in the given environment and defeat the random defender.

Minimal Defense

This is an experiment in the minimal_defense-v1 environment.
An environment where the defender is following the defend_minimal defense policy. The defend_minimal policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.

  • tabular_q_learning_vs_minimal_defense-v1

    • This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the defender.
  • dqn_vs_minimal_defense-v1

    • This experiment trains an attacker agent using DQN to act optimally in the given environment and defeat the random defender.

Random Attack

This is an experiment in the random_attack-v1 environment.
An environment where the attack is following a random attack policy.

Maximal Attack

This is an experiment in the maximal_attack-v1 environment. An environment where the attack is following the attack_maximal attack policy. The attack_maximal policy entails that the attacker will always attack the attribute with the maximum value out of all of its neighbors.

Two Agents

Experiments in the idsgame-v1 environment. An environment where neither the attacker nor defender is part of the environment, i.e. it is intended for 2-agent simulations or RL training.

Simulation Experiments

Experiments with pre-defined policies (no training)

Two Agents

Experiments in the idsgame-v1 environment.
An environment where neither the attacker nor defender is part of the environment, i.e. it is intended for 2-agent simulations or RL training.

  • random_vs_random-v1

    • In this experiment, the attacker is implemented with a random attack policy. Similarly, the defender is implemented with a random defense policy.
  • random_vs_defend_minimal-v1

    • In this experiment, the attacker is implemented with a random attack policy. The defender is implemented with the policy defend_minimal. The defend_minimal policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.
  • attack_maximal_vs_defend_minimal-v1

    • In this experiment, the attacker is implemented with the policy attack_maximal. The attack_maximal policy entails that the attacker will always attack the attribute with the maximum value out of all of its neighbors. Similarly, the defender is implemented with the policy defend_minimal. The defend_minimal policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.
  • attack_maximal_vs_random-v1

    • In this experiment, the attacker is implemented with the policy attack_maximal. The attack_maximal policy entails that the attacker will always attack the attribute with the maximum value out of all of its neighbors. The defender is implemented with a random defense policy.

Experiments in Version 2 Environments

Experiment in version 2 environments. That is, evironments with the following network topology:

				 Start
				   |
			  +--------+--------+
			  |		    |
			Server            Server
			  |		    |
			  +--------+--------+
				   |
				  Data

This is the standard network from Elderman et al. Nodes are initialized with the following state (index of the defense values to set to zero is selected randomly):

attack values: [0,0,0,0,0,0,0,0,0,0]
defense values: [2,2,0,2,2,2,2,2,2,2]
det: 2

The environment has sparse rewards (+1,-1 rewards are given at the terminal state of each episode). The environment is partially observed (attacker can only see attack attributes of neighboring nodes, defender can only see defense attributes)

Training Experiments

Experiments where one or two of the agents are using some learning algorithm to update their policy.

Random Defense

This is an experiment in the random_defense-v2 environment. An environment where the defender is following a random defense policy.

  • tabular_q_learning_vs_random_defense-v2

    • This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the random defender.
  • dqn_vs_random_defense-v2

    • This experiment trains an attacker agent using DQN to act optimally in the given environment and defeat the random defender.

Minimal Defense

This is an experiment in the minimal_defense-v2 environment.
An environment where the defender is following the defend_minimal defense policy. The defend_minimal policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.

  • tabular_q_learning_vs_minimal_defense-v2

    • This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the defender.
  • dqn_vs_minimal_defense-v2

    • This experiment trains an attacker agent using DQN to act optimally in the given environment and defeat the random defender.

Random Attack

This is an experiment in the random_attack-v2 environment.
An environment where the attack is following a random attack policy.

Maximal Attack

This is an experiment in the maximal_attack-v2 environment. An environment where the attack is following the attack_maximal attack policy. The attack_maximal policy entails that the attacker will always attack the attribute with the maximum value out of all of its neighbors.

Two Agents

This is an experiment in the idsgame-v2 environment. An environment where neither the attacker nor defender is part of the environment, i.e. it is intended for 2-agent simulations or RL training.

Simulation Experiments

Experiments with pre-defined policies (no training)

Two Agents

Experiments in the idsgame-v2 environment.
An environment where neither the attacker nor defender is part of the environment, i.e. it is intended for 2-agent simulations or RL training.

  • random_vs_random-v2

    • In this experiment, the attacker is implemented with a random attack policy. Similarly, the defender is implemented with a random defense policy.
  • random_vs_defend_minimal-v2

    • In this experiment, the attacker is implemented with a random attack policy. The defender is implemented with the policy defend_minimal. The defend_minimal policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.
  • attack_maximal_vs_defend_minimal-v2

    • In this experiment, the attacker is implemented with the policy attack_maximal. The attack_maximal policy entails that the attacker will always attack the attribute with the maximum value out of all of its neighbors. Similarly, the defender is implemented with the policy defend_minimal. The defend_minimal policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.
  • attack_maximal_vs_random-v2

    • In this experiment, the attacker is implemented with the policy attack_maximal. The attack_maximal policy entails that the attacker will always attack the attribute with the maximum value out of all of its neighbors. The defender is implemented with a random defense policy.

Experiments in Version 3 Environments

Experiment in version 3 environments. That is, evironments with the following network topology:

				 Start
				  |
				  |
		       +-------------------+
		       |	  |	   |
		       v	  v	   v
		     Server     Server   Server
		       |	  |	   |
		       |	  |	   |
		       v	  v	   v
		     Server  	Server   Server
		       |	  |	   |
		       |	  |	   |
		       +----------+--------+
				  |
				  v
				 Data

Nodes are initialized with the following state (index of the defense values to set to zero is selected randomly):

attack values: [0,0,0,0,0,0,0,0,0,0]
defense values: [2,2,0,2,2,2,2,2,2,2]
det: 2

The environment has sparse rewards (+1,-1 rewards are given at the terminal state of each episode). The environment is partially observed (attacker can only see attack attributes of neighboring nodes, defender can only see defense attributes)

Training Experiments

Experiments where one or two of the agents are using some learning algorithm to update their policy.

Random Defense

This is an experiment in the random_defense-v3 environment. An environment where the defender is following a random defense policy.

  • tabular_q_learning_vs_random_defense-v3

    • This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the random defender.
  • dqn_vs_random_defense-v3

    • This experiment trains an attacker agent using DQN to act optimally in the given environment and defeat the random defender.

Minimal Defense

This is an experiment in the minimal_defense-v3 environment.
An environment where the defender is following the defend_minimal defense policy. The defend_minimal policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.

  • tabular_q_learning_vs_minimal_defense-v3

    • This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the defender.
  • dqn_vs_minimal_defense-v3

    • This experiment trains an attacker agent using DQN to act optimally in the given environment and defeat the random defender.

Random Attack

Experiments in the random_attack-v3 environment.
An environment where the attack is following a random attack policy.

Maximal Attack

Experiments in the maximal_attack-v3 environment. An environment where the attack is following the attack_maximal attack policy. The attack_maximal policy entails that the attacker will always attack the attribute with the maximum value out of all of its neighbors.

Two Agents

Experiments in the idsgame-v3 environment. An environment where neither the attacker nor defender is part of the environment, i.e. it is intended for 2-agent simulations or RL training.

Simulation Experiments

Experiments with pre-defined policies (no training)

Two Agents

Experiments in the idsgame-v3 environment.
An environment where neither the attacker nor defender is part of the environment, i.e. it is intended for 2-agent simulations or RL training.

  • random_vs_random-v3

    • In this experiment, the attacker is implemented with a random attack policy. Similarly, the defender is implemented with a random defense policy.
  • random_vs_defend_minimal-v3

    • In this experiment, the attacker is implemented with a random attack policy. The defender is implemented with the policy defend_minimal. The defend_minimal policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.
  • attack_maximal_vs_defend_minimal-v3

    • In this experiment, the attacker is implemented with the policy attack_maximal. The attack_maximal policy entails that the attacker will always attack the attribute with the maximum value out of all of its neighbors. Similarly, the defender is implemented with the policy defend_minimal. The defend_minimal policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.
  • attack_maximal_vs_random-v3

    • In this experiment, the attacker is implemented with the policy attack_maximal. The attack_maximal policy entails that the attacker will always attack the attribute with the maximum value out of all of its neighbors. The defender is implemented with a random defense policy.
  • tabular_q_agent_vs_tabular_q_agent-v3

    • In this experiment, both the attacker and defender are implemented with a greedy policy based on saved Q-table from previous Q-training.

Experiments in Version 4 Environments

Experiment in version 4 environments. That is, evironments with the following network topology:


												 Start
												   |
							                          		   |
				       +-----------------------------+-----------------------------+-------------------------+-------------------------+
				       | 			     |			           |			     |			       |
				       | 			     |				   |			     |			       |
				       v			     v				   v			     v			       v
				     Server			   Server     			 Server      		   Server    		     Server
				       |			     |				   |			     |			       |
				       |			     |				   |			     |			       |
				       |			     |				   |			     |			       |
				       v			     v				   v			     v			       v
				     Server			   Server			 Server		           Server		     Server
				       |			     |				   |			     |			       |
				       |			     |				   |			     |			       |
				       |			     |				   |			     |			       |
				       v			     v				   v			     v			       v
				     Server			   Server			 Server		           Server		     Server
				       |			     |				   |			     |			       |
				       |			     |				   |			     |			       |
				       |			     |				   |			     |			       |
				       v			     v				   v			     v			       v
				     Server			   Server			 Server 		   Server		     Server
				       |			     |				   |			     |			       |
				       |			     |				   |			     |			       |
				       +-----------------------------+-----------------------------+-------------------------+-------------------------+
				       								   |
												   |
												   v
												  Data

Nodes are initialized with the following state (index of the defense values to set to zero is selected randomly):

attack values: [0,0,0,0,0,0,0,0,0,0]
defense values: [2,2,0,2,2,2,2,2,2,2]
det: 2

The environment has sparse rewards (+1,-1 rewards are given at the terminal state of each episode). The environment is partially observed (attacker can only see attack attributes of neighboring nodes, defender can only see defense attributes)

Training Experiments

Experiments where one or two of the agents are using some learning algorithm to update their policy.

Random Defense

Experiments in the random_defense-v4 environment. An environment where the defender is following a random defense policy.

  • tabular_q_learning_vs_random_defense-v4

    • This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the random defender.
  • dqn_vs_random_defense-v4

    • This experiment trains an attacker agent using DQN to act optimally in the given environment and defeat the random defender.

Minimal Defense

Experiments in the minimal_defense-v4 environment.
An environment where the defender is following the defend_minimal defense policy. The defend_minimal policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.

  • tabular_q_learning_vs_minimal_defense-v4

    • This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the defender.
  • dqn_vs_minimal_defense-v4

    • This experiment trains an attacker agent using DQN to act optimally in the given environment and defeat the random defender.

Random Attack

Experiments in the random_attack-v4 environment.
An environment where the attack is following a random attack policy.

Maximal Attack

Experiments in the maximal_attack-v4 environment. An environment where the attack is following the attack_maximal attack policy.

  • maximal_attack_vs_tabular_q_learning-v4
    • The attack_maximal policy entails that the attacker will always attack the attribute with the maximum value out of all of its neighbors. The defender is implemented with a random defense policy. This experiment trains a defender agent using tabular q-learning to act optimally in the given environment and detect the attacker.

Two Agents

Experiments in the idsgame-v4 environment. An environment where neither the attacker nor defender is part of the environment, i.e. it is intended for 2-agent simulations or RL training.

Simulation Experiments

Experiments with pre-defined policies (no training)

Two Agents

Experiments in the idsgame-v4 environment.
An environment where neither the attacker nor defender is part of the environment, i.e. it is intended for 2-agent simulations or RL training.

  • random_vs_random-v4

    • In this experiment, the attacker is implemented with a random attack policy. Similarly, the defender is implemented with a random defense policy.
  • random_vs_defend_minimal-v4

    • In this experiment, the attacker is implemented with a random attack policy. The defender is implemented with the policy defend_minimal. The defend_minimal policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.
  • attack_maximal_vs_defend_minimal-v4

    • In this experiment, the attacker is implemented with the policy attack_maximal. The attack_maximal policy entails that the attacker will always attack the attribute with the maximum value out of all of its neighbors. Similarly, the defender is implemented with the policy defend_minimal. The defend_minimal policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.
  • attack_maximal_vs_random-v4

    • In this experiment, the attacker is implemented with the policy attack_maximal. The attack_maximal policy entails that the attacker will always attack the attribute with the maximum value out of all of its neighbors. The defender is implemented with a random defense policy.
  • tabular_q_agent_vs_defend_minimal-v4

    • In this experiment, the attacker is implemented with a greedy policy based on a save Q-table. The defender is implemented with the policy defend_minimal. The defend_minimal policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.

Experiments in Version 5 Environments

Experiment in version 5 environments. That is, evironments with the following network topology:


												 Start
												   |
							                          		   |
				       +-----------------------------+-----------------------------+-------------------------+-------------------------+
				       | 			     |			           |			     |			       |
				       | 			     |				   |			     |			       |
				       v			     v				   v			     v			       v
				     Server------------------------Server------------------------Server--------------------Server--------------------Server
				       |                	     |		             	   |			     |			       |
				       |			     |				   |			     |			       |
				       |			     |				   |			     |			       |
				       v			     v				   v			     v			       v
				     Server------------------------Server------------------------Server--------------------Server--------------------Server
				       |	         	     |		                   |	        	     |			       |
				       |			     |				   |			     |			       |
				       |			     |				   |			     |			       |
				       v			     v				   v			     v			       v
				     Server------------------------Server------------------------Server--------------------Server--------------------Server
				       |	         	     |				   |	                     |			       |
				       |			     |				   |			     |			       |
				       |			     |				   |			     |			       |
				       v			     v				   v			     v			       v
				     Server------------------------Server------------------------Server--------------------Server--------------------Server
				       |			     |				   |			     |			       |
				       |			     |				   |			     |			       |
				       +-----------------------------+-----------------------------+-------------------------+-------------------------+
				       								   |
												   |
												   v
												  Data

Nodes are initialized with the following state (index of the defense values to set to zero is selected randomly):

attack values: [0,0,0,0,0,0,0,0,0,0]
defense values: [2,2,0,2,2,2,2,2,2,2]
det: 2

Moreover, only two nodes per layer has a vulnerability (defense value set to 0, all other nodes in the layer have defense value initialized to 2 on all attributes)

The environment has sparse rewards (+1,-1 rewards are given at the terminal state of each episode). The environment is partially observed (attacker can only see attack attributes of neighboring nodes, defender can only see defense attributes)

Training Experiments

Experiments where one or two of the agents are using some learning algorithm to update their policy.

Random Defense

Experiments in the random_defense-v5 environment. An environment where the defender is following a random defense policy.

  • tabular_q_learning_vs_random_defense-v5

    • This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the random defender.
  • dqn_vs_random_defense-v5

    • This experiment trains an attacker agent using DQN to act optimally in the given environment and defeat the random defender.

Minimal Defense

Experiments in the minimal_defense-v5 environment.
An environment where the defender is following the defend_minimal defense policy. The defend_minimal policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.

Random Attack

Experiments in the random_attack-v5 environment.
An environment where the attack is following a random attack policy.

Maximal Attack

This is an experiment in the maximal_attack-v5 environment. An environment where the attack is following the attack_maximal attack policy.

  • maximal_attack_vs_tabular_q_learning-v5
    • The attack_maximal policy entails that the attacker will always attack the attribute with the maximum value out of all of its neighbors. The defender is implemented with a random defense policy. This experiment trains a defender agent using tabular q-learning to act optimally in the given environment and detect the attacker.

Two Agents

This is an experiment in the idsgame-v5 environment. An environment where neither the attacker nor defender is part of the environment, i.e. it is intended for 2-agent simulations or RL training.

Simulation Experiments

Experiments with pre-defined policies (no training)

Two Agents

This is an experiment in the idsgame-v5 environment.
An environment where neither the attacker nor defender is part of the environment, i.e. it is intended for 2-agent simulations or RL training.

  • random_vs_random-v5

    • In this experiment, the attacker is implemented with a random attack policy. Similarly, the defender is implemented with a random defense policy.
  • random_vs_defend_minimal-v5

    • In this experiment, the attacker is implemented with a random attack policy. The defender is implemented with the policy defend_minimal. The defend_minimal policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.
  • attack_maximal_vs_defend_minimal-v5

    • In this experiment, the attacker is implemented with the policy attack_maximal. The attack_maximal policy entails that the attacker will always attack the attribute with the maximum value out of all of its neighbors. Similarly, the defender is implemented with the policy defend_minimal. The defend_minimal policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.
  • attack_maximal_vs_random-v5

    • In this experiment, the attacker is implemented with the policy attack_maximal. The attack_maximal policy entails that the attacker will always attack the attribute with the maximum value out of all of its neighbors. The defender is implemented with a random defense policy.

Experiments in Version 7 Environments

Experiment in version 7 environments. That is, evironments with the following network topology:

				 Start
				  |
				  |
		       +-------------------+
		       |	  |	   |
		       v	  v	   v
		     Server     Server   Server
		       |	  |	   |
		       |	  |	   |
		       v	  v	   v
		     Server  	Server   Server
		       |	  |	   |
		       |	  |	   |
		       +----------+--------+
				  |
				  v
				 Data

Nodes are initialized with the following state (index of the defense values to set to zero is selected randomly):

attack values: [0,0,0,0,0,0,0,0,0,0]
defense values: [2,2,0,2,2,2,2,2,2,2]
det: 2

The environment has dense rewards (+1,-1 given whenever the attacker reaches a new level in the network) The environment is partially observed (attacker can only see attack attributes of neighboring nodes, defender can only see defense attributes).

Training Experiments

Experiments where one or two of the agents are using some learning algorithm to update their policy.

Random Defense

This is an experiment in the random_defense-v7 environment. An environment where the defender is following a random defense policy.

  • tabular_q_learning_vs_random_defense-v7

    • This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the random defender.
  • dqn_vs_random_defense-v7

    • This experiment trains an attacker agent using DQN to act optimally in the given environment and defeat the random defender.
  • reinforce_vs_random_defense-v7

    • This experiment trains an attacker agent using REINFORCE to act optimally in the given environment and defeat the random defender.
  • actor_critic_vs_random_defense-v7

    • This experiment trains an attacker agent using Actor-Critic to act optimally in the given environment and defeat the random defender.

Minimal Defense

This is an experiment in the minimal_defense-v7 environment.
An environment where the defender is following the defend_minimal defense policy. The defend_minimal policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.

  • tabular_q_learning_vs_minimal_defense-v7

    • This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the defender.
  • dqn_vs_minimal_defense-v7

    • This experiment trains an attacker agent using DQN to act optimally in the given environment and defeat the random defender.
  • reinforce_vs_minimal_defense-v7

    • This experiment trains an attacker agent using REINFORCE to act optimally in the given environment and defeat the random defender.
  • actor_critic_vs_minimal_defense-v7

    • This experiment trains an attacker agent using Actor-Critic to act optimally in the given environment and defeat the random defender.

Random Attack

Experiments in the random_attack-v7 environment.
An environment where the attack is following a random attack policy.

  • random_attack_vs_tabular_q_learning-v7

    • This experiment trains a defender agent using tabular q-learning to act optimally in the given environment and defeat the random attacker.
  • random_attack_vs_dqn-v7

    • This experiment trains a defender agent using DQN to act optimally in the given environment and defeat the random attacker.
  • random_attack_vs_reinforce-v7

    • This experiment trains a defender agent using REINFORCE to act optimally in the given environment and defeat the random attacker.
  • random_attack_vs_actor_critic-v7

    • This experiment trains a defender agent using Actor-Critic to act optimally in the given environment and defeat the random attacker.

Maximal Attack

Experiments in the maximal_attack-v7 environment. An environment where the attack is following the attack_maximal attack policy. The attack_maximal policy entails that the attacker will always attack the attribute with the maximum value out of all of its neighbors.

Two Agents

Experiments in the idsgame-v7 environment. An environment where neither the attacker nor defender is part of the environment, i.e. it is intended for 2-agent simulations or RL training.

  • tabular_q_learning_vs_tabular_q_learning-v7

    • This experiment trains both an attacker and a defender agent simultaneously against each other using tabular q-learning.
  • dqn_vs_dqn-v7

    • This experiment trains both an attacker and a defender agent simultaneously against each other using DQN.
  • reinforce_vs_reinforce-v7

    • This experiment trains both an attacker and a defender agent simultaneously against each other using REINFORCE.
  • actor_critic_vs_actor_critic-v7

    • This experiment trains both an attacker and a defender agent simultaneously against each other using Actor-Critic.

Experiments in Version 8 Environments

Experiment in version 8 environments. That is, evironments with the following network topology:

                   Start
		     |
		     |
		     v
		   Server
		     |
		     |
		     v
		   Data

Nodes are initialized with the following state (index of the defense values to set to zero is selected randomly):

attack values: [0,0,0,0,0,0,0,0,0,0]
defense values: [2,2,0,2,2,2,2,2,2,2]
det: 2

The environment has dense rewards (+1,-1 given whenever the attacker reaches a new level in the network) The environment is partially observed (attacker can only see attack attributes of neighboring nodes, defender can only see defense attributes).

Training Experiments

Experiments where one or two of the agents are using some learning algorithm to update their policy.

Random Defense

Experiments in the random_defense-v8 environment. An environment where the defender is following a random defense policy.

  • tabular_q_learning_vs_random_defense-v8

    • This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the random defender.
  • dqn_vs_random_defense-v8

    • This experiment trains an attacker agent using DQN to act optimally in the given environment and defeat the random defender.
  • reinforce_vs_random_defense-v8

    • This experiment trains an attacker agent using REINFORCE to act optimally in the given environment and defeat the random defender.
  • actor_critic_vs_random_defense-v8

    • This experiment trains an attacker agent using Actor-Critic to act optimally in the given environment and defeat the random defender.

Minimal Defense

Experiments in the minimal_defense-v8 environment.
An environment where the defender is following the defend_minimal defense policy. The defend_minimal policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.

  • tabular_q_learning_vs_minimal_defense-v8

    • This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the defender.
  • dqn_vs_minimal_defense-v8

    • This experiment trains an attacker agent using DQN to act optimally in the given environment and defeat the random defender.
  • reinforce_vs_minimal_defense-v8

    • This experiment trains an attacker agent using REINFORCE to act optimally in the given environment and defeat the random defender.
  • actor_critic_vs_minimal_defense-v8

    • This experiment trains an attacker agent using Actor-Critic to act optimally in the given environment and defeat the random defender.

Random Attack

This is an experiment in the random_attack-v8 environment.
An environment where the attack is following a random attack policy.

  • random_attack_vs_tabular_q_learning-v8

    • This experiment trains a defender agent using tabular q-learning to act optimally in the given environment and defeat the random attacker.
  • random_attack_vs_dqn-v8

    • This experiment trains a defender agent using DQN to act optimally in the given environment and defeat the random attacker.
  • random_attack_vs_reinforce-v8

    • This experiment trains a defender agent using REINFORCE to act optimally in the given environment and defeat the random attacker.
  • random_attack_vs_actor_critic-v8

    • This experiment trains a defender agent using Actor-Critic to act optimally in the given environment and defeat the random attacker.

Maximal Attack

Experiments in the maximal_attack-v8 environment. An environment where the attack is following the attack_maximal attack policy. The attack_maximal policy entails that the attacker will always attack the attribute with the maximum value out of all of its neighbors.

Two Agents

Experiments in the idsgame-v8 environment. An environment where neither the attacker nor defender is part of the environment, i.e. it is intended for 2-agent simulations or RL training.

  • tabular_q_learning_vs_tabular_q_learning-v8

    • This experiment trains both an attacker and a defender agent simultaneously against each other using tabular q-learning.
  • dqn_vs_dqn-v8

    • This experiment trains both an attacker and a defender agent simultaneously against each other using DQN.
  • reinforce_vs_reinforce-v8

    • This experiment trains both an attacker and a defender agent simultaneously against each other using REINFORCE.
  • actor_critic_vs_actor_critic-v8

    • This experiment trains both an attacker and a defender agent simultaneously against each other using advantage actor-critic.

Experiments in Version 9 Environments

Experiment in version 9 environments. That is, evironments with the following network topology:

				 Start
				   |
			  +--------+--------+
			  |		    |
			Server            Server
			  |		    |
			  +--------+--------+
				   |
				  Data

This is the standard network from Elderman et al. Nodes are initialized with the following state (index of the defense values to set to zero is selected randomly):

attack values: [0,0,0,0,0,0,0,0,0,0]
defense values: [2,2,0,2,2,2,2,2,2,2]
det: 2

The environment has dense rewards (+1,-1 given whenever the attacker reaches a new level in the network) The environment is partially observed (attacker can only see attack attributes of neighboring nodes, defender can only see defense attributes).

Training Experiments

Experiments where one or two of the agents are using some learning algorithm to update their policy.

Random Defense

This is an experiment in the random_defense-v9 environment. An environment where the defender is following a random defense policy.

  • tabular_q_learning_vs_random_defense-v9

    • This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the random defender.
  • dqn_vs_random_defense-v9

    • This experiment trains an attacker agent using DQN to act optimally in the given environment and defeat the random defender.
  • reinforce_vs_random_defense-v9

    • This experiment trains an attacker agent using REINFORCE to act optimally in the given environment and defeat the random defender.
  • actor_critic_vs_random_defense-v9

    • This experiment trains an attacker agent using Actor-Critic to act optimally in the given environment and defeat the random defender.

Minimal Defense

This is an experiment in the minimal_defense-v9 environment.
An environment where the defender is following the defend_minimal defense policy. The defend_minimal policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.

  • tabular_q_learning_vs_minimal_defense-v9

    • This experiment trains an attacker agent using tabular q-learning to act optimally in the given environment and defeat the defender.
  • dqn_vs_minimal_defense-v9

    • This experiment trains an attacker agent using DQN to act optimally in the given environment and defeat the random defender.
  • reinforce_vs_minimal_defense-v9

    • This experiment trains an attacker agent using REINFORCE to act optimally in the given environment and defeat the random defender.
  • actor_critic_vs_minimal_defense-v9

    • This experiment trains an attacker agent using Actor-Critic to act optimally in the given environment and defeat the random defender.

Random Attack

This is an experiment in the random_attack-v9 environment.
An environment where the attack is following a random attack policy.

  • random_attack_vs_tabular_q_learning-v9

    • This experiment trains a defender agent using tabular q-learning to act optimally in the given environment and defeat the random attacker.
  • random_attack_vs_dqn-v9

    • This experiment trains a defender agent using dqn to act optimally in the given environment and defeat the random attacker.
  • random_attack_vs_reinforce-v9

    • This experiment trains a defender agent using REINFORCE to act optimally in the given environment and defeat the random attacker
  • random_attack_vs_actor_critic-v9

    • This experiment trains a defender agent using Actor-Critic to act optimally in the given environment and defeat the random attacker

Maximal Attack

This is an experiment in the maximal_attack-v9 environment. An environment where the attack is following the attack_maximal attack policy. The attack_maximal policy entails that the attacker will always attack the attribute with the maximum value out of all of its neighbors.

Two Agents

This is an experiment in the idsgame-v9 environment. An environment where neither the attacker nor defender is part of the environment, i.e. it is intended for 2-agent simulations or RL training.

  • tabular_q_learning_vs_tabular_q_learning-v9

    • This experiment trains both an attacker and a defender agent simultaneously against each other using tabular q-learning.
  • dqn_vs_dqn-v9

    • This experiment trains both an attacker and a defender agent simultaneously against each other using DQN.
  • reinforce_vs_reinforce-v9

    • This experiment trains both an attacker and a defender agent simultaneously against each other using REINFORCE.
  • actor_critic_vs_actor_critic-v9

    • This experiment trains both an attacker and a defender agent simultaneously against each other using Actor-Critic.

Experiments in Version 10 Environments

Experiment in version 10 environments. That is, evironments with the following network topology:

                   Start
		     |
		     |
		     v
		   Server
		     |
		     |
		     v
		   Data

Nodes are initialized with the following state (index of the defense values to set to zero is selected randomly):

attack values: [0,0,0,0,0,0,0,0,0,0]
defense values: [2,2,0,2,2,2,2,2,2,2]
det: 2

The environment has dense rewards (+1,-1 given whenever the attacker reaches a new level in the network)

The environment is fully observed for both the defender and attacker

Training Experiments

Experiments where one or two of the agents are using some learning algorithm to update their policy.

Minimal Defense

Experiments in the minimal_defense-v10 environment.
An environment where the defender is following the defend_minimal defense policy. The defend_minimal policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.

Two Agents

Experiments in the idsgame-v10 environment. An environment where neither the attacker nor defender is part of the environment, i.e. it is intended for 2-agent simulations or RL training.

Experiments in Version 11 Environments

Experiment in version 11 environments. That is, evironments with the following network topology:

                   Start
		     |
		     |
		     v
		   Server
		     |
		     |
		     v
		   Data

Nodes are initialized with the following state (index of the defense values to set to zero is selected randomly):

attack values: [0,0]
defense values: [0,0]
det: 1

The environment has dense rewards (+1,-1 given whenever the attacker reaches a new level in the network)

The environment is fully observed for both the defender and attacker

Training Experiments

Experiments where one or two of the agents are using some learning algorithm to update their policy.

Minimal Defense

Experiments in the minimal_defense-v11 environment.
An environment where the defender is following the defend_minimal defense policy. The defend_minimal policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.

Two Agents

Experiments in the idsgame-v11 environment. An environment where neither the attacker nor defender is part of the environment, i.e. it is intended for 2-agent simulations or RL training.

Experiments in Version 12 Environments

Experiment in version 12 environments. That is, evironments with the following network topology:

                   Start
		     |
		     |
		     v
		   Server
		     |
		     |
		     v
		   Data

Nodes are initialized with the following state (index of the defense values to set to zero is selected randomly):

attack values: [(0-1) randomized, (0-1) randomized]
defense values: [(0-1) randomized, (0-1) randomized]
det: (0-1) randomized

The environment has dense rewards (+1,-1 given whenever the attacker reaches a new level in the network)

The environment is fully observed for both the defender and attacker

Training Experiments

Experiments where one or two of the agents are using some learning algorithm to update their policy.

Minimal Defense

Experiments in the minimal_defense-v12 environment.
An environment where the defender is following the defend_minimal defense policy. The defend_minimal policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.

Two Agents

Experiments in the idsgame-v12 environment. An environment where neither the attacker nor defender is part of the environment, i.e. it is intended for 2-agent simulations or RL training.

Experiments in Version 13 Environments

Experiment in version 13 environments. That is, evironments with the following network topology:

                   Start
		     |
		     |
		     v
		   Server
		     |
		     |
		     v
		   Data

Nodes are initialized with the following state (index of the defense values to set to zero is selected randomly):

attack values: [0,0]
defense values: [0,0]
det: 10

The environment has dense rewards (+1,-1 given whenever the attacker reaches a new level in the network)

The environment is fully observed for both the defender and attacker

Training Experiments

Experiments where one or two of the agents are using some learning algorithm to update their policy.

Minimal Defense

Experiments in the minimal_defense-v13 environment.
An environment where the defender is following the defend_minimal defense policy. The defend_minimal policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.

Two Agents

Experiments in the idsgame-v13 environment. An environment where neither the attacker nor defender is part of the environment, i.e. it is intended for 2-agent simulations or RL training.