-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify MalSimulator and create wrappers for other interfaces #87
base: main
Are you sure you want to change the base?
Conversation
aebfcb8
to
753ce55
Compare
502b688
to
fe27e46
Compare
ce89c22
to
950fed4
Compare
malsim/sims/mal_simulator.py
Outdated
if (reprint_header != 0) and ((entry_nr + 1) % reprint_header == 0): | ||
formatted_str += header | ||
return formatted_str | ||
@dataclass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having this be a non-mutable NamedTuple would be safer IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dataclass can also become immutable with @dataclass(frozen=True)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, why do we want them to be immutable?
The agent states are (at least in the current implementation) stored in the simulator and potentially updated/changed each step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are they not also sent outside the simulator, as per the function signature of the searchers? Sending references to mutable state outside of the simulator seems dangerous to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point now. As you say the agent states are not only internal, but used by the decision agents/searchers. I agree that it feels dangerous. Maybe they should be immutable, but it also feels a bit weird to create new Agent objects in every step. Other option is to use something else in the searchers than those objects, and keep them internal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do:
@property
def state(self):
"""Immutable state"""
return tuple(self._mutable_state)
or sth similar.
But the current implementation is not safe in this manner in a thousand places. It is also very easy for someone to bypass these safety measures. If it is too cumbersome to create such a dynamic property (because converting the internal data structure - a dataclass in this case - to sth immutable is more involved), then it is not worth it. In this case, the simplest approach seems to be to convert the dataclass to a frozen one when reading the property. But I don't know the performance implications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, having the agent state object be mutable internally in the MalSimulator but return an immutable object when fetching it might be the easiest way. It would avoid the dangers and confusion of the agent states being changed outside of the simulator. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems like an appropriate solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added MalSimAgentView (a public interface for MalSimAgent). So every time an agent state is returned from MalSimulator it will be read-only.
The observations/infos in the SerializedObsEnv are now created without altering the MalSimAgent objects and instead stored in dicts.
Bonus explanation about the searcher agents:
The searchers still depend on action_mask (which comes from the SerializedObsEnv) which is not what I/we actually want in the end (we want the decision agents to work with the MalSimulator without serialization/indexing/action masks). The problem was that when I converted the searcher agents to use the agent state only they behaved a little bit different from before, so before I solve that I will let them stay that way.
EDIT: last paragraph is not relevant any more, the searchers are now 'converted'
malsim/agents/searchers.py
Outdated
@@ -81,7 +85,7 @@ def select_next_target( | |||
return current_target, False | |||
|
|||
|
|||
class DepthFirstAttacker: | |||
class DepthFirstAttacker(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the DepthFirstAttacker also inherit DecisionAgent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thank you. I just pushed quite a big change to the searcher agents (they will use attackgraph nodes instead of the serialized action masks) where I fixed that as well.
malsim/agents/decision_agent.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we consolidated the design of the agents taking into account my PR here too? Otherwise, if you plan to merge that too, it does not make sense to have them redesigned here only to be redesigned again later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had not seen your PR.
In my opinion the agents should not depend on the obs/action mask. Partly because it is not needed and partly because the obs/action mask will be specific to the MalSimSerializedEnv/ParalellEnv and not a part of the base MalSimulator (as per this PR).
The searcher agents in this PR only depend on the action surface of an agent (a list of AttackGraphNodes). I am open to other ways and of simplifying the searcher agents further, but I do want to get rid of using the obs/action mask. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm all for removing these cryptic stuff. My comment was mostly about the design of the agents which seems unnecessarily convoluted in its current form. There is really no need for 4 5 methods just to pass things around or 5 variables to do what can be done with 2, etc.
Specifically to your changes here, is there an ambition behind the Decision agent? To me, as long as the interface of an agent is fixed and explicit (i.e. the init and the get-next-step-what-its-called-method) there is really no benefit in providing abstract classes or trying to standardize the internal structure of an agent class, especially when it is that little. Or are there requests for sth like that by others that make such things a necessity or sth?
My approach would be to take my PR (because I think it's great 😃) and just remove the mask etc, unless I'm missing sth bigger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think it doesn't offer much to have three files to keep what is basically 200 lines of code. The perceived benefit of having the agent module be a package where things are split using a conceptual separation does not add much in practice, it makes importing a tiny bit slower, requires 3 open files if someone is editing the package in an editor, etc, and more importantly is undone by exposing classes at the package level using __all__
in __init__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I am all for simplifying it into one or two method if it is possible. I also added two simple tests so that makes it safer to make those changes.
If there is an ambition behind using the abstract base class DecisionAgent?
The main reason is that it makes it clear for the user/developer that every class that implements it shares the same interface and has the same method name. I agree that it might be a bit overkill since there will probably just be one method.. but at the same time I don't think it hurts, especially when using a linter it gets clear that for example in cli.py we can use decision_agent.get_next_action(sim_agent_state)
even though we don't know what class decision_agent has (we type hint it with DecisionAgent
).
Maybe it is not super pythonic, and maybe there is some better option? Or do you think it adds no value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see the second comment before sending that.
I think having separated files makes sense in this case for future proofing instead of putting all agents in one class. We will probably add heuristics agents in there (agents that take decision based on the state of the attack graph) that could be in a separate file and we might add searcher agents which are specific for defenders as well, so it will grow.
But yeah, if it makes sense to have a base DecisionAgent is more questionable.
d1b0cf8
to
2fe03ed
Compare
Throwing this into the mix: # Author: Jakob Nyberg, 2025
from collections.abc import Callable
import logging
from typing import Any
import numpy as np
import numpy.typing as npt
logger = logging.getLogger(__name__)
Array = npt.NDArray[np.int32]
BoolArray = npt.NDArray[np.bool_]
def get_new_targets(
discovered_targets: npt.NDArray[np.int32],
mask: tuple[npt.NDArray[np.bool_], npt.NDArray[np.bool_]],
) -> tuple[npt.NDArray[np.int32], npt.NDArray[np.int32]]:
attack_surface = mask[1]
surface_indexes = np.flatnonzero(attack_surface)
new_targets = np.array(
[idx for idx in surface_indexes if idx not in discovered_targets],
dtype=np.int32,
)
return new_targets, surface_indexes
def move_target_to_back(
current_target: np.int32 | None,
targets: npt.NDArray[np.int32],
attack_surface: npt.NDArray[np.int32],
) -> tuple[npt.NDArray[np.int32], np.int32 | None]:
"""
If the current target was not compromised this turn, put it
on the bottom of the stack and focus on next target instead
"""
if not current_target:
return targets, current_target
if current_target in attack_surface:
targets = np.concatenate((current_target, targets[:-1]))
return targets, targets[-1]
return targets, current_target
def choose_target(
targets: npt.NDArray[np.int32],
attack_surface: npt.NDArray[np.int32],
) -> tuple[npt.NDArray[np.int32], np.int32, bool]:
# targets that have not been removed from the attack surface by another agent
valid_targets = np.array(
[t for t in targets if t in attack_surface], dtype=np.int32
)
if len(valid_targets) == 0:
return valid_targets, np.int32(0), True
return valid_targets[:-1], valid_targets[-1], False
def compute_action(
permute_func: Callable[[Array], Array],
action_func: Callable[
[np.int32 | None, Array, Array], tuple[Array, np.int32 | None, bool]
],
add_targets_func: Callable[[Array, Array], Array],
):
def _compute_action(
targets: Array,
mask: tuple[BoolArray, BoolArray],
current_target: np.int32 | None,
):
new_targets, surface_indexes = get_new_targets(targets, mask)
targets, current_target, done = action_func(
current_target,
add_targets_func(targets, permute_func(new_targets)),
surface_indexes,
)
action = 0 if done else 1
if action == 0:
logger.debug(
'Attacker agent does not have any valid targets and will terminate'
)
logger.debug(f'Attacker targets: {targets}')
logger.debug(f'Attacker current target: {current_target}')
logger.debug(f'Attacker action: {action}')
return targets, action, current_target
return _compute_action
def create_permute_func(seed: int | None, randomize: bool) -> Callable[[Array], Array]:
s = seed if seed else np.random.SeedSequence().entropy
rng = np.random.default_rng(s) if randomize else None
return rng.permutation if rng else lambda x: x
class BreadthFirstAttacker:
def __init__(self, agent_config: dict[str, Any]) -> None:
self.current_target: np.int32 | None = None
self.targets: npt.NDArray[np.int32] = np.array([], dtype=np.int32)
permute_func = create_permute_func(
agent_config.get('seed', None), agent_config.get('randomize', False)
)
self.compute_action = compute_action(
permute_func, self._action_func, self._add_new_targets_func
)
def __call__(
self,
_: dict[str, Any],
mask: tuple[npt.NDArray[np.bool_], npt.NDArray[np.bool_]],
):
self.targets, action, self.current_target = self.compute_action(
self.targets, mask, self.current_target
)
return (action, self.current_target)
@staticmethod
def _action_func(
current_target: np.int32 | None,
targets: Array,
surface_indexes: Array,
):
targets, current_target = move_target_to_back(
current_target, targets, surface_indexes
)
return choose_target(targets, surface_indexes)
@staticmethod
def _add_new_targets_func(targets: Array, new_targets: Array):
new_targets = np.flip(new_targets) # to comply with the original implementation
return np.concatenate([new_targets, targets])
class DepthFirstAttacker:
def __init__(self, agent_config: dict[str, Any]) -> None:
self.current_target: np.int32 | None = None
self.targets: npt.NDArray[np.int32] = np.array([], dtype=np.int32)
permute_func = create_permute_func(
agent_config.get('seed', None), agent_config.get('randomize', False)
)
self.compute_action = compute_action(
permute_func, self._action_func, self._add_new_targets_func
)
def __call__(
self,
_: dict[str, Any],
mask: tuple[npt.NDArray[np.bool_], npt.NDArray[np.bool_]],
):
self.targets, action, self.current_target = self.compute_action(
self.targets, mask, self.current_target
)
return (action, self.current_target)
@staticmethod
def _action_func(
current_target: np.int32 | None,
targets: Array,
surface_indexes: Array,
):
# keep working on a target unless it has been removed from the attack surface
return (
choose_target(targets, surface_indexes)
if current_target not in surface_indexes
else (targets, current_target, False)
)
@staticmethod
def _add_new_targets_func(targets: Array, new_targets: Array):
# add new targets to the front of the list, so that the agent works on the latest targets first
return np.concatenate([targets, new_targets]) |
malsim/sims/mal_sim_agent.py
Outdated
|
||
|
||
@dataclass | ||
class MalSimAgent: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think calling this MalSimAgent
is confusing. This is a container for agent-related info. A name like MalSimAgentState
or MalSimAgentState
seems more representative. The current name adds one more "agent" to the actual agent classes and the maltoolbox attacker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this confusion is very annoying, and I think you are right about just calling it MalSimAgentState instead. Will change that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
tests/sims/test_example_scenarios.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please explain how this test works? Why doesn't it use searcher agents? BFS is mentioned here but you seem to read the attackers from the toolbox?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't use any searcher agents because I am not testing the searcher agents, but just the mal simulator functionality. I perform the steps one at a time with steps that I already know are available to the defender/attacker (as per the model and the entrypoints). The scenario file will probably help to see the entrypoints.
Yes, all attacker agents in the simulators in the MalSimulator must reference an Attacker object. This is so the simulator can know the entrypoints, reached attack steps and perform 'compromise' on nodes. This is nothing new in my PR though.
bf769ab
to
1fa8373
Compare
… vectorized simulator MALSimulator handles: - agent registration - simulation step - action masks - state of attack graph But it does NOT handle the observation format/vectorization. Other changes: - Agents state are not dicts but instead MalSimAgent objects - Add vectorized separate simulator
- Instead use field 'agents' to define any number of agents in scenario settings
…n the simulator - MalSimAgent is now only internally used in the MalSimulator - Every time an agent state is returned from the simulator it is converted to a MalSimAgentView (read only)
- Don't alter MalSimAgentView objects - Instead store obs/infos in dicts that are returned on step/reset
… of code base - As a temporary solution to not change behaviour of searcher agents I decided to still use the action mask and not the agent state in them. - End goal is to not use that kwarg so I added TODO for that - Adapt the cli/envs/tests to work with the new searcher agents
…erentiate with malsimagent
… action mask - And adapt everywhere where the searcher decision agents are used (don't give it kwarg anymore, just MalSimAgentView)
d208cd3
to
45a1182
Compare
|
||
# action surface does not have a guaranteed order, | ||
# so for the agent to be deterministic we need to sort | ||
action_surface.sort(key=lambda n: n.id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the order of the action surface can not be guaranteed to be consistent (I personally think the simulator should handle such matters), then why typehint it as a list rather than a set?
This is a rather big redesign of the simulator.
MalSimulator is now very much simplified:
Things related to building up observations (specifically the ParallelEnv part that before was in the MalSimulator) has now been moved to a wrapper
malsim_vectorized_obs_env.py
(naming here is an open question).This wrapper builds up state step by step from the performed actions returned from MalSimulator.step(actions).
Logging for that wrapper was also factored into its own module (should it be?)
Added the class DecisionAgent (these are agents like BFS/DFS + keyboard + in the future more advanced heuristics).
This was to create a common interface for working with DecisionAgents.
All our DecisionAgents are still not done, they are still tailored towards the ParallelEnv/MalSim Vectorized Obs Env, but the goal is to have them work with the regular MalSimulator (on attack graphs state).
Any number of agents can now be specified in the scenario file. This is a cleaner solution in my opinion.
Let me know if something does not make sense.