Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify MalSimulator and create wrappers for other interfaces #87

Open
wants to merge 41 commits into
base: main
Choose a base branch
from

Conversation

mrkickling
Copy link
Collaborator

@mrkickling mrkickling commented Dec 17, 2024

This is a rather big redesign of the simulator.

MalSimulator is now very much simplified:

  • This creates a base for other wrappers (with their own observation formats) to use
  • It also allows to run the simulator more easily for other reasons than ML training
  • It works on AttackGraphs only
  • It stores agent states (Agent classes were added for this)
  • The MAL Simulator contains the state of the Attack Graph
  • step() takes AttackGraphNodes as actions and returns those that were performed (not the full state). Does this make sense?

Things related to building up observations (specifically the ParallelEnv part that before was in the MalSimulator) has now been moved to a wrapper malsim_vectorized_obs_env.py (naming here is an open question).
This wrapper builds up state step by step from the performed actions returned from MalSimulator.step(actions).
Logging for that wrapper was also factored into its own module (should it be?)

Added the class DecisionAgent (these are agents like BFS/DFS + keyboard + in the future more advanced heuristics).
This was to create a common interface for working with DecisionAgents.
All our DecisionAgents are still not done, they are still tailored towards the ParallelEnv/MalSim Vectorized Obs Env, but the goal is to have them work with the regular MalSimulator (on attack graphs state).

Any number of agents can now be specified in the scenario file. This is a cleaner solution in my opinion.

Let me know if something does not make sense.

@mrkickling mrkickling force-pushed the add-base-mal-simulator branch 3 times, most recently from aebfcb8 to 753ce55 Compare December 20, 2024 12:33
@mrkickling mrkickling force-pushed the add-base-mal-simulator branch 2 times, most recently from 502b688 to fe27e46 Compare January 15, 2025 13:52
@mrkickling mrkickling marked this pull request as ready for review January 15, 2025 14:18
@mrkickling mrkickling force-pushed the add-base-mal-simulator branch from ce89c22 to 950fed4 Compare January 15, 2025 14:22
@mrkickling mrkickling requested a review from andrewbwm January 15, 2025 14:25
@mrkickling mrkickling changed the title Add base mal simulator Simplify MalSimulator and create wrappers for other interfaces Jan 15, 2025
if (reprint_header != 0) and ((entry_nr + 1) % reprint_header == 0):
formatted_str += header
return formatted_str
@dataclass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having this be a non-mutable NamedTuple would be safer IMO

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dataclass can also become immutable with @dataclass(frozen=True).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, why do we want them to be immutable?
The agent states are (at least in the current implementation) stored in the simulator and potentially updated/changed each step.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are they not also sent outside the simulator, as per the function signature of the searchers? Sending references to mutable state outside of the simulator seems dangerous to me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point now. As you say the agent states are not only internal, but used by the decision agents/searchers. I agree that it feels dangerous. Maybe they should be immutable, but it also feels a bit weird to create new Agent objects in every step. Other option is to use something else in the searchers than those objects, and keep them internal.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do:

@property
def state(self):
    """Immutable state"""
    return tuple(self._mutable_state)

or sth similar.

But the current implementation is not safe in this manner in a thousand places. It is also very easy for someone to bypass these safety measures. If it is too cumbersome to create such a dynamic property (because converting the internal data structure - a dataclass in this case - to sth immutable is more involved), then it is not worth it. In this case, the simplest approach seems to be to convert the dataclass to a frozen one when reading the property. But I don't know the performance implications.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, having the agent state object be mutable internally in the MalSimulator but return an immutable object when fetching it might be the easiest way. It would avoid the dangers and confusion of the agent states being changed outside of the simulator. Does that make sense?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like an appropriate solution.

Copy link
Collaborator Author

@mrkickling mrkickling Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added MalSimAgentView (a public interface for MalSimAgent). So every time an agent state is returned from MalSimulator it will be read-only.

The observations/infos in the SerializedObsEnv are now created without altering the MalSimAgent objects and instead stored in dicts.

Bonus explanation about the searcher agents:
The searchers still depend on action_mask (which comes from the SerializedObsEnv) which is not what I/we actually want in the end (we want the decision agents to work with the MalSimulator without serialization/indexing/action masks). The problem was that when I converted the searcher agents to use the agent state only they behaved a little bit different from before, so before I solve that I will let them stay that way.

EDIT: last paragraph is not relevant any more, the searchers are now 'converted'

@mrkickling mrkickling requested a review from Hoclor January 15, 2025 14:45
@@ -81,7 +85,7 @@ def select_next_target(
return current_target, False


class DepthFirstAttacker:
class DepthFirstAttacker():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the DepthFirstAttacker also inherit DecisionAgent?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thank you. I just pushed quite a big change to the searcher agents (they will use attackgraph nodes instead of the serialized action masks) where I fixed that as well.

Copy link
Collaborator

@nkakouros nkakouros Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we consolidated the design of the agents taking into account my PR here too? Otherwise, if you plan to merge that too, it does not make sense to have them redesigned here only to be redesigned again later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had not seen your PR.

In my opinion the agents should not depend on the obs/action mask. Partly because it is not needed and partly because the obs/action mask will be specific to the MalSimSerializedEnv/ParalellEnv and not a part of the base MalSimulator (as per this PR).

The searcher agents in this PR only depend on the action surface of an agent (a list of AttackGraphNodes). I am open to other ways and of simplifying the searcher agents further, but I do want to get rid of using the obs/action mask. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm all for removing these cryptic stuff. My comment was mostly about the design of the agents which seems unnecessarily convoluted in its current form. There is really no need for 4 5 methods just to pass things around or 5 variables to do what can be done with 2, etc.

Specifically to your changes here, is there an ambition behind the Decision agent? To me, as long as the interface of an agent is fixed and explicit (i.e. the init and the get-next-step-what-its-called-method) there is really no benefit in providing abstract classes or trying to standardize the internal structure of an agent class, especially when it is that little. Or are there requests for sth like that by others that make such things a necessity or sth?

My approach would be to take my PR (because I think it's great 😃) and just remove the mask etc, unless I'm missing sth bigger.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think it doesn't offer much to have three files to keep what is basically 200 lines of code. The perceived benefit of having the agent module be a package where things are split using a conceptual separation does not add much in practice, it makes importing a tiny bit slower, requires 3 open files if someone is editing the package in an editor, etc, and more importantly is undone by exposing classes at the package level using __all__ in __init__.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I am all for simplifying it into one or two method if it is possible. I also added two simple tests so that makes it safer to make those changes.

If there is an ambition behind using the abstract base class DecisionAgent?
The main reason is that it makes it clear for the user/developer that every class that implements it shares the same interface and has the same method name. I agree that it might be a bit overkill since there will probably just be one method.. but at the same time I don't think it hurts, especially when using a linter it gets clear that for example in cli.py we can use decision_agent.get_next_action(sim_agent_state) even though we don't know what class decision_agent has (we type hint it with DecisionAgent).

Maybe it is not super pythonic, and maybe there is some better option? Or do you think it adds no value?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see the second comment before sending that.

I think having separated files makes sense in this case for future proofing instead of putting all agents in one class. We will probably add heuristics agents in there (agents that take decision based on the state of the attack graph) that could be in a separate file and we might add searcher agents which are specific for defenders as well, so it will grow.

But yeah, if it makes sense to have a base DecisionAgent is more questionable.

@mrkickling mrkickling force-pushed the add-base-mal-simulator branch from d1b0cf8 to 2fe03ed Compare January 17, 2025 15:07
@kasanari
Copy link
Collaborator

kasanari commented Jan 17, 2025

Throwing this into the mix:

# Author: Jakob Nyberg, 2025
from collections.abc import Callable
import logging


from typing import Any

import numpy as np
import numpy.typing as npt

logger = logging.getLogger(__name__)


Array = npt.NDArray[np.int32]
BoolArray = npt.NDArray[np.bool_]


def get_new_targets(
    discovered_targets: npt.NDArray[np.int32],
    mask: tuple[npt.NDArray[np.bool_], npt.NDArray[np.bool_]],
) -> tuple[npt.NDArray[np.int32], npt.NDArray[np.int32]]:
    attack_surface = mask[1]
    surface_indexes = np.flatnonzero(attack_surface)
    new_targets = np.array(
        [idx for idx in surface_indexes if idx not in discovered_targets],
        dtype=np.int32,
    )
    return new_targets, surface_indexes


def move_target_to_back(
    current_target: np.int32 | None,
    targets: npt.NDArray[np.int32],
    attack_surface: npt.NDArray[np.int32],
) -> tuple[npt.NDArray[np.int32], np.int32 | None]:
    """
    If the current target was not compromised this turn, put it
    on the bottom of the stack and focus on next target instead
    """

    if not current_target:
        return targets, current_target

    if current_target in attack_surface:
        targets = np.concatenate((current_target, targets[:-1]))
        return targets, targets[-1]

    return targets, current_target


def choose_target(
    targets: npt.NDArray[np.int32],
    attack_surface: npt.NDArray[np.int32],
) -> tuple[npt.NDArray[np.int32], np.int32, bool]:
    # targets that have not been removed from the attack surface by another agent
    valid_targets = np.array(
        [t for t in targets if t in attack_surface], dtype=np.int32
    )

    if len(valid_targets) == 0:
        return valid_targets, np.int32(0), True

    return valid_targets[:-1], valid_targets[-1], False


def compute_action(
    permute_func: Callable[[Array], Array],
    action_func: Callable[
        [np.int32 | None, Array, Array], tuple[Array, np.int32 | None, bool]
    ],
    add_targets_func: Callable[[Array, Array], Array],
):
    def _compute_action(
        targets: Array,
        mask: tuple[BoolArray, BoolArray],
        current_target: np.int32 | None,
    ):
        new_targets, surface_indexes = get_new_targets(targets, mask)

        targets, current_target, done = action_func(
            current_target,
            add_targets_func(targets, permute_func(new_targets)),
            surface_indexes,
        )

        action = 0 if done else 1
        if action == 0:
            logger.debug(
                'Attacker agent does not have any valid targets and will terminate'
            )

        logger.debug(f'Attacker targets: {targets}')
        logger.debug(f'Attacker current target: {current_target}')
        logger.debug(f'Attacker action: {action}')
        return targets, action, current_target

    return _compute_action


def create_permute_func(seed: int | None, randomize: bool) -> Callable[[Array], Array]:
    s = seed if seed else np.random.SeedSequence().entropy
    rng = np.random.default_rng(s) if randomize else None
    return rng.permutation if rng else lambda x: x


class BreadthFirstAttacker:
    def __init__(self, agent_config: dict[str, Any]) -> None:
        self.current_target: np.int32 | None = None
        self.targets: npt.NDArray[np.int32] = np.array([], dtype=np.int32)

        permute_func = create_permute_func(
            agent_config.get('seed', None), agent_config.get('randomize', False)
        )

        self.compute_action = compute_action(
            permute_func, self._action_func, self._add_new_targets_func
        )

    def __call__(
        self,
        _: dict[str, Any],
        mask: tuple[npt.NDArray[np.bool_], npt.NDArray[np.bool_]],
    ):
        self.targets, action, self.current_target = self.compute_action(
            self.targets, mask, self.current_target
        )

        return (action, self.current_target)

    @staticmethod
    def _action_func(
        current_target: np.int32 | None,
        targets: Array,
        surface_indexes: Array,
    ):
        targets, current_target = move_target_to_back(
            current_target, targets, surface_indexes
        )
        return choose_target(targets, surface_indexes)

    @staticmethod
    def _add_new_targets_func(targets: Array, new_targets: Array):
        new_targets = np.flip(new_targets)  # to comply with the original implementation
        return np.concatenate([new_targets, targets])


class DepthFirstAttacker:
    def __init__(self, agent_config: dict[str, Any]) -> None:
        self.current_target: np.int32 | None = None
        self.targets: npt.NDArray[np.int32] = np.array([], dtype=np.int32)

        permute_func = create_permute_func(
            agent_config.get('seed', None), agent_config.get('randomize', False)
        )

        self.compute_action = compute_action(
            permute_func, self._action_func, self._add_new_targets_func
        )

    def __call__(
        self,
        _: dict[str, Any],
        mask: tuple[npt.NDArray[np.bool_], npt.NDArray[np.bool_]],
    ):
        self.targets, action, self.current_target = self.compute_action(
            self.targets, mask, self.current_target
        )

        return (action, self.current_target)

    @staticmethod
    def _action_func(
        current_target: np.int32 | None,
        targets: Array,
        surface_indexes: Array,
    ):
        # keep working on a target unless it has been removed from the attack surface
        return (
            choose_target(targets, surface_indexes)
            if current_target not in surface_indexes
            else (targets, current_target, False)
        )

    @staticmethod
    def _add_new_targets_func(targets: Array, new_targets: Array):
        # add new targets to the front of the list, so that the agent works on the latest targets first
        return np.concatenate([targets, new_targets])



@dataclass
class MalSimAgent:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think calling this MalSimAgent is confusing. This is a container for agent-related info. A name like MalSimAgentState or MalSimAgentState seems more representative. The current name adds one more "agent" to the actual agent classes and the maltoolbox attacker.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this confusion is very annoying, and I think you are right about just calling it MalSimAgentState instead. Will change that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain how this test works? Why doesn't it use searcher agents? BFS is mentioned here but you seem to read the attackers from the toolbox?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't use any searcher agents because I am not testing the searcher agents, but just the mal simulator functionality. I perform the steps one at a time with steps that I already know are available to the defender/attacker (as per the model and the entrypoints). The scenario file will probably help to see the entrypoints.

Yes, all attacker agents in the simulators in the MalSimulator must reference an Attacker object. This is so the simulator can know the entrypoints, reached attack steps and perform 'compromise' on nodes. This is nothing new in my PR though.

… vectorized simulator

MALSimulator handles:
- agent registration
- simulation step
- action masks
- state of attack graph
But it does NOT handle the observation format/vectorization.

Other changes:
- Agents state are not dicts but instead MalSimAgent objects
- Add vectorized separate simulator
- Instead use field 'agents' to define any number of agents in scenario settings
mrkickling and others added 24 commits January 29, 2025 09:50
…n the simulator

- MalSimAgent is now only internally used in the MalSimulator
- Every time an agent state is returned from the simulator
  it is converted to a MalSimAgentView (read only)
- Don't alter MalSimAgentView objects
- Instead store obs/infos in dicts that are returned on step/reset
… of code base

- As a temporary solution to not change behaviour of searcher agents
  I decided to still use the action mask and not the agent state in them.
- End goal is to not use that kwarg so I added TODO for that
- Adapt the cli/envs/tests to work with the new searcher agents
… action mask

- And adapt everywhere where the searcher decision agents are used (don't give it kwarg anymore, just MalSimAgentView)
@mrkickling mrkickling force-pushed the add-base-mal-simulator branch from d208cd3 to 45a1182 Compare January 29, 2025 08:58

# action surface does not have a guaranteed order,
# so for the agent to be deterministic we need to sort
action_surface.sort(key=lambda n: n.id)
Copy link
Collaborator

@kasanari kasanari Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the order of the action surface can not be guaranteed to be consistent (I personally think the simulator should handle such matters), then why typehint it as a list rather than a set?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants