Add maskable GraphPPO based on sb3_contrib.MaskablePPO + GNN for domains with graph observations #444

nhuet · 2024-11-29T13:24:33Z

Derive MaskableGraphPPO from MaskablePPO (and also GraphOnPolicyAlgorithm to reuse work done for GraphPPO)
Derive MaskableGNNActorCriticPolicy from MaskableActorCriticPolicy (and similarly from _BaseGNNActorCriticPolicy to share code with GNNActorCriticPolicy), idem for MaskableMultiInputGNNActorCriticPolicy.
Update stable_baselines3 scikit-decide wrapper with a new argument use_action_masking in order to wrap the domain in an environment exposing action_masks method (redirecting towards domain.get_action_mask())
We add examples for GraphMaze and GraphJspDomain so that the solver propose only applicable actions

- Use it in rollout to make them be aware of current action mask. - Add a `get_action_mask()` method to domains by default converting applicable actions space into a 0-1 numpy array, provided that the action space of each agent is an EnumerableSpace.

- inherits from Maskable - do not require anymore FullObservable from the domain to use action masking, as get_action_mask() can be called without the solver knowing about the current state (and since in rollout, the actual domain is now used) - decide whether using action masking directly in __init__() so that using_applicable_actions() can be overriden properly - use common functions for unwrap_obs and wrap_action in solver and wrapper environment to avoid code duplication - use domain.get_action_mask() to convert applicable actions into a mask (the method is more efficient as not calling get_applicable_actions() for each actions)

This is more memory sufficient for only 0-1's. And seems to be the standard for action mask at least for ray.rllib, as shown in `action_mask_key` documentation at https://docs.ray.io/en/latest/rllib/rllib-training.html

- we reuse our stable_baselines3 wrapper - the policy is extracting features from the graph with a GNN - the GNN is using pytorch-geometric - We subclass - ActorCriticPolicy: - feature extractor = gnn - custom conversion of observation to torch to convert into torch_geometric.data.Data - PPO to handle properly - observation conversion - rollout buffer - Current limitations: - we extract a fixed number of features (independent of edge/node numbers) for now as we end with a feature reduction layer connected to a classic mlp (not knowning anything about the current graph structure) - User input: the user can define (and default choices are made else) - the gnn (default to a 2 layers GCN), taking as inputs w.r.t torch_geometric conventions: - x: nodes features - edge_index: edge indices or sparse transposed adjency matrix - edge_attr (optional): edges features - edge_weight (optional): edge weights (taken from first dimension of edge_attr) - the feature reduction layer from the gnn output to the fixed number of features (default to global_max_pool + linear layer + relu) We also introduce a multiinput policy to take into account static graph features. The observation space is a DictSpace whose subspaces can contain some Graph spaces.

nhuet force-pushed the gnn-sb3-maskable branch 2 times, most recently from f52083e to 9550a07 Compare December 5, 2024 09:20

nhuet added 5 commits December 13, 2024 13:36

Use np.int8 instead of np.int64 for action mask dtype

85f6c59

This is more memory sufficient for only 0-1's. And seems to be the standard for action mask at least for ray.rllib, as shown in `action_mask_key` documentation at https://docs.ray.io/en/latest/rllib/rllib-training.html

Add maskable GraphPPO based on sb3_contrib.MaskablePPO

b126b46

nhuet force-pushed the gnn-sb3-maskable branch from 9550a07 to b126b46 Compare December 13, 2024 14:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add maskable GraphPPO based on sb3_contrib.MaskablePPO + GNN for domains with graph observations #444

Add maskable GraphPPO based on sb3_contrib.MaskablePPO + GNN for domains with graph observations #444

nhuet commented Nov 29, 2024 •

edited

Loading

Add maskable GraphPPO based on sb3_contrib.MaskablePPO + GNN for domains with graph observations #444

Are you sure you want to change the base?

Add maskable GraphPPO based on sb3_contrib.MaskablePPO + GNN for domains with graph observations #444

Conversation

nhuet commented Nov 29, 2024 • edited Loading

nhuet commented Nov 29, 2024 •

edited

Loading