Add a characteristic for solvers using action masks and make use of it in rollout #445

nhuet · 2024-11-29T13:43:46Z

Use it in rollout to make them be aware of current action mask, by calling their retrieve_applicable_actions() method.
Add a get_action_mask() method to domains by default converting applicable actions space into a 0-1 numpy array, provided that the action space of each agent is an EnumerableSpace.
Use these new features to simplify how the RayRLlib solver handles action masking:
- inherit from Maskable
- do not require anymore FullObservable from the domain to use action
  masking, as get_action_mask() can be called without the solver knowing about
  the current state (and since in rollout, the actual domain is now
  used)
- decide whether using action masking directly in __init__() so that
  using_applicable_actions() can be overriden properly
- use common functions for unwrap_obs and wrap_action in solver and
  wrapper environment to avoid code duplication
- use domain.get_action_mask() to convert applicable actions into a mask
  (the method is more efficient as not calling get_applicable_actions()
  for each actions)

- Use it in rollout to make them be aware of current action mask. - Add a `get_action_mask()` method to domains by default converting applicable actions space into a 0-1 numpy array, provided that the action space of each agent is an EnumerableSpace.

- inherits from Maskable - do not require anymore FullObservable from the domain to use action masking, as get_action_mask() can be called without the solver knowing about the current state (and since in rollout, the actual domain is now used) - decide whether using action masking directly in __init__() so that using_applicable_actions() can be overriden properly - use common functions for unwrap_obs and wrap_action in solver and wrapper environment to avoid code duplication - use domain.get_action_mask() to convert applicable actions into a mask (the method is more efficient as not calling get_applicable_actions() for each actions)

This is more memory sufficient for only 0-1's. And seems to be the standard for action mask at least for ray.rllib, as shown in `action_mask_key` documentation at https://docs.ray.io/en/latest/rllib/rllib-training.html

neo-alex

Great to have proper masking implemented, thank you! LGTM

nhuet marked this pull request as draft December 10, 2024 16:31

nhuet force-pushed the rollout-action-mask branch from f8826b1 to 3c73d1d Compare December 12, 2024 16:24

nhuet changed the title ~~Add option in rollout for sample_action kwargs (e.g. action masking)~~ Add a characteristic for solvers using action masks and make use of it in rollout Dec 12, 2024

nhuet marked this pull request as ready for review December 12, 2024 16:30

nhuet added 3 commits December 13, 2024 13:36

Use np.int8 instead of np.int64 for action mask dtype

85f6c59

This is more memory sufficient for only 0-1's. And seems to be the standard for action mask at least for ray.rllib, as shown in `action_mask_key` documentation at https://docs.ray.io/en/latest/rllib/rllib-training.html

nhuet force-pushed the rollout-action-mask branch from 3c73d1d to 85f6c59 Compare December 13, 2024 12:36

neo-alex approved these changes Dec 16, 2024

View reviewed changes

neo-alex merged commit 491d3a1 into airbus:master Dec 16, 2024
26 of 33 checks passed

nhuet deleted the rollout-action-mask branch January 20, 2025 09:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a characteristic for solvers using action masks and make use of it in rollout #445

Add a characteristic for solvers using action masks and make use of it in rollout #445

nhuet commented Nov 29, 2024 •

edited

Loading

neo-alex left a comment

Add a characteristic for solvers using action masks and make use of it in rollout #445

Add a characteristic for solvers using action masks and make use of it in rollout #445

Conversation

nhuet commented Nov 29, 2024 • edited Loading

neo-alex left a comment

Choose a reason for hiding this comment

nhuet commented Nov 29, 2024 •

edited

Loading