From c61f076296e51dddb3dfb670f7cc6121e3503d22 Mon Sep 17 00:00:00 2001 From: Mikel Date: Wed, 26 Jun 2024 17:20:17 +0200 Subject: [PATCH] Add DiscreteActionWrapper (based on Discrete space) and its docs --- craftium-docs/docs/getting-started.md | 37 +++--------- craftium-docs/docs/obs-and-acts.md | 2 + craftium-docs/docs/reference.md | 3 + craftium-docs/docs/wrappers.md | 51 ++++++++++++++++ craftium/__init__.py | 10 +-- craftium/wrappers.py | 87 +++++++++++++++++++++------ mkdocs.yml | 1 + 7 files changed, 139 insertions(+), 52 deletions(-) create mode 100644 craftium-docs/docs/wrappers.md diff --git a/craftium-docs/docs/getting-started.md b/craftium-docs/docs/getting-started.md index cec2714db..e696927ae 100644 --- a/craftium-docs/docs/getting-started.md +++ b/craftium-docs/docs/getting-started.md @@ -49,6 +49,8 @@ env.close() The code above just starts an episode calling `env.reset`, then samples a random action, plots the current observation, and executes the action using `env.step`. If the episode finishes, `env.reset` is called again to start a new episode. Finally, when the loop ends, `env.close` cleanly closes the environment. +Finally, check the [page](./obs-and-acts.md) on obervations and actions for a complete description of the craftium's default observation and action spaces. + ## Using `CraftiumEnv` The example above employs Gymnasium's `make` utility to load the environments registered by craftium. In this section we explain how to load environments without using the `make` utility, directly employing `CraftiumEnv`. `CraftiumEnv` is craftium's main class, wrapping the modified minetest game in the Gymnasium API. @@ -74,34 +76,11 @@ The first parameter, `env_dir` is the single mandatory parameter and specifies t The rest of the parameters are optional, where the ones in the code section above are the most common. `render_mode` is a common parameter in Gymnasium environments (see [docs](https://gymnasium.farama.org/api/env/#gymnasium.Env.render) for more info) is used to set the rendering mode of the environment. Finally, `obs_width` and `obs_height` specify the size of the observations in pixels. -## Action wrappers - -Note that `CraftiumEnv` environments define a fairly large action space with discrete and continuous values. For a complete specification on the default action space see the dedicated [page](./obs-and-acts.md#action-space). - -However, many tasks don't require the complete action space and can be greatly simplified by considering only the relevant actions to solve the task at hand. For this reason, craftium comes with [`BinaryActionWrapper`](./reference.md), that can be used to convert the default [`Dict`](https://gymnasium.farama.org/api/spaces/composite/#dict) action space into a simplified [`MultiBinary`](https://gymnasium.farama.org/api/spaces/fundamental/#gymnasium.spaces.MultiBinary) space. - -For example, - -```python -from craftium.wrappers import BinaryActionWrapper - -env = BinaryActionWrapper( - env, - actions=["forward", "mouse x+", "mouse x-"], - mouse_mov=0.5, - ) -``` - -`BinaryActionWrapper` takes the `CraftiumEnv` to wrap as the first argument. Then, the `actions` parameters can be used to select the set of actions from the original action space that will be available in the wrapped environment (see the section on the [action space](./obs-and-acts.md#action-space) for the list of all available action names). In this example, the wrapped environment will only have 3 discrete actions: forward, move the mouse left, and move the mouse right. The last parameter, `mouse_mov` defines the magnitude of the mouse movement (must be in the [0, 1] range). +## Next steps -If we print `env.action_space` before applying the wrapper, we get the following Gymnasium space: -```python -Dict('aux1': Discrete(2), 'backward': Discrete(2), 'dig': Discrete(2), 'drop': Discrete(2), 'forward': Discrete(2), 'inventory': Discrete(2), 'jump': Discrete(2), 'left': Discrete(2), 'mouse': Box(-1.0, 1.0, (2,), float32), 'place': Discrete(2), 'right': Discrete(2), 'slot_1': Discrete(2), 'slot_2': Discrete(2), 'slot_3': Discrete(2), 'slot_4': Discrete(2), 'slot_5': Discrete(2), 'slot_6': Discrete(2), 'slot_7': Discrete(2), 'slot_8': Discrete(2), 'slot_9': Discrete(2), 'sneak': Discrete(2), 'zoom': Discrete(2)) -``` - -After wrapping `env` with `BinaryActionWrapper`, we get that `env.action_space` is: -```python -MultiBinary(3) -``` +These are some resources that might be interesting for your next steps! -Much simpler! The default action space has been reduced to a binary vector of only 3 elements. Finally, note that many of the [environments provided](./environments.md) by craftium employ `BinaryActionWrapper` to simplify their optimization. +- [Creating custom environments](./creating-envs.md) page. +- [Wrappers](./wrappers.md) page. +- [Craftium's training example](https://github.com/mikelma/craftium/blob/main/train_agent.py). +- [Craftium's environment implementations](https://github.com/mikelma/craftium/tree/main/craftium-envs). diff --git a/craftium-docs/docs/obs-and-acts.md b/craftium-docs/docs/obs-and-acts.md index 11476745c..42391bedc 100644 --- a/craftium-docs/docs/obs-and-acts.md +++ b/craftium-docs/docs/obs-and-acts.md @@ -57,3 +57,5 @@ In a nutshell, actions in Craftium are a dictionary of some key commands and mou ``` This action would cause the player to jump forward and rotate the mouse to the left. Note that it isn't neccessary to provide a value for each possible key. If the value for a key is not given, its default value will be used: `0` ("off") for keys and `[0, 0]` (no movement) for mouse movements. + +Note that the craftium's default `Dict` action space might be too complex for many tasks, where useful actions might be a subset of the original space. For this purpose, craftium comes with different `ActionWrappers` that can be used to customize and simplify the default action space. Check the dedicated [page](./wrappers.md) on wrappers and the API [reference](./reference.md) for more info. diff --git a/craftium-docs/docs/reference.md b/craftium-docs/docs/reference.md index e778de248..4055a8f2a 100644 --- a/craftium-docs/docs/reference.md +++ b/craftium-docs/docs/reference.md @@ -5,6 +5,9 @@ ::: craftium.craftium_env.CraftiumEnv
+ ## Wrappers ::: craftium.wrappers.BinaryActionWrapper + +::: craftium.wrappers.DiscreteActionWrapper diff --git a/craftium-docs/docs/wrappers.md b/craftium-docs/docs/wrappers.md new file mode 100644 index 000000000..6d632a29a --- /dev/null +++ b/craftium-docs/docs/wrappers.md @@ -0,0 +1,51 @@ +# Action wrappers + +`CraftiumEnv` environments define a fairly large action space with discrete and continuous values (see the dedicated [page](./obs-and-acts.md#action-space)). However, many tasks don't require the complete action space and can be greatly simplified by considering only the relevant actions to solve the task at hand. For this reason, craftium comes with `BinaryActionWrapper` and `DiscreteActionWrapper` (see API [docs](./reference.md)), that can be used to convert the default [`Dict`](https://gymnasium.farama.org/api/spaces/composite/#dict) action space into a simplified space. + +For example, + +```python +from craftium.wrappers import BinaryActionWrapper + +env = BinaryActionWrapper( + env, + actions=["forward", "mouse x+", "mouse x-"], + mouse_mov=0.5, +) +``` + +## BinaryActionWrapper + +`BinaryActionWrapper` takes the `CraftiumEnv` to wrap as the first argument. Then, the `actions` parameters can be used to select the set of actions from the original action space that will be available in the wrapped environment (see the section on the [action space](./obs-and-acts.md#action-space) for the list of all available action names). In this example, the wrapped environment will only have 3 discrete actions: forward, move the mouse left, and move the mouse right. The last parameter, `mouse_mov` defines the magnitude of the mouse movement (must be in the [0, 1] range). + +If we print `env.action_space` before applying the wrapper, we get the following Gymnasium space: +```python +Dict('aux1': Discrete(2), 'backward': Discrete(2), 'dig': Discrete(2), 'drop': Discrete(2), 'forward': Discrete(2), 'inventory': Discrete(2), 'jump': Discrete(2), 'left': Discrete(2), 'mouse': Box(-1.0, 1.0, (2,), float32), 'place': Discrete(2), 'right': Discrete(2), 'slot_1': Discrete(2), 'slot_2': Discrete(2), 'slot_3': Discrete(2), 'slot_4': Discrete(2), 'slot_5': Discrete(2), 'slot_6': Discrete(2), 'slot_7': Discrete(2), 'slot_8': Discrete(2), 'slot_9': Discrete(2), 'sneak': Discrete(2), 'zoom': Discrete(2)) +``` + +After wrapping `env` with `BinaryActionWrapper`, we get that `env.action_space` is: +```python +MultiBinary(3) +``` + +The default action space has been reduced to a binary vector of only 3 elements (see [`MultiBinary`](https://gymnasium.farama.org/api/spaces/fundamental/#gymnasium.spaces.MultiBinary)). Note that `BinaryActionWrapper` allows for simultanaous actions. In the example above, the action `[1, 0, 1]` whould generate a forward movement action and a mouse (camera) rotation at the same time. + +## DiscreteActionWrapper + +Another way of reducing the action space of tasks is to discretize them as unique indices. For example, the actions `["forward", "mouse x+", "mouse x-"]` would be translated to three different actions: `1`, `2`, and `3`, in other words: *a* ∈ {1,2,3}. `DiscreteActionWrapper` converts the default `Dict` space into [`Discrete`](https://gymnasium.farama.org/api/spaces/fundamental/#gymnasium.spaces.Discrete). Let's try: + +```python +from craftium.wrappers import DiscreteActionWrapper + +env = DiscreteActionWrapper( + env, + actions=["forward", "mouse x+", "mouse x-"], + mouse_mov=0.5, +) + +print(env.action_space) +``` + +The program returns: `Discrete(4)`. What a surprise! We provided three valid actions to the wrapper but it returned an action space of four unique actions 🤔. This is because `DiscreteActionWrapper` adds an extra action (with index `0`) that does nothing (equivalent to the `NOP` action in other environments). + +Finally, note that many of the [environments provided](./environments.md) by craftium employ `DiscreteActionWrapper` to simplify their optimization. diff --git a/craftium/__init__.py b/craftium/__init__.py index 9774942e2..40ae716a4 100644 --- a/craftium/__init__.py +++ b/craftium/__init__.py @@ -1,5 +1,5 @@ from .craftium_env import CraftiumEnv -from .wrappers import BinaryActionWrapper +from .wrappers import BinaryActionWrapper, DiscreteActionWrapper from gymnasium.envs.registration import register, WrapperSpec @@ -25,8 +25,8 @@ entry_point="craftium.craftium_env:CraftiumEnv", additional_wrappers=[ WrapperSpec( - name="BinaryActionWrapper", - entry_point="craftium.wrappers:BinaryActionWrapper", + name="DiscreteActionWrapper", + entry_point="craftium.wrappers:DiscreteActionWrapper", kwargs=dict( actions=["forward", "mouse x+", "mouse x-"], mouse_mov=0.5, @@ -48,8 +48,8 @@ entry_point="craftium.craftium_env:CraftiumEnv", additional_wrappers=[ WrapperSpec( - name="BinaryActionWrapper", - entry_point="craftium.wrappers:BinaryActionWrapper", + name="DiscreteActionWrapper", + entry_point="craftium.wrappers:DiscreteActionWrapper", kwargs=dict( actions=["forward", "dig", "mouse x+", "mouse x-", "mouse y+", "mouse y-"], mouse_mov=0.5, diff --git a/craftium/wrappers.py b/craftium/wrappers.py index 036200aaf..14c154197 100644 --- a/craftium/wrappers.py +++ b/craftium/wrappers.py @@ -1,10 +1,29 @@ import numpy as np from gymnasium import ActionWrapper, Env -from gymnasium.spaces import MultiBinary +from gymnasium.spaces import MultiBinary, Discrete from .craftium_env import ACTION_ORDER +def check_actions_valid(actions): + # obtain valid action names + mouse_actions = [f"mouse {ax}{sign}" for ax, sign in zip(["x", "y", "x", "y"], ["+", "-", "-", "+"])] + valid_actions = ACTION_ORDER + mouse_actions + del valid_actions[valid_actions.index("mouse")] + + # check if the provided action names are valid + assert sum([a not in valid_actions for a in actions]) == 0, \ + f"Invalid action given. Valid actions are: {valid_actions}" + + +def clip_mouse(m): + mouse_mov = np.clip(m, 0., 1.) + if m != mouse_mov: + print(f"Warning (DiscreteActionWrapper): mouse_mov \ + is {m}, clipping in range 0-1.") + return mouse_mov + + class BinaryActionWrapper(ActionWrapper): """A Gymnasium `ActionWrapper` that translates craftium's `Dict` action space into a binary (discretized) action space [`MultiBiniary`](https://gymnasium.farama.org/api/spaces/fundamental/#gymnasium.spaces.MultiBinary). @@ -15,25 +34,10 @@ class BinaryActionWrapper(ActionWrapper): def __init__(self, env: Env, actions: list[str], mouse_mov: float = 0.5): ActionWrapper.__init__(self, env) + check_actions_valid(actions) self.actions = actions - - # obtain valid action names - mouse_actions = [f"mouse {ax}{sign}" for ax, sign in zip(["x", "y", "x", "y"], ["+", "-", "-", "+"])] - valid_actions = ACTION_ORDER + mouse_actions - del valid_actions[valid_actions.index("mouse")] - - # check if the provided action names are valid - assert sum([a not in valid_actions for a in actions]) == 0, \ - f"Invalid action given. Valid actions are: {valid_actions}" - - # define the action space for gymnasium self.action_space = MultiBinary(len(actions)) - - # clip the mouse movement if needed - self.mouse_mov = np.clip(mouse_mov, 0., 1.) - if self.mouse_mov != mouse_mov: - print(f"Warning (DiscreteActionWrapper): mouse_mov \ - is {mouse_mov}, clipping in range 0-1.") + self.mouse_mov = clip_mouse(mouse_mov) def action(self, action): assert len(action) == len(self.actions), \ @@ -59,3 +63,50 @@ def action(self, action): res["mouse"] = mouse return res + +class DiscreteActionWrapper(ActionWrapper): + """A Gymnasium `ActionWrapper` that translates craftium's `Dict` action space into a discretized action space [`Discrete`](https://gymnasium.farama.org/api/spaces/fundamental/#gymnasium.spaces.Discrete). + + Unlike `DiscreteActionWrapper`, this wrapper adds an additional action to the action space in order to include the `NOP` action. This action is equivalent to `{}` in the `Dict` space or to a list of zeros in the `MultiBinary` space. The `NOP` action has index `0`, and the rest of the actions have the consecutive idexes. Thus, the number of actions of the environment will be `len(actions)+1`. + + :param env: The environment to wrap. + :param actions: A list of strings containing the names of the actions that will consititute the new action space. + :params mouse_mov: Magnitude of the mouse movement. Must be in the [0, 1] range, else it will be clipped. + """ + def __init__(self, env: Env, actions: list[str], mouse_mov: float = 0.5): + ActionWrapper.__init__(self, env) + + check_actions_valid(actions) + + self.actions = actions + self.action_space = Discrete(len(actions)+1) + self.mouse_mov = clip_mouse(mouse_mov) + + def action(self, action): + assert action >= 0 and action <= len(self.actions), \ + f"Action out of bound, got {action} but expected 0 <= action <= {len(self.actions)}" + + # if the action has index 0, return an empty action (NOP) + if action == 0: + return {} + + res = {} + + name = self.actions[action-1] + + mouse = [0, 0] + + if name == "mouse x+": + mouse[0] += self.mouse_mov + elif name == "mouse x-": + mouse[0] -= self.mouse_mov + elif name == "mouse y+": + mouse[1] += self.mouse_mov + elif name == "mouse y-": + mouse[1] -= self.mouse_mov + else: + res[name] = 1 + + res["mouse"] = mouse + + return res diff --git a/mkdocs.yml b/mkdocs.yml index 4ae498c6e..edfc73c95 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -11,6 +11,7 @@ nav: - General Information: - Observations and actions: docs/obs-and-acts.md - docs/environments.md + - Wrappers: docs/wrappers.md - Misc: - docs/troubleshooting.md - API Reference: docs/reference.md