Skip to content

Commit

Permalink
Add DiscreteActionWrapper (based on Discrete space) and its docs
Browse files Browse the repository at this point in the history
  • Loading branch information
Mikel committed Jun 26, 2024
1 parent a7962f2 commit c61f076
Show file tree
Hide file tree
Showing 7 changed files with 139 additions and 52 deletions.
37 changes: 8 additions & 29 deletions craftium-docs/docs/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,8 @@ env.close()

The code above just starts an episode calling `env.reset`, then samples a random action, plots the current observation, and executes the action using `env.step`. If the episode finishes, `env.reset` is called again to start a new episode. Finally, when the loop ends, `env.close` cleanly closes the environment.

Finally, check the [page](./obs-and-acts.md) on obervations and actions for a complete description of the craftium's default observation and action spaces.

## Using `CraftiumEnv`

The example above employs Gymnasium's `make` utility to load the environments registered by craftium. In this section we explain how to load environments without using the `make` utility, directly employing `CraftiumEnv`. `CraftiumEnv` is craftium's main class, wrapping the modified minetest game in the Gymnasium API.
Expand All @@ -74,34 +76,11 @@ The first parameter, `env_dir` is the single mandatory parameter and specifies t

The rest of the parameters are optional, where the ones in the code section above are the most common. `render_mode` is a common parameter in Gymnasium environments (see [docs](https://gymnasium.farama.org/api/env/#gymnasium.Env.render) for more info) is used to set the rendering mode of the environment. Finally, `obs_width` and `obs_height` specify the size of the observations in pixels.

## Action wrappers

Note that `CraftiumEnv` environments define a fairly large action space with discrete and continuous values. For a complete specification on the default action space see the dedicated [page](./obs-and-acts.md#action-space).

However, many tasks don't require the complete action space and can be greatly simplified by considering only the relevant actions to solve the task at hand. For this reason, craftium comes with [`BinaryActionWrapper`](./reference.md), that can be used to convert the default [`Dict`](https://gymnasium.farama.org/api/spaces/composite/#dict) action space into a simplified [`MultiBinary`](https://gymnasium.farama.org/api/spaces/fundamental/#gymnasium.spaces.MultiBinary) space.

For example,

```python
from craftium.wrappers import BinaryActionWrapper

env = BinaryActionWrapper(
env,
actions=["forward", "mouse x+", "mouse x-"],
mouse_mov=0.5,
)
```

`BinaryActionWrapper` takes the `CraftiumEnv` to wrap as the first argument. Then, the `actions` parameters can be used to select the set of actions from the original action space that will be available in the wrapped environment (see the section on the [action space](./obs-and-acts.md#action-space) for the list of all available action names). In this example, the wrapped environment will only have 3 discrete actions: forward, move the mouse left, and move the mouse right. The last parameter, `mouse_mov` defines the magnitude of the mouse movement (must be in the [0, 1] range).
## Next steps

If we print `env.action_space` before applying the wrapper, we get the following Gymnasium space:
```python
Dict('aux1': Discrete(2), 'backward': Discrete(2), 'dig': Discrete(2), 'drop': Discrete(2), 'forward': Discrete(2), 'inventory': Discrete(2), 'jump': Discrete(2), 'left': Discrete(2), 'mouse': Box(-1.0, 1.0, (2,), float32), 'place': Discrete(2), 'right': Discrete(2), 'slot_1': Discrete(2), 'slot_2': Discrete(2), 'slot_3': Discrete(2), 'slot_4': Discrete(2), 'slot_5': Discrete(2), 'slot_6': Discrete(2), 'slot_7': Discrete(2), 'slot_8': Discrete(2), 'slot_9': Discrete(2), 'sneak': Discrete(2), 'zoom': Discrete(2))
```

After wrapping `env` with `BinaryActionWrapper`, we get that `env.action_space` is:
```python
MultiBinary(3)
```
These are some resources that might be interesting for your next steps!

Much simpler! The default action space has been reduced to a binary vector of only 3 elements. Finally, note that many of the [environments provided](./environments.md) by craftium employ `BinaryActionWrapper` to simplify their optimization.
- [Creating custom environments](./creating-envs.md) page.
- [Wrappers](./wrappers.md) page.
- [Craftium's training example](https://github.com/mikelma/craftium/blob/main/train_agent.py).
- [Craftium's environment implementations](https://github.com/mikelma/craftium/tree/main/craftium-envs).
2 changes: 2 additions & 0 deletions craftium-docs/docs/obs-and-acts.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,3 +57,5 @@ In a nutshell, actions in Craftium are a dictionary of some key commands and mou
```

This action would cause the player to jump forward and rotate the mouse to the left. Note that it isn't neccessary to provide a value for each possible key. If the value for a key is not given, its default value will be used: `0` ("off") for keys and `[0, 0]` (no movement) for mouse movements.

Note that the craftium's default `Dict` action space might be too complex for many tasks, where useful actions might be a subset of the original space. For this purpose, craftium comes with different `ActionWrappers` that can be used to customize and simplify the default action space. Check the dedicated [page](./wrappers.md) on wrappers and the API [reference](./reference.md) for more info.
3 changes: 3 additions & 0 deletions craftium-docs/docs/reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@
::: craftium.craftium_env.CraftiumEnv

<br>

## Wrappers

::: craftium.wrappers.BinaryActionWrapper

::: craftium.wrappers.DiscreteActionWrapper
51 changes: 51 additions & 0 deletions craftium-docs/docs/wrappers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Action wrappers

`CraftiumEnv` environments define a fairly large action space with discrete and continuous values (see the dedicated [page](./obs-and-acts.md#action-space)). However, many tasks don't require the complete action space and can be greatly simplified by considering only the relevant actions to solve the task at hand. For this reason, craftium comes with `BinaryActionWrapper` and `DiscreteActionWrapper` (see API [docs](./reference.md)), that can be used to convert the default [`Dict`](https://gymnasium.farama.org/api/spaces/composite/#dict) action space into a simplified space.

For example,

```python
from craftium.wrappers import BinaryActionWrapper

env = BinaryActionWrapper(
env,
actions=["forward", "mouse x+", "mouse x-"],
mouse_mov=0.5,
)
```

## BinaryActionWrapper

`BinaryActionWrapper` takes the `CraftiumEnv` to wrap as the first argument. Then, the `actions` parameters can be used to select the set of actions from the original action space that will be available in the wrapped environment (see the section on the [action space](./obs-and-acts.md#action-space) for the list of all available action names). In this example, the wrapped environment will only have 3 discrete actions: forward, move the mouse left, and move the mouse right. The last parameter, `mouse_mov` defines the magnitude of the mouse movement (must be in the [0, 1] range).

If we print `env.action_space` before applying the wrapper, we get the following Gymnasium space:
```python
Dict('aux1': Discrete(2), 'backward': Discrete(2), 'dig': Discrete(2), 'drop': Discrete(2), 'forward': Discrete(2), 'inventory': Discrete(2), 'jump': Discrete(2), 'left': Discrete(2), 'mouse': Box(-1.0, 1.0, (2,), float32), 'place': Discrete(2), 'right': Discrete(2), 'slot_1': Discrete(2), 'slot_2': Discrete(2), 'slot_3': Discrete(2), 'slot_4': Discrete(2), 'slot_5': Discrete(2), 'slot_6': Discrete(2), 'slot_7': Discrete(2), 'slot_8': Discrete(2), 'slot_9': Discrete(2), 'sneak': Discrete(2), 'zoom': Discrete(2))
```

After wrapping `env` with `BinaryActionWrapper`, we get that `env.action_space` is:
```python
MultiBinary(3)
```

The default action space has been reduced to a binary vector of only 3 elements (see [`MultiBinary`](https://gymnasium.farama.org/api/spaces/fundamental/#gymnasium.spaces.MultiBinary)). Note that `BinaryActionWrapper` allows for simultanaous actions. In the example above, the action `[1, 0, 1]` whould generate a forward movement action and a mouse (camera) rotation at the same time.

## DiscreteActionWrapper

Another way of reducing the action space of tasks is to discretize them as unique indices. For example, the actions `["forward", "mouse x+", "mouse x-"]` would be translated to three different actions: `1`, `2`, and `3`, in other words: *a* ∈ {1,2,3}. `DiscreteActionWrapper` converts the default `Dict` space into [`Discrete`](https://gymnasium.farama.org/api/spaces/fundamental/#gymnasium.spaces.Discrete). Let's try:

```python
from craftium.wrappers import DiscreteActionWrapper

env = DiscreteActionWrapper(
env,
actions=["forward", "mouse x+", "mouse x-"],
mouse_mov=0.5,
)

print(env.action_space)
```

The program returns: `Discrete(4)`. What a surprise! We provided three valid actions to the wrapper but it returned an action space of four unique actions 🤔. This is because `DiscreteActionWrapper` adds an extra action (with index `0`) that does nothing (equivalent to the `NOP` action in other environments).

Finally, note that many of the [environments provided](./environments.md) by craftium employ `DiscreteActionWrapper` to simplify their optimization.
10 changes: 5 additions & 5 deletions craftium/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from .craftium_env import CraftiumEnv
from .wrappers import BinaryActionWrapper
from .wrappers import BinaryActionWrapper, DiscreteActionWrapper

from gymnasium.envs.registration import register, WrapperSpec

Expand All @@ -25,8 +25,8 @@
entry_point="craftium.craftium_env:CraftiumEnv",
additional_wrappers=[
WrapperSpec(
name="BinaryActionWrapper",
entry_point="craftium.wrappers:BinaryActionWrapper",
name="DiscreteActionWrapper",
entry_point="craftium.wrappers:DiscreteActionWrapper",
kwargs=dict(
actions=["forward", "mouse x+", "mouse x-"],
mouse_mov=0.5,
Expand All @@ -48,8 +48,8 @@
entry_point="craftium.craftium_env:CraftiumEnv",
additional_wrappers=[
WrapperSpec(
name="BinaryActionWrapper",
entry_point="craftium.wrappers:BinaryActionWrapper",
name="DiscreteActionWrapper",
entry_point="craftium.wrappers:DiscreteActionWrapper",
kwargs=dict(
actions=["forward", "dig", "mouse x+", "mouse x-", "mouse y+", "mouse y-"],
mouse_mov=0.5,
Expand Down
87 changes: 69 additions & 18 deletions craftium/wrappers.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,29 @@
import numpy as np
from gymnasium import ActionWrapper, Env
from gymnasium.spaces import MultiBinary
from gymnasium.spaces import MultiBinary, Discrete

from .craftium_env import ACTION_ORDER


def check_actions_valid(actions):
# obtain valid action names
mouse_actions = [f"mouse {ax}{sign}" for ax, sign in zip(["x", "y", "x", "y"], ["+", "-", "-", "+"])]
valid_actions = ACTION_ORDER + mouse_actions
del valid_actions[valid_actions.index("mouse")]

# check if the provided action names are valid
assert sum([a not in valid_actions for a in actions]) == 0, \
f"Invalid action given. Valid actions are: {valid_actions}"


def clip_mouse(m):
mouse_mov = np.clip(m, 0., 1.)
if m != mouse_mov:
print(f"Warning (DiscreteActionWrapper): mouse_mov \
is {m}, clipping in range 0-1.")
return mouse_mov


class BinaryActionWrapper(ActionWrapper):
"""A Gymnasium `ActionWrapper` that translates craftium's `Dict` action space into a binary (discretized) action space [`MultiBiniary`](https://gymnasium.farama.org/api/spaces/fundamental/#gymnasium.spaces.MultiBinary).
Expand All @@ -15,25 +34,10 @@ class BinaryActionWrapper(ActionWrapper):
def __init__(self, env: Env, actions: list[str], mouse_mov: float = 0.5):
ActionWrapper.__init__(self, env)

check_actions_valid(actions)
self.actions = actions

# obtain valid action names
mouse_actions = [f"mouse {ax}{sign}" for ax, sign in zip(["x", "y", "x", "y"], ["+", "-", "-", "+"])]
valid_actions = ACTION_ORDER + mouse_actions
del valid_actions[valid_actions.index("mouse")]

# check if the provided action names are valid
assert sum([a not in valid_actions for a in actions]) == 0, \
f"Invalid action given. Valid actions are: {valid_actions}"

# define the action space for gymnasium
self.action_space = MultiBinary(len(actions))

# clip the mouse movement if needed
self.mouse_mov = np.clip(mouse_mov, 0., 1.)
if self.mouse_mov != mouse_mov:
print(f"Warning (DiscreteActionWrapper): mouse_mov \
is {mouse_mov}, clipping in range 0-1.")
self.mouse_mov = clip_mouse(mouse_mov)

def action(self, action):
assert len(action) == len(self.actions), \
Expand All @@ -59,3 +63,50 @@ def action(self, action):
res["mouse"] = mouse

return res

class DiscreteActionWrapper(ActionWrapper):
"""A Gymnasium `ActionWrapper` that translates craftium's `Dict` action space into a discretized action space [`Discrete`](https://gymnasium.farama.org/api/spaces/fundamental/#gymnasium.spaces.Discrete).
Unlike `DiscreteActionWrapper`, this wrapper adds an additional action to the action space in order to include the `NOP` action. This action is equivalent to `{}` in the `Dict` space or to a list of zeros in the `MultiBinary` space. The `NOP` action has index `0`, and the rest of the actions have the consecutive idexes. Thus, the number of actions of the environment will be `len(actions)+1`.
:param env: The environment to wrap.
:param actions: A list of strings containing the names of the actions that will consititute the new action space.
:params mouse_mov: Magnitude of the mouse movement. Must be in the [0, 1] range, else it will be clipped.
"""
def __init__(self, env: Env, actions: list[str], mouse_mov: float = 0.5):
ActionWrapper.__init__(self, env)

check_actions_valid(actions)

self.actions = actions
self.action_space = Discrete(len(actions)+1)
self.mouse_mov = clip_mouse(mouse_mov)

def action(self, action):
assert action >= 0 and action <= len(self.actions), \
f"Action out of bound, got {action} but expected 0 <= action <= {len(self.actions)}"

# if the action has index 0, return an empty action (NOP)
if action == 0:
return {}

res = {}

name = self.actions[action-1]

mouse = [0, 0]

if name == "mouse x+":
mouse[0] += self.mouse_mov
elif name == "mouse x-":
mouse[0] -= self.mouse_mov
elif name == "mouse y+":
mouse[1] += self.mouse_mov
elif name == "mouse y-":
mouse[1] -= self.mouse_mov
else:
res[name] = 1

res["mouse"] = mouse

return res
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ nav:
- General Information:
- Observations and actions: docs/obs-and-acts.md
- docs/environments.md
- Wrappers: docs/wrappers.md
- Misc:
- docs/troubleshooting.md
- API Reference: docs/reference.md
Expand Down

0 comments on commit c61f076

Please sign in to comment.