[Feature Request] ActionDiscretizer scalar integration #2615

oslumbers · 2024-11-28T15:52:18Z

Motivation

The ActionDiscretizer only gives the option of converting the input_spec["full_action_spec"] to MultiCategorical or MultiOneHot. This introduces a dimension into the shape:

MultiCategorical(
    shape=torch.Size([1]),
    space=BoxList(boxes=[CategoricalBox(n=4)]),
    dtype=torch.int64,
    domain=discrete)

which for me causes errors in the collector which is expecting a scalar shape:

  File "runner.py", line 347, in run
    rollout = next(self.collector_iter)
  File "torchrl/collectors/collectors.py", line 1031, in iterator
    tensordict_out = self.rollout()
  File "torchrl/_utils.py", line 481, in unpack_rref_and_invoke_function
    return func(self, *args, **kwargs)
  File "torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "torchrl/collectors/collectors.py", line 1162, in rollout
    env_output, env_next_output = self.env.step_and_maybe_reset(env_input)
  File "torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "torchrl/envs/batched_envs.py", line 67, in decorated_fun
    return fun(self, *args, **kwargs)
  File "torchrl/envs/batched_envs.py", line 1572, in step_and_maybe_reset
    shared_tensordict_parent.update_(
  File "tensordict/base.py", line 5339, in update_
    self._apply_nest(
  File "tensordict/_td.py", line 1330, in _apply_nest
    item_trsf = item._apply_nest(
  File "tensordict/_td.py", line 1330, in _apply_nest
    item_trsf = item._apply_nest(
  File "tensordict/_td.py", line 1330, in _apply_nest
    item_trsf = item._apply_nest(
  File "tensordict/_td.py", line 1350, in _apply_nest
    item_trsf = fn(
  File "tensordict/base.py", line 5318, in inplace_update
    dest.copy_(source, non_blocking=non_blocking)
RuntimeError: output with shape [2, 1] doesn't match the broadcast shape [2, 2]

Solution

To get around this issue, I can replace the MultiCategorical instead with a Categorical:

Categorical(
    shape=torch.Size([]),
    space=CategoricalBox(n=tensor([4])),
    device=cpu,
    dtype=torch.int64,
    domain=discrete)

However, _inv_call() does not have functionality for a scalar action, therefore have to do something like, in line 8658:

action = action.unsqueeze(-1)

to

action = action.unsqueeze(-1).unsqueeze(-1)

so that intervals.ndim == action.ndim.

Alternatives

Could we either:

Add an argument for selecting between MultiCategorical or Categorical
Or bring the creation of the new action_spec outside of the transform_input_spec method such that any child Class of ActionDiscretizer can more specifically define the desired action_spec - rather than currently I have to override transform_input_spec which I would rather maintain.

Also, within _inv_call can we add functionality to account for a scalar action?

Checklist

I have checked that there is no similar issue in the repo (required)

The text was updated successfully, but these errors were encountered:

vmoens · 2024-11-29T18:24:01Z

Looking at it

Here is a MRE for future use

from typing import Optional

from torchrl.envs import EnvBase, ActionDiscretizer
from tensordict import TensorDict, TensorDictBase
from torchrl.data import Bounded
import torch

class EnvWithScalarAction(EnvBase):
    _batch_size = torch.Size(())

    def _reset(self, td: TensorDict):
        return TensorDict(observation=torch.randn(3), done=torch.zeros(1, dtype=torch.bool), truncated=torch.zeros(1, dtype=torch.bool), terminated=torch.zeros(1, dtype=torch.bool))

    def _step(
        self,
        tensordict: TensorDictBase,
    ) -> TensorDictBase:
        return TensorDict(observation=torch.randn(3), reward=torch.zeros(1), done=torch.zeros(1, dtype=torch.bool), truncated=torch.zeros(1, dtype=torch.bool), terminated=torch.zeros(1, dtype=torch.bool))

    def _set_seed(self, seed: Optional[int]):
        ...

def policy(td):
    td.set("action", torch.rand(()))
    return td

env = EnvWithScalarAction()
env.auto_specs_(policy=policy)
env.action_spec = Bounded(-1, 1, shape=())

tenv = env.append_transform(ActionDiscretizer(num_intervals=4))

print(tenv.rollout(4))

vmoens · 2024-11-29T18:30:43Z

This is a first stab
#2619

Needs more comprehensive tests etc.

Question: is it also breaking with action_spec with shape [1] or just []?

oslumbers added the enhancement New feature or request label Nov 28, 2024

oslumbers assigned vmoens Nov 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] ActionDiscretizer scalar integration #2615

[Feature Request] ActionDiscretizer scalar integration #2615

oslumbers commented Nov 28, 2024

vmoens commented Nov 29, 2024

vmoens commented Nov 29, 2024

[Feature Request] ActionDiscretizer scalar integration #2615

[Feature Request] ActionDiscretizer scalar integration #2615

Comments

oslumbers commented Nov 28, 2024

Motivation

Solution

Alternatives

Checklist

vmoens commented Nov 29, 2024

vmoens commented Nov 29, 2024