[BUG] SAC loss masking #2612

matteobettini · 2024-11-27T14:33:07Z

PR #2606

Introduces indexing of loss tensordict using done signals

rl/torchrl/objectives/sac.py

Line 718 in d537dcb

next_tensordict_select = next_tensordict[~done.squeeze(-1)]

rl/torchrl/objectives/sac.py

Line 1223 in d537dcb

next_tensordict_select = next_tensordict[~done.squeeze(-1)]

I have multiple concerns regarding this PR:

masking using done signals is not always possible as the done signal could have more dimensions than the loss tensordict, leading to errors in multiagent settings
the action network could be expecting a specific imput dimension, masking the input could lead to arbitrary unsupported shapes and crashes in the actor (also not in multiagent settings)
in continuous sac, the mask is applied to output actions and logprobabilities (with arbitrary 0 entries). this does not seem necessary, those values will be discarded anyway as the value_estimate() will read the dones
same in descrete sac, it seems to me that the value_estimate() already reads done and discards next_values for done states. plus the target of a done state should be the reward, so here by using 0s we are actually introducing a further bug

In my opinion this change was not needed as the done target_values of done states are already discarded in value_estimate().

Maybe I am wrong in this analysis, please let me know.

I do not think it is possible to avoid submitting inputs of done states to the policy without changing the input shape (which we should avoid as it could lead to errors)

The text was updated successfully, but these errors were encountered:

matteobettini · 2024-11-27T14:40:07Z

If the problem is that the observation of done states could be nan (which I argue it shouldn't) we could consider replacing nans with 0 so that the forward can run

vmoens · 2024-11-27T14:43:01Z

In my opinion this change was not needed as the done target_values of done states are already discarded in value_estimate().

This change was indeed required as some network cannot accept values of observations that are written when the environment is done. There should not be any forward pass for values that are nonsensical and where outputs will not be used.

See #2590 for context.

If the problem is that the observation of done states could be nan (which I argue it shouldn't) we could consider replacing nans with 0 so that the forward can run

NaN were there just to exemplify. In practice, the network should not be queried if the values are not used. For instance you could have a model that implements some sort of internal state update at each query and you woulnd't want this to be modified by values that will be discarded.

Re (1), we could decide not to mask with done if the shapes cannot be broadcast.
Re (2) I think this can be addressed to by checking that the squeezed done has the shape of the tensordict. If so, we're only modifying the batch size and things should work ok (maybe not with RNNs?)

If this can be addressed differently I'm happy to give it a look!

cc @fmeirinhos

vmoens · 2024-11-27T14:45:03Z

I think we also need more coverage of MARL usage of the losses in the tests, because this could have been easily spotted if running pytest test/test_cost.py -k SAC had any MARL setting in it

matteobettini · 2024-11-27T14:48:29Z

I understand the orginal issue, but this seems to be a very difficult pickle.

not all networks can be queried with sparse data or data of arbitrary shape (list is long)

matteobettini · 2024-11-27T14:49:46Z

There should not be any forward pass for values that are nonsensical and where outputs will not be used.

Not sure it makes sense to enforce this here without introducing problems bigger than the original

vmoens · 2024-11-27T14:53:14Z

Not sure it makes sense to enforce this here without introducing problems bigger than the original

You know what, I'm happy to revert that PR as soon as we can figure out what to do for people who have networks that simply cannot accept "done" observation values!

I do think it's a valid concern and it should be addressed, but obviously by a non-buggy solution.

matteobettini · 2024-11-27T15:08:08Z

I have been checking a bit how other libraries do it and they seem to pass the next obs anyway

maybe it is my opinion, but i don’t see what is particular about an observation of a done state, it should be part of the same observation space as the others

furthermore, for policies that have an internal state or counter, this in sac is a bit unnatural as the policy is called from 2 places anyway (actions and values) so keeping track of meaningful states is hard

vmoens · 2024-11-27T15:12:37Z

I have been checking a bit how other libraries do it and they seem to pass the next obs anyway

I don't think we should overfit too much to what other libs do but to issues our users are facing.

furthermore, for policies that have an internal state or counter

this is just an example. The point is that if an error is thrown when invalid data is passed to a network, we should never reach that error (or give the tooling necessary to avoid that).

We could add a flag in the constructor like skip_done_states which defaults to False.

Then we capture errors where relevant and if the actor network raises during a call on the next data we tell users about this flag. In all other cases the behaviour is unchanged.

I gave it a shot in #2613 (without the capture of the error)

vmoens · 2024-11-27T15:17:40Z

For the record, this is an example of a function that errors when there's a NaN

>>> import torch
>>> matrix_with_nan = torch.tensor([[1.0, 2.0], [float('nan'), 4.0]])
>>> 
>>> result = torch.linalg.cholesky(matrix_with_nan)

Note that replacing NaN with 0s with cholesky is also problematic

>>> torch.linalg.cholesky(torch.zeros(4, 4))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
torch._C._LinAlgError: linalg.cholesky: The factorization could not be completed because the input is not positive-definite (the leading minor of order 1 is not positive-definite).

matteobettini · 2024-11-27T15:17:48Z

That seems a reasonable solution to me, happy to review.

i think we are just facing a difficult issue as i can clearly understand where both problems are coming from. I also don’t like padding or calls on useless data

matteobettini · 2024-11-27T16:30:04Z

Note that replacing NaN with 0s with cholesky is also problematic

Also flattening the cholesky input and removing the NaN value is problematic no?

vmoens · 2024-11-27T16:30:41Z

Also flattening the cholesky input and removing the NaN value is problematic no?

no the matrix is in the feature dim, not the batch dim. It isn't flattened

matteobettini added the bug Something isn't working label Nov 27, 2024

matteobettini assigned vmoens Nov 27, 2024

vmoens mentioned this issue Nov 27, 2024

[BugFix] skip_done_states in SAC #2613

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] SAC loss masking #2612

[BUG] SAC loss masking #2612

matteobettini commented Nov 27, 2024

matteobettini commented Nov 27, 2024 •

edited

Loading

vmoens commented Nov 27, 2024

vmoens commented Nov 27, 2024

matteobettini commented Nov 27, 2024 •

edited

Loading

matteobettini commented Nov 27, 2024

vmoens commented Nov 27, 2024

matteobettini commented Nov 27, 2024 •

edited

Loading

vmoens commented Nov 27, 2024

vmoens commented Nov 27, 2024

matteobettini commented Nov 27, 2024

matteobettini commented Nov 27, 2024

vmoens commented Nov 27, 2024

[BUG] SAC loss masking #2612

[BUG] SAC loss masking #2612

Comments

matteobettini commented Nov 27, 2024

matteobettini commented Nov 27, 2024 • edited Loading

vmoens commented Nov 27, 2024

vmoens commented Nov 27, 2024

matteobettini commented Nov 27, 2024 • edited Loading

matteobettini commented Nov 27, 2024

vmoens commented Nov 27, 2024

matteobettini commented Nov 27, 2024 • edited Loading

vmoens commented Nov 27, 2024

vmoens commented Nov 27, 2024

matteobettini commented Nov 27, 2024

matteobettini commented Nov 27, 2024

vmoens commented Nov 27, 2024

matteobettini commented Nov 27, 2024 •

edited

Loading

matteobettini commented Nov 27, 2024 •

edited

Loading

matteobettini commented Nov 27, 2024 •

edited

Loading