Add support for continuous actions in the ALE (CALE) #539

psc-g · 2024-07-11T19:04:44Z

This pull request enables the use of continuous actions in the ALE. It uses the same Python interface as the standard discrete actions, but now also accepts a numpy array of actions.

The idea is to simulate a real Atari joystick with a continuous unit circle, representing all the possible joystick movements. The unit circle is parameterized with polar coordinates, one dimension representing radius r (in [0, 1]), and another dimension representing theta (in [-π, π]). The fire button is a separate dimension (in [0, 1]). The action space is thus 3-dimensional.

The original Atari controller, despite having an analogous joystick, still only triggered one of 9 discrete events triggered by the joystick position. This is simulated with a threshold parameter continuous_action_threshold. For example: if the x position of the joystick is less than continuous_action_threshold then the LEFT event is triggered, and if the y position of the joystick is larger than continuous_action_threshold, the UP event is triggered (note these two events can trigger simultaneously).

The intent of this pull request is to enable the benchmarking of continuous controllers on the ALE, which would open the opportunity for more direct, and fair, comparisons between discrete- and continuous-control RL agents.

…ry call to actContinuous.

pseudo-rnd-thoughts

Thanks Pablo for the PR, it certainly looks interesting and we would be interested in adding this.

The CI has broken for some result, therefore, I'll try to fix the build CI in the next few days before properly reviewing the PR.
In the meantime, could you add some testing for this?

psc-g · 2024-07-15T10:19:12Z

Thanks Mark,
Would be happy to add tests for this. Could you let me know how you run the tests? When I try with pytest -q tests/python/test_atari_env.py I get a whole bunch of failures due to ROMs not found (even though I did pip install gymnasium[atari]).

pseudo-rnd-thoughts · 2024-07-15T11:01:03Z

The ROMs are now packaged within the PyPI but not the repo(so AutoROM is unnecessary), you should just need to run bash scripts/download_unpack_roms.sh which will download the ROMs and put them in the correct folder.

psc-g · 2024-07-15T11:06:45Z

Thanks Mark, that resolved that issue, but there's another one now, which seems to be unrelated to my changes (since I haven't added any tests yet and code defaults to discrete, not continuous, actions). All the tests raise this AsertionError:

AssertionError: The environment must specify an action space. https://gymnasium.farama.org/tutorials/gymnasium_basics/environment_creation/

Would it be possible for you to try running this on your end, to verify that they do still pass? That would confirm that it's an issue with my setup.

psc-g · 2024-07-15T17:11:01Z

Figured it out. I mistakenly added underscores before action_space in the first commit. Working on new tests for continuous actions now.

psc-g · 2024-07-15T17:31:37Z

Unit tests added! Let me know if you have any other comments on the PR.

psc-g · 2024-07-23T07:47:51Z

hey mark, any update on the build CI issue?

pseudo-rnd-thoughts · 2024-07-23T09:47:29Z

Thanks @psc-g for checking in, sadly not, in #535 I've fixed the building problem such that all the wheels are created. However, we added recently testing that the wheels actually work which shows they don't.
Unzipping the wheels to understand their content, for the windows and linux builds, they are all using the same version of python while the macos versions work. Very strange.
I've put a message on the CIBuildTools github page, pypa/cibuildwheel#1940, asking for support but haven't received any reply.
If you have any time to look and understand, that would be greatly appreciated.

@JesseFarebro Any ideas?

pseudo-rnd-thoughts · 2024-07-23T09:52:04Z

Reading through the logs more closely, I spotted

-- pybind11 v2.13.1 
  CMake Warning (dev) at build/temp.win-amd64-cpython-38/Release/ale_py._ale_py/_deps/pybind11-src/tools/FindPythonLibsNew.cmake:101 (message):
    Policy CMP0148 is not set: The FindPythonInterp and FindPythonLibs modules
    are removed.  Run "cmake --help-policy CMP0148" for policy details.  Use
    the cmake_policy command to set the policy and suppress this warning, or
    preferably upgrade to using FindPython, either by calling it explicitly
    before pybind11, or by setting PYBIND11_FINDPYTHON ON before pybind11.
  Call Stack (most recent call first):
    vcpkg/scripts/buildsystems/vcpkg.cmake:859 (_find_package)
    build/temp.win-amd64-cpython-38/Release/ale_py._ale_py/_deps/pybind11-src/tools/pybind11Tools.cmake:50 (find_package)
    build/temp.win-amd64-cpython-38/Release/ale_py._ale_py/_deps/pybind11-src/tools/pybind11Common.cmake:202 (include)
    build/temp.win-amd64-cpython-38/Release/ale_py._ale_py/_deps/pybind11-src/CMakeLists.txt:248 (include)
  This warning is for project developers.  Use -Wno-dev to suppress it.
  
  CMake Warning (dev) at vcpkg/scripts/buildsystems/vcpkg.cmake:859 (_find_package):
    Policy CMP0148 is not set: The FindPythonInterp and FindPythonLibs modules
    are removed.  Run "cmake --help-policy CMP0148" for policy details.  Use
    the cmake_policy command to set the policy and suppress this warning.
  
  Call Stack (most recent call first):
    build/temp.win-amd64-cpython-38/Release/ale_py._ale_py/_deps/pybind11-src/tools/FindPythonLibsNew.cmake:114 (find_package)
    vcpkg/scripts/buildsystems/vcpkg.cmake:859 (_find_package)
    build/temp.win-amd64-cpython-38/Release/ale_py._ale_py/_deps/pybind11-src/tools/pybind11Tools.cmake:50 (find_package)
    build/temp.win-amd64-cpython-38/Release/ale_py._ale_py/_deps/pybind11-src/tools/pybind11Common.cmake:202 (include)
    build/temp.win-amd64-cpython-38/Release/ale_py._ale_py/_deps/pybind11-src/CMakeLists.txt:248 (include)
  This warning is for project developers.  Use -Wno-dev to suppress it.
  
  -- Found PythonInterp: C:/hostedtoolcache/windows/Python/3.9.13/x64/python3.exe (found suitable version "3.9.13", minimum required is "3.7")
  -- Found PythonLibs: C:/hostedtoolcache/windows/Python/3.9.13/x64/libs/python39.lib
  -- Found Python3: C:\Users\runneradmin\AppData\Local\Temp\build-env-5ocq8zbx\Scripts\python.exe (found version "3.8.10") found components: Interpreter
  -- Configuring done (37.4s)
  -- Generating done (0.2s)
  -- Build files have been written to: D:/a/Arcade-Learning-Environment/Arcade-Learning-Environment/build/temp.win-amd64-cpython-38/Release/ale_py._ale_py
  MSBuild version 17.10.4+10fbfbf2e for .NET Framework

This appears to be the underlying problem, I'll see what I can do

pseudo-rnd-thoughts · 2024-07-23T14:43:35Z

@psc-g I've fixed the CI issues, thanks for poking me about it.
@jjshoots Has offered to review it in the next few days

I'm thinking about how we release this. I'm guessing you have a paper associated with this, how publicly do you wish for us to be about it. Our plan is to cut a release in the next few days, v0.9.1, that adds support for NumPy 2.0
We would then have a subsequence release when this PR is ready, v0.10.0 that can highlight this new feature and possibly a link to your paper for how it could be used

psc-g · 2024-07-23T15:05:20Z

thanks mark, great to hear you were able to resolve the issues!

yes, we have a paper for this that is under review, but wanted to put it up on arxiv soon. before doing that, however, we wanted this to be merged so that we can point to the main repo instead of my fork. would be happy to share the draft with you privately if that would help!

in terms of coordination with releases, happy to do whatever you think works best. would be awesome if we can have this out before RLC (happening august 9-12) so that we can start advertising it there.

pseudo-rnd-thoughts · 2024-07-23T15:26:55Z

yes, we have a paper for this that is under review, but wanted to put it up on arxiv soon. before doing that, however, we wanted this to be merged so that we can point to the main repo instead of my fork. would be happy to share the draft with you privately if that would help!

Cool, we can work with that. Once Jet finishes his review then we can cut a release, straight away

in terms of coordination with releases, happy to do whatever you think works best. would be awesome if we can have this out before RLC (happening august 9-12) so that we can start advertising it there.

I'll write a short summary of this PR using what you wrote above in the next day and include example code for people to utilise

jjshoots

This was an interesting PR to review, thanks for the contribution! I have just some minor comments that can possibly be ignored (except the action space one).

As I understand things correctly, CALE works by taking in a continuous action in the form of r, theta, fire, and then uses a thresholding parameter to convert that into a discrete action, in which everything else is similar to the normal Atari envs.
I question whether this needs to be a full feature within ALE at the C level, as opposed to simply using a Gymnasium wrapper that converts continuous action values into discrete action values in Python land. I see that ALE already provides the full action space to discrete agents that the continuous agent is allowed to have.
Having this in C land would mean that it is a baked-in feature of ALE and would probably run a little bit faster, but from a maintainability standpoint, ALE is already struggling from not-enough-developers-syndrome.

Regardless, the PR looks solid AFAICT, I'll leave the decision to include and merge up to @pseudo-rnd-thoughts.

src/environment/ale_state.cpp

jjshoots · 2024-07-24T02:01:48Z

src/python/env.py

+                action = tuple(action)
+                if len(action) != 3:
+                    raise error.Error("Actions must have 3-dimensions.")
+


I think this is not needed, Gymnasium's passive env checker automatically checks for this and line 275 will error out anyway. @pseudo-rnd-thoughts could you confirm?

sounds good, let me know and happy to remove if unnecessary.

src/python/env.py

psc-g · 2024-07-25T15:35:26Z

thanks for the review, @jjshoots ! i think i've addressed all your comments.

regarding your higher-level comment about having this just be a wrapper, i think it could be nice to keep it as is. the reason for this is that it will allow us to support paddles properly. currently paddles are using the same discrete actions and get mapped to a hardcoded value (PADDLE_DELTA). with continuous inputs we have the opportunity to have these be truly continuous controllers, which i think will be quite interesting.

for now i kept them as in the previous setup with hardcoded values to avoid being too disruptive. but having the c code available already will allow us to add it fairly easily in the future.

jjshoots · 2024-07-26T01:22:41Z

Gotcha, that makes sense. I think one more confirmation by @pseudo-rnd-thoughts and should be good to go. Looks like very interesting things are on the horizon!

pseudo-rnd-thoughts · 2024-07-26T14:15:30Z

regarding your higher-level comment about having this just be a wrapper, i think it could be nice to keep it as is. the reason for this is that it will allow us to support paddles properly. currently paddles are using the same discrete actions and get mapped to a hardcoded value (PADDLE_DELTA). with continuous inputs we have the opportunity to have these be truly continuous controllers, which i think will be quite interesting.

This makes sense however the current implementation uses fixed limits in both ALEState::setActionJoysticksContinuous and ALEState::applyActionPaddlesContinuous. I would prefer if the c++ has true continuous behaviour with the python being continuous to discrete that can be updated. For users, it will be easier to see this is what is happening behind the scenes as viewing the source code of Python is easier than compiled c++.

r, theta, fire = action.tolist()
AttributeError: 'list' object has no attribute 'tolist'

Could you fix the testing please before we do anything

psc-g · 2024-07-27T09:06:21Z

Hi @pseudo-rnd-thoughts , wondering if you're ok with the PR now? it seems there are still some CI issues, though.

psc-g · 2024-07-28T07:43:50Z

Thanks @pseudo-rnd-thoughts , tests fixed, apologies for that!

i'm not sure i totally understood your last comment regarding the c++/python question. are you ok with things as is, or are you suggesting changes?

jjshoots · 2024-07-30T13:24:51Z

@psc-g Apologies for leaving you hanging, I think what @pseudo-rnd-thoughts means is that it would be better if the CPP implementation was a pure continuous environment outright, where the thresholding is done on the Python end. As it currently stands, the CPP implementation is a harder-to-digest (harder, not impossible) implementation of essentially a Gymnasium Env Action Wrapper.

Unfortunately I tend to agree with him. I think there is definitely value in having something like this within ALE, but would appreciate seeing it with full continuous action space. At the current state, it feels like a half baked implementation waiting for someone in the future to pick up the pieces and complete the implementation.

I think having this PR open is good for now until it is revisited/completed downstream.

Edit: I've been informed this was a bit blunt. I mean this with zero offence at all.

jjshoots · 2024-07-31T12:41:41Z

@psc-g After some internal discussion, we think this is the best way forward for this PR:

We'll merge this PR into a separate branch within this repo.
ALE itself will be released off the master branch with updated Numpy 2.0 support.
We'll convert the CPP level implementation to pure continuous based on the work done here, and implement a discrete switch at the Python level.
We make another release for continuous action space ALE.

This would allow for users to have a pure-continuous action space, a clipped discrete action space, and a pure-discrete action space, while maintaining only one variant of the underlying ALE API.
Unfortunately, I don't think this timeline fits within the next week for RLC (I'll try tho).
Hopefully this works out well for everyone.

psc-g · 2024-08-03T11:32:43Z

Hey mark and jet, apologies for delayed response, I'm on vacation these two weeks 😄

Thank you for your help with this, I think your plan makes sense! Let me know if there's something I can do to help with the new plan, and no worries about RLC, I can point people to the CALE branch!

psc-g added 8 commits May 31, 2024 13:57

Changes to support continuous actions

a8b723c

Set continuous action threshold once in ALE constructor, not with eve…

4b44286

…ry call to actContinuous.

Set default value for continuous action threshold.

87af1ec

Use unified function for discrete and continuous actions.

26ef98f

Change step argument option from tuple to np.ndarray.

3c43735

Force action to tuple before splitting into three components.

bf6610d

Add a check to ensure that 3-dimensional actions are being passed in.

0eaf8db

Fixing style issues

11a980c

pseudo-rnd-thoughts reviewed Jul 15, 2024

View reviewed changes

Update action_space variable name for gymnasium compatibility.

f857a4e

Add unit tests for continuous action spaces.

f9c8560

pre-commit

5586b9b

jjshoots requested changes Jul 24, 2024

View reviewed changes

psc-g added 2 commits July 25, 2024 17:27

Fix action limits

76779d7

Use action.tolist() instead of converting to tuple

26f777a

Fix threshold checks for paddles.

5e42c6a

psc-g requested a review from jjshoots July 26, 2024 07:08

Fixing tests

6aadbb7

jjshoots changed the base branch from master to CALE July 31, 2024 12:42

jjshoots merged commit b24830f into Farama-Foundation:CALE Jul 31, 2024
28 checks passed

jjshoots mentioned this pull request Aug 3, 2024

Pure Continuous version of ALE on CPP #550

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for continuous actions in the ALE (CALE) #539

Add support for continuous actions in the ALE (CALE) #539

psc-g commented Jul 11, 2024

pseudo-rnd-thoughts left a comment

psc-g commented Jul 15, 2024

pseudo-rnd-thoughts commented Jul 15, 2024

psc-g commented Jul 15, 2024

psc-g commented Jul 15, 2024

psc-g commented Jul 15, 2024

psc-g commented Jul 23, 2024

pseudo-rnd-thoughts commented Jul 23, 2024

pseudo-rnd-thoughts commented Jul 23, 2024

pseudo-rnd-thoughts commented Jul 23, 2024

psc-g commented Jul 23, 2024

pseudo-rnd-thoughts commented Jul 23, 2024

jjshoots left a comment

jjshoots Jul 24, 2024

psc-g Jul 24, 2024

psc-g commented Jul 25, 2024

jjshoots commented Jul 26, 2024

pseudo-rnd-thoughts commented Jul 26, 2024

psc-g commented Jul 27, 2024

psc-g commented Jul 28, 2024

jjshoots commented Jul 30, 2024 •

edited

Loading

jjshoots commented Jul 31, 2024 •

edited

Loading

psc-g commented Aug 3, 2024

Add support for continuous actions in the ALE (CALE) #539

Add support for continuous actions in the ALE (CALE) #539

Conversation

psc-g commented Jul 11, 2024

pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

psc-g commented Jul 15, 2024

pseudo-rnd-thoughts commented Jul 15, 2024

psc-g commented Jul 15, 2024

psc-g commented Jul 15, 2024

psc-g commented Jul 15, 2024

psc-g commented Jul 23, 2024

pseudo-rnd-thoughts commented Jul 23, 2024

pseudo-rnd-thoughts commented Jul 23, 2024

pseudo-rnd-thoughts commented Jul 23, 2024

psc-g commented Jul 23, 2024

pseudo-rnd-thoughts commented Jul 23, 2024

jjshoots left a comment

Choose a reason for hiding this comment

jjshoots Jul 24, 2024

Choose a reason for hiding this comment

psc-g Jul 24, 2024

Choose a reason for hiding this comment

psc-g commented Jul 25, 2024

jjshoots commented Jul 26, 2024

pseudo-rnd-thoughts commented Jul 26, 2024

psc-g commented Jul 27, 2024

psc-g commented Jul 28, 2024

jjshoots commented Jul 30, 2024 • edited Loading

jjshoots commented Jul 31, 2024 • edited Loading

psc-g commented Aug 3, 2024

jjshoots commented Jul 30, 2024 •

edited

Loading

jjshoots commented Jul 31, 2024 •

edited

Loading