Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for continuous actions in the ALE (CALE) #539

Merged
merged 15 commits into from
Jul 31, 2024

Conversation

psc-g
Copy link
Contributor

@psc-g psc-g commented Jul 11, 2024

This pull request enables the use of continuous actions in the ALE. It uses the same Python interface as the standard discrete actions, but now also accepts a numpy array of actions.

The idea is to simulate a real Atari joystick with a continuous unit circle, representing all the possible joystick movements. The unit circle is parameterized with polar coordinates, one dimension representing radius r (in [0, 1]), and another dimension representing theta (in [-π, π]). The fire button is a separate dimension (in [0, 1]). The action space is thus 3-dimensional.

The original Atari controller, despite having an analogous joystick, still only triggered one of 9 discrete events triggered by the joystick position. This is simulated with a threshold parameter continuous_action_threshold. For example: if the x position of the joystick is less than continuous_action_threshold then the LEFT event is triggered, and if the y position of the joystick is larger than continuous_action_threshold, the UP event is triggered (note these two events can trigger simultaneously).

The intent of this pull request is to enable the benchmarking of continuous controllers on the ALE, which would open the opportunity for more direct, and fair, comparisons between discrete- and continuous-control RL agents.

Copy link
Member

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Pablo for the PR, it certainly looks interesting and we would be interested in adding this.

The CI has broken for some result, therefore, I'll try to fix the build CI in the next few days before properly reviewing the PR.
In the meantime, could you add some testing for this?

@psc-g
Copy link
Contributor Author

psc-g commented Jul 15, 2024

Thanks Mark,
Would be happy to add tests for this. Could you let me know how you run the tests? When I try with pytest -q tests/python/test_atari_env.py I get a whole bunch of failures due to ROMs not found (even though I did pip install gymnasium[atari]).

@pseudo-rnd-thoughts
Copy link
Member

The ROMs are now packaged within the PyPI but not the repo(so AutoROM is unnecessary), you should just need to run bash scripts/download_unpack_roms.sh which will download the ROMs and put them in the correct folder.

@psc-g
Copy link
Contributor Author

psc-g commented Jul 15, 2024

Thanks Mark, that resolved that issue, but there's another one now, which seems to be unrelated to my changes (since I haven't added any tests yet and code defaults to discrete, not continuous, actions). All the tests raise this AsertionError:

AssertionError: The environment must specify an action space. https://gymnasium.farama.org/tutorials/gymnasium_basics/environment_creation/

Would it be possible for you to try running this on your end, to verify that they do still pass? That would confirm that it's an issue with my setup.

@psc-g
Copy link
Contributor Author

psc-g commented Jul 15, 2024

Figured it out. I mistakenly added underscores before action_space in the first commit. Working on new tests for continuous actions now.

@psc-g
Copy link
Contributor Author

psc-g commented Jul 15, 2024

Unit tests added! Let me know if you have any other comments on the PR.

@psc-g
Copy link
Contributor Author

psc-g commented Jul 23, 2024

hey mark, any update on the build CI issue?

@pseudo-rnd-thoughts
Copy link
Member

Thanks @psc-g for checking in, sadly not, in #535 I've fixed the building problem such that all the wheels are created. However, we added recently testing that the wheels actually work which shows they don't.
Unzipping the wheels to understand their content, for the windows and linux builds, they are all using the same version of python while the macos versions work. Very strange.
I've put a message on the CIBuildTools github page, pypa/cibuildwheel#1940, asking for support but haven't received any reply.
If you have any time to look and understand, that would be greatly appreciated.

@JesseFarebro Any ideas?

@pseudo-rnd-thoughts
Copy link
Member

Reading through the logs more closely, I spotted

-- pybind11 v2.13.1 
  CMake Warning (dev) at build/temp.win-amd64-cpython-38/Release/ale_py._ale_py/_deps/pybind11-src/tools/FindPythonLibsNew.cmake:101 (message):
    Policy CMP0148 is not set: The FindPythonInterp and FindPythonLibs modules
    are removed.  Run "cmake --help-policy CMP0148" for policy details.  Use
    the cmake_policy command to set the policy and suppress this warning, or
    preferably upgrade to using FindPython, either by calling it explicitly
    before pybind11, or by setting PYBIND11_FINDPYTHON ON before pybind11.
  Call Stack (most recent call first):
    vcpkg/scripts/buildsystems/vcpkg.cmake:859 (_find_package)
    build/temp.win-amd64-cpython-38/Release/ale_py._ale_py/_deps/pybind11-src/tools/pybind11Tools.cmake:50 (find_package)
    build/temp.win-amd64-cpython-38/Release/ale_py._ale_py/_deps/pybind11-src/tools/pybind11Common.cmake:202 (include)
    build/temp.win-amd64-cpython-38/Release/ale_py._ale_py/_deps/pybind11-src/CMakeLists.txt:248 (include)
  This warning is for project developers.  Use -Wno-dev to suppress it.
  
  CMake Warning (dev) at vcpkg/scripts/buildsystems/vcpkg.cmake:859 (_find_package):
    Policy CMP0148 is not set: The FindPythonInterp and FindPythonLibs modules
    are removed.  Run "cmake --help-policy CMP0148" for policy details.  Use
    the cmake_policy command to set the policy and suppress this warning.
  
  Call Stack (most recent call first):
    build/temp.win-amd64-cpython-38/Release/ale_py._ale_py/_deps/pybind11-src/tools/FindPythonLibsNew.cmake:114 (find_package)
    vcpkg/scripts/buildsystems/vcpkg.cmake:859 (_find_package)
    build/temp.win-amd64-cpython-38/Release/ale_py._ale_py/_deps/pybind11-src/tools/pybind11Tools.cmake:50 (find_package)
    build/temp.win-amd64-cpython-38/Release/ale_py._ale_py/_deps/pybind11-src/tools/pybind11Common.cmake:202 (include)
    build/temp.win-amd64-cpython-38/Release/ale_py._ale_py/_deps/pybind11-src/CMakeLists.txt:248 (include)
  This warning is for project developers.  Use -Wno-dev to suppress it.
  
  -- Found PythonInterp: C:/hostedtoolcache/windows/Python/3.9.13/x64/python3.exe (found suitable version "3.9.13", minimum required is "3.7")
  -- Found PythonLibs: C:/hostedtoolcache/windows/Python/3.9.13/x64/libs/python39.lib
  -- Found Python3: C:\Users\runneradmin\AppData\Local\Temp\build-env-5ocq8zbx\Scripts\python.exe (found version "3.8.10") found components: Interpreter
  -- Configuring done (37.4s)
  -- Generating done (0.2s)
  -- Build files have been written to: D:/a/Arcade-Learning-Environment/Arcade-Learning-Environment/build/temp.win-amd64-cpython-38/Release/ale_py._ale_py
  MSBuild version 17.10.4+10fbfbf2e for .NET Framework

This appears to be the underlying problem, I'll see what I can do

@pseudo-rnd-thoughts
Copy link
Member

@psc-g I've fixed the CI issues, thanks for poking me about it.
@jjshoots Has offered to review it in the next few days

I'm thinking about how we release this. I'm guessing you have a paper associated with this, how publicly do you wish for us to be about it. Our plan is to cut a release in the next few days, v0.9.1, that adds support for NumPy 2.0
We would then have a subsequence release when this PR is ready, v0.10.0 that can highlight this new feature and possibly a link to your paper for how it could be used

@psc-g
Copy link
Contributor Author

psc-g commented Jul 23, 2024

thanks mark, great to hear you were able to resolve the issues!

yes, we have a paper for this that is under review, but wanted to put it up on arxiv soon. before doing that, however, we wanted this to be merged so that we can point to the main repo instead of my fork. would be happy to share the draft with you privately if that would help!

in terms of coordination with releases, happy to do whatever you think works best. would be awesome if we can have this out before RLC (happening august 9-12) so that we can start advertising it there.

@pseudo-rnd-thoughts
Copy link
Member

yes, we have a paper for this that is under review, but wanted to put it up on arxiv soon. before doing that, however, we wanted this to be merged so that we can point to the main repo instead of my fork. would be happy to share the draft with you privately if that would help!

Cool, we can work with that. Once Jet finishes his review then we can cut a release, straight away

in terms of coordination with releases, happy to do whatever you think works best. would be awesome if we can have this out before RLC (happening august 9-12) so that we can start advertising it there.

I'll write a short summary of this PR using what you wrote above in the next day and include example code for people to utilise

Copy link
Member

@jjshoots jjshoots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an interesting PR to review, thanks for the contribution! I have just some minor comments that can possibly be ignored (except the action space one).

As I understand things correctly, CALE works by taking in a continuous action in the form of r, theta, fire, and then uses a thresholding parameter to convert that into a discrete action, in which everything else is similar to the normal Atari envs.
I question whether this needs to be a full feature within ALE at the C level, as opposed to simply using a Gymnasium wrapper that converts continuous action values into discrete action values in Python land. I see that ALE already provides the full action space to discrete agents that the continuous agent is allowed to have.
Having this in C land would mean that it is a baked-in feature of ALE and would probably run a little bit faster, but from a maintainability standpoint, ALE is already struggling from not-enough-developers-syndrome.

Regardless, the PR looks solid AFAICT, I'll leave the decision to include and merge up to @pseudo-rnd-thoughts.

src/environment/ale_state.cpp Outdated Show resolved Hide resolved
Comment on lines 271 to 274
action = tuple(action)
if len(action) != 3:
raise error.Error("Actions must have 3-dimensions.")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not needed, Gymnasium's passive env checker automatically checks for this and line 275 will error out anyway. @pseudo-rnd-thoughts could you confirm?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, let me know and happy to remove if unnecessary.

src/python/env.py Outdated Show resolved Hide resolved
src/python/env.py Outdated Show resolved Hide resolved
@psc-g
Copy link
Contributor Author

psc-g commented Jul 25, 2024

thanks for the review, @jjshoots ! i think i've addressed all your comments.

regarding your higher-level comment about having this just be a wrapper, i think it could be nice to keep it as is. the reason for this is that it will allow us to support paddles properly. currently paddles are using the same discrete actions and get mapped to a hardcoded value (PADDLE_DELTA). with continuous inputs we have the opportunity to have these be truly continuous controllers, which i think will be quite interesting.

for now i kept them as in the previous setup with hardcoded values to avoid being too disruptive. but having the c code available already will allow us to add it fairly easily in the future.

@jjshoots
Copy link
Member

Gotcha, that makes sense. I think one more confirmation by @pseudo-rnd-thoughts and should be good to go. Looks like very interesting things are on the horizon!

@psc-g psc-g requested a review from jjshoots July 26, 2024 07:08
@pseudo-rnd-thoughts
Copy link
Member

regarding your higher-level comment about having this just be a wrapper, i think it could be nice to keep it as is. the reason for this is that it will allow us to support paddles properly. currently paddles are using the same discrete actions and get mapped to a hardcoded value (PADDLE_DELTA). with continuous inputs we have the opportunity to have these be truly continuous controllers, which i think will be quite interesting.

This makes sense however the current implementation uses fixed limits in both ALEState::setActionJoysticksContinuous and ALEState::applyActionPaddlesContinuous. I would prefer if the c++ has true continuous behaviour with the python being continuous to discrete that can be updated. For users, it will be easier to see this is what is happening behind the scenes as viewing the source code of Python is easier than compiled c++.

r, theta, fire = action.tolist()
AttributeError: 'list' object has no attribute 'tolist'

Could you fix the testing please before we do anything

@psc-g
Copy link
Contributor Author

psc-g commented Jul 27, 2024

Hi @pseudo-rnd-thoughts , wondering if you're ok with the PR now? it seems there are still some CI issues, though.

@psc-g
Copy link
Contributor Author

psc-g commented Jul 28, 2024

Thanks @pseudo-rnd-thoughts , tests fixed, apologies for that!

i'm not sure i totally understood your last comment regarding the c++/python question. are you ok with things as is, or are you suggesting changes?

@jjshoots
Copy link
Member

jjshoots commented Jul 30, 2024

@psc-g Apologies for leaving you hanging, I think what @pseudo-rnd-thoughts means is that it would be better if the CPP implementation was a pure continuous environment outright, where the thresholding is done on the Python end. As it currently stands, the CPP implementation is a harder-to-digest (harder, not impossible) implementation of essentially a Gymnasium Env Action Wrapper.

Unfortunately I tend to agree with him. I think there is definitely value in having something like this within ALE, but would appreciate seeing it with full continuous action space. At the current state, it feels like a half baked implementation waiting for someone in the future to pick up the pieces and complete the implementation.

I think having this PR open is good for now until it is revisited/completed downstream.

Edit: I've been informed this was a bit blunt. I mean this with zero offence at all.

@jjshoots
Copy link
Member

jjshoots commented Jul 31, 2024

@psc-g After some internal discussion, we think this is the best way forward for this PR:

  1. We'll merge this PR into a separate branch within this repo.
  2. ALE itself will be released off the master branch with updated Numpy 2.0 support.
  3. We'll convert the CPP level implementation to pure continuous based on the work done here, and implement a discrete switch at the Python level.
  4. We make another release for continuous action space ALE.

This would allow for users to have a pure-continuous action space, a clipped discrete action space, and a pure-discrete action space, while maintaining only one variant of the underlying ALE API.
Unfortunately, I don't think this timeline fits within the next week for RLC (I'll try tho).
Hopefully this works out well for everyone.

@jjshoots jjshoots changed the base branch from master to CALE July 31, 2024 12:42
@jjshoots jjshoots merged commit b24830f into Farama-Foundation:CALE Jul 31, 2024
28 checks passed
@psc-g
Copy link
Contributor Author

psc-g commented Aug 3, 2024

Hey mark and jet, apologies for delayed response, I'm on vacation these two weeks 😄

Thank you for your help with this, I think your plan makes sense! Let me know if there's something I can do to help with the new plan, and no worries about RLC, I can point people to the CALE branch!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants