-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for continuous actions in the ALE (CALE) #539
Conversation
…ry call to actContinuous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Pablo for the PR, it certainly looks interesting and we would be interested in adding this.
The CI has broken for some result, therefore, I'll try to fix the build CI in the next few days before properly reviewing the PR.
In the meantime, could you add some testing for this?
Thanks Mark, |
The ROMs are now packaged within the PyPI but not the repo(so AutoROM is unnecessary), you should just need to run |
Thanks Mark, that resolved that issue, but there's another one now, which seems to be unrelated to my changes (since I haven't added any tests yet and code defaults to discrete, not continuous, actions). All the tests raise this
Would it be possible for you to try running this on your end, to verify that they do still pass? That would confirm that it's an issue with my setup. |
Figured it out. I mistakenly added underscores before action_space in the first commit. Working on new tests for continuous actions now. |
Unit tests added! Let me know if you have any other comments on the PR. |
hey mark, any update on the build CI issue? |
Thanks @psc-g for checking in, sadly not, in #535 I've fixed the building problem such that all the wheels are created. However, we added recently testing that the wheels actually work which shows they don't. @JesseFarebro Any ideas? |
Reading through the logs more closely, I spotted -- pybind11 v2.13.1
CMake Warning (dev) at build/temp.win-amd64-cpython-38/Release/ale_py._ale_py/_deps/pybind11-src/tools/FindPythonLibsNew.cmake:101 (message):
Policy CMP0148 is not set: The FindPythonInterp and FindPythonLibs modules
are removed. Run "cmake --help-policy CMP0148" for policy details. Use
the cmake_policy command to set the policy and suppress this warning, or
preferably upgrade to using FindPython, either by calling it explicitly
before pybind11, or by setting PYBIND11_FINDPYTHON ON before pybind11.
Call Stack (most recent call first):
vcpkg/scripts/buildsystems/vcpkg.cmake:859 (_find_package)
build/temp.win-amd64-cpython-38/Release/ale_py._ale_py/_deps/pybind11-src/tools/pybind11Tools.cmake:50 (find_package)
build/temp.win-amd64-cpython-38/Release/ale_py._ale_py/_deps/pybind11-src/tools/pybind11Common.cmake:202 (include)
build/temp.win-amd64-cpython-38/Release/ale_py._ale_py/_deps/pybind11-src/CMakeLists.txt:248 (include)
This warning is for project developers. Use -Wno-dev to suppress it.
CMake Warning (dev) at vcpkg/scripts/buildsystems/vcpkg.cmake:859 (_find_package):
Policy CMP0148 is not set: The FindPythonInterp and FindPythonLibs modules
are removed. Run "cmake --help-policy CMP0148" for policy details. Use
the cmake_policy command to set the policy and suppress this warning.
Call Stack (most recent call first):
build/temp.win-amd64-cpython-38/Release/ale_py._ale_py/_deps/pybind11-src/tools/FindPythonLibsNew.cmake:114 (find_package)
vcpkg/scripts/buildsystems/vcpkg.cmake:859 (_find_package)
build/temp.win-amd64-cpython-38/Release/ale_py._ale_py/_deps/pybind11-src/tools/pybind11Tools.cmake:50 (find_package)
build/temp.win-amd64-cpython-38/Release/ale_py._ale_py/_deps/pybind11-src/tools/pybind11Common.cmake:202 (include)
build/temp.win-amd64-cpython-38/Release/ale_py._ale_py/_deps/pybind11-src/CMakeLists.txt:248 (include)
This warning is for project developers. Use -Wno-dev to suppress it.
-- Found PythonInterp: C:/hostedtoolcache/windows/Python/3.9.13/x64/python3.exe (found suitable version "3.9.13", minimum required is "3.7")
-- Found PythonLibs: C:/hostedtoolcache/windows/Python/3.9.13/x64/libs/python39.lib
-- Found Python3: C:\Users\runneradmin\AppData\Local\Temp\build-env-5ocq8zbx\Scripts\python.exe (found version "3.8.10") found components: Interpreter
-- Configuring done (37.4s)
-- Generating done (0.2s)
-- Build files have been written to: D:/a/Arcade-Learning-Environment/Arcade-Learning-Environment/build/temp.win-amd64-cpython-38/Release/ale_py._ale_py
MSBuild version 17.10.4+10fbfbf2e for .NET Framework This appears to be the underlying problem, I'll see what I can do |
@psc-g I've fixed the CI issues, thanks for poking me about it. I'm thinking about how we release this. I'm guessing you have a paper associated with this, how publicly do you wish for us to be about it. Our plan is to cut a release in the next few days, v0.9.1, that adds support for NumPy 2.0 |
thanks mark, great to hear you were able to resolve the issues! yes, we have a paper for this that is under review, but wanted to put it up on arxiv soon. before doing that, however, we wanted this to be merged so that we can point to the main repo instead of my fork. would be happy to share the draft with you privately if that would help! in terms of coordination with releases, happy to do whatever you think works best. would be awesome if we can have this out before RLC (happening august 9-12) so that we can start advertising it there. |
Cool, we can work with that. Once Jet finishes his review then we can cut a release, straight away
I'll write a short summary of this PR using what you wrote above in the next day and include example code for people to utilise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an interesting PR to review, thanks for the contribution! I have just some minor comments that can possibly be ignored (except the action space one).
As I understand things correctly, CALE works by taking in a continuous action in the form of r, theta, fire
, and then uses a thresholding parameter to convert that into a discrete action, in which everything else is similar to the normal Atari envs.
I question whether this needs to be a full feature within ALE at the C level, as opposed to simply using a Gymnasium wrapper that converts continuous action values into discrete action values in Python land. I see that ALE already provides the full action space to discrete agents that the continuous agent is allowed to have.
Having this in C land would mean that it is a baked-in feature of ALE and would probably run a little bit faster, but from a maintainability standpoint, ALE is already struggling from not-enough-developers-syndrome.
Regardless, the PR looks solid AFAICT, I'll leave the decision to include and merge up to @pseudo-rnd-thoughts.
src/python/env.py
Outdated
action = tuple(action) | ||
if len(action) != 3: | ||
raise error.Error("Actions must have 3-dimensions.") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not needed, Gymnasium's passive env checker automatically checks for this and line 275 will error out anyway. @pseudo-rnd-thoughts could you confirm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, let me know and happy to remove if unnecessary.
thanks for the review, @jjshoots ! i think i've addressed all your comments. regarding your higher-level comment about having this just be a wrapper, i think it could be nice to keep it as is. the reason for this is that it will allow us to support paddles properly. currently paddles are using the same discrete actions and get mapped to a hardcoded value ( for now i kept them as in the previous setup with hardcoded values to avoid being too disruptive. but having the c code available already will allow us to add it fairly easily in the future. |
Gotcha, that makes sense. I think one more confirmation by @pseudo-rnd-thoughts and should be good to go. Looks like very interesting things are on the horizon! |
This makes sense however the current implementation uses fixed limits in both
Could you fix the testing please before we do anything |
Hi @pseudo-rnd-thoughts , wondering if you're ok with the PR now? it seems there are still some CI issues, though. |
Thanks @pseudo-rnd-thoughts , tests fixed, apologies for that! i'm not sure i totally understood your last comment regarding the c++/python question. are you ok with things as is, or are you suggesting changes? |
@psc-g Apologies for leaving you hanging, I think what @pseudo-rnd-thoughts means is that it would be better if the CPP implementation was a pure continuous environment outright, where the thresholding is done on the Python end. As it currently stands, the CPP implementation is a harder-to-digest (harder, not impossible) implementation of essentially a Gymnasium Env Action Wrapper. Unfortunately I tend to agree with him. I think there is definitely value in having something like this within ALE, but would appreciate seeing it with full continuous action space. At the current state, it feels like a half baked implementation waiting for someone in the future to pick up the pieces and complete the implementation. I think having this PR open is good for now until it is revisited/completed downstream. Edit: I've been informed this was a bit blunt. I mean this with zero offence at all. |
@psc-g After some internal discussion, we think this is the best way forward for this PR:
This would allow for users to have a pure-continuous action space, a clipped discrete action space, and a pure-discrete action space, while maintaining only one variant of the underlying ALE API. |
Hey mark and jet, apologies for delayed response, I'm on vacation these two weeks 😄 Thank you for your help with this, I think your plan makes sense! Let me know if there's something I can do to help with the new plan, and no worries about RLC, I can point people to the CALE branch! |
This pull request enables the use of continuous actions in the ALE. It uses the same Python interface as the standard discrete actions, but now also accepts a numpy array of actions.
The idea is to simulate a real Atari joystick with a continuous unit circle, representing all the possible joystick movements. The unit circle is parameterized with polar coordinates, one dimension representing radius
r
(in [0, 1]), and another dimension representingtheta
(in [-π, π]). The fire button is a separate dimension (in [0, 1]). The action space is thus 3-dimensional.The original Atari controller, despite having an analogous joystick, still only triggered one of 9 discrete events triggered by the joystick position. This is simulated with a threshold parameter
continuous_action_threshold
. For example: if thex
position of the joystick is less thancontinuous_action_threshold
then theLEFT
event is triggered, and if they
position of the joystick is larger thancontinuous_action_threshold
, theUP
event is triggered (note these two events can trigger simultaneously).The intent of this pull request is to enable the benchmarking of continuous controllers on the ALE, which would open the opportunity for more direct, and fair, comparisons between discrete- and continuous-control RL agents.