Multi_car_racing and Domain Randomization Progress #18
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Here's the current state of my work on
multi_car_racing
and Domain Randomization:Installation:
Running the script still requires Docker for now, I also copied the
multi_car_racing
repository for convenience (for some reason installing and importing as usual didn't work), this will be cleaned up later when we are ready to move on to later stages of the projectI upgraded the pettingZoo version to 1.23 and supersuit to 3.7.2 as pettingZoo < 1.23 had a typo preventing an import (
BaseParallelWraper
renamed toBaseParallelWrapper
)There was a circular import in
curriculum_sync_wrapper.py
Script:
The task wrapper for
multi_car_racing
seems to work as expectedThe curriculum setup is executed without any error:
ppo_continuous_action
architecture along with thecleanrl_pettingzoo_pistonball_plr
training script (with minor adjustments).I'm still solving bugs and progressing through the script. For now it seems that the
end_step
variable prevents the loop containing the backward pass to run (end_step
is equal to 0, thereforeb_obs = torch.flatten(rb_obs[:end_step], start_dim=0, end_dim=1)
is empty andfor start in range(0, len(b_obs), batch_size)
doesn't iterate properly).Could this be due to the fact that the
pistonball_plr
script is unfinished ? (In hindsight I should've chosen a training loop that wasn't in theexperimental
folder)