Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi_car_racing and Domain Randomization Progress #18

Open
wants to merge 5 commits into
base: nmmo
Choose a base branch
from

Conversation

RPegoud
Copy link

@RPegoud RPegoud commented Feb 29, 2024

Here's the current state of my work on multi_car_racing and Domain Randomization:

Installation:

  • Running the script still requires Docker for now, I also copied the multi_car_racing repository for convenience (for some reason installing and importing as usual didn't work), this will be cleaned up later when we are ready to move on to later stages of the project

  • I upgraded the pettingZoo version to 1.23 and supersuit to 3.7.2 as pettingZoo < 1.23 had a typo preventing an import (BaseParallelWraper renamed to BaseParallelWrapper)

  • There was a circular import in curriculum_sync_wrapper.py

from syllabus.core import Curriculum, decorate_all_functions # circular import

from syllabus.core import Curriculum
from .utils import decorate_all_functions  # fixed the problem

Script:

  • The task wrapper for multi_car_racing seems to work as expected

  • The curriculum setup is executed without any error:

env = MultiCarRacingParallelWrapper(env=env, n_agents=n_agents)
curriculum = DomainRandomization(env.task_space)
curriculum, task_queue, update_queue = make_multiprocessing_curriculum(curriculum)
  • However, I'm still unsure of how to update the DR curriculum compared to PLR:
# TODO: adapt to DR
if global_cycles % num_steps == 0:
    update = {
        "update_type": "on_demand",
        "metrics": {
            "action_log_dist": logprobs,
            "value": values,
            "next_value": (
                agent.get_value(next_obs)
                if step == num_steps - 1
                else None
            ),
            "rew": rb_rewards[step],
            "masks": torch.Tensor(1 - np.array(list(dones.values()))),
            "tasks": [env.unwrapped.task],
        },
    }
    curriculum.update_curriculum(update)

I'm still solving bugs and progressing through the script. For now it seems that the end_step variable prevents the loop containing the backward pass to run (end_step is equal to 0, therefore b_obs = torch.flatten(rb_obs[:end_step], start_dim=0, end_dim=1) is empty and for start in range(0, len(b_obs), batch_size) doesn't iterate properly).

Could this be due to the fact that the pistonball_plr script is unfinished ? (In hindsight I should've chosen a training loop that wasn't in the experimental folder)

)

""" CURRICULUM SETUP """
env = MultiCarRacingParallelWrapper(env=env, n_agents=n_agents)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to wrap the environment in a PettingZooMultiProcessingSyncWrapper and then you should be done setting up Syllabus.

"tasks": [env.unwrapped.task],
},
}
curriculum.update_curriculum(update)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need any of this curriculum code for Domain Randomization. You do need it for PLR, but I'm working on a new version of PLR that will cut this out, hopefully done in a few days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants