Lasertag self play #26

RPegoud · 2024-03-28T09:55:48Z

New features:

Parallel PZ wrapper for Lasertag
Selfplay Curriculum
PPO training script using selfplay on lasertag

class SelfPlay(Curriculum):
    def __init__(self, agent, device: str, store_agents_on_cpu: bool = False):
        self.store_agents_on_cpu = store_agents_on_cpu
        self.storage_device = "cpu" if self.store_agents_on_cpu else device
        self.agent = deepcopy(agent).to(self.storage_device)

    def update_agent(self, agent):
        self.agent = deepcopy(agent).to(self.storage_device)

    def get_opponent(self):
        # Always return the most recent agent
        return self.agent

    def sample(self, k=1):
        return 0

RyanNavillus

Looks great! I left a couple of comments and suggestions, I'll message you on discord about the FSP/PFSP API

lasertag/register.py

RyanNavillus · 2024-03-29T21:09:00Z

lasertag_ppo.py

+        self.task = None
+        self.episode_return = 0
+        self.task_space = TaskSpace(spaces.MultiDiscrete(np.array([[2], [5]])))
+        self.possible_agents = np.arange(self.n_agents)


Ideally this should be a list of agent names

Suggested change

self.possible_agents = np.arange(self.n_agents)

self.possible_agents = [f"agent_{i}" for i in range(self.n_agents)]

RyanNavillus · 2024-03-29T21:13:48Z

lasertag_ppo.py

+        out = {}
+        for idx, i in enumerate(array):
+            out[str(idx)] = i
+        return out


Could these variable names be more descriptive? Also I assume str(idx) should instead be self.possible_agents[idx] if you change that to strings.

RyanNavillus · 2024-03-29T21:14:45Z

lasertag_ppo.py

+        """
+        Broadcasts the `done` and `trunc` flags to dictionaries keyed by agent id.
+        """
+        return {str(idx): value for idx in range(self.n_agents)}


Same thing here, maybe just calling idx agent_idx is enough

RyanNavillus · 2024-03-29T21:15:55Z

lasertag_ppo.py

+        action = batchify(action, device)
+        obs, rew, done, info = self.env.step(action)
+        obs = obs["image"]
+        trunc = 0  # there is no `truncated` flag in this environment


Suggested change

trunc = 0 # there is no `truncated` flag in this environment

trunc = False # there is no `truncated` flag in this environment

RyanNavillus · 2024-03-29T21:22:34Z

lasertag_ppo.py

+        probs = Categorical(logits=logits)
+        if action is None:
+            action = probs.sample()
+        return action, probs.log_prob(action), probs.entropy(), self.critic(hidden)


Didn't review the agent code super closely but it looks good

RyanNavillus · 2024-03-29T21:23:19Z

lasertag_ppo.py

+    # convert to torch
+    obs = torch.tensor(obs).to(device)
+
+    return obs


Is there any reason not to just use batchify? They do the exact same thing right?

That's right, the original PettingZoo code had an added obs = obs.transpose(0, -1, 1, 2) in batchify_obs which I removed. So there's no need for this function anymore

RyanNavillus · 2024-03-29T21:23:54Z

lasertag_ppo.py

+    x = x.cpu().numpy()
+    x = {a: x[i] for i, a in enumerate(env.possible_agents)}
+
+    return x


I assume this duplicated code isn't intentional

RyanNavillus · 2024-03-29T21:33:35Z

lasertag_ppo.py

+    print(f"Approx KL: {approx_kl.item()}")
+    print(f"Clip Fraction: {np.mean(clip_fracs)}")
+    print(f"Explained Variance: {explained_var.item()}")
+    print("\n-------------------------------------------\n")


You could consider adding in weights and biases integration (it's only like 20 lines of code, check out cleanrl_procgen_plr.py). It's a good tool and generates plots for you automatically, and most RL tools integrate with it. You'll need to learn it eventually for this project, but if you're happy with how you're doing things you can try wandb out later.

Sounds like a good improvement, I just used whatever logging was used in the cleanRL script but wandb should provide a clearer overview of training progress

syllabus/task_space/task_space.py

RyanNavillus

Could you put the agent curricula into a file syllabus/curricula/selfplay.py? Also please move the DualCurriculumWrapper to syllabus/core/dual_curriculum_wrapper.py.

…e and curricula

RyanNavillus

Thanks for moving everything into Syllabus, looks great!. Could you also add some test cases to the multiagent smoke tests (maybe using pettingzoo's chess environment) to test the self play algorithms. We just want something quick that runs through their code to make sure there are no basic runtime errors

RyanNavillus · 2024-04-19T20:34:45Z

syllabus/core/dual_curriculum_wrapper.py

+        self.agent_mp_curriculum, self.agent_task_queue, self.agent_update_queue = (
+            make_multiprocessing_curriculum(agent_curriculum)
+        )
+        self.sample()  # initializes env_task and agent_task


This initializer seems hacky. I don't think we should be internally making multiprocessing curricula. Instead let the user create 2 curricula, pass them to this, then in their code, wrap the dual curriculum in a multiprocessing wrapper. This wrapper should implement update functions to pass updates to both the agent and environment curriculum. For example the update_on_step function should call self.env_curriculum.update_on_step and self.agent_curriculum.update_on_step.

I also don't see why we need to call sample here?

The initializer might also need to update the task space. So the task space of this curriculum should be Tuple(env_curriculum.task_space, agent_curriculum.task_space)

I'll try to merge Nistha's task space updates in today so that you can pull them in here

syllabus/curricula/selfplay.py

RyanNavillus · 2024-04-19T20:40:14Z

syllabus/curricula/selfplay.py

+        its priority.
+        """
+        if self.n_stored_agents < self.max_agents:
+            # TODO: define the expected behaviour when the limit is exceeded


We probably should delete or overwrite agents when this happens.

Then I suggest this:

joblib.dump( agent, filename=f"{self.storage_path}/{self.name}_agent_checkpoint_{self.current_agent_index % self.max_agents}.pkl", ) self.current_agent_index += 1

If we go this route, you should save the most recent agent to a file some.where. Since it won't be obvious from the filenames. It might be better to just delete a file and write a new one, but this is fine for now.

…wrapper API changes (still ongoing)

…lasertag_self_play

…ort, added Dict task space to dual curriculum

…pt to automate slurm jobs

…ments, refactoring

into lasertag_self_play

…lasertag_self_play

RPegoud added 9 commits March 22, 2024 16:28

added lasertag parallel wrapper

6209199

running lasertag selfplay (not learning)

dd40390

added opponent update

36c5ea9

reverted commit

39d4d43

added recording code, run from colab ?

1676dc7

working selfplay lasertag script

f0ea3f9

updated lasertag script

8134546

added device selection for selfplay agent storage

1d293d4

added fsp v0

7abf246

RPegoud marked this pull request as draft March 28, 2024 14:03

RPegoud added 2 commits March 29, 2024 11:36

added prioritized fictitious self-play

e7a6ab8

added win rate and n_games plots for pfsp

990e719

RyanNavillus reviewed Mar 29, 2024

View reviewed changes

RPegoud added 4 commits April 2, 2024 11:54

added wandb integration

bcc3d1e

added adversarial sp script

bd12c98

replaced pettingzoo ppo code with cleanrl base ppo.py

09a34de

Added DualCurriculumWrapper and baselines for SP+DR, FSP+DR, PFSP+DR

e7a1bd3

RyanNavillus reviewed Apr 18, 2024

View reviewed changes

added dual curriculum wrapper and selfplay variations to syllabus cor…

4f42f3a

…e and curricula

RPegoud marked this pull request as ready for review April 18, 2024 15:14

RyanNavillus reviewed Apr 19, 2024

View reviewed changes

RPegoud added 9 commits April 20, 2024 10:00

resolved circular import, integrated PR suggestions, dual curriculum …

1a5a131

…wrapper API changes (still ongoing)

added get_opp and update_agent to base curriculum class

3e43732

.

7d4a2d8

Merge branch 'main' of https://github.com/RyanNavillus/Syllabus into …

82de680

…lasertag_self_play

added pfsp_dr training script with checkpoints, resolved circular imp…

d636b28

…ort, added Dict task space to dual curriculum

added seeding

b79e844

added single script to run lasertag DR + {SP/FSP/PFSP} and shell scri…

54f1fc4

…pt to automate slurm jobs

added plotly, tqdm, joblib, griddly to setup test dependencies

49ed44b

refactored typing annotation for python 3.8 compatibility

9131e5c

RPegoud and others added 18 commits April 26, 2024 08:32

fixed pytorch error when running with cuda enabled

1c5b051

fixed checkpointing

e61a779

adapted training script to run for N updates, added command line argu…

cd9599f

…ments, refactoring

added agent checkpoints based on update steps

b38f37e

updated round robin result format, added wandb model storage

6b5534d

PLR + selfplay v0

a35ebcb

fixed wandb model logging errors

3c30ccf

added seed to selfplay checkpoints

552dbfc

added round robin results parsing script

226c491

Add chaecks for missing files

24beb7f

Fix num_steps

69cc927

lasertag vecenv attempt

f81db5b

Merge branch 'lasertag_self_play' of https://github.com/RPegoud/Syllabus

feebb4b

into lasertag_self_play

lasertag CNN/LSTM v0

4d0938c

lstm fixes, selfplay changes (+ to cpu before save)

a4cea85

rnn lstm fixes, selfplay => agent to gpu after save

348453f

improved round-robin parsing

ae74fa5

corrected round robin script for lstm

b991f8b

RyanNavillus changed the base branch from nmmo to lasertag May 21, 2024 21:27

RPegoud and others added 10 commits June 11, 2024 22:26

vecenv progress

ba3349c

functional vecenv / LSTM code, next-steps: add curriculum wrapper

ef35582

Curriculum setup, requires vectorization of env curriculum

9dfaf8d

added comments

24799ac

Update to use syllabus infra

a8a3ec5

Merge branch 'main' of https://github.com/RyanNavillus/Syllabus into …

e7070f0

…lasertag_self_play

Fix bad merge

6c0ad7c

Add syllabus infrastructure and batched sampling to agent curriculum

df762f8

Set workers back to 8

0bc7a04

working vectorized agent selection with queue storage

035c6dc

RyanNavillus changed the base branch from lasertag to main November 24, 2024 08:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lasertag self play #26

Lasertag self play #26

RPegoud commented Mar 28, 2024

RyanNavillus left a comment

RyanNavillus Mar 29, 2024

RyanNavillus Mar 29, 2024

RyanNavillus Mar 29, 2024

RyanNavillus Mar 29, 2024

RyanNavillus Mar 29, 2024

RyanNavillus Mar 29, 2024

RPegoud Mar 30, 2024

RyanNavillus Mar 29, 2024

RyanNavillus Mar 29, 2024

RPegoud Mar 30, 2024

RyanNavillus left a comment

RyanNavillus left a comment

RyanNavillus Apr 19, 2024

RyanNavillus Apr 19, 2024

RyanNavillus Apr 19, 2024

RPegoud Apr 20, 2024

RyanNavillus Apr 20, 2024

	self.possible_agents = np.arange(self.n_agents)
	self.possible_agents = [f"agent_{i}" for i in range(self.n_agents)]

	trunc = 0 # there is no `truncated` flag in this environment
	trunc = False # there is no `truncated` flag in this environment

Lasertag self play #26

Are you sure you want to change the base?

Lasertag self play #26

Conversation

RPegoud commented Mar 28, 2024

RyanNavillus left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RyanNavillus left a comment

Choose a reason for hiding this comment

RyanNavillus left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment