sb3 #38

AdrianHuang2002 · 2024-04-27T13:24:14Z

No description provided.

AdrianHuang2002 · 2024-04-27T13:39:27Z

https://wandb.ai/h975894552/Syllabus/runs/z24gccwf/overview?nw=nwuserh975894552

Data for this model

RyanNavillus

Left some comments. Please make the requested changes to simplify the PR a bit, and let me know if you have any questions about the callbacks.

RyanNavillus · 2024-04-28T07:43:57Z

syllabus/examples/training_scripts/cleanrl_procgen_plr.py

Remove any changes to this file

RyanNavillus · 2024-04-28T07:44:04Z

syllabus/examples/training_scripts/cleanrl_procgen_centralplr.py

Remove any changes to this file

RyanNavillus · 2024-04-28T07:44:08Z

syllabus/examples/task_wrappers/procgen_task_wrapper.py

Remove any changes to this file

RyanNavillus · 2024-04-28T07:44:11Z

syllabus/core/curriculum_sync_wrapper.py

Remove any changes to this file

RyanNavillus · 2024-04-28T07:44:45Z

syllabus/examples/experimental/sb3_procgen_plr.py

-        )
+        env = openai_gym.make(f"procgen-{env_id}-v0", distribution_mode="easy", start_level=start_level, num_levels=num_levels)
+        env = GymV21CompatibilityV0(env=env)
+        components = MultiProcessingComponents(task_queue=task_queue, update_queue=update_queue)  


Suggested change

components = MultiProcessingComponents(task_queue=task_queue, update_queue=update_queue)

components = curriculum.get_components()

RyanNavillus · 2024-04-28T07:48:27Z

...ndb/run-20240423_020001-cymykoqj/files/events.out.tfevents.1713852002.WenranLaoGong.157219.0

Remove this file git rm --cached syllabus/examples/training_scripts/wandb/run-20240423_020001-cymykoqj/files/events.out.tfevents.1713852002.WenranLaoGong.157219.0

RyanNavillus · 2024-04-28T07:48:45Z

...ndb/run-20240423_020154-r83gxf5t/files/events.out.tfevents.1713852115.WenranLaoGong.157950.0

Remove this file git rm --cached syllabus/examples/training_scripts/wandb/run-20240423_020001-cymykoqj/files/events.out.tfevents.1713852002.WenranLaoGong.157219.0

RyanNavillus · 2024-04-28T07:49:28Z

profiling_results.prof

Remove this file git rm --cached profiling_results.prof

RyanNavillus · 2024-04-28T07:54:17Z

syllabus/examples/experimental/sb3_procgen_plr.py

+    model.policy.train()
+    return mean_returns, stddev_returns, normalized_mean_returns
+
+
 class CustomCallback(BaseCallback):


Look at the documentation here to change the behavior of the callback https://stable-baselines3.readthedocs.io/en/v1.0/guide/callbacks.html

RyanNavillus · 2024-04-28T07:59:22Z

syllabus/examples/experimental/sb3_procgen_plr.py

+        mean_eval_returns, _, _ = level_replay_evaluate_sb3(args.env_id, model, args.num_eval_episodes, num_levels=0)
+        writer.add_scalar("test_eval/mean_episode_return", mean_eval_returns, self.global_step)


This code should only be run once every update. There is a different callback method _on_training_end that you should probably use.

If you need access to any data from training, try printing out self.locals or self.globals from within the callback method to see what is available

RyanNavillus

I checked the hyperparameters you included, but I didn't review any that you excluded. I'll revisit that later

RyanNavillus · 2024-04-29T22:15:45Z

syllabus/examples/experimental/sb3_procgen_plr.py

+        """
+        return True  
+
+    def _on_rollout_end(self) -> None:


I'm not sure, but I think you can put this function in the CustomCallback rather than creating 2 separate ones

RyanNavillus · 2024-04-29T22:16:52Z

syllabus/examples/experimental/sb3_procgen_plr.py

+        return True  
+
+    def _on_rollout_end(self) -> None:
+        if self.num_timesteps % self.eval_freq == 0:


This isn't necessary, we should just evaluate every time this function is called. It should happen every 16,000 steps, but try running it and make sure that's what happens

RyanNavillus · 2024-04-29T22:22:11Z

syllabus/examples/experimental/sb3_procgen_plr.py

 def wrap_vecenv(vecenv):
-    vecenv.is_vector_env = True
    vecenv = VecMonitor(venv=vecenv, filename=None)
-    vecenv = VecNormalize(venv=vecenv, norm_obs=False, norm_reward=True)
+    vecenv = VecNormalize(venv=vecenv, norm_obs=False, norm_reward=True, training=False)
    return vecenv


This function is used for both eval and training, we should probably pass training as an argument to wrap_vecenv, so that training=True for the training envs right?

RyanNavillus · 2024-04-29T22:23:11Z

syllabus/examples/training_scripts/cleanrl_procgen_centralplr.py

Remove these changes

RyanNavillus · 2024-04-29T22:23:26Z

syllabus/examples/training_scripts/cleanrl_procgen_plr.py

Remove these changes

RyanNavillus · 2024-04-29T22:23:42Z

syllabus/examples/models/procgen_model.py

Remove these changes

RyanNavillus · 2024-04-29T22:27:43Z

syllabus/examples/experimental/sb3_procgen_plr.py

+        n_epochs=3,
+        clip_range_vf=0.2,
+        ent_coef=0.01,
+        batch_size=256 * 64,


Suggested change

batch_size=256 * 64,

batch_size=2048

batch_size is actually the minibatch size. We want 8 batches for 25*64 steps, so 2048 steps per minibatch

RyanNavillus · 2024-04-29T22:55:01Z

syllabus/examples/experimental/sb3_procgen_plr.py

+
+    print("Creating model")
+    model = PPO(
+        "CnnPolicy",


You're going to need to find a way to replace this with the ProcgenAgent model. https://stable-baselines3.readthedocs.io/en/v1.0/guide/custom_policy.html

Take a look at the advanced example https://stable-baselines3.readthedocs.io/en/v1.0/guide/custom_policy.html#advanced-example

I think if you replace the CustomNetwork with our Policy (the parent class of ProcgenAgent) then the code they have here might just work out of the box.

Syllabus/syllabus/examples/models/procgen_model.py

Line 136 in 08c9e0b

class Policy(nn.Module):

…into sb3-progen-plr

RyanNavillus

I'm not sure if this works, but it you should try this. It would be a lot simpler this way

from syllabus.examples.models.procgen_model import Policy
class CustomActorCriticPolicy(ActorCriticPolicy):
    def __init__(
        self,
        observation_space: gym.spaces.Space,
        action_space: gym.spaces.Space,
        lr_schedule: Callable[[float], float],
        net_arch: Optional[List[Union[int, Dict[str, List[int]]]]] = None,
        activation_fn: Type[nn.Module] = nn.Tanh,
        *args,
        **kwargs,
    ):

        super(CustomActorCriticPolicy, self).__init__(
            observation_space,
            action_space,
            lr_schedule,
            net_arch,
            activation_fn,
            # Pass remaining arguments to base class
            *args,
            **kwargs,
        )
        # Disable orthogonal initialization
        self.ortho_init = False

    def _build_mlp_extractor(self) -> None:
        self.mlp_extractor = Policy(...)

This is a small change to the documentation here https://stable-baselines3.readthedocs.io/en/v1.0/guide/custom_policy.html#advanced-example

RyanNavillus · 2024-04-30T22:41:17Z

syllabus/examples/models/sb3_procgen_model.py

+        return value, action_log_probs, dist_entropy
+
+
+class Sb3ProcgenAgent(CustomPolicy):


I think SB3's model will exclusively call forward, so this class isn't necessary

RyanNavillus · 2024-04-30T22:41:50Z

syllabus/examples/models/sb3_procgen_model.py

+    def get_value(self, input):
+        value, _, _ = self.network(input)
+        return value
+
+    def evaluate_actions(self, input, rnn_hxs, masks, action):
+        value, actor_features = self.network(input, rnn_hxs, masks)
+        dist = self.dist(actor_features)
+
+        action_log_probs = dist.log_prob(action)
+        dist_entropy = dist.entropy().mean()
+        return value, action_log_probs, dist_entropy


See comment below, I'm not sure you need to add these methods

AdrianHuang2002 · 2024-05-03T14:16:58Z

https://wandb.ai/h975894552/Syllabus/runs/dhjocig2/overview?nw=nwuserh975894552

…en-plr

黄先生 and others added 6 commits March 31, 2024 14:47

pass single process test

4b5363a

Your commit message

ce3c005

did some changes

d97f8fc

fixed the issues

9ec5728

Applied changes from diff and resolved conflicts

2cf206c

sb3

241636f

RyanNavillus reviewed Apr 28, 2024

View reviewed changes

AdrianHuang2002 and others added 6 commits April 29, 2024 00:56

refine sb3

e3391c2

refine sb3

d2f1450

Model Architecture

e402274

Model Architecture and sb3_procgen_plr refine

b061c00

Model Architecture and sb3_procgen_plr refine

9d30b18

Add files via upload

5700585

RyanNavillus reviewed Apr 29, 2024

View reviewed changes

AdrianHuang2002 added 3 commits April 30, 2024 10:26

Model Architecture modify

b660323

Merge branch 'sb3-progen-plr' of github.com:AdrianHuang2002/Syllabus …

9f903dc

…into sb3-progen-plr

Model Architecture modify

37d29c3

RyanNavillus reviewed Apr 30, 2024

View reviewed changes

Model Architecture modify

906a55b

RyanNavillus and others added 9 commits May 5, 2024 21:27

Add SB3 Agent code

979876e

Model Architecture completed version

e8bfba0

changed init_weights method

d3ce33d

changed init_weights method

27b80aa

changed init_weights method

e85eaff

value_net weight update

48ca132

init_weights update and change in CustomCallback

3330f15

init_weights update and change in CustomCallback _on_step function

1bb55b9

init_weights update and change in CustomCallback _on_step function

02daa40

AdrianHuang2002 and others added 7 commits May 17, 2024 22:48

changes in CustomCallback _on_step function

3a93293

Merge branch 'main' into sb3-progen-plr

c8b5ae6

Testing eval changes

143d59f

Fix tasks for PLR update

d899f04

sb3-procgen-plr final version

e13b8e4

Merge branch 'main' of github.com:RyanNavillus/Syllabus into sb3-prog…

cdc1c90

…en-plr

Update SB3 script to use Async PLR

a12db75

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sb3 #38

sb3 #38

AdrianHuang2002 commented Apr 27, 2024

AdrianHuang2002 commented Apr 27, 2024

RyanNavillus left a comment

RyanNavillus Apr 28, 2024

RyanNavillus Apr 28, 2024

RyanNavillus Apr 28, 2024

RyanNavillus Apr 28, 2024

RyanNavillus Apr 28, 2024

RyanNavillus Apr 28, 2024

RyanNavillus Apr 28, 2024

RyanNavillus Apr 28, 2024

RyanNavillus Apr 28, 2024

RyanNavillus Apr 28, 2024

RyanNavillus left a comment

RyanNavillus Apr 29, 2024

RyanNavillus Apr 29, 2024

RyanNavillus Apr 29, 2024

RyanNavillus Apr 29, 2024

RyanNavillus Apr 29, 2024

RyanNavillus Apr 29, 2024

RyanNavillus Apr 29, 2024

RyanNavillus Apr 29, 2024

RyanNavillus left a comment •

edited

Loading

RyanNavillus Apr 30, 2024

RyanNavillus Apr 30, 2024

AdrianHuang2002 commented May 3, 2024

	components = MultiProcessingComponents(task_queue=task_queue, update_queue=update_queue)
	components = curriculum.get_components()

		mean_eval_returns, _, _ = level_replay_evaluate_sb3(args.env_id, model, args.num_eval_episodes, num_levels=0)
		writer.add_scalar("test_eval/mean_episode_return", mean_eval_returns, self.global_step)

		return value, action_log_probs, dist_entropy


		class Sb3ProcgenAgent(CustomPolicy):

sb3 #38

Are you sure you want to change the base?

sb3 #38

Conversation

AdrianHuang2002 commented Apr 27, 2024

AdrianHuang2002 commented Apr 27, 2024

RyanNavillus left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RyanNavillus left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RyanNavillus left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AdrianHuang2002 commented May 3, 2024

RyanNavillus left a comment •

edited

Loading