可以test，无法训练，报错 #2

aijunzhao · 2023-05-20T03:12:44Z

(SnakeAI) E:\snake-ai-master\main>python train_cnn.py
Using cuda device
Wrapping the env in a VecTransposeImage.
Process SpawnProcess-5:
Traceback (most recent call last):
File "C:\Users\KEN2020.conda\envs\SnakeAI\lib\multiprocessing\process.py", line 315, in _bootstrap
self.run()
File "C:\Users\KEN2020.conda\envs\SnakeAI\lib\multiprocessing\process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\KEN2020.conda\envs\SnakeAI\lib\site-packages\stable_baselines3\common\vec_env\subproc_vec_env.py", line 30, in _worker
observation, reward, done, info = env.step(data)
File "C:\Users\KEN2020.conda\envs\SnakeAI\lib\site-packages\stable_baselines3\common\monitor.py", line 95, in step
observation, reward, done, info = self.env.step(action)
File "C:\Users\KEN2020.conda\envs\SnakeAI\lib\site-packages\gym\core.py", line 289, in step
return self.env.step(action)
File "E:\snake-ai-master\main\snake_game_custom_wrapper_cnn.py", line 47, in step
self.done, info = self.game.step(action) # info = {"snake_size": int, "snake_head_pos": np.array, "prev_snake_head_pos": np.array, "food_pos": np.array, "food_obtained": bool}
File "E:\snake-ai-master\main\snake_game.py", line 96, in step
self.sound_game_over.play()
AttributeError: 'SnakeGame' object has no attribute 'sound_game_over'
Traceback (most recent call last):
File "C:\Users\KEN2020.conda\envs\SnakeAI\lib\multiprocessing\connection.py", line 312, in _recv_bytes
nread, err = ov.GetOverlappedResult(True)
BrokenPipeError: [WinError 109] 管道已结束。

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train_cnn.py", line 95, in
main()
File "train_cnn.py", line 82, in main
model.learn(
File "C:\Users\KEN2020.conda\envs\SnakeAI\lib\site-packages\sb3_contrib\ppo_mask\ppo_mask.py", line 525, in learn
continue_training = self.collect_rollouts(self.env, callback, self.rollout_buffer, self.n_steps, use_masking)
File "C:\Users\KEN2020.conda\envs\SnakeAI\lib\site-packages\sb3_contrib\ppo_mask\ppo_mask.py", line 305, in collect_rollouts
new_obs, rewards, dones, infos = env.step(actions)
File "C:\Users\KEN2020.conda\envs\SnakeAI\lib\site-packages\stable_baselines3\common\vec_env\base_vec_env.py", line 163, in step
return self.step_wait()
File "C:\Users\KEN2020.conda\envs\SnakeAI\lib\site-packages\stable_baselines3\common\vec_env\vec_transpose.py", line 95, in step_wait
observations, rewards, dones, infos = self.venv.step_wait()
File "C:\Users\KEN2020.conda\envs\SnakeAI\lib\site-packages\stable_baselines3\common\vec_env\subproc_vec_env.py", line 121, in step_wait
results = [remote.recv() for remote in self.remotes]
File "C:\Users\KEN2020.conda\envs\SnakeAI\lib\site-packages\stable_baselines3\common\vec_env\subproc_vec_env.py", line 121, in
results = [remote.recv() for remote in self.remotes]
File "C:\Users\KEN2020.conda\envs\SnakeAI\lib\multiprocessing\connection.py", line 250, in recv
buf = self._recv_bytes()
File "C:\Users\KEN2020.conda\envs\SnakeAI\lib\multiprocessing\connection.py", line 321, in _recv_bytes
raise EOFError
EOFError

Han-duoduo · 2023-05-20T04:59:22Z

+1 求解决

Chapoii · 2023-05-20T05:11:18Z

播放声音前面加一个判断是否是silent_mode，训练的时候不需要播放声音

BeiYining · 2023-05-20T16:56:24Z

具体位置是:snake_game.py -- line 95 左右的位置

shironghe · 2023-05-21T08:20:09Z

sound_game_over 不会影响我们训练模型，可以注释掉self.sound_game_over.play()再添加pass，等玩test时再打开

aijunzhao · 2023-05-21T15:33:48Z

sound_game_over 不会影响我们训练模型，可以注释掉self.sound_game_over.play()再添加pass，等玩test时再打开

可以了，感谢兄弟

1816705 · 2023-05-26T05:09:19Z

为什么我卡在这

shironghe · 2023-05-26T05:20:48Z

mlp模型训练时没有在命令行给提示，你可以在windows下ctrl+shift+Esc查看CPU、GPU的使用情况，在ubuntu下使用htop命令查看那CPU的使用情况，在watch -n 1 nvidia-smi下查看GPU的使用情况，程序占用率高就表明在训练了，并不是卡住了在 2023-05-26 13:09:31，"wave" ***@***.***> 写道：为什么我卡在这 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: ***@***.***>

zjhcwjb · 2023-05-30T18:00:59Z

mlp模型训练时没有在命令行给提示，你可以在windows下ctrl+shift+Esc查看CPU、GPU的使用情况，在ubuntu下使用htop命令查看那CPU的使用情况，在watch -n 1 nvidia-smi下查看GPU的使用情况，程序占用率高就表明在训练了，并不是卡住了在 2023-05-26 13:09:31，"wave" @.> 写道：为什么我卡在这 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.>

请问怎么让训练过程可视化呢

shironghe · 2023-05-31T03:09:28Z

训练后注意看logs下会生成新的文件夹，里面的文件可以用TensorBoard查看进行可视化在 2023-05-31 02:01:11，"zjhcwjb" ***@***.***> 写道： mlp模型训练时没有在命令行给提示，你可以在windows下ctrl+shift+Esc查看CPU、GPU的使用情况，在ubuntu下使用htop命令查看那CPU的使用情况，在watch -n 1 nvidia-smi下查看GPU的使用情况，程序占用率高就表明在训练了，并不是卡住了在 2023-05-26 13:09:31，"wave" @.> 写道：为什么我卡在这 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 请问怎么让训练过程可视化呢 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: ***@***.***>

zjhcwjb · 2023-05-31T05:33:32Z

训练后注意看logs下会生成新的文件夹，里面的文件可以用TensorBoard查看进行可视化在 2023-05-31 02:01:11，"zjhcwjb" @.> 写道： mlp模型训练时没有在命令行给提示，你可以在windows下ctrl+shift+Esc查看CPU、GPU的使用情况，在ubuntu下使用htop命令查看那CPU的使用情况，在watch -n 1 nvidia-smi下查看GPU的使用情况，程序占用率高就表明在训练了，并不是卡住了在 2023-05-26 13:09:31，"wave" @.> 写道：为什么我卡在这 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 请问怎么让训练过程可视化呢 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.>

感谢回复但是如果想看到ai每一局游戏的画面应该怎么做啊

shironghe · 2023-05-31T05:40:31Z

你可以借鉴test_mlp的代码，借助env.render()函数对游戏画面进行渲染在 2023-05-31 13:33:44，"zjhcwjb" ***@***.***> 写道：训练后注意看logs下会生成新的文件夹，里面的文件可以用TensorBoard查看进行可视化在 2023-05-31 02:01:11，"zjhcwjb" @.> 写道： mlp模型训练时没有在命令行给提示，你可以在windows下ctrl+shift+Esc查看CPU、GPU的使用情况，在ubuntu下使用htop命令查看那CPU的使用情况，在watch -n 1 nvidia-smi下查看GPU的使用情况，程序占用率高就表明在训练了，并不是卡住了在 2023-05-26 13:09:31，"wave" @.> 写道：为什么我卡在这 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 请问怎么让训练过程可视化呢 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 感谢回复但是如果想看到ai每一局游戏的画面应该怎么做啊 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: ***@***.***>

zjhcwjb · 2023-05-31T06:34:03Z

你可以借鉴test_mlp的代码，借助env.render()函数对游戏画面进行渲染在 2023-05-31 13:33:44，"zjhcwjb" @.> 写道：训练后注意看logs下会生成新的文件夹，里面的文件可以用TensorBoard查看进行可视化在 2023-05-31 02:01:11，"zjhcwjb" @.> 写道： mlp模型训练时没有在命令行给提示，你可以在windows下ctrl+shift+Esc查看CPU、GPU的使用情况，在ubuntu下使用htop命令查看那CPU的使用情况，在watch -n 1 nvidia-smi下查看GPU的使用情况，程序占用率高就表明在训练了，并不是卡住了在 2023-05-26 13:09:31，"wave" @.> 写道：为什么我卡在这 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 请问怎么让训练过程可视化呢 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 感谢回复但是如果想看到ai每一局游戏的画面应该怎么做啊 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.>
已经加上渲染画面的代码了运行起来gpu占用也很高应该是在训练但还是看不到画面啊

shironghe · 2023-05-31T06:38:34Z

可以的话，共享你的代码在 2023-05-31 14:34:13，"zjhcwjb" ***@***.***> 写道：你可以借鉴test_mlp的代码，借助env.render()函数对游戏画面进行渲染在 2023-05-31 13:33:44，"zjhcwjb" @.> 写道：训练后注意看logs下会生成新的文件夹，里面的文件可以用TensorBoard查看进行可视化在 2023-05-31 02:01:11，"zjhcwjb" @.> 写道： mlp模型训练时没有在命令行给提示，你可以在windows下ctrl+shift+Esc查看CPU、GPU的使用情况，在ubuntu下使用htop命令查看那CPU的使用情况，在watch -n 1 nvidia-smi下查看GPU的使用情况，程序占用率高就表明在训练了，并不是卡住了在 2023-05-26 13:09:31，"wave" @.> 写道：为什么我卡在这 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 请问怎么让训练过程可视化呢 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 感谢回复但是如果想看到ai每一局游戏的画面应该怎么做啊 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 已经加上渲染画面的代码了运行起来gpu占用也很高应该是在训练但还是看不到画面啊 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: ***@***.***>

zjhcwjb · 2023-05-31T06:41:14Z

可以的话，共享你的代码在 2023-05-31 14:34:13，"zjhcwjb" @.> 写道：你可以借鉴test_mlp的代码，借助env.render()函数对游戏画面进行渲染在 2023-05-31 13:33:44，"zjhcwjb" @.> 写道：训练后注意看logs下会生成新的文件夹，里面的文件可以用TensorBoard查看进行可视化在 2023-05-31 02:01:11，"zjhcwjb" @.> 写道： mlp模型训练时没有在命令行给提示，你可以在windows下ctrl+shift+Esc查看CPU、GPU的使用情况，在ubuntu下使用htop命令查看那CPU的使用情况，在watch -n 1 nvidia-smi下查看GPU的使用情况，程序占用率高就表明在训练了，并不是卡住了在 2023-05-26 13:09:31，"wave" @.> 写道：为什么我卡在这 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 请问怎么让训练过程可视化呢 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 感谢回复但是如果想看到ai每一局游戏的画面应该怎么做啊 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 已经加上渲染画面的代码了运行起来gpu占用也很高应该是在训练但还是看不到画面啊 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.>

import os
import sys
import random
import time

from stable_baselines3.common.monitor import Monitor
from stable_baselines3.common.vec_env import SubprocVecEnv
from stable_baselines3.common.callbacks import CheckpointCallback
from sb3_contrib import MaskablePPO
from sb3_contrib.common.wrappers import ActionMasker

from snake_game_custom_wrapper_mlp import SnakeEnv

NUM_ENV = 32
LOG_DIR = "logs"
os.makedirs(LOG_DIR, exist_ok=True)

Linear scheduler

def linear_schedule(initial_value, final_value=0.0):

if isinstance(initial_value, str):
    initial_value = float(initial_value)
    final_value = float(final_value)
    assert (initial_value > 0.0)

def scheduler(progress):
    return final_value + progress * (initial_value - final_value)

return scheduler

def make_env(seed=0):
def _init():
env = SnakeEnv(seed=seed)
env = ActionMasker(env, SnakeEnv.get_action_mask)
env = Monitor(env)
env.seed(seed)
return env
return _init

def main():

# Generate a list of random seeds for each environment.
seed_set = set()
while len(seed_set) < NUM_ENV:
    seed_set.add(random.randint(0, 1e9))

# Create the Snake environment.
env = SubprocVecEnv([make_env(seed=s) for s in seed_set])

lr_schedule = linear_schedule(2.5e-4, 2.5e-6)
clip_range_schedule = linear_schedule(0.15, 0.025)

# # Instantiate a PPO agent
model = MaskablePPO(
    "MlpPolicy",
    env,
    device="cuda",
    verbose=1,
    n_steps=2048,
    batch_size=512,
    n_epochs=4,
    gamma=0.94,
    learning_rate=lr_schedule,
    clip_range=clip_range_schedule,
    tensorboard_log=LOG_DIR
)

# Set the save directory
save_dir = "trained_models_mlp"
os.makedirs(save_dir, exist_ok=True)

checkpoint_interval = 15625 # checkpoint_interval * num_envs = total_steps_per_checkpoint
checkpoint_callback = CheckpointCallback(save_freq=checkpoint_interval, save_path=save_dir, name_prefix="ppo_snake")

# Writing the training logs from stdout to a file
original_stdout = sys.stdout
log_file_path = os.path.join(save_dir, "training_log.txt")
with open(log_file_path, 'w') as log_file:
    sys.stdout = log_file

    model.learn(
        total_timesteps=int(100000000),
        callback=[checkpoint_callback]
    )
    env.close()

# Restore stdout
sys.stdout = original_stdout

# Save the final model
model.save(os.path.join(save_dir, "ppo_snake_final.zip"))

demo_env = make_env()()

with open(log_file_path, 'w') as log_file:
    sys.stdout = log_file

    for i in range(100):
        model.learn(
            total_timesteps=int(1000000),
            callback=[checkpoint_callback]
        )

        obs = demo_env.reset()
        demo_env.render()
        time.sleep(0.5)
        done = False
        while not done:
            action, _ = model.predict(obs)
            obs, _, done, _ = demo_env.step(action)
            demo_env.render()
            time.sleep(0.5)

if name == "main":
main()
嗯嗯就是在train_mlp的基础上加上了渲染的部分完全没有报错信息但是画面完全不出来

shironghe · 2023-05-31T06:44:47Z

这个格式，我实在不好看,你可以保留格式再发我一份嘛在 2023-05-31 14:41:25，"zjhcwjb" ***@***.***> 写道：可以的话，共享你的代码在 2023-05-31 14:34:13，"zjhcwjb" @.> 写道：你可以借鉴test_mlp的代码，借助env.render()函数对游戏画面进行渲染在 2023-05-31 13:33:44，"zjhcwjb" @.> 写道：训练后注意看logs下会生成新的文件夹，里面的文件可以用TensorBoard查看进行可视化在 2023-05-31 02:01:11，"zjhcwjb" @.> 写道： mlp模型训练时没有在命令行给提示，你可以在windows下ctrl+shift+Esc查看CPU、GPU的使用情况，在ubuntu下使用htop命令查看那CPU的使用情况，在watch -n 1 nvidia-smi下查看GPU的使用情况，程序占用率高就表明在训练了，并不是卡住了在 2023-05-26 13:09:31，"wave" @.> 写道：为什么我卡在这 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 请问怎么让训练过程可视化呢 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 感谢回复但是如果想看到ai每一局游戏的画面应该怎么做啊 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 已经加上渲染画面的代码了运行起来gpu占用也很高应该是在训练但还是看不到画面啊 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> import os import sys import random import time from stable_baselines3.common.monitor import Monitor from stable_baselines3.common.vec_env import SubprocVecEnv from stable_baselines3.common.callbacks import CheckpointCallback from sb3_contrib import MaskablePPO from sb3_contrib.common.wrappers import ActionMasker from snake_game_custom_wrapper_mlp import SnakeEnv NUM_ENV = 32 LOG_DIR = "logs" os.makedirs(LOG_DIR, exist_ok=True) Linear scheduler def linear_schedule(initial_value, final_value=0.0): if isinstance(initial_value, str): initial_value = float(initial_value) final_value = float(final_value) assert (initial_value > 0.0) def scheduler(progress): return final_value + progress * (initial_value - final_value) return scheduler def make_env(seed=0): def _init(): env = SnakeEnv(seed=seed) env = ActionMasker(env, SnakeEnv.get_action_mask) env = Monitor(env) env.seed(seed) return env return _init def main(): # Generate a list of random seeds for each environment. seed_set = set() while len(seed_set) < NUM_ENV: seed_set.add(random.randint(0, 1e9)) # Create the Snake environment. env = SubprocVecEnv([make_env(seed=s) for s in seed_set]) lr_schedule = linear_schedule(2.5e-4, 2.5e-6) clip_range_schedule = linear_schedule(0.15, 0.025) # # Instantiate a PPO agent model = MaskablePPO( "MlpPolicy", env, device="cuda", verbose=1, n_steps=2048, batch_size=512, n_epochs=4, gamma=0.94, learning_rate=lr_schedule, clip_range=clip_range_schedule, tensorboard_log=LOG_DIR ) # Set the save directory save_dir = "trained_models_mlp" os.makedirs(save_dir, exist_ok=True) checkpoint_interval = 15625 # checkpoint_interval * num_envs = total_steps_per_checkpoint checkpoint_callback = CheckpointCallback(save_freq=checkpoint_interval, save_path=save_dir, name_prefix="ppo_snake") # Writing the training logs from stdout to a file original_stdout = sys.stdout log_file_path = os.path.join(save_dir, "training_log.txt") with open(log_file_path, 'w') as log_file: sys.stdout = log_file model.learn( total_timesteps=int(100000000), callback=[checkpoint_callback] ) env.close() # Restore stdout sys.stdout = original_stdout # Save the final model model.save(os.path.join(save_dir, "ppo_snake_final.zip")) demo_env = make_env()() with open(log_file_path, 'w') as log_file: sys.stdout = log_file for i in range(100): model.learn( total_timesteps=int(1000000), callback=[checkpoint_callback] ) obs = demo_env.reset() demo_env.render() time.sleep(0.5) done = False while not done: action, _ = model.predict(obs) obs, _, done, _ = demo_env.step(action) demo_env.render() time.sleep(0.5) if name == "main": main() 嗯嗯就是在train_mlp的基础上加上了渲染的部分完全没有报错信息但是画面完全不出来 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: ***@***.***>

zjhcwjb · 2023-05-31T07:04:03Z

您好这样可以吗？刚刚接触强化学习可能问的问题都比较蠢实在打扰了从 Windows 版邮件发送发件人: shironghe发送时间: 2023年5月31日 14:44收件人: linyiLYi/snake-ai抄送: zjhcwjb; Comment主题: Re: [linyiLYi/snake-ai] 可以test，无法训练，报错 (Issue #2) 这个格式，我实在不好看,你可以保留格式再发我一份嘛在 2023-05-31 14:41:25，"zjhcwjb" ***@***.***> 写道：可以的话，共享你的代码在 2023-05-31 14:34:13，"zjhcwjb" @.> 写道：你可以借鉴test_mlp的代码，借助env.render()函数对游戏画面进行渲染在 2023-05-31 13:33:44，"zjhcwjb" @.> 写道：训练后注意看logs下会生成新的文件夹，里面的文件可以用TensorBoard查看进行可视化在 2023-05-31 02:01:11，"zjhcwjb" @.> 写道： mlp模型训练时没有在命令行给提示，你可以在windows下ctrl+shift+Esc查看CPU、GPU的使用情况，在ubuntu下使用htop命令查看那CPU的使用情况，在watch -n 1 nvidia-smi下查看GPU的使用情况，程序占用率高就表明在训练了，并不是卡住了在 2023-05-26 13:09:31，"wave" @.> 写道：为什么我卡在这 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 请问怎么让训练过程可视化呢 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 感谢回复但是如果想看到ai每一局游戏的画面应该怎么做啊 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 已经加上渲染画面的代码了运行起来gpu占用也很高应该是在训练但还是看不到画面啊 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.>import osimport sysimport randomimport timefrom stable_baselines3.common.monitor import Monitorfrom stable_baselines3.common.vec_env import SubprocVecEnvfrom stable_baselines3.common.callbacks import CheckpointCallbackfrom sb3_contrib import MaskablePPOfrom sb3_contrib.common.wrappers import ActionMaskerfrom snake_game_custom_wrapper_mlp import SnakeEnvNUM_ENV = 32LOG_DIR = "logs"os.makedirs(LOG_DIR, exist_ok=True)Linear schedulerdef linear_schedule(initial_value, final_value=0.0):if isinstance(initial_value, str):initial_value = float(initial_value)final_value = float(final_value)assert (initial_value > 0.0)def scheduler(progress):return final_value + progress * (initial_value - final_value)return schedulerdef make_env(seed=0):def _init():env = SnakeEnv(seed=seed)env = ActionMasker(env, SnakeEnv.get_action_mask)env = Monitor(env)env.seed(seed)return envreturn _initdef main():# Generate a list of random seeds for each environment.seed_set = set()while len(seed_set) < NUM_ENV:seed_set.add(random.randint(0, 1e9))# Create the Snake environment.env = SubprocVecEnv([make_env(seed=s) for s in seed_set])lr_schedule = linear_schedule(2.5e-4, 2.5e-6)clip_range_schedule = linear_schedule(0.15, 0.025)# # Instantiate a PPO agentmodel = MaskablePPO("MlpPolicy",env,device="cuda",verbose=1,n_steps=2048,batch_size=512,n_epochs=4,gamma=0.94,learning_rate=lr_schedule,clip_range=clip_range_schedule,tensorboard_log=LOG_DIR)# Set the save directorysave_dir = "trained_models_mlp"os.makedirs(save_dir, exist_ok=True)checkpoint_interval = 15625 # checkpoint_interval * num_envs = total_steps_per_checkpointcheckpoint_callback = CheckpointCallback(save_freq=checkpoint_interval, save_path=save_dir, name_prefix="ppo_snake")# Writing the training logs from stdout to a fileoriginal_stdout = sys.stdoutlog_file_path = os.path.join(save_dir, "training_log.txt")with open(log_file_path, 'w') as log_file:sys.stdout = log_filemodel.learn(total_timesteps=int(100000000),callback=[checkpoint_callback])env.close()# Restore stdoutsys.stdout = original_stdout# Save the final modelmodel.save(os.path.join(save_dir, "ppo_snake_final.zip"))demo_env = make_env()()with open(log_file_path, 'w') as log_file:sys.stdout = log_filefor i in range(100):model.learn(total_timesteps=int(1000000),callback=[checkpoint_callback])obs = demo_env.reset()demo_env.render()time.sleep(0.5)done = Falsewhile not done:action, _ = model.predict(obs)obs, _, done, _ = demo_env.step(action)demo_env.render()time.sleep(0.5)if name == "main":main()嗯嗯就是在train_mlp的基础上加上了渲染的部分完全没有报错信息但是画面完全不出来—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***> —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>

shironghe · 2023-05-31T07:13:26Z

哥们抱歉，刚刚我是在邮箱上看的代码，邮箱忽视了代码的格式，在github上看是没问题的。刚刚我又仔细看了一下train_mlp的代码，发现模型的整个训练过程都是在MaskablePPO内部进行的，我未在其API中找到调用env.render的参数，我想是无法展示训练的画面的。但你可以想象其训练画面不过是重复n次贪吃蛇游戏，不断的eat食物获取奖励，不断的死亡获取惩罚。在 2023-05-31 15:04:14，"zjhcwjb" ***@***.***> 写道：  您好这样可以吗？刚刚接触强化学习可能问的问题都比较蠢实在打扰了从 Windows 版邮件发送发件人: shironghe发送时间: 2023年5月31日 14:44收件人: linyiLYi/snake-ai抄送: zjhcwjb; Comment主题: Re: [linyiLYi/snake-ai] 可以test，无法训练，报错 (Issue #2) 这个格式，我实在不好看,你可以保留格式再发我一份嘛在 2023-05-31 14:41:25，"zjhcwjb" ***@***.***> 写道：可以的话，共享你的代码在 2023-05-31 14:34:13，"zjhcwjb" @.> 写道：你可以借鉴test_mlp的代码，借助env.render()函数对游戏画面进行渲染在 2023-05-31 13:33:44，"zjhcwjb" @.> 写道：训练后注意看logs下会生成新的文件夹，里面的文件可以用TensorBoard查看进行可视化在 2023-05-31 02:01:11，"zjhcwjb" @.> 写道： mlp模型训练时没有在命令行给提示，你可以在windows下ctrl+shift+Esc查看CPU、GPU的使用情况，在ubuntu下使用htop命令查看那CPU的使用情况，在watch -n 1 nvidia-smi下查看GPU的使用情况，程序占用率高就表明在训练了，并不是卡住了在 2023-05-26 13:09:31，"wave" @.> 写道：为什么我卡在这 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 请问怎么让训练过程可视化呢 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 感谢回复但是如果想看到ai每一局游戏的画面应该怎么做啊 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 已经加上渲染画面的代码了运行起来gpu占用也很高应该是在训练但还是看不到画面啊 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.>import osimport sysimport randomimport timefrom stable_baselines3.common.monitor import Monitorfrom stable_baselines3.common.vec_env import SubprocVecEnvfrom stable_baselines3.common.callbacks import CheckpointCallbackfrom sb3_contrib import MaskablePPOfrom sb3_contrib.common.wrappers import ActionMaskerfrom snake_game_custom_wrapper_mlp import SnakeEnvNUM_ENV = 32LOG_DIR = "logs"os.makedirs(LOG_DIR, exist_ok=True)Linear schedulerdef linear_schedule(initial_value, final_value=0.0):if isinstance(initial_value, str):initial_value = float(initial_value)final_value = float(final_value)assert (initial_value > 0.0)def scheduler(progress):return final_value + progress * (initial_value - final_value)return schedulerdef make_env(seed=0):def _init():env = SnakeEnv(seed=seed)env = ActionMasker(env, SnakeEnv.get_action_mask)env = Monitor(env)env.seed(seed)return envreturn _initdef main():# Generate a list of random seeds for each environment.seed_set = set()while len(seed_set) < NUM_ENV:seed_set.add(random.randint(0, 1e9))# Create the Snake environment.env = SubprocVecEnv([make_env(seed=s) for s in seed_set])lr_schedule = linear_schedule(2.5e-4, 2.5e-6)clip_range_schedule = linear_schedule(0.15, 0.025)# # Instantiate a PPO agentmodel = MaskablePPO("MlpPolicy",env,device="cuda",verbose=1,n_steps=2048,batch_size=512,n_epochs=4,gamma=0.94,learning_rate=lr_schedule,clip_range=clip_range_schedule,tensorboard_log=LOG_DIR)# Set the save directorysave_dir = "trained_models_mlp"os.makedirs(save_dir, exist_ok=True)checkpoint_interval = 15625 # checkpoint_interval * num_envs = total_steps_per_checkpointcheckpoint_callback = CheckpointCallback(save_freq=checkpoint_interval, save_path=save_dir, name_prefix="ppo_snake")# Writing the training logs from stdout to a fileoriginal_stdout = sys.stdoutlog_file_path = os.path.join(save_dir, "training_log.txt")with open(log_file_path, 'w') as log_file:sys.stdout = log_filemodel.learn(total_timesteps=int(100000000),callback=[checkpoint_callback])env.close()# Restore stdoutsys.stdout = original_stdout# Save the final modelmodel.save(os.path.join(save_dir, "ppo_snake_final.zip"))demo_env = make_env()()with open(log_file_path, 'w') as log_file:sys.stdout = log_filefor i in range(100):model.learn(total_timesteps=int(1000000),callback=[checkpoint_callback])obs = demo_env.reset()demo_env.render()time.sleep(0.5)done = Falsewhile not done:action, _ = model.predict(obs)obs, _, done, _ = demo_env.step(action)demo_env.render()time.sleep(0.5)if name == "main":main()嗯嗯就是在train_mlp的基础上加上了渲染的部分完全没有报错信息但是画面完全不出来—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***> —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***> — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: ***@***.***>

zjhcwjb · 2023-05-31T07:20:51Z

嗯嗯刚接触强化学习但是还是想显示出来确认是不是真的在训练 train_cnn中是有调用env.render的不知道为什么也显示不出来问chatgpt也不知道怎么改从 Windows 版邮件发送发件人: shironghe发送时间: 2023年5月31日 15:13收件人: linyiLYi/snake-ai抄送: zjhcwjb; Comment主题: Re: [linyiLYi/snake-ai] 可以test，无法训练，报错 (Issue #2) 哥们抱歉，刚刚我是在邮箱上看的代码，邮箱忽视了代码的格式，在github上看是没问题的。刚刚我又仔细看了一下train_mlp的代码，发现模型的整个训练过程都是在MaskablePPO内部进行的，我未在其API中找到调用env.render的参数，我想是无法展示训练的画面的。但你可以想象其训练画面不过是重复n次贪吃蛇游戏，不断的eat食物获取奖励，不断的死亡获取惩罚。在 2023-05-31 15:04:14，"zjhcwjb" ***@***.***> 写道： 您好这样可以吗？刚刚接触强化学习可能问的问题都比较蠢实在打扰了从 Windows 版邮件发送发件人: shironghe发送时间: 2023年5月31日 14:44收件人: linyiLYi/snake-ai抄送: zjhcwjb; Comment主题: Re: [linyiLYi/snake-ai] 可以test，无法训练，报错 (Issue #2) 这个格式，我实在不好看,你可以保留格式再发我一份嘛在 2023-05-31 14:41:25，"zjhcwjb" ***@***.***> 写道：可以的话，共享你的代码在 2023-05-31 14:34:13，"zjhcwjb" @.> 写道：你可以借鉴test_mlp的代码，借助env.render()函数对游戏画面进行渲染在 2023-05-31 13:33:44，"zjhcwjb" @.> 写道：训练后注意看logs下会生成新的文件夹，里面的文件可以用TensorBoard查看进行可视化在 2023-05-31 02:01:11，"zjhcwjb" @.> 写道： mlp模型训练时没有在命令行给提示，你可以在windows下ctrl+shift+Esc查看CPU、GPU的使用情况，在ubuntu下使用htop命令查看那CPU的使用情况，在watch -n 1 nvidia-smi下查看GPU的使用情况，程序占用率高就表明在训练了，并不是卡住了在 2023-05-26 13:09:31，"wave" @.> 写道：为什么我卡在这 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 请问怎么让训练过程可视化呢 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 感谢回复但是如果想看到ai每一局游戏的画面应该怎么做啊 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 已经加上渲染画面的代码了运行起来gpu占用也很高应该是在训练但还是看不到画面啊 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.>import osimport sysimport randomimport timefrom stable_baselines3.common.monitor import Monitorfrom stable_baselines3.common.vec_env import SubprocVecEnvfrom stable_baselines3.common.callbacks import CheckpointCallbackfrom sb3_contrib import MaskablePPOfrom sb3_contrib.common.wrappers import ActionMaskerfrom snake_game_custom_wrapper_mlp import SnakeEnvNUM_ENV = 32LOG_DIR = "logs"os.makedirs(LOG_DIR, exist_ok=True)Linear schedulerdef linear_schedule(initial_value, final_value=0.0):if isinstance(initial_value, str):initial_value = float(initial_value)final_value = float(final_value)assert (initial_value > 0.0)def scheduler(progress):return final_value + progress * (initial_value - final_value)return schedulerdef make_env(seed=0):def _init():env = SnakeEnv(seed=seed)env = ActionMasker(env, SnakeEnv.get_action_mask)env = Monitor(env)env.seed(seed)return envreturn _initdef main():# Generate a list of random seeds for each environment.seed_set = set()while len(seed_set) < NUM_ENV:seed_set.add(random.randint(0, 1e9))# Create the Snake environment.env = SubprocVecEnv([make_env(seed=s) for s in seed_set])lr_schedule = linear_schedule(2.5e-4, 2.5e-6)clip_range_schedule = linear_schedule(0.15, 0.025)# # Instantiate a PPO agentmodel = MaskablePPO("MlpPolicy",env,device="cuda",verbose=1,n_steps=2048,batch_size=512,n_epochs=4,gamma=0.94,learning_rate=lr_schedule,clip_range=clip_range_schedule,tensorboard_log=LOG_DIR)# Set the save directorysave_dir = "trained_models_mlp"os.makedirs(save_dir, exist_ok=True)checkpoint_interval = 15625 # checkpoint_interval * num_envs = total_steps_per_checkpointcheckpoint_callback = CheckpointCallback(save_freq=checkpoint_interval, save_path=save_dir, name_prefix="ppo_snake")# Writing the training logs from stdout to a fileoriginal_stdout = sys.stdoutlog_file_path = os.path.join(save_dir, "training_log.txt")with open(log_file_path, 'w') as log_file:sys.stdout = log_filemodel.learn(total_timesteps=int(100000000),callback=[checkpoint_callback])env.close()# Restore stdoutsys.stdout = original_stdout# Save the final modelmodel.save(os.path.join(save_dir, "ppo_snake_final.zip"))demo_env = make_env()()with open(log_file_path, 'w') as log_file:sys.stdout = log_filefor i in range(100):model.learn(total_timesteps=int(1000000),callback=[checkpoint_callback])obs = demo_env.reset()demo_env.render()time.sleep(0.5)done = Falsewhile not done:action, _ = model.predict(obs)obs, _, done, _ = demo_env.step(action)demo_env.render()time.sleep(0.5)if name == "main":main()嗯嗯就是在train_mlp的基础上加上了渲染的部分完全没有报错信息但是画面完全不出来—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***> —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***> —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***> —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

可以test，无法训练，报错 #2

可以test，无法训练，报错 #2

aijunzhao commented May 20, 2023

Han-duoduo commented May 20, 2023

Chapoii commented May 20, 2023

BeiYining commented May 20, 2023

shironghe commented May 21, 2023

aijunzhao commented May 21, 2023

1816705 commented May 26, 2023

shironghe commented May 26, 2023 via email

zjhcwjb commented May 30, 2023

shironghe commented May 31, 2023 via email

zjhcwjb commented May 31, 2023

shironghe commented May 31, 2023 via email

zjhcwjb commented May 31, 2023

shironghe commented May 31, 2023 via email

zjhcwjb commented May 31, 2023

shironghe commented May 31, 2023 via email

zjhcwjb commented May 31, 2023 via email

shironghe commented May 31, 2023 via email

zjhcwjb commented May 31, 2023 via email

可以test，无法训练，报错 #2

可以test，无法训练，报错 #2

Comments

aijunzhao commented May 20, 2023

Han-duoduo commented May 20, 2023

Chapoii commented May 20, 2023

BeiYining commented May 20, 2023

shironghe commented May 21, 2023

aijunzhao commented May 21, 2023

1816705 commented May 26, 2023

shironghe commented May 26, 2023 via email

zjhcwjb commented May 30, 2023

shironghe commented May 31, 2023 via email

zjhcwjb commented May 31, 2023

shironghe commented May 31, 2023 via email

zjhcwjb commented May 31, 2023

shironghe commented May 31, 2023 via email

zjhcwjb commented May 31, 2023

Linear scheduler

shironghe commented May 31, 2023 via email

zjhcwjb commented May 31, 2023 via email

shironghe commented May 31, 2023 via email

zjhcwjb commented May 31, 2023 via email