LLM alignment@360,
prev.@miHoYo & 4Paradigm.
PhD@THU, advised by Prof. Jun Zhu.
Pinned Loading
-
-
reversi-alpha-zero
reversi-alpha-zero PublicForked from mokemokechicken/reversi-alpha-zero
Reversi reinforcement learning by AlphaGo Zero methods.
Python
-
tianshou
tianshou PublicForked from thu-ml/tianshou
An elegant PyTorch deep reinforcement learning platform.
Python
-
-
schroederdewitt/multiagent_mujoco
schroederdewitt/multiagent_mujoco PublicBenchmark for Continuous Multi-Agent Robotic Control, based on OpenAI's Mujoco Gym environments.
-
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.