在仿真easyrl书中的ddpg算法遇到的问题 #175
-
按照ddpg算法伪码以及代码中的定义,actorloss应该是-q,criticloss为目标网络与实际网络的差值。 |
Beta Was this translation helpful? Give feedback.
Answered by
johnjim0816
Nov 13, 2023
Replies: 1 comment
-
看奖励,loss根据具体算法不同会有不同特性,用动作熵评估更好 |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
Sm1les
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
看奖励,loss根据具体算法不同会有不同特性,用动作熵评估更好