Skip to content

在仿真easyrl书中的ddpg算法遇到的问题 #175

Closed Answered by johnjim0816
yxz777 asked this question in 问题求助
Discussion options

You must be logged in to vote

按照ddpg算法伪码以及代码中的定义,actorloss应该是-q,criticloss为目标网络与实际网络的差值。 在训练的过程中,actorloss在不断的上升,criticloss也上下飘忽不定 那这样也就意味着actor网络输出动作的q值是越来越小的,criticloss也无法达到一个稳定的loss 所以评判agent学习的好坏要看奖励的增减还是loss的收敛呢?

看奖励,loss根据具体算法不同会有不同特性,用动作熵评估更好

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by Sm1les
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants