Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential bug during training? #11

Open
liubaoryol opened this issue Feb 3, 2023 · 6 comments
Open

Potential bug during training? #11

liubaoryol opened this issue Feb 3, 2023 · 6 comments

Comments

@liubaoryol
Copy link

Is there a reason you calculate the reward the way you do in line 69?

rewards = self.disc.calculate_reward(

My models were able to learn after I changed that line to

        with torch.no_grad():
            rewards = self.disc.g(states)

This gives the unshaped rewards

@Charlesyyun
Copy link

Did that work out for you? I found my actor loss unable to converge.

@liubaoryol
Copy link
Author

Yes it did, although I was running it on discrete state and action environments. Which env are you using?

@mikhail-vlasenko
Copy link

@liubaoryol it is great to hear that you got it working with discrete action space! could you please share your code? i think it will be valuable, as multiple people here already asked about discrete action support. Thanks in advance

@liubaoryol
Copy link
Author

Of course! Let me clean it up and I'll share it next week:)

@jagandecapri
Copy link

I'm interested to know about the implementation for discrete action support too. :)

@ChenYunan
Copy link

reward = -logsigmoid(-logits) = -log[1 - sigmoid(logits)] = -log(1 - D), which corresponds the objective of G is minimize log(1-D).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants