Potential bug during training? #11

liubaoryol · 2023-02-03T04:00:57Z

Is there a reason you calculate the reward the way you do in line 69?

gail-airl-ppo.pytorch/gail_airl_ppo/algo/airl.py

Line 69 in 4e13a23

rewards = self.disc.calculate_reward(

My models were able to learn after I changed that line to

        with torch.no_grad():
            rewards = self.disc.g(states)

This gives the unshaped rewards

The text was updated successfully, but these errors were encountered:

Charlesyyun · 2023-04-27T06:22:27Z

Did that work out for you? I found my actor loss unable to converge.

liubaoryol · 2023-04-27T10:02:20Z

Yes it did, although I was running it on discrete state and action environments. Which env are you using?

mikhail-vlasenko · 2023-06-03T13:14:36Z

@liubaoryol it is great to hear that you got it working with discrete action space! could you please share your code? i think it will be valuable, as multiple people here already asked about discrete action support. Thanks in advance

liubaoryol · 2023-06-03T14:32:31Z

Of course! Let me clean it up and I'll share it next week:)

jagandecapri · 2023-07-22T16:54:47Z

I'm interested to know about the implementation for discrete action support too. :)

ChenYunan · 2024-08-13T16:00:49Z

reward = -logsigmoid(-logits) = -log[1 - sigmoid(logits)] = -log(1 - D), which corresponds the objective of G is minimize log(1-D).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential bug during training? #11

Potential bug during training? #11

liubaoryol commented Feb 3, 2023

Charlesyyun commented Apr 27, 2023

liubaoryol commented Apr 27, 2023

mikhail-vlasenko commented Jun 3, 2023

liubaoryol commented Jun 3, 2023

jagandecapri commented Jul 22, 2023

ChenYunan commented Aug 13, 2024

Potential bug during training? #11

Potential bug during training? #11

Comments

liubaoryol commented Feb 3, 2023

Charlesyyun commented Apr 27, 2023

liubaoryol commented Apr 27, 2023

mikhail-vlasenko commented Jun 3, 2023

liubaoryol commented Jun 3, 2023

jagandecapri commented Jul 22, 2023

ChenYunan commented Aug 13, 2024