Need some advice for GAIL implement #40
Replies: 4 comments
-
By the way, I want to know the differences between |
Beta Was this translation helpful? Give feedback.
-
I would like to recommend the following:
If it's ok with you, and you can make the changes (particularly points 1 and 2) it would be great if you could share the code to try to help with it. Regarding the distinction between The The |
Beta Was this translation helpful? Give feedback.
-
Hi @Toni-SM Thank you for you reply! After I changed the discriminator's I want to get some of your experience when you need to tune parameters in different tasks. They played an important role in PPO, GAIL and AMP, especially like Thank you so much. You help me a lot! |
Beta Was this translation helpful? Give feedback.
-
Good to hear it works. Yes, some algorithms require hyperparameter fine-tuning to work better. Hope to do it at some free time :) |
Beta Was this translation helpful? Give feedback.
-
Hi @Toni-SM
I'm trying to use skrl library to implement GAIL to perform the task
FactoryTaskNutBoltPick
in Isaac Gym. But it works badly. I need some advice.Firstly, my idea is to use
PPO
as the generator forGAIL
, so define theGAIL
class inheriting thePPO
class. And I define the modeldiscriminator
inGAIL
and set related configurations. The memory_exp is the expert's demonstration which size is 300000.The policy and value model is the same as
PPO
and discriminator model is define as follow:So, the
_update()
in GAIL will include two update function. First is '_update_disc' which is used to update the discriminator. Its inputs are sampled states and actions from policy and demonstrations. After_update_disc
, calculatedisc_reward
from discriminator and update the PPOsuper()._update(timestep, timesteps)
withdisc_reward
(not using policy's initial rewards). And the process ofsuper()._update(timestep, timesteps)
is basically consistent with the process of _update() in PPO. Usedisc_reward
to calculate the GAE, sample and compute the loss of policy, value and entropy(same as PPO).I‘m not sure if the GAIL I implemented is correct. But I used it to train
FactoryTaskNutBoltPick
and performed badly. Maybe related parameters are wrong? I need your advice.Beta Was this translation helpful? Give feedback.
All reactions