You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Are you requesting a feature or an implementation?
To handle the partial MDP task, the recurrent policy is currently quite popular. We need to add a lstm layer after the original conv (or mlp) policy, and store the hidden states for training. But in SLM-lab, the RecurrentNet class has limited ablities. It is more like a concatenation of series of input states, and the hidden states of rnn are not stored, which weanken the recurrent policy seriously.
For example, I used it with the default parameters to solve the cartpole task, and it failed.
Hi @yangysc , thanks for testing the RNN. The shared network from the spec ppo_rnn_shared_cartpole works slightly better because there are less hyperparameters to run. It yields slightly better results:
We have not thoroughly tested RNNs yet, but your observation is true and the RecurrentNet class is limited in that sense. The hidden state is discarded and not used as input in the next forward pass. We can implement this by storing the hidden state alongside the state in agent Memory, and retrieve it during memory.sample().
This will take some time to implement, but we're currently busy with benchmarking tasks. I'm making this issue as a feature request so we can get on it as soon as we have time.
In the meantime, you could try increasing the sequence length (seq_len) in the net component of the spec file. This will persist the hidden state for more steps.
Are you requesting a feature or an implementation?
To handle the partial MDP task, the recurrent policy is currently quite popular. We need to add a lstm layer after the original conv (or mlp) policy, and store the hidden states for training. But in SLM-lab, the RecurrentNet class has limited ablities. It is more like a concatenation of series of input states, and the hidden states of rnn are not stored, which weanken the recurrent policy seriously.
For example, I used it with the default parameters to solve the cartpole task, and it failed.
Even I changed the max_frame parameter of the env from 500 to 50000, the RecurrentNet still couldn't work.
If you have any suggested solutions
I'm afraid to cause more bugs, so I'm sorry not able to add this new feature. But I provide two examples.
OpenAI baselines
pytorch-a2c-ppo-acktr-gail
With this feature, I believe SLM-Lab will be the top-1 in pytorch.
Thanks in advance!
The text was updated successfully, but these errors were encountered: