Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Colab Notebook #1

Open
araffin opened this issue May 18, 2019 · 4 comments
Open

Colab Notebook #1

araffin opened this issue May 18, 2019 · 4 comments

Comments

@araffin
Copy link

araffin commented May 18, 2019

Hello,

I set up a colab notebook, so you can train your agents online on flappy envs ;) : https://colab.research.google.com/drive/13mJ1bU2tKVurG9chNhM0U7ivgVKlzPu7

Also, I have some questions about the training:

  • How many timesteps did you use for the hover and maneuver envs?
  • What was the final performance?

It seems, that your maneuver does not follow gym interface, the reward must be a float, but is currently a numpy array (I had to use a reward wrapper to catch up the error).

I would also normalize the reward using the opposite of the cost instead of the inverse (otherwise the reward magnitude is really huge), and maybe add a "life bonus" (+1 for each timesteps) for the hover env, see here for an example ;)

@ffnc1020
Copy link
Collaborator

Hi,
Thank you for your interest and sorry about the delayed reply. The notebook is a great idea! You can make a pull request and add that in the markdown or however you see fit.

The hovering control of the flapping wing robot is still an open problem, so I just have a feedback controller for the demo, which is already not easy to achieve. The system is extremely unstable so it is very difficult to control.

The maneuvering is trained for 5 million steps, using default hyper parameters with reward scaling of 0.05. Yes the inverse creates a huge reward at the target position and pose which helps attract the robot to converge better.

I'll fix the reward to be an array.

I'll post the training performance and some demo clips in the next update.

@juanmed
Copy link

juanmed commented Jun 6, 2019

@araffin Hi, thanks for setting the colab notebook. As I explain in #2, pydart2 is deprecated for the latest dartsim version v6.9. To run successfully one needs to install dartsim<=6.8.2 from source. This is indicated in my last comment in #2 . Could it be possible that you update the notebook to reflect this change and be able to run it successfully? Thank you.

@araffin
Copy link
Author

araffin commented Jun 6, 2019

Could it be possible that you update the notebook to reflect this change and be able to run it successfully?

Well, you can copy and update the notebook yourself (and post the link here afterward ;) ). I don't have the time to do that now.

@SaltedfishLZX
Copy link

it seems that the error of the reward type still exist now, the type is np.ndarray most of the time. BTW, the maneuver env seems to train the model to fix the out of ARC controller. I'm wondering that if there is a successful example of training without a feedback controller, or it's just too difficult to do such a control?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants