Colab Notebook #1

araffin · 2019-05-18T09:19:31Z

Hello,

I set up a colab notebook, so you can train your agents online on flappy envs ;) : https://colab.research.google.com/drive/13mJ1bU2tKVurG9chNhM0U7ivgVKlzPu7

Also, I have some questions about the training:

How many timesteps did you use for the hover and maneuver envs?
What was the final performance?

It seems, that your maneuver does not follow gym interface, the reward must be a float, but is currently a numpy array (I had to use a reward wrapper to catch up the error).

I would also normalize the reward using the opposite of the cost instead of the inverse (otherwise the reward magnitude is really huge), and maybe add a "life bonus" (+1 for each timesteps) for the hover env, see here for an example ;)

ffnc1020 · 2019-05-31T04:33:47Z

Hi,
Thank you for your interest and sorry about the delayed reply. The notebook is a great idea! You can make a pull request and add that in the markdown or however you see fit.

The hovering control of the flapping wing robot is still an open problem, so I just have a feedback controller for the demo, which is already not easy to achieve. The system is extremely unstable so it is very difficult to control.

The maneuvering is trained for 5 million steps, using default hyper parameters with reward scaling of 0.05. Yes the inverse creates a huge reward at the target position and pose which helps attract the robot to converge better.

I'll fix the reward to be an array.

I'll post the training performance and some demo clips in the next update.

juanmed · 2019-06-06T11:26:48Z

@araffin Hi, thanks for setting the colab notebook. As I explain in #2, pydart2 is deprecated for the latest dartsim version v6.9. To run successfully one needs to install dartsim<=6.8.2 from source. This is indicated in my last comment in #2 . Could it be possible that you update the notebook to reflect this change and be able to run it successfully? Thank you.

araffin · 2019-06-06T12:03:27Z

Could it be possible that you update the notebook to reflect this change and be able to run it successfully?

Well, you can copy and update the notebook yourself (and post the link here afterward ;) ). I don't have the time to do that now.

SaltedfishLZX · 2020-05-07T08:48:39Z

it seems that the error of the reward type still exist now, the type is np.ndarray most of the time. BTW, the maneuver env seems to train the model to fix the out of ARC controller. I'm wondering that if there is a successful example of training without a feedback controller, or it's just too difficult to do such a control?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Colab Notebook #1

Colab Notebook #1

araffin commented May 18, 2019

ffnc1020 commented May 31, 2019

juanmed commented Jun 6, 2019

araffin commented Jun 6, 2019

SaltedfishLZX commented May 7, 2020

Colab Notebook #1

Colab Notebook #1

Comments

araffin commented May 18, 2019

ffnc1020 commented May 31, 2019

juanmed commented Jun 6, 2019

araffin commented Jun 6, 2019

SaltedfishLZX commented May 7, 2020