Gymnasium api #1

korneelf1 · 2024-08-14T16:31:25Z

This pull request adds a file which adds functionalities to the simulator such as:

Reward calculation
Episode management
Reset management
Gymnasium integration

Offers a step function which operates as expected (performs one singular step for N_drones)
When using it with conventional RL frameworks, you should stick to N_drones = 1.
When using N_drones > 1, apply a post processor to add the transitions to the buffer for each drone (depending on how your framework operates).

Offers a step_rollout function which acts similar to the step function, but accepts a controller as input to accelerate the rollout collection. While this method allows a 50x increase in speed, care should be given to eg exploration vs exploitation (no noise is injected now) and the way that the resets are handled for each independent requirement (see reset_subenvs).

TODO:
reset sub envs using Numba, easier, faster

…rnels

…sition controller

…oneSim_gym into gymnasium_env

…actions, try different collector, skip action weight (not even sure what it should be as we use default drone)

…east 3M interactions (see learning to fly)

…t to impossible states

FEATURE: time limit in done function

korneelf1 · 2024-08-14T16:33:20Z

As for the Kerneller and Jitter, now created in gym_sim
Potentially do it in init such that these become available to all code in the repo?

korneelf1 · 2024-08-14T20:20:58Z

Adding basic unit tests for ensuring same output as og sim

Korneel Van den Berghe and others added 30 commits June 10, 2024 10:23

Custom Gym Env, Crazyflie simulator, sb3/tianshou progress, custom ke…

147b61a

…rnels

gymnasium API simulator

3169338

Improved docstrings

2ae4e38

support for actin history, numpy version of deque, slows down slightly

c2056c6

adapt reward function to target stabilizing controller rather than po…

a700384

…sition controller

Warn if action not in actionspace, might cause NaN values

2837bcf

enable position term in reward, test and train scripts

909e07b

updates on n_episodes (for testing)

d4d56c3

trained policy

06fd50b

imu model, changed reward structure

0895da6

changed reward str

70fdab0

Merge branch 'gymnasium_env' of https://github.com/korneelf1/fastPyDr…

e4c9bfd

…oneSim_gym into gymnasium_env

matplotlib rendering

3949c2d

mac policy

cf6bd60

fix test envs + stabalization reward

92e2f00

adapt train script

e52ecd2

improved collector, progressed gpu kernels

ddd1219

stabalization ctrl

cdd7482

Prioritized buffer, plot origin point for rendering

61da397

Cuda support for kernels

6ce17a5

Recurrent training

e15b93d

Trained policies

4be9604

position control

6b8fb11

position targets added to observation space

a6537ca

Forgot to actually add the actions to the history........

2904f7e

For training purposes, use stabilization reward

ca06898

sim on cpu, train on gpu

649590d

changes ext

71cd14f

restore training characteristics to last_working branch?

3d6577b

Spiking Actors, Updated Evotorch

fa52928

korneelf1 added 26 commits August 8, 2024 21:08

specify observation space, implement safety feature for out of bound …

5761d52

…actions, try different collector, skip action weight (not even sure what it should be as we use default drone)

clean evolearning file

94919a9

implement standard collector compatibility

9e523a5

remove device arg in collector, change hyperparameters to ensure at l…

9d9e9b0

…east 3M interactions (see learning to fly)

batch size of 1 in PPO apperently causes it to crash :)

048aed6

batch size of 1 does NOT work!

247c160

ANN evo learning

48a7226

Fix logging, fix PPO (action bounded)

6c2b343

Increase learning rate

374c6e1

Tianshou to evotorch wrapper

04e5e89

Normalize motor speeds

0939233

Remove test statements for reward printing

3481caa

Cant remove psets for now

1aafac8

N=1 different return in step

32a1376

reward function from git reward_squared_fast_learning

795a669

dont normalize, 1e-4 learning rate with no scheduling

7ad36bd

allow tianshou policy with evotorch

03926f8

Improved reset mechanism, initially completely random, but might rese…

b6ae688

…t to impossible states

Allwo multidim quaternion init

9842ea6

allow multiple lr schedulers

d4fb96e

step size of 1k for lr schedulers

9c58498

format gym env

a17f5c5

CLEANUP: clean up gym_sim file, helpers file

def3b11

FEATURE: time limit in done function

Something of a test

9549371

Cleaned up file structure

ac0b222

Syncing with OG repo

dece0bf

test dones and single steps

67d0e6a

Fix of unittests

7174ac3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gymnasium api #1

Gymnasium api #1

korneelf1 commented Aug 14, 2024

korneelf1 commented Aug 14, 2024

korneelf1 commented Aug 14, 2024

Gymnasium api #1

Are you sure you want to change the base?

Gymnasium api #1

Conversation

korneelf1 commented Aug 14, 2024

korneelf1 commented Aug 14, 2024

korneelf1 commented Aug 14, 2024