Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gymnasium api #1

Open
wants to merge 76 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
147b61a
Custom Gym Env, Crazyflie simulator, sb3/tianshou progress, custom ke…
Jun 10, 2024
3169338
gymnasium API simulator
Jun 17, 2024
2ae4e38
Improved docstrings
Jun 17, 2024
c2056c6
support for actin history, numpy version of deque, slows down slightly
Jun 17, 2024
a700384
adapt reward function to target stabilizing controller rather than po…
Jun 17, 2024
2837bcf
Warn if action not in actionspace, might cause NaN values
Jun 17, 2024
909e07b
enable position term in reward, test and train scripts
Jun 18, 2024
d4d56c3
updates on n_episodes (for testing)
Jun 19, 2024
06fd50b
trained policy
korneelf1 Jun 19, 2024
0895da6
imu model, changed reward structure
Jun 19, 2024
70fdab0
changed reward str
Jun 19, 2024
e4c9bfd
Merge branch 'gymnasium_env' of https://github.com/korneelf1/fastPyDr…
Jun 19, 2024
3949c2d
matplotlib rendering
Jun 19, 2024
cf6bd60
mac policy
korneelf1 Jun 19, 2024
92e2f00
fix test envs + stabalization reward
Jun 19, 2024
e52ecd2
adapt train script
Jun 19, 2024
ddd1219
improved collector, progressed gpu kernels
korneelf1 Jun 24, 2024
cdd7482
stabalization ctrl
korneelf1 Jun 24, 2024
61da397
Prioritized buffer, plot origin point for rendering
korneelf1 Jun 24, 2024
6ce17a5
Cuda support for kernels
korneelf1 Jun 24, 2024
e15b93d
Recurrent training
korneelf1 Jun 25, 2024
4be9604
Trained policies
korneelf1 Jun 25, 2024
6b8fb11
position control
korneelf1 Jun 25, 2024
a6537ca
position targets added to observation space
korneelf1 Jun 26, 2024
2904f7e
Forgot to actually add the actions to the history........
korneelf1 Jun 26, 2024
ca06898
For training purposes, use stabilization reward
korneelf1 Jun 26, 2024
649590d
sim on cpu, train on gpu
korneelf1 Jun 26, 2024
71cd14f
changes ext
korneelf1 Jun 26, 2024
3d6577b
restore training characteristics to last_working branch?
korneelf1 Jun 27, 2024
fa52928
Spiking Actors, Updated Evotorch
korneelf1 Jul 25, 2024
e7854c9
wandb logging of evotorch
korneelf1 Jul 25, 2024
31e9e44
conda setup
korneelf1 Jul 26, 2024
1ae8af6
remove nullable slice in collector (unused)
korneelf1 Jul 26, 2024
8ec0218
make reqs file
korneelf1 Jul 26, 2024
fa19cc3
Merge remote-tracking branch 'refs/remotes/origin/gymnasium_env' into…
korneelf1 Jul 26, 2024
9863f5a
spiking actor implemented
korneelf1 Jul 29, 2024
0da6c0d
SMLP is now nn.Module, evotorch adapted to support SNN
korneelf1 Jul 30, 2024
43a7b19
save the evolutionary model
korneelf1 Jul 30, 2024
c30c968
debugging by adding option to specify task
korneelf1 Jul 31, 2024
18794db
swap around n_states
korneelf1 Jul 31, 2024
b63284b
more info on whats happening in terminal
korneelf1 Jul 31, 2024
e7863a8
debugging....
korneelf1 Aug 1, 2024
a87fd1b
Fix speed issue, add masked networks
korneelf1 Aug 2, 2024
e1ff0ab
!!! thrust constant and torque constant were swapped around (k is thr…
korneelf1 Aug 4, 2024
ba4f150
Trying out og drone
korneelf1 Aug 5, 2024
3fb2767
less strict termination conditions to speed up training
korneelf1 Aug 5, 2024
ec19a26
I wasnt logging the args...
korneelf1 Aug 5, 2024
54595d2
Cumulative reward if evaluation env
korneelf1 Aug 5, 2024
5761d52
specify observation space, implement safety feature for out of bound …
korneelf1 Aug 8, 2024
94919a9
clean evolearning file
korneelf1 Aug 8, 2024
9e523a5
implement standard collector compatibility
korneelf1 Aug 9, 2024
9d9e9b0
remove device arg in collector, change hyperparameters to ensure at l…
korneelf1 Aug 9, 2024
048aed6
batch size of 1 in PPO apperently causes it to crash :)
korneelf1 Aug 9, 2024
247c160
batch size of 1 does NOT work!
korneelf1 Aug 9, 2024
48a7226
ANN evo learning
korneelf1 Aug 9, 2024
6c2b343
Fix logging, fix PPO (action bounded)
korneelf1 Aug 9, 2024
374c6e1
Increase learning rate
korneelf1 Aug 12, 2024
04e5e89
Tianshou to evotorch wrapper
korneelf1 Aug 12, 2024
0939233
Normalize motor speeds
korneelf1 Aug 13, 2024
3481caa
Remove test statements for reward printing
korneelf1 Aug 13, 2024
1aafac8
Cant remove psets for now
korneelf1 Aug 13, 2024
32a1376
N=1 different return in step
korneelf1 Aug 13, 2024
795a669
reward function from git reward_squared_fast_learning
korneelf1 Aug 13, 2024
7ad36bd
dont normalize, 1e-4 learning rate with no scheduling
korneelf1 Aug 13, 2024
03926f8
allow tianshou policy with evotorch
korneelf1 Aug 13, 2024
b6ae688
Improved reset mechanism, initially completely random, but might rese…
korneelf1 Aug 14, 2024
9842ea6
Allwo multidim quaternion init
korneelf1 Aug 14, 2024
d4fb96e
allow multiple lr schedulers
korneelf1 Aug 14, 2024
9c58498
step size of 1k for lr schedulers
korneelf1 Aug 14, 2024
a17f5c5
format gym env
korneelf1 Aug 14, 2024
def3b11
CLEANUP: clean up gym_sim file, helpers file
korneelf1 Aug 14, 2024
9549371
Something of a test
korneelf1 Aug 14, 2024
ac0b222
Cleaned up file structure
korneelf1 Aug 14, 2024
dece0bf
Syncing with OG repo
korneelf1 Aug 14, 2024
67d0e6a
test dones and single steps
korneelf1 Aug 14, 2024
7174ac3
Fix of unittests
korneelf1 Aug 15, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file added __init__.py
Empty file.
10 changes: 10 additions & 0 deletions crafts.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,16 @@

class Rotor:
def __init__(self, x=[0., 0., 0.], wmax=4000., Tmax=4., k=0., cm=0.01, tau=0.03, Izz=1e-6, dir='cw'):
'''
x: position of the rotor in the body frame
wmax: maximum angular velocity
Tmax: maximum thrust
k: thrust coefficient
cm: moment arm
tau: time constant
Izz: moment of inertia
dir: direction of rotation
'''
self.x = np.asarray(x)
self.wmax = wmax
self.Tmax = Tmax
Expand Down
974 changes: 974 additions & 0 deletions gym_sim.py

Large diffs are not rendered by default.

101 changes: 101 additions & 0 deletions helpers.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
import numpy as np

# tianshou code
from tianshou.policy import SACPolicy, BasePolicy
from tianshou.utils.net.continuous import ActorProb, Critic
from tianshou.utils.net.common import Net
from tianshou.data import VectorReplayBuffer
from tianshou.trainer import OffpolicyTrainer
from tianshou.highlevel.logger import LoggerFactoryDefault
from tianshou.utils import WandbLogger


import torch
class NumpyDeque(object):
def __init__(self, shape:tuple, device='cpu') -> None:
self.shape_arr = shape

self.array = np.zeros((self.shape_arr), dtype=np.float32)

def __len__(self):
return self.shape_arr[1]

def append(self, els):
assert els.shape[0] == self.shape_arr[0]

self.array = np.roll(self.array, els.shape[1], axis=1)
self.array[:,0:els.shape[1]] = els.astype(np.float32)

def reset(self, vecs=None):
if vecs is None:
self.array = np.zeros((self.shape_arr), dtype=np.float32)
elif isinstance(vecs,np.ndarray):
self.array[vecs==1] = 0.

def __call__(self):
return self.array
def __repr__(self):
return str(self.array)
def __array__(self, dtype=None):
if dtype:
return self.array.astype(dtype)
return self.array
@property
def shape(self):
return self.shape_arr


if __name__=='__main__':
def create_policy(env):
observation_space = env.observation_space.shape or env.observation_space.n
action_space = env.action_space.shape or env.action_space.n
# create the networks behind actors and critics
net_a = Net(state_shape=observation_space,
hidden_sizes=[64,64], device='cpu')
net_c1 = Net(state_shape=observation_space,action_shape=action_space,
hidden_sizes=[64,64],
concat=True,)
net_c2 = Net(state_shape=observation_space,action_shape=action_space,
hidden_sizes=[64,64],
concat=True,)

# create actors and critics
actor = ActorProb(
net_a,
action_space,
unbounded=True,
conditioned_sigma=True,
)
critic1 = Critic(net_c1, device='cpu')
critic2 = Critic(net_c2, device='cpu')

# create the optimizers
actor_optim = torch.optim.Adam(actor.parameters(), lr=1e-3)
critic_optim = torch.optim.Adam(critic1.parameters(), lr=1e-3)
critic2_optim = torch.optim.Adam(critic2.parameters(), lr=1e-3)

# create the policy
policy = SACPolicy(actor=actor, actor_optim=actor_optim, \
critic=critic1, critic_optim=critic_optim,\
critic2=critic2, critic2_optim=critic2_optim,\
action_space=env.action_space,\
observation_space=env.observation_space, \
action_scaling=True) # make sure actions are scaled properly
return policy

def forward_policy(policy, state):
return policy(state)


test_qeue = NumpyDeque((3,5))
print(type(test_qeue.array))
print(test_qeue)
ones_1 = np.ones((3,1))
twos_2 = np.ones((3,2))*2
test_qeue.append(ones_1)
print(test_qeue)
test_qeue.append(twos_2)
print(test_qeue)
test_qeue.reset(np.array([0,1,0]))
print(test_qeue)

111 changes: 111 additions & 0 deletions imu.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
import numpy as np
from helpers import NumpyDeque
import matplotlib.pyplot as plt

def quaternion_rotation_matrix(Q):
"""
Covert a quaternion into a full three-dimensional rotation matrix.

Input
:param Q: A 4 element array representing the quaternion (q0,q1,q2,q3)

Output
:return: A 3x3 element matrix representing the full 3D rotation matrix.
This rotation matrix converts a point in the local reference
frame to a point in the global reference frame.
"""
# Extract the values from Q
q0 = Q[0]
q1 = Q[1]
q2 = Q[2]
q3 = Q[3]

# First row of the rotation matrix
r00 = 2 * (q0 * q0 + q1 * q1) - 1
r01 = 2 * (q1 * q2 - q0 * q3)
r02 = 2 * (q1 * q3 + q0 * q2)

# Second row of the rotation matrix
r10 = 2 * (q1 * q2 + q0 * q3)
r11 = 2 * (q0 * q0 + q2 * q2) - 1
r12 = 2 * (q2 * q3 - q0 * q1)

# Third row of the rotation matrix
r20 = 2 * (q1 * q3 - q0 * q2)
r21 = 2 * (q2 * q3 + q0 * q1)
r22 = 2 * (q0 * q0 + q3 * q3) - 1

# 3x3 rotation matrix
rot_matrix = np.array([[r00, r01, r02],
[r10, r11, r12],
[r20, r21, r22]])

return rot_matrix

class IMU:
def __init__(self, noise=np.array([0, 0, 0, 0, 0, 0]), bias=np.array([0, 0, 0, 0, 0, 0]), dt=0.01, offset=np.array([0, 0, 0])):
'''
noise: standard deviation of the noise
bias: initial value of the bias (modelled as brownian motion)
offset: offset of the sensor in body frame'''
assert len(noise) == 6
assert len(bias) == 6
assert len(offset) == 3

self.dt = dt

self.noise = noise/np.sqrt(self.dt)
self.bias = bias

self.accel = np.array([0, 0, 0])
self.gyro = np.array([0, 0, 0])

self.offset = offset

self.vel_history = NumpyDeque((100, 3))
self.accel_history = NumpyDeque((100, 3))
self.gyro_history = NumpyDeque((100, 3))

def add_brownian_bias(self):
self.bias += np.random.normal(0, 0.01, size=(6,)) * np.sqrt(self.dt)
def add_noise(self):
self.add_brownian_bias()
self.accel = np.random.normal(self.accel, self.noise) - self.bias
self.gyro = np.random.normal(self.gyro, self.noise) - self.bias


def simulate(self, state):
'''
state: [x, y, z, vx, vy, vz, qw, qx, qy, qz, wx, wy, wz]
'''
# update acceleration
vel = state[3:6]
self.vel_history.append(vel)
q = state[6:10]
R = quaternion_rotation_matrix(q)

# transfrom velocity to body frame
vel_body = np.dot(R.T, vel)
# transform gravity to body frame
accel_grav = np.dot(R, np.array([0, 0, -9.81]))
# compute acceleration from velocity history
accel_vel_change = (self.vel_history[0] - self.vel_history[1]) / self.dt

self.accel = accel_grav + accel_vel_change

self.gyro = state[10:13]

# update angular velocity
self.gyro = state[10:13]

self.add_noise()
return np.concatenate([self.accel, self.gyro])

def reset(self):
self.accel = np.array([0, 0, 0])
self.gyro = np.array([0, 0, 0])
self.bias = np.array([0, 0, 0, 0, 0, 0])


def render(self):
pass
Loading