Skip to content

Charlescai123/Simplex-Cartpole

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simplex-Cartpole

Tensorflow Python Linux Gym Matlab


This repository implements the cart-pole framework in simulation to validate the feasibility of the Simplex-enabled Safe Continual Learning Machine (SeCLM) for safety-critical systems. If you find this work helpful, please refer to and cite the literature listed below.

@misc{cai2024simplexenabledsafecontinuallearning,
      title={Simplex-enabled Safe Continual Learning Machine}, 
      author={Yihao Cai and Hongpeng Cao and Yanbing Mao and Lui Sha and Marco Caccamo},
      year={2024},
      eprint={2409.05898},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2409.05898}, 
}

Table of Content


Code Structure

In the codebase, Hydra is utilized for configuring hyperparameters in YAML style. The structure is formatted as follows:

├── config                                <- Configure files for the framework
├── results   
│      ├── hydra                          <- Hydra log for each runtime
│      ├── logs                           <- Logs for training/testing/evaluation
│      ├── models                         <- Trained weights files
│      └── plots                          <- Plots for cartpole phase/trajectory
├── scripts                              
│      ├── test                           <- For testing
│      └── train                          <- For training
│           ├── pretrain.sh                   <- Pretain a policy using Phy-DRL
│           ├── seclm_safe_learn.sh           <- Continual learn with seclm
│           ├── seclm_safe_only.sh            <- Continual learn with seclm (only use seclm for safety)
│           └── unsafe_continual_learn.sh     <- Continual learn without seclm (no safety guarantee)
├── src                              
│    ├── envs                             <- Environment of real plant (cartpole)
│    ├── ha_teacher                  
│           ├── matlab                    <- m files for solving LMIs
│           ├── ha_teacher.py             <- High Assurance Teacher
│           └── mat_engine.py             <- Matlab engine interface
│    ├── hp_student                               
│           ├── agents                    <- Phy-DRL agent (High Performance Student)
│           ├── networks                  <- Phy-DRL network structure
│           └── ... 
│    ├── trainer                  
│           └── trainer.py                <- Training/Testing/Evaluation loop                               
│    ├── ... 
│    └── physical_design.py               <- Physical matrix design for cartpole     
├── main.py                               <- Main file
└── requirements.txt                      <- Depencencies for code environment                      

Environment Setup

It is recommended to create a conda environment for development. The required packages have been tested under Python 3.9.5, though they should be compatible with other Python versions.

Python Package

Follow the steps below to build the Python environment:

  1. First, download the appropriate version of Anaconda. After installation, create a virtual environment:

    conda create --name cartpole python==3.9.5
  2. Second, activate the conda environment you created:

    conda activate cartpole
  3. Finally, install all dependent packages by running:

    pip install -r requirements.txt

Matlab Interface

The LMIs need matlab for computation. Please install matlab and check the requirements to ensure your python version is compatible with the installed matlab version. After that, build MATLAB Engine API for Python:

  1. Find the folder under matlab root path:

    cd <matlab_root>\extern\engines\python
  2. Use pip to install:

    python -m pip install .

Experiment

Pretrain a Policy


Use Phy-DRL to pretrain a policy in a friction-free environment (around 1 million steps):

bash scripts/train/pretrain.sh

You can observe the training status using tensorboard:

tensorboard --logdir ./results/logs

To test the trained Phy-DRL policy, assign the model path for CHECKPOINT in the scripts/test/pretrain.sh, set WITH_FRICTION and ACTUATOR_NOISE to false, run command:

bash scripts/test/pretrain.sh

The cartpole system will safely converge to the set point using control action from Phy-DRL agent:

ani_pretrain traj_pretrain
Fig 1. A Well-trained Agent Provides Safety and Stability

We now create a more real environment by introducing frictions and actuator noises: In scripts/test/pretrain.sh, set WITH_FRICTION and ACTUATOR_NOISE to true, run the script again. You will find that, due to the 'sim-to-real' gap, the system will fail at the same initial condition:

ani_pretrain_gap traj_pretrain_gap
Fig 2. System Failure due to Large Sim-to-Real Gap

Continual Learning


1. Unsafe continual learning

Continual learning without safety guarantee:

bash scripts/train/unsafe_continual_learn.sh 

2. SeCLM only for safety

Use SeCLM for continual learning, the teacher only guarantees safety (Agent doesn't learn from the teacher):

bash scripts/train/seclm_safe_only.sh 

3. SeCLM for safe continual learning

Use SeCLM for continual learning, the teacher guarantees safety and agent learns from teacher's behavior:

bash scripts/train/seclm_safe_learn.sh 

In SeCLM, the teacher will always provide safety guarantee for the student (agent) during continual learning:

ani_seclm_train traj_seclm_train
Fig 3. Teacher Guarantees Safety During Agent Learning (and Inference)

Below are the two different system phase portraits during training. The polygon represents the Safety Set (hard constraints), and the ellipse represents the Safety Envelope (soft constraints):

phase_unsafe_learn phase_seclm
Fig 4. Phase Behavior of Unsafe Learn (left) and SeCLM (right)

Results


To show the agent's learning performance with SeCLM, we select the same (unsafe) initial condition and continually train for 10 episodes, either with or without SeCLM.

  • Unsafe Learn

During the first 10 episodes, the system frequently failed, preventing the agent from gathering sufficient data to learn a safe policy.

ani_unsafe_learn traj_unsafe_learn
Fig 5. Agent Random Exploration Causes System Failure

  • SeCLM

By SeCLM, the cartpole would always keep in a safe condition. To validate the training performance, we disable the teacher module during testing, and the result shows that the agent has learned the safe behavior from teacher:

ani_seclm_eval_10 traj_seclm_eval_10
Fig 6. Agent Inference after training 10 episodes by SeC-Learning Machine

Misc


  • To plot cartpole phase/trajectory or live show its animation/trajectory, check corresponding fields in config/logger/logger.yaml
  • Choose between training by steps or episodes, set field training_by_steps to true or false in config/base_config.yaml
  • The repository uses the logging package for debugging. Set debug mode in config/base_config.yaml
  • In case you get issues during live plot, check the suitable backend libraries to use for matplotlib and set it by matplotlib.use(xxx) in src/logger/live_plotter.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published