Simplex-Cartpole

This repository implements the cart-pole framework in simulation to validate the feasibility of the Simplex-enabled Safe Continual Learning Machine (SeCLM) for safety-critical systems. If you find this work helpful, please refer to and cite the literature listed below.

@misc{cai2024simplexenabledsafecontinuallearning,
      title={Simplex-enabled Safe Continual Learning Machine}, 
      author={Yihao Cai and Hongpeng Cao and Yanbing Mao and Lui Sha and Marco Caccamo},
      year={2024},
      eprint={2409.05898},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2409.05898}, 
}

Table of Content

Code Structure
Environment Setup
Experiment
Misc

Code Structure

In the codebase, Hydra is utilized for configuring hyperparameters in YAML style. The structure is formatted as follows:

├── config                                <- Configure files for the framework
├── results   
│      ├── hydra                          <- Hydra log for each runtime
│      ├── logs                           <- Logs for training/testing/evaluation
│      ├── models                         <- Trained weights files
│      └── plots                          <- Plots for cartpole phase/trajectory
├── scripts                              
│      ├── test                           <- For testing
│      └── train                          <- For training
│           ├── pretrain.sh                   <- Pretain a policy using Phy-DRL
│           ├── seclm_safe_learn.sh           <- Continual learn with seclm
│           ├── seclm_safe_only.sh            <- Continual learn with seclm (only use seclm for safety)
│           └── unsafe_continual_learn.sh     <- Continual learn without seclm (no safety guarantee)
├── src                              
│    ├── envs                             <- Environment of real plant (cartpole)
│    ├── ha_teacher                  
│           ├── matlab                    <- m files for solving LMIs
│           ├── ha_teacher.py             <- High Assurance Teacher
│           └── mat_engine.py             <- Matlab engine interface
│    ├── hp_student                               
│           ├── agents                    <- Phy-DRL agent (High Performance Student)
│           ├── networks                  <- Phy-DRL network structure
│           └── ... 
│    ├── trainer                  
│           └── trainer.py                <- Training/Testing/Evaluation loop                               
│    ├── ... 
│    └── physical_design.py               <- Physical matrix design for cartpole     
├── main.py                               <- Main file
└── requirements.txt                      <- Depencencies for code environment

Environment Setup

It is recommended to create a conda environment for development. The required packages have been tested under Python 3.9.5, though they should be compatible with other Python versions.

Python Package

Follow the steps below to build the Python environment:

First, download the appropriate version of Anaconda. After installation, create a virtual environment:
```
conda create --name cartpole python==3.9.5
```
Second, activate the conda environment you created:
```
conda activate cartpole
```
Finally, install all dependent packages by running:
```
pip install -r requirements.txt
```

Matlab Interface

The LMIs need matlab for computation. Please install matlab and check the requirements to ensure your python version is compatible with the installed matlab version. After that, build MATLAB Engine API for Python:

Find the folder under matlab root path:
```
cd <matlab_root>\extern\engines\python
```
Use pip to install:
```
python -m pip install .
```

Experiment

Pretrain a Policy

Use Phy-DRL to pretrain a policy in a friction-free environment (around 1 million steps):

bash scripts/train/pretrain.sh

You can observe the training status using tensorboard:

tensorboard --logdir ./results/logs

To test the trained Phy-DRL policy, assign the model path for CHECKPOINT in the scripts/test/pretrain.sh, set WITH_FRICTION and ACTUATOR_NOISE to false, run command:

bash scripts/test/pretrain.sh

The cartpole system will safely converge to the set point using control action from Phy-DRL agent:

Fig 1. A Well-trained Agent Provides Safety and Stability

We now create a more real environment by introducing frictions and actuator noises: In scripts/test/pretrain.sh, set WITH_FRICTION and ACTUATOR_NOISE to true, run the script again. You will find that, due to the 'sim-to-real' gap, the system will fail at the same initial condition:

Fig 2. System Failure due to Large Sim-to-Real Gap

Continual Learning

1. Unsafe continual learning

Continual learning without safety guarantee:

bash scripts/train/unsafe_continual_learn.sh

2. SeCLM only for safety

Use SeCLM for continual learning, the teacher only guarantees safety (Agent doesn't learn from the teacher):

bash scripts/train/seclm_safe_only.sh

3. SeCLM for safe continual learning

Use SeCLM for continual learning, the teacher guarantees safety and agent learns from teacher's behavior:

bash scripts/train/seclm_safe_learn.sh

In SeCLM, the teacher will always provide safety guarantee for the student (agent) during continual learning:

Fig 3. Teacher Guarantees Safety During Agent Learning (and Inference)

Below are the two different system phase portraits during training. The polygon represents the Safety Set (hard constraints), and the ellipse represents the Safety Envelope (soft constraints):

Fig 4. Phase Behavior of Unsafe Learn (left) and SeCLM (right)

Results

To show the agent's learning performance with SeCLM, we select the same (unsafe) initial condition and continually train for 10 episodes, either with or without SeCLM.

Unsafe Learn

During the first 10 episodes, the system frequently failed, preventing the agent from gathering sufficient data to learn a safe policy.

Fig 5. Agent Random Exploration Causes System Failure

SeCLM

By SeCLM, the cartpole would always keep in a safe condition. To validate the training performance, we disable the teacher module during testing, and the result shows that the agent has learned the safe behavior from teacher:

Fig 6. Agent Inference after training 10 episodes by SeC-Learning Machine

Misc

To plot cartpole phase/trajectory or live show its animation/trajectory, check corresponding fields in config/logger/logger.yaml
Choose between training by steps or episodes, set field training_by_steps to true or false in config/base_config.yaml
The repository uses the logging package for debugging. Set debug mode in config/base_config.yaml
In case you get issues during live plot, check the suitable backend libraries to use for matplotlib and set it by matplotlib.use(xxx) in src/logger/live_plotter.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simplex-Cartpole

Table of Content

Code Structure

Environment Setup

Python Package

Matlab Interface

Experiment

Pretrain a Policy

Continual Learning

1. Unsafe continual learning

2. SeCLM only for safety

3. SeCLM for safe continual learning

Results

Unsafe Learn

SeCLM

Misc

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
config		config
docs		docs
results		results
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Charlescai123/Simplex-Cartpole

Folders and files

Latest commit

History

Repository files navigation

Simplex-Cartpole

Table of Content

Code Structure

Environment Setup

Python Package

Matlab Interface

Experiment

Pretrain a Policy

Continual Learning

1. Unsafe continual learning

2. SeCLM only for safety

3. SeCLM for safe continual learning

Results

Unsafe Learn

SeCLM

Misc

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages