This repository implements the cart-pole framework in simulation to validate the feasibility of the Simplex-enabled Safe Continual Learning Machine (SeCLM) for safety-critical systems. If you find this work helpful, please refer to and cite the literature listed below.
@misc{cai2024simplexenabledsafecontinuallearning,
title={Simplex-enabled Safe Continual Learning Machine},
author={Yihao Cai and Hongpeng Cao and Yanbing Mao and Lui Sha and Marco Caccamo},
year={2024},
eprint={2409.05898},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2409.05898},
}
In the codebase, Hydra is utilized for configuring hyperparameters in YAML style. The structure is formatted as follows:
├── config <- Configure files for the framework
├── results
│ ├── hydra <- Hydra log for each runtime
│ ├── logs <- Logs for training/testing/evaluation
│ ├── models <- Trained weights files
│ └── plots <- Plots for cartpole phase/trajectory
├── scripts
│ ├── test <- For testing
│ └── train <- For training
│ ├── pretrain.sh <- Pretain a policy using Phy-DRL
│ ├── seclm_safe_learn.sh <- Continual learn with seclm
│ ├── seclm_safe_only.sh <- Continual learn with seclm (only use seclm for safety)
│ └── unsafe_continual_learn.sh <- Continual learn without seclm (no safety guarantee)
├── src
│ ├── envs <- Environment of real plant (cartpole)
│ ├── ha_teacher
│ ├── matlab <- m files for solving LMIs
│ ├── ha_teacher.py <- High Assurance Teacher
│ └── mat_engine.py <- Matlab engine interface
│ ├── hp_student
│ ├── agents <- Phy-DRL agent (High Performance Student)
│ ├── networks <- Phy-DRL network structure
│ └── ...
│ ├── trainer
│ └── trainer.py <- Training/Testing/Evaluation loop
│ ├── ...
│ └── physical_design.py <- Physical matrix design for cartpole
├── main.py <- Main file
└── requirements.txt <- Depencencies for code environment
It is recommended to create a conda environment for development. The required packages have been tested under Python 3.9.5, though they should be compatible with other Python versions.
Follow the steps below to build the Python environment:
-
First, download the appropriate version of Anaconda. After installation, create a virtual environment:
conda create --name cartpole python==3.9.5
-
Second, activate the conda environment you created:
conda activate cartpole
-
Finally, install all dependent packages by running:
pip install -r requirements.txt
The LMIs need matlab for computation. Please install matlab and check the requirements to ensure your python version is compatible with the installed matlab version. After that, build MATLAB Engine API for Python:
-
Find the folder under matlab root path:
cd <matlab_root>\extern\engines\python
-
Use pip to install:
python -m pip install .
Use Phy-DRL to pretrain a policy in a friction-free environment (around 1 million steps):
bash scripts/train/pretrain.sh
You can observe the training status using tensorboard:
tensorboard --logdir ./results/logs
To test the trained Phy-DRL policy, assign the model path for CHECKPOINT
in the scripts/test/pretrain.sh
,
set WITH_FRICTION
and ACTUATOR_NOISE
to false
, run command:
bash scripts/test/pretrain.sh
The cartpole system will safely converge to the set point using control action from Phy-DRL agent:
Fig 1. A Well-trained Agent Provides Safety and Stability
We now create a more real environment by introducing frictions and actuator noises: In scripts/test/pretrain.sh
,
set WITH_FRICTION
and ACTUATOR_NOISE
to true
, run the script again. You will find that, due to the
'sim-to-real' gap, the system will fail at the same initial condition:
Fig 2. System Failure due to Large Sim-to-Real Gap
Continual learning without safety guarantee:
bash scripts/train/unsafe_continual_learn.sh
Use SeCLM for continual learning, the teacher only guarantees safety (Agent doesn't learn from the teacher):
bash scripts/train/seclm_safe_only.sh
Use SeCLM for continual learning, the teacher guarantees safety and agent learns from teacher's behavior:
bash scripts/train/seclm_safe_learn.sh
In SeCLM, the teacher will always provide safety guarantee for the student (agent) during continual learning:
Fig 3. Teacher Guarantees Safety During Agent Learning (and Inference)
Below are the two different system phase portraits during training. The polygon represents the Safety Set
(hard
constraints), and the ellipse represents the Safety Envelope
(soft constraints):
Fig 4. Phase Behavior of Unsafe Learn (left) and SeCLM (right)
To show the agent's learning performance with SeCLM, we select the same (unsafe) initial condition and continually train for 10 episodes, either with or without SeCLM.
During the first 10 episodes, the system frequently failed, preventing the agent from gathering sufficient data to learn a safe policy.
Fig 5. Agent Random Exploration Causes System Failure
By SeCLM, the cartpole would always keep in a safe condition. To validate the training performance, we disable the teacher module during testing, and the result shows that the agent has learned the safe behavior from teacher:
Fig 6. Agent Inference after training 10 episodes by SeC-Learning Machine
- To plot cartpole
phase/trajectory
or live show itsanimation/trajectory
, check corresponding fields inconfig/logger/logger.yaml
- Choose between training by
steps
orepisodes
, set fieldtraining_by_steps
to true or false inconfig/base_config.yaml
- The repository uses the
logging
package for debugging. Set debug mode inconfig/base_config.yaml
- In case you get issues during live plot, check the suitable
backend libraries to use for matplotlib and set it
by matplotlib.use(xxx) in
src/logger/live_plotter.py