Skip to content

Latest commit

 

History

History
108 lines (82 loc) · 2.93 KB

README.md

File metadata and controls

108 lines (82 loc) · 2.93 KB

Language Instructed Reinforcement Learning for Human-AI Coordination

This is the code for Language Instructed Reinforcement Learning for Human-AI Coordination (ICML 2023).

The code has been tested with PyTorch 2.0.1

Get Started

Clone the repo with --recursive to include submodules

git clone --recursive [email protected]:hengyuan-hu/instruct-rl.git

Dependencies

pip install tdqm scipy matplotlib 'transformers[torch]'
pip install openai

The Say-Select Experiments

cd say-select

# train instruct-rl policies using default hyper-parameters
python train.py

# train vanilla rl policies
python train.py --lmd 0

The Hanabi Experiments

Prepare

First build the C++ part of the repo if you want to train/evaluate models

# under the root folder of the repo, compile
make

# Run this line before running any training code to prevent tensor operations
# from using single thread as our code uses multi-threading internally to run
# large number of environments in parallel
# Add it to your bashrc for convenience
export OMP_NUM_THREADS=1

Download pretrained OBL models and fully trained models used in ICML paper.

https://drive.google.com/file/d/1KezlTZ86zP6hdWIKNlfStvchUATuyjkl/view?usp=sharing

Run the code

Then go to the pyhanabi folder to run the code.

cd pyhanabi

Generate the language observations and language descriptions of the possible actions.

python gen_all_langs.py

The openai_api.py file contains code to evaluate prompts using openai-api. Please check that file for detailed instructions. That file is deisgned to run interactively using VSCode's Python interactive window. Pre-generated prior policies used in the paper are stored in pyhanabi/openai.

To train the model, run

export OMP_NUM_THREADS=1  # if you have not put this into bashrc
# ppo, the config uses color-instruction
python ppo_main.py --config configs/ppo.yaml

# iql,  the config uses color-instruction
python r2d2_main.py --config configs/iql.yaml

Additional resources

To evaluate a trained model

# inside pyhanabi folder
python tools/eval_model.py --weight1 ../models/icml/iql_rank/iql1_pkr_load_pikl_lambda0.15_seeda_num_epoch50/model0.pthw

To examine the condtional action matrix

python tools/action_matrix.py --model ../models/icml/iql_color/iql1_pkc_load_pikl_lambda0.15_seeda_num_epoch50/model0.pthw

To train a belief model

python train_belief.py --policy ../models/icml/iql_color/iql1_pkc_load_pikl_lambda0.15_seeda_num_epoch50/model0.pthw

To run sparta for the fast adaptation experiments in the appendix

python sparta.py

To host a bot online so that people can play with it.

cd live_bot
pip install websocket-client requests
python main.py --name Bot-Color --login_name Bot-Something --password agoodpassword