Belief-Grounded Networks for Accelerated Robot Learning under Partial Observability

This is the repo stored the code for our paper Belief-Grounded Network for Accelerated Robot Learning under Partial Observability accepted at CoRL 2020. This is a joint repo contributed with Brett and Song. If you use this repository in published work, please cite the paper:

@article{nguyen2020belief,
  title={Belief-Grounded Networks for Accelerated Robot Learning under Partial Observability},
  author={Nguyen, Hai and Daley, Brett and Song, Xinchao and Amato, Christopher and Platt, Robert},
  journal={arXiv preprint arXiv:2010.09170},
  year={2020}
}

Setup

Install gym-pomdps from https://github.com/abaisero/gym-pomdps by pip install -e .
Install dependency: pip install -r requirements.txt
Install MuJoCo
After that
- Copy .pomdp domain files in folder domains/pomdp_files to gym_pomdps/pomdps
- Copy the domains' folders in domains/pomdp_files to gym/envs/
- Register new domains with gym by adding the content in modifications/__init__.py to gym/envs/__init__.py
- Modify several baselines files as in the folder modifications
  - baselines/bench/monitor.py - adding discounted reward calculation
  - baselines/common/vec_env/dummy_vec_env.py - adding get states and get belief functions
  - baselines/common/vec_env/shmem_vec_env.py - adding get states and get belief functions
- Modify line 96 in gym-pomdps/gym_pomdps/pomdp.py from state_next = -1 to state_next = self.state_space.n

Train

Algorithm names: ab-cb, ah-cb, ah-ch, ah-cs
Domain names: PomdpHallway-v0, PomdpHallway-v2, PomdpRs44-v0, PomdpRS55-v0, MomdpBumps-v0, MomdpPlates-v0, MomdpTopPlate-v0
Running modes: train, simulate (replay a policy)

Command (tee is to save the output to a file for plotting later):

Train:

python3 -u main.py --algo algo-name --num-env-steps num-steps --seed 0 --env-name name --running-mode train --seed 0 | tee log.txt

Simulate a saved policy:

python3 main.py --algo algo-name --num-env-steps num-steps --seed 0 --env-name name --running-mode simulate --policy-file file --eval-interval 100

Runing BGN w/ an ah-ch agent:

python3 -u main.py --algo ah-ch --num-env-steps num-steps --seed 0 --env-name name --running-mode train --belief-loss-coef 1.0 | tee log.txt

For all training commands, the policy will be autonomously saved at scripts/logs/env-name/algo-name.#seed.mdl

Visualize

Plot using the script in folder plot which takes a text file as the input with the option to plot training/validation results, smooth window:
- Plot a single folder: sub-folders must have names such as ahcb, abcb, ahcs, ahch, bgn, each contain the runs for different seeds:
```
python3 plot_folder.py --folder hallway --window 10 --mode training/testing
```
- Plot multiple folders:
```
python3 plot_folders.py --folder hallway hallway2 rs44 rs55 --window 10 10 10 10 --mode testing testing training training
```

License

This code is released under the MIT License.

Acknowledgments

This codebase evolved from the pytorch-a2c-ppo-acktr-gail but heavily modified.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
domains		domains
modifications		modifications
rl		rl
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Belief-Grounded Networks for Accelerated Robot Learning under Partial Observability

Contents

Setup

Train

Visualize

License

Acknowledgments

About

Releases

Packages

Languages

License

xinchaosong/belief-grounded-network

Folders and files

Latest commit

History

Repository files navigation

Belief-Grounded Networks for Accelerated Robot Learning under Partial Observability

Contents

Setup

Train

Visualize

License

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages