Boosting Efficient Reinforcement Learning for Vision-and-Language Navigation With Open-Sourced LLM

This repository is the official implementation of Boosting Efficient Reinforcement Learning for Vision-and-Language Navigation With Open-Sourced LLM.

Environment Installation

1. Use anaconda to create a Python 3.8 environment:

conda create -n vln python3.8
conda activate vln

2. Install CLIP:

pip install git+https://github.com/openai/CLIP.git

3. Install dependencies:

pip install -r python_requirements.txt

4. Install Matterport3D simulators (v0.1):

sudo apt-get install libjsoncpp-dev libepoxy-dev libglm-dev libosmesa6 libosmesa6-dev libglew-dev
mkdir build && cd build
cmake -DEGL_RENDERING=ON ..
make -j8

5. Download image and object features for environments, unzip and place under img_features/:

Please check here for image features
Please check here for object features

6. Download the CLIP model (optional)

Download the CLIP model here and place it under img_features/ or run the script to automatically download the model

7. Download ChatGLM-6B model (optional)

Download the ChatGLM-6B model here for online instruction decomposition, but this is not necessary because the instructions have been pre-processed and stored in the json file under tasks/R2R/data/

Run

Training Agent

bash run/agent.bash 0

0 is the id of GPU. It will train the agent and save the snapshot under snap/agent/.

Test Agent

After training the agent, test the agent by

bash run/test_agent.bash 0

0 is the id of GPU. It will load trained agent and test it on test set.

Additional experiments on reward functions

We opt for a simple reward function for two main reasons. First, this reward design sufficiently supports the agent in learning effective policies and facilitates fair comparisons with existing methods. Second, we aim to minimize task-specific customizations to maintain the model's generalizability; overly complex or inaccurate rewards could diminish the model's performance.

Methods	Validation Seen	Validation Unseen
	NL↓ NE↓ SR↑ SPL↑	NL↓ NE↓ SR↑ SPL↑
DILLM-VLN	12.8 4.74 57.2 0.51	11.4 5.31 49.4 0.44
+ SGS	12.3 5.15 53.5 0.48	12.8 5.37 47.5 0.41
+ OGS	11.8 5.27 52.2 0.47	11.7 5.66 46.1 0.40

We have incorporated the scene grounding score (SGS, which assesses if the agent has reached the scene described by the sub-instruction) and the object grounding score (OGS, which determines if the agent has found the target object described in the sub-instruction) into the reward function. The above table presents the experiment results, showing a decline in navigation performance with the addition of SGS and OGS. This indicates that the design of the reward function directly influences the learning objectives of the agent. Our task design already decomposes the navigation task into multiple simple sub-instruction, focusing the agent on completing each sub-instruction sequentially. The additional reward signals introduce unnecessary distractions, hindering the agent's learning of efficient navigation policies.

Citation

If you find this work helpful, please consider citing:

@article{wang2024boosting,
  title={Boosting Efficient Reinforcement Learning for Vision-and-Language Navigation With Open-Sourced LLM}, 
  author={Wang, Jiawei and Wang, Teng and Cai, Wenzhe and Xu, Lele and Sun, Changyin},
  journal={IEEE Robotics and Automation Letters}, 
  year={2024},
  doi={10.1109/LRA.2024.3511402}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
cmake		cmake
connectivity		connectivity
include		include
pybind11		pybind11
r2r_src		r2r_src
run		run
src		src
tasks/R2R/data		tasks/R2R/data
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
LICENSE_Matterport3DSimulator		LICENSE_Matterport3DSimulator
README.md		README.md
python_requirements.txt		python_requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Boosting Efficient Reinforcement Learning for Vision-and-Language Navigation With Open-Sourced LLM

Environment Installation

1. Use anaconda to create a Python 3.8 environment:

2. Install CLIP:

3. Install dependencies:

4. Install Matterport3D simulators (v0.1):

5. Download image and object features for environments, unzip and place under img_features/:

6. Download the CLIP model (optional)

7. Download ChatGLM-6B model (optional)

Run

Training Agent

Test Agent

Additional experiments on reward functions

Citation

About

Releases

Packages

Languages

License

wangjw55/DILLM

Folders and files

Latest commit

History

Repository files navigation

Boosting Efficient Reinforcement Learning for Vision-and-Language Navigation With Open-Sourced LLM

Environment Installation

1. Use anaconda to create a Python 3.8 environment:

2. Install CLIP:

3. Install dependencies:

4. Install Matterport3D simulators (v0.1):

5. Download image and object features for environments, unzip and place under img_features/:

6. Download the CLIP model (optional)

7. Download ChatGLM-6B model (optional)

Run

Training Agent

Test Agent

Additional experiments on reward functions

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages