-
(Sim-to-Real) Complex Instruction VLN
-
Real-world demos by following complex instructions, which consist of several simple instructions.
+
Sim-to-Real Demos: Complex Instruction VLN
+
In these demos, the agent navigates according to complex instructions composed of multiple simple instructions in sequence. NaVid can accurately execute them in the correct order.
+
@@ -344,13 +347,13 @@
Method Overview
Data Collection
-
+
We co-train NaVid using real-world caption data (763k) and simulated VLN data (510k). The simulated VLN data consists of 500k action planning samples and 10k instruction reasoning samples.
- We initialize the encoders and Vicuna-7B using pre-trained weights, and our model requires only one epoch for the training process.
+
@@ -388,7 +391,8 @@
Caption Results Visualization
-
Caption Results Visualization
+
Results of Navigation Video Captioning
+
Given an egocentric RGB video, describe the trajectory using NaVid.
@@ -504,7 +508,8 @@
R2R Data Visualization
-
R2R train split (Training) -> R2R val-unseen split (Evaluation)
+
Cross-scene Generalization Results on R2R
+
(R2R training split -> R2R validation unseen split)
@@ -565,7 +570,8 @@
R2R train split (Training) -> R2R val-unseen split (Eval
-
R2R train split (Training) -> RxR val-unseen split (Evaluation)
+
Cross-scene Generalization Results from R2R to RxR
+
(R2R training split -> RxR validation unseen split )
diff --git a/static/images/data.png b/static/images/data.png
deleted file mode 100644
index d26b8cc..0000000
Binary files a/static/images/data.png and /dev/null differ
diff --git a/static/images/data_collection.png b/static/images/data_collection.png
new file mode 100644
index 0000000..c5e6adc
Binary files /dev/null and b/static/images/data_collection.png differ