-
Notifications
You must be signed in to change notification settings - Fork 120
Performance Evaluation
We benchmarked the performance of our system in terms of ATE (Absolute Trajectory Error), RPE (Relative Pose Error), and computational cost against other top-performing open-source implementations, i.e., OKVIS [Leutenegger et al.], VINS-Mono [Qin et al.] and ROVIO [Bloesch et al.], on publicly available datasets. Our implementation achieves comparable accuracy at a fraction of the computational cost. On a desktop PC equipped with an Intel Core i7 CPU @ 3.6 GHz, our system runs at around 140 Hz at low CPU consumption rate. As a comparison, OKVIS and VINS-Mono runs at around 20 Hz, and ROVIO runs at around 60 Hz. The runtime of our system can be further improved by better utilizing CPU cache and memory.
OKVIS and VINS-Mono are optimization based, which means they operate on keyframes in an iterative manner, which in general results in more accurate pose estimates at the price of higher latency and computational cost. ROVIO and XIVO are filtering based, which are causal and much cheaper in terms of computatioanl cost. Yet, they produce pose estimates comparable to optimization based methods.
Besides, OKVIS runs on stereo images, whereas the other three methods only use monocular images.
We benchmarked the runtime of OKVIS, VINS-Mono, ROVIO and XIVO on a desktop machine equipped with an Intel Core i7 CPU @ 3.6 GHz. The table below shows the runtime of the feature processing and state update modules.
Module | OKVIS (Stereo+Keyframe) | VINS-Mono (Keyframe) | ROVIO | XIVO |
---|---|---|---|---|
Feature detection & matching | 15ms | 20ms | 1ms* | 3 ms |
State update | 42ms | 50m | 13ms | 4 ms |
* ROVIO is a 'direct' method that skips the feature matching step and directly uses the photometric error as the innovation term in EKF update step. Since it uses Iterative Extended Kalman Filter (IEKF) for state update, it's slower than our EKF-based method.
OKVIS and VINS-Mono (marked with Keyframe) perform iterative nonlinear least square on keyframes for state estimation, and thus are much slower in the state update step.
We compared the performance of our system in terms of ATE and RPE on two publicly available datasets: TUM-VI and EuRoC. We achieve comparable pose estimation accuracy at a fraction of the computational cost of the top-performing open-source implementations.
The following table shows the performance on 6 indoor sequences where ground-truth poses are available. The numbers for OKVIS, VINS-Mono, and ROVIO are taken from the TUM-VI benchmark paper. The evaluation script of XIVO can be found in misc/run_all.sh
.
Sequence | length | OKVIS (Stereo+Keyframe) | VINS-Mono (Keyframe) | ROVIO | XIVO |
---|---|---|---|---|---|
room1 | 156m | 0.06m | 0.07m | 0.16m | 0.06m |
room2 | 142m | 0.11m | 0.07m | 0.33m | 0.11m |
room3 | 135m | 0.07m | 0.11m | 0.15m | 0.16m |
room4 | 68m | 0.03m | 0.04m | 0.09m | 0.07m |
room5 | 131m | 0.07m | 0.20m | 0.12m | 0.11m |
room6 | 67m | 0.04m | 0.08m | 0.05m | 0.05m |
Table 1. RMSE ATE in meters. Methods marked with Keyframe are keyframe-based, others are recursive approaches.
Sequence | OKVIS (Stereo+Keyframe) | VINS-Mono (Keyframe) | ROVIO | XIVO |
---|---|---|---|---|
room1 | 0.013m/0.43o | 0.015m/0.44o | 0.029m/0.53o | 0.020m/0.53o |
room2 | 0.015m/0.62o | 0.017m/0.63o | 0.030m/0.67o | 0.048m/0.72o |
room3 | 0.012m/0.64o | 0.023m/0.63o | 0.027m/0.66o | 0.069m/0.74o |
room4 | 0.012m/0.57o | 0.015m/0.41o | 0.022m/0.61o | 0.022m/0.64o |
room5 | 0.012m/0.47o | 0.026m/0.47o | 0.031m/0.60o | 0.025m/0.57o |
room6 | 0.012m/0.49o | 0.014m/0.44o | 0.019m/0.50o | 0.022m/0.53o |
Table 2. RMSE RPE in translation (meters) and rotation (degrees). Methods marked with Keyframe are keyframe-based, others are recursive approaches.
Benchmark results on the EuRoC dataset will be available soon.