Spatial-temporal Consistency Constraint for Depth and Ego-motion Estimation of Laparoscopic Images

Our network architecture

Abstract

Estimating depth and ego-motion are crucial tasks for laparoscopic navigation and robotic-assisted surgery. Most current self-supervised methods involve warping one frame onto an adjacent frame using the estimated depth and camera pose. The photometric loss between the estimated and original frames then serves as the training signal. However, these methods encounter major challenges due to non-Lambertian reflection regions and the textureless surfaces of organs, leading to significant performance degradation and scale ambiguity in monocular depth estimation. In this paper, we introduce a network that predicts depth and ego-motion using spatial-temporal consistency constraints. Spatial consistency is derived from the left and right views of the stereo laparoscopic image pairs, while temporal consistency comes from consecutive frames. To enhance the understanding of semantic information in surgical scenes, we employ the Swin Transformer as the encoder and decoder for depth estimation, due to its superior semantic segmentation capabilities. To address issues of illumination variance and scale ambiguity, we incorporate a SIFT loss term to eliminate oversaturated regions in laparoscopic images. Our method is evaluated on the SCARED dataset and shows remarkable results.

Results

Method	Year	Abs Rel	Sq Rel	RMSE	RMSE log	δ
Fang et al.	2020	0.078	0.794	6.794	0.109	0.946
Endo-SfM	2021	0.062	0.606	5.726	0.093	0.957
AF-SfMLeaner	2022	0.063	0.538	5.597	0.089	0.974
Yang et al.	2024	0.062	0.558	5.585	0.090	0.962
Ours		0.057	0.436	4.972	0.081	0.972

Initialization

Install required dependencies with pip:

pip install -r requirements.txt

Download pretrained model from: depth_anything_vitb14. Create a folder named pretrained_model in this repo and place the downloaded model in it.

Dataset

SCARED

Please follow AF-SfMLearner to prepare the SCARED dataset.

Utilization

Training

CUDA_VISIBLE_DEVICES=0 python train_end_to_end.py --data_path <your_data_path> --log_dir './logs'

Evaluation

Export ground truth depth and pose before evaluation:

python export_gt_depth.py --data_path PATH_TO_YOUR_DATA --split endovis

python export_gt_pose.py --data_path PATH_TO_YOUR_DATA --split endovis --sequence YOUR_SEQUENCE

If you want to evaluate your model:

python evaluate_depth.py --data_path PATH_TO_YOUR_DATA --load_weights_folder PATH_TO_YOUR_MODEL --eval_mono

If you want to evaluate your model:

python evaluate_pose.py --data_path PATH_TO_YOUR_DATA --load_weights_folder PATH_TO_YOUR_MODEL --eval_mono

Depth map generation

If you want to generate your depthmap:

python generate_pred.py

Point cloud map generation

If you want to generate point cloud map:

python generate_pred_nocolor.py

cd depth2pointcloud

python generate_depthmap.py

python generate_pc_rgb.py

Acknowledgment

Our code is based on the implementation of AF-SfMLearner, Depth-Anything. We thank their excellent works.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
datasets		datasets
depth2pointcloud		depth2pointcloud
img		img
models		models
splits/endovis		splits/endovis
utils		utils
weights_best		weights_best
README.md		README.md
evaluate_depth.py		evaluate_depth.py
evaluate_pose.py		evaluate_pose.py
export_gt_depth.py		export_gt_depth.py
export_gt_pose.py		export_gt_pose.py
generate_3d_video.py		generate_3d_video.py
generate_pred.py		generate_pred.py
generate_pred_nocolor.py		generate_pred_nocolor.py
layers.py		layers.py
options.py		options.py
requirements.txt		requirements.txt
test_simple.py		test_simple.py
test_simple_nocolor.py		test_simple_nocolor.py
train_end_to_end.py		train_end_to_end.py
trainer_end_to_end.py		trainer_end_to_end.py
visualize_pose.py		visualize_pose.py
visualize_reconstruction.py		visualize_reconstruction.py
vo.png		vo.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spatial-temporal Consistency Constraint for Depth and Ego-motion Estimation of Laparoscopic Images

Our network architecture

Abstract

Results

Initialization

Dataset

SCARED

Utilization

Training

Evaluation

Depth map generation

Point cloud map generation

Acknowledgment

About

Releases

Packages

Languages

nanasylum/spatialtemporal

Folders and files

Latest commit

History

Repository files navigation

Spatial-temporal Consistency Constraint for Depth and Ego-motion Estimation of Laparoscopic Images

Our network architecture

Abstract

Results

Initialization

Dataset

SCARED

Utilization

Training

Evaluation

Depth map generation

Point cloud map generation

Acknowledgment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages