Multi-view Reconstruction via SfM-guided Monocular Depth Estimation
Haoyu Guo*, He Zhu*, Sida Peng, Haotong Lin, Yunzhi Yan, Tao Xie, Wenguan Wang, Xiaowei Zhou, Hujun Bao
CVPR 2025 Oral
git clone https://github.com/zju3dv/Murre.git
conda create -n murre python=3.10
conda activate murre
conda install cudatoolkit=11.8 pytorch==2.0.1 torchvision=0.15.2 torchtriton=2.0.0 -c pytorch -c nvidia # use the correct version of cuda for your system
pip install -r requirements.txt
The pretrained model weights can be downloaded from here.
cd sfm_depth
python get_sfm_depth.py --input_sfm_dir ${your_input_path} --output_sfm_dir ${your_output_path} --processing_res ${your_desired_resolution}
Make sure that the input is organized in the format of COLMAP results. You can specify the processing resolution to trade off between inference speed and reconstruction precision.
The parsed sparse depth maps, camera intrinsics, camera poses will be stored in ${your_output_path}/sparse_depth
, ${your_output_path}/intrinsic
, and ${your_output_path}/pose
respectively.
Run the Murre model to perform SfM-guided monocular depth estimation:
python run.py --checkpoint ${your_ckpt_path} --input_rgb_dir ${your_rgb_path} --input_sdpt_dir ${your_sparse_depth_path} --output_dir ${your_output_path} --denoise_steps 10 --ensemble_size 5 --processing_res ${your_desired_resolution} --max_depth 10.0
For indoor scenes, we recommend setting --max_depth=10.0
. For outdoor scenes, consider increasing this value (for example, 80.0).
To filter unreliable SfM depth estimates, adjust:
--err_thr=${your_error_thresh}
(reprojection error threshold)
--nviews_thr=${your_nviews_thresh}
(minimum co-visible views)
This ensures robustness by excluding noisy depth values with high errors or insufficient observations.
Make sure that the same processing resolution is used as the first step.
Run the following to perform TSDF fusion on depth maps produced by Murre:
python tsdf_fusion.py --image_dir ${your_rgb_path} --depth_dir ${your_depth_path} --intrinsic_dir ${your_intrinsic_path} --pose_dir ${your_pose_path}
Please pass in the depth maps produced by Murre and camera parameters parsed in the first step.
Adjust --res
to balance reconstruction resolution with performance. Set --depth_max
to clip depth maps based on your scene type (e.g., lower values for indoor scenes, higher for outdoor).
If you find this code useful for your research, please use the following BibTeX entry.
@inproceedings{guo2025murre,
title={Multi-view Reconstruction via SfM-guided Monocular Depth Estimation},
author={Guo, Haoyu and Zhu, He and Peng, Sida and Lin, Haotong and Yan, Yunzhi and Xie, Tao and Wang, Wenguan and Zhou, Xiaowei and Bao, Hujun},
booktitle={CVPR},
year={2025},
}
We sincerely thank the following excellent projects, from which our work has greatly benefited.