Code for the ICCV 2023 paper "Deep geometry-aware camera self-calibration from video" by Annika Hagemann, Moritz Knorr and Christoph Stiller.
This codebase allows estimating camera intrinsics from monocular video without calibration targets. It is derived from DROID-SLAM by Teed et al. https://github.com/princeton-vl/DROID-SLAM and extends the deep visual SLAM system with a self-calibrating bundle adjustment layer.
So far, the repository contains a demo script for inference and code for reproducing the results from our paper. An extended version, containing the training code for self-calibration, will be made available soon.
This software is a research prototype, solely developed for and published as part of the publication "Deep geometry-aware camera self-calibration from video", Hagemann et al. ICCV 2023. It will not be maintained or monitored.
To run the demo and for testing on your own sequence, you need a 12 GB GPU. To reproduce the results from the paper, a 16 GB GPU is required.
-
Clone the additional thirdparty requirements:
git submodule update --init --recursive
-
Create an anaconda environment with all requirements:
conda env create -f environment_novis.yaml conda activate droidenv pip install evo --upgrade --no-binary evo
If you get stuck at "Solving environment", try to use our detailed exported environment under misc/environment_detailed_vis.yaml (with visualization) or misc/environment_detailed.yaml (without visualization), instead of environment_novis.yaml.
-
Compile the extensions. This takes several minutes.
python setup.py install
-
Download the exemplary sequence abandonedfactory, unzip it and put it into the folder
datasets/demo
. -
Run the demo script:
python demo.py --imagedir=datasets/demo/abandonedfactory --opt_intr --num_images=300
-
The estimated intrinsics will appear in the terminal. To output a video with estimated intrinsics, use the "--video_calib" flag when running the demo. Note that this slows down the inference. To run the 3D visualization, use the "--visualize" flag. Pressing the key "r" in the open3D viewer allows you to interact with the visualization during inference.
You can use the "--num_images" flag to adjust the number of images. For suitable sequences (diverse motion, structured environment), it is oftentimes sufficient to use around 300 images. Furthermore, you can adjust the image size to reduce computation time, and the stride to only use every n-th image.
- Download the different datasets
-
TartanAir monocular test sequences from the CVPR 2020 SLAM challenge with groundtruth poses; put them into
datasets/TartanAir
-
EuRoC sequences (ASL) and groundtruth poses; put them into
datasets/EuRoC
-
TUM-RGBD fr1 sequences; put them into
datasets/TUM-RGBD
The expected folder structure of each dataset can be seen in the file evaluation_scripts/DroidCalib/config.yaml.
- Run evaluation
This will create files figures/*.csv containing evaluation results with different settings.
python evaluation_scripts/DroidCalib/eval_script.py
You only need a monocular video, stored as an ordered set of images. If the images are distortion-free, use the pinhole model:
python demo.py --imagedir=YOUR_IMAGE_PATH --opt_intr
To approximate radial distortion, we have implemented the unified camera model ("mei"):
python demo.py --imagedir=YOUR_IMAGE_PATH --opt_intr --camera_model=mei
To estimate only the focal length, use the "focal" camera model:
python demo.py --imagedir=YOUR_IMAGE_PATH --opt_intr --camera_model=focal
This can be useful in case the camera motion is not sufficiently diverse to render all intrinsic parameters observable (e.g. planar motion). Right now, this is only supported for distortion-free images.
- The video should contain diverse camera motion (translation and rotations around the different axes) for the intrinsics to be observable.
- Avoid motion blur and other artifacts in the images.
- There should be some structure in the scene (e.g. not just the sky or an unstructured wall).
This repository is derived from DROID-SLAM by Teed et al. https://github.com/princeton-vl/DROID-SLAM -- we thank the authors for making their source code available. All files without header are left unchanged and originate from the authors of DROID-SLAM. Files with header were adapted for the self-calibration functionality.
DroidCalib is open-sourced under the AGPL-3.0 license. See the LICENSE file for details.
For a list of other open source components included in DroidCalib, see the file 3rd-party-licenses.txt.