official repo of paper for "CamI2V: Camera-Controlled Image-to-Video Diffusion Model"
github io: https://zgctroy.github.io/CamI2V/
Abstract: Recent advancements have integrated camera pose as a user-friendly and physics-informed condition in video diffusion models, enabling precise camera control. In this paper, we identify one of the key challenges as effectively modeling noisy cross-frame interactions to enhance geometry consistency and camera controllability. We innovatively associate the quality of a condition with its ability to reduce uncertainty and interpret noisy cross-frame features as a form of noisy condition. Recognizing that noisy conditions provide deterministic information while also introducing randomness and potential misguidance due to added noise, we propose applying epipolar attention to only aggregate features along corresponding epipolar lines, thereby accessing an optimal amount of noisy conditions. Additionally, we address scenarios where epipolar lines disappear, commonly caused by rapid camera movements, dynamic objects, or occlusions, ensuring robust performance in diverse environments. Furthermore, we develop a more robust and reproducible evaluation pipeline to address the inaccuracies and instabilities of existing camera control metrics. Our method achieves a 25.64% improvement in camera controllability on the RealEstate10K dataset without compromising dynamics or generation quality and demonstrates strong generalization to out-of-domain images. Training and inference require only 24GB and 12GB of memory, respectively, for 16-frame sequences at 256×256 resolution. We will release all checkpoints, along with training and evaluation code. Dynamic videos are available for viewing on our supplementary anonymous web page.
- 2024-11-16 !!!!! Code is not complete and clean. Evaluation codes, environment installer, bash scripts, and gradio codes are on the way. In addition, we implement camera control methods using code inject on lvdm, which is not easy for python beginner. We will reconstruct codes in about three weeks. !!!!
- 2024-11-16: Release most of codes including implementation for motionctrl, cameractrl, cami2v and training, inference, test code
- 2024-10-14: Release of checkpoints, training, and evaluation codes in a month
Method$(c_\text{txt,img}=7.5,c_\text{cam}=1.0)$ | Parameters | Generation Time$\downarrow$ | RotErr$\downarrow$ | TransErr$\downarrow$ | CamMC$\downarrow$ | FVD (VideoGPT)$\downarrow$ | FVD (StyleGAN)$\downarrow$ |
---|---|---|---|---|---|---|---|
DynamiCrafter | 1.4 B | 8.14 s | 3.3772 | 9.7700 | 11.544 | 117.785 | 103.510 |
DynamiCrafter + MotionCtrl | + 63.4 M | 8.27 s | 0.9771 | 2.4435 | 3.0235 | 68.545 | 61.027 |
DynamiCrafter + CameraCtrl | + 211 M | 8.38 s | 0.6984 | 1.8658 | 2.2445 | 68.422 | 60.235 |
DynamiCrafter + CamI2V | + 261 M | 10.3 s | 0.4257 | 1.4226 | 1.6277 | 63.940 | 54.897 |
DynamiCrafter + CamI2V (only plucker, no epipolar ) | 0.7624 | 2.0397 | 2.4542 | 66.237 | 58.179 | ||
DynamiCrafter + CamI2V (no plucker, only epipolar ) | 1.5905 | 5.2980 | 6.2457 | 87.248 | 77.236 |
zoom in + zoom out
Also see 512 resolution part of https://zgctroy.github.io/CamI2V/
See 256 resolution part of https://zgctroy.github.io/CamI2V/
CameraCtrl: https://github.com/hehao13/CameraCtrl
MotionCtrl: https://github.com/TencentARC/MotionCtrl/tree/animatediff
@inproceedings{anonymous2025camiv,
title={CamI2V: Camera-Controlled Image-to-Video Diffusion Model},
author={Anonymous},
booktitle={Submitted to The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=dIZB7jeSUv},
note={under review}
}