CamI2V: Camera-Controlled Image-to-Video Diffusion Model

official repo of paper for "CamI2V: Camera-Controlled Image-to-Video Diffusion Model"

github io: https://zgctroy.github.io/CamI2V/

Abstract: Recent advancements have integrated camera pose as a user-friendly and physics-informed condition in video diffusion models, enabling precise camera control. In this paper, we identify one of the key challenges as effectively modeling noisy cross-frame interactions to enhance geometry consistency and camera controllability. We innovatively associate the quality of a condition with its ability to reduce uncertainty and interpret noisy cross-frame features as a form of noisy condition. Recognizing that noisy conditions provide deterministic information while also introducing randomness and potential misguidance due to added noise, we propose applying epipolar attention to only aggregate features along corresponding epipolar lines, thereby accessing an optimal amount of noisy conditions. Additionally, we address scenarios where epipolar lines disappear, commonly caused by rapid camera movements, dynamic objects, or occlusions, ensuring robust performance in diverse environments. Furthermore, we develop a more robust and reproducible evaluation pipeline to address the inaccuracies and instabilities of existing camera control metrics. Our method achieves a 25.64% improvement in camera controllability on the RealEstate10K dataset without compromising dynamics or generation quality and demonstrates strong generalization to out-of-domain images. Training and inference require only 24GB and 12GB of memory, respectively, for 16-frame sequences at 256×256 resolution. We will release all checkpoints, along with training and evaluation code. Dynamic videos are available for viewing on our supplementary anonymous web page.

News and ToDo List

2024-11-16 !!!!! Code is not complete and clean. Evaluation codes, environment installer, bash scripts, and gradio codes are on the way. In addition, we implement camera control methods using code inject on lvdm, which is not easy for python beginner. We will reconstruct codes in about three weeks. !!!!
2024-11-16: Release most of codes including implementation for motionctrl, cameractrl, cami2v and training, inference, test code
2024-10-14: Release of checkpoints, training, and evaluation codes in a month

256x256 resolution, 25steps, RTX 3090, 16 frames

Method$(c_\text{txt,img}=7.5,c_\text{cam}=1.0)$	Parameters	Generation Time$\downarrow$	RotErr$\downarrow$	TransErr$\downarrow$	CamMC$\downarrow$	FVD (VideoGPT)$\downarrow$	FVD (StyleGAN)$\downarrow$
DynamiCrafter	1.4 B	8.14 s	3.3772	9.7700	11.544	117.785	103.510
DynamiCrafter + MotionCtrl	+ 63.4 M	8.27 s	0.9771	2.4435	3.0235	68.545	61.027
DynamiCrafter + CameraCtrl	+ 211 M	8.38 s	0.6984	1.8658	2.2445	68.422	60.235
DynamiCrafter + CamI2V	+ 261 M	10.3 s	0.4257	1.4226	1.6277	63.940	54.897
DynamiCrafter + CamI2V (only plucker, no epipolar )			0.7624	2.0397	2.4542	66.237	58.179
DynamiCrafter + CamI2V (no plucker, only epipolar )			1.5905	5.2980	6.2457	87.248	77.236

Performance

Visualization

1024x576

zoom in + zoom out

512x320

Also see 512 resolution part of https://zgctroy.github.io/CamI2V/

256x256

See 256 resolution part of https://zgctroy.github.io/CamI2V/

Related Repo

CameraCtrl: https://github.com/hehao13/CameraCtrl

MotionCtrl: https://github.com/TencentARC/MotionCtrl/tree/animatediff

Citation

@inproceedings{anonymous2025camiv,
    title={CamI2V: Camera-Controlled Image-to-Video Diffusion Model},
    author={Anonymous},
    booktitle={Submitted to The Thirteenth International Conference on Learning Representations},
    year={2025},
    url={https://openreview.net/forum?id=dIZB7jeSUv},
    note={under review}
}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
CameraControl		CameraControl
imgs		imgs
lvdm		lvdm
main		main
scripts		scripts
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cami2v_gradio_app.py		cami2v_gradio_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CamI2V: Camera-Controlled Image-to-Video Diffusion Model

News and ToDo List

256x256 resolution, 25steps, RTX 3090, 16 frames

Performance

Visualization

1024x576

512x320

256x256

Related Repo

Citation

About

Releases

Packages

Languages

License

ZGCTroy/CamI2V

Folders and files

Latest commit

History

Repository files navigation

CamI2V: Camera-Controlled Image-to-Video Diffusion Model

News and ToDo List

256x256 resolution, 25steps, RTX 3090, 16 frames

Performance

Visualization

1024x576

512x320

256x256

Related Repo

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages