Skip to content
/ CamI2V Public

official repo of paper for "CamI2V: Camera-Controlled Image-to-Video Diffusion Model"

License

Notifications You must be signed in to change notification settings

ZGCTroy/CamI2V

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CamI2V: Camera-Controlled Image-to-Video Diffusion Model

official repo of paper for "CamI2V: Camera-Controlled Image-to-Video Diffusion Model"

github io: https://zgctroy.github.io/CamI2V/

Abstract: Recent advancements have integrated camera pose as a user-friendly and physics-informed condition in video diffusion models, enabling precise camera control. In this paper, we identify one of the key challenges as effectively modeling noisy cross-frame interactions to enhance geometry consistency and camera controllability. We innovatively associate the quality of a condition with its ability to reduce uncertainty and interpret noisy cross-frame features as a form of noisy condition. Recognizing that noisy conditions provide deterministic information while also introducing randomness and potential misguidance due to added noise, we propose applying epipolar attention to only aggregate features along corresponding epipolar lines, thereby accessing an optimal amount of noisy conditions. Additionally, we address scenarios where epipolar lines disappear, commonly caused by rapid camera movements, dynamic objects, or occlusions, ensuring robust performance in diverse environments. Furthermore, we develop a more robust and reproducible evaluation pipeline to address the inaccuracies and instabilities of existing camera control metrics. Our method achieves a 25.64% improvement in camera controllability on the RealEstate10K dataset without compromising dynamics or generation quality and demonstrates strong generalization to out-of-domain images. Training and inference require only 24GB and 12GB of memory, respectively, for 16-frame sequences at 256×256 resolution. We will release all checkpoints, along with training and evaluation code. Dynamic videos are available for viewing on our supplementary anonymous web page.

News and ToDo List

  • 2024-11-16 !!!!! Code is not complete and clean. Evaluation codes, environment installer, bash scripts, and gradio codes are on the way. In addition, we implement camera control methods using code inject on lvdm, which is not easy for python beginner. We will reconstruct codes in about three weeks. !!!!
  • 2024-11-16: Release most of codes including implementation for motionctrl, cameractrl, cami2v and training, inference, test code
  • 2024-10-14: Release of checkpoints, training, and evaluation codes in a month

256x256 resolution, 25steps, RTX 3090, 16 frames

Method$(c_\text{txt,img}=7.5,c_\text{cam}=1.0)$ Parameters Generation Time$\downarrow$ RotErr$\downarrow$ TransErr$\downarrow$ CamMC$\downarrow$ FVD (VideoGPT)$\downarrow$ FVD (StyleGAN)$\downarrow$
DynamiCrafter 1.4 B 8.14 s 3.3772 9.7700 11.544 117.785 103.510
DynamiCrafter + MotionCtrl + 63.4 M 8.27 s 0.9771 2.4435 3.0235 68.545 61.027
DynamiCrafter + CameraCtrl + 211 M 8.38 s 0.6984 1.8658 2.2445 68.422 60.235
DynamiCrafter + CamI2V + 261 M 10.3 s 0.4257 1.4226 1.6277 63.940 54.897
DynamiCrafter + CamI2V (only plucker, no epipolar ) 0.7624 2.0397 2.4542 66.237 58.179
DynamiCrafter + CamI2V (no plucker, only epipolar ) 1.5905 5.2980 6.2457 87.248 77.236

Performance

Visualization

1024x576

zoom in + zoom out

512x320

Also see 512 resolution part of https://zgctroy.github.io/CamI2V/

256x256

See 256 resolution part of https://zgctroy.github.io/CamI2V/

Related Repo

CameraCtrl: https://github.com/hehao13/CameraCtrl

MotionCtrl: https://github.com/TencentARC/MotionCtrl/tree/animatediff

Citation

@inproceedings{anonymous2025camiv,
    title={CamI2V: Camera-Controlled Image-to-Video Diffusion Model},
    author={Anonymous},
    booktitle={Submitted to The Thirteenth International Conference on Learning Representations},
    year={2025},
    url={https://openreview.net/forum?id=dIZB7jeSUv},
    note={under review}
}

About

official repo of paper for "CamI2V: Camera-Controlled Image-to-Video Diffusion Model"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published