Phantom: Subject-Consistent Video Generation via Cross-Modal Alignment

Phantom: Subject-Consistent Video Generation via Cross-Modal Alignment
Lijie Liu^*, Tianxiang Ma^*, Bingchuan Li^{* †}, Zhuowei Chen^*, Jiawei Liu, Qian He, Xinglong Wu
^*Equal contribution,^†Project lead
Intelligent Creation Team, ByteDance

🔥 Latest News!

Phantom-wan is coming soon! We are adapting the Phantom framework into the Wan2.1 video generation model. The inference code and model will be open-sourced.

Overview

Phantom is a unified video generation framework for single and multi-subject references, built on existing text-to-video and image-to-video architectures. It achieves cross-modal alignment using text-image-video triplet data by redesigning the joint text-image injection model. Additionally, it emphasizes subject consistency in human generation while enhancing ID-preserving video generation.

🆚 Comparative Results

Identity Preserving Video Generation.
Single Reference Subject-to-Video Generation.
Multi-Reference Subject-to-Video Generation.

Acknowledgements

We would like to express our gratitude to the SEED team for their support. Special thanks to Lu Jiang, Haoyuan Guo, Zhibei Ma, and Sen Wang for their assistance with the model and data. In addition, we are also very grateful to Siying Chen, Qingyang Li, and Wei Han for their help with the evaluation.

BibTeX

@article{liu2025phantom,
  title={Phantom: Subject-Consistent Video Generation via Cross-Modal Alignment},
  author={Liu, Lijie and Ma, Tianxaing and Li, Bingchuan and Chen, Zhuowei and Liu, Jiawei and He, Qian and Wu, Xinglong},
  journal={arXiv preprint arXiv:2502.11079},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
assets		assets
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Phantom: Subject-Consistent Video Generation via Cross-Modal Alignment

🔥 Latest News!

Overview

🆚 Comparative Results

Acknowledgements

BibTeX

About

Releases

Packages

Languages

djdomore/Phantom

Folders and files

Latest commit

History

Repository files navigation

Phantom: Subject-Consistent Video Generation via Cross-Modal Alignment

🔥 Latest News!

Overview

🆚 Comparative Results

Acknowledgements

BibTeX

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages