DRA-Ctrl

This is the official implementation of DRA-Ctrl.

Dimension-Reduction Attack! Video Generative Models are Experts on Controllable Image Synthesis

by Hengyuan Cao, Yutong Feng, Biao Gong, Yijing Tian, Yunhong Lu, Chuang Liu, and Bin Wang

Updates

[2025-07-10] When the model is not in use, move it to the CPU to reduce GPU memory usage, and apply quantization to further decrease memory requirements. Now our work should be able to run on consumer-grade GPUs. For specific usage, please refer to the Get Started section below. Please note: You need to check the requirements.txt file to update environment dependencies, as this ensures the new features will function properly.

[2025-07-01] Added a new Gradio app (gradio_app_hf.py) designed similarly to our HuggingFace Space, making it easier to switch tasks, adjust parameters, and directly test examples. The previous Gradio app (gradio_app.py) will remain unchanged.

✅ TODOs

release code
release checkpoints
use quantized version to save VRAM
using FramePack as base model

🔍 Introduction

Abstract
Video generative models can be regarded as world simulators due to their ability to capture dynamic, continuous changes inherent in real-world environments. These models integrate high-dimensional information across visual, temporal, spatial, and causal dimensions, enabling predictions of subjects in various status. A natural and valuable research direction is to explore whether a fully trained video generative model in high-dimensional space can effectively support lower-dimensional tasks such as controllable image generation. In this work, we propose a paradigm for video-to-image knowledge compression and task adaptation, termed Dimension-Reduction Attack (DRA-Ctrl), which utilizes the strengths of video models, including long-range context modeling and flatten full-attention, to perform various generation tasks. Specially, to address the challenging gap between continuous video frames and discrete image generation, we introduce a mixup-based transition strategy that ensures smooth adaptation. Moreover, we redesign the attention structure with a tailored masking mechanism to better align text prompts with image-level control. Experiments across diverse image generation tasks, such as subject-driven and spatially conditioned generation, show that repurposed video models outperform those trained directly on images. These results highlight the untapped potential of large-scale video generators for broader visual applications. DRA-Ctrl provides new insights into reusing resource-intensive video models and lays foundation for future unified generative models across visual modalities.

🚀 Quick Start

Hardware Requirements

Our method is implemented on Linux with H800 80GB GPU. The peak VRAM consumption stays below 45GB.

Dependencies

conda create --name dra_ctrl python=3.12
pip install -r requirements.txt

Checkpoints

We use the community fork for Diffusers-format weights on tencent/HunyuanVideo-I2V as the initialization parameters for the model.

You can download the LoRA weights for various tasks of DRA-Ctrl at this link.

The checkpoint directory is shown below.

DRA-Ctrl/
└── ckpts/
    ├── HunyuanVideo-I2V/
    |   ├── image_processor/
    |   ├── scheduler/
    |       ...
    ├── depth-anything-small-hf
    |   ├── model.safetensors
    |       ...
    ├── canny.safetensors
    ├── coloring.safetensors
    ├── deblurring.safetensors
    ├── depth.safetensors
    ├── depth_pred.safetensors
    ├── fill.safetensors
    ├── sr.safetensors
    ├── subject_driven.safetensors
    └── style_transfer.safetensors

Get Started

To reduce GPU memory requirements, we provide a parameter vram_optimization to specify different levels of memory optimization schemes. The specific parameters and their meanings are as follows:

No_Optimization: No optimization is applied, and 48GB of VRAM is sufficient to run the code.

HighRAM_HighVRAM: No more than 20GB of VRAM is required.

HighRAM_LowVRAM: No more than 8GB of VRAM is required.

LowRAM_HighVRAM: No more than 20GB of VRAM is required.

LowRAM_LowVRAM: No more than 8GB of VRAM is required.

VerylowRAM_LowVRAM: No more than 8GB of VRAM is required.

Note: Reduced resources will lead to increased generation time.

python gradio_app_hf.py --vram_optimization SET_YOUR_OPTIMIZATION_SCHEME_HERE

Here is the command to run the legacy Gradio app, which we do not recommend using. For easier switching between tasks, adjusting parameters, testing examples, and better VRAM optimization, please use the command above.

python gradio_app.py --config configs/gradio.yaml

In spatially-aligned image generation tasks, when passing the condition image to gradio_app, there's no need to manually input edge maps, depth maps, or other condition images - only the original image is required. The corresponding condition images will be automatically extracted.

You can use the *_test.jpg or *_test.png images from the assets folder as condition images for input to gradio_app, which will generate the following examples:

Examples:

Task	Target Prompt	Condition Image Prompt
Canny to Image	Mosquito frozen in clear ice cube on sand, glowing sunset casting golden light with misty halo around ice	-
Colorization	A vibrant young woman with rainbow glasses, yellow eyes, and colorful feather accessory against a bright yellow background	-
Deblurring	Vibrant rainbow ball creates dramatic splash in clear water, bubbles swirling against crisp white background	-
Depth to Image	Golden-brown cat-shaped bread loaf with closed eyes rests on wooden table, soft kitchen blur in background	-
Depth Prediction	Steaming bowl of ramen with pork slices, soft-boiled egg, greens, and scallions in rich broth on wooden table	-
In/Out-painting	Her left hand emerges at the frame's lower right, delicately cradling a vibrant red flower against the black void	-
In/Out-painting	Mona Lisa dons a medical mask, her enigmatic smile now concealed beneath crisp white fabric	-
Super-resolution	Crispy buffalo wings and golden fries rest on a red-and-white checkered paper lining a gleaming metal tray, with creamy dip	-
Subject-driven image generation	The woman stands in a snowy forest, captured in a half-portrait outfit	Woman in cream knit sweater sits calmly by a crackling fireplace, surrounded by warm candlelight and rustic wooden shelves
Subject-driven image generation	a cat in a chef outfit outfit	a cat
Style Transfer	bitmoji style. An orange cat sits quietly on the stone slab. Beside it are the green grasses. With its ears perked up, it looks to one side.	An orange cat sits quietly on the stone slab. Beside it are the green grasses. With its ears perked up, it looks to one side.

📋 Citation

If you find our work helpful, please cite:

@misc{cao2025dimensionreductionattackvideogenerative,
      title={Dimension-Reduction Attack! Video Generative Models are Experts on Controllable Image Synthesis}, 
      author={Hengyuan Cao and Yutong Feng and Biao Gong and Yijing Tian and Yunhong Lu and Chuang Liu and Bin Wang},
      year={2025},
      eprint={2505.23325},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.23325}, 
      }

Attribution

This project uses code from the following sources:

diffusers/models/transformers/transformer_hunyuan_video - Copyright 2024 The HunyuanVideo Team and The HuggingFace Team (Apache 2.0 licensed).
diffusers/pipelines/hunyuan_video/pipeline_hunyuan_video_image2video - Copyright 2024 The HunyuanVideo Team and The HuggingFace Team (Apache 2.0 licensed).

Acknowledgements

We would like to thank the contributors to the HunyuanVideo, HunyuanVideo-I2V, diffusers and HuggingFace repositories, for their open research and exploration.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DRA-Ctrl

Dimension-Reduction Attack! Video Generative Models are Experts on Controllable Image Synthesis

Updates

✅ TODOs

🔍 Introduction

Abstract

🚀 Quick Start

Hardware Requirements

Dependencies

Checkpoints

Get Started

📋 Citation

Attribution

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
configs		configs
models		models
pipelines		pipelines
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
gradio_app.py		gradio_app.py
gradio_app_hf.py		gradio_app_hf.py
requirements.txt		requirements.txt

License

Kunbyte-AI/DRA-Ctrl

Folders and files

Latest commit

History

Repository files navigation

DRA-Ctrl

Dimension-Reduction Attack! Video Generative Models are Experts on Controllable Image Synthesis

Updates

✅ TODOs

🔍 Introduction

Abstract

🚀 Quick Start

Hardware Requirements

Dependencies

Checkpoints

Get Started

📋 Citation

Attribution

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages