cd framebridge-cogvideox
conda create -n framebridge-cogvideox python=3.10
conda activate framebridge-cogvideox
pip install -r requirements.txtFirst download the CogVideoX-2B model. For our I2V fine-tuning, we slightly modify the transformer of CogVideoX (doubling the number of input channels for the input layer to receive image input). You can directly download our modified version CogVideoX-2B-modified.
Download the metadata file results_2M_train.csv of WebVid-2M and put all videos in the folder 2M_train with the following structure:
2M_train/
├── 00001.mp4
├── 00002.mp4
| ......
└── *****.mp4
Modify the following path arguments in finetune_single_rank_i2v_bridge_2b.sh:
MODEL_PATH: path to CogVideoX-2B-modifiedDATASET_PATH: path toresults_2M_train.csv--video_folder: path to2M_train
Run the script:
bash finetune_single_rank_i2v_bridge_2b.shThe fine-tuned FrameBridge-CogVideoX model can be downloaded from the Google Drive link. (Unfortunately, due to limited computational resources and dataset quality, the performance of FrameBridge model is not as satisfactory as the official I2V version of CogVideoX.)
Before running inference, update the following arguments in sample.sh:
--model_path: path to the original CogVideoX-2B model or CogVideoX-2B-modified (either option is ok as the transformer will be reloaded with the fine-tuned bridge model)--image_or_video_path: path to the image prompt--transformer_path: path to the FrameBridge-CogVideoX model (or the transformer subfolder from fine-tuned models)
Run the script:
bash sample.shcd framebridge-videocrafter
conda create -n framebridge-videocrafter python=3.10
conda activate framebridge-videocrafter
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txtDownload VideoCrafter-1 model (256 × 256 resolution).
Prepare WebVid-2M dataset in the same way as described in the previous section.
Modify the following path arguments in configs/training_256_bridge/config.yaml:
pretrained_checkpointandt2v_diffusion_checkpoint: path to downloaded VideoCrafter-1 checkpointdata.params.train.params.data_dir: path to2M_traindata.params.train.params.meta_path: path toresults_2M_train.csv
Run the script:
bash configs/training_256_bridge/run.shThe fine-tuned FrameBridge-VideoCrafter model can be downloaded from the Google Drive link. (Unfortunately, due to limitations in computational resources and dataset quality, the performance of our model may not be satisfactory compared with some state-of-the-art models.)
Before running inference, update the following arguments in scripts/run_bridge.sh:
ckpt: path to the fine-tuned FrameBridge modelprompt_dir: path to the image and text prompts (structured in the same way as in the sampling process of DynamiCrafter)
Run the script:
bash scripts/run_bridge.shcd framebridge-latte
conda create -n framebridge-latte python=3.9
cconda activate framebridge-latte
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txtDownload UCF-101 dataset from https://www.crcv.ucf.edu/data/UCF101/UCF101.rar.
Download VAE model stabilityai/sd-vae-ft-ema and put the files into the folder checkpoints/vae.
We have three different config files for training different models:
configs/ucf101/ucf101_train_bridge.yaml: training vanilla FrameBridgeconfigs/ucf101/ucf101_train_nwarp.yaml: training the neural prior modelconfigs/ucf101/ucf101_train_bridge_nwarp.yaml: training FrameBridge with neural prior
Choose the corresponding config file based on the model you want to train, and set the path arguments inside:
data_path: path to the extracted UCF101 folderpretrained_model_path: path to thecheckpointsfolder which include the downloaded VAE.
If you want to train FrameBridge with neural prior, you also need to set:
nwarp_config: path toconfigs/ucf101/ucf101_sample_nwarp.yamlnwarp_ckpt: path to checkpoint of trained neural prior model
# Train vanilla FrameBridge
bash train_scripts/ucf101_bridge_train.sh
# Train neural prior model
bash train_scripts/ucf101_nwarp_train.sh
# Train FrameBridge with neural prior
bash train_scripts/ucf101_bridge_nwarp_train.shYou can download FrameBridge checkpoints from the Google Drive link:
There are also different config files for sampling with different models:
configs/ucf101/ucf101_sample_bridge.yaml: sampling with vanilla FrameBridgeconfigs/ucf101/ucf101_sample_bridge_nwarp.yaml: sampling with FrameBridge with neural prior
Similarly,
- set
pretrained_model_pathto the path of thecheckpointsfolder containing the VAE - set
data_pathto the UCF-101 folder (the UCF-101 dataset is needed to obtain image prompts during sampling).
To use FrameBridge with neural prior, you also need to set:
nwarp_config: path toconfigs/ucf101/ucf101_sample_nwarp.yamlnwarp_ckpt: path to checkpoint of trained neural prior model
in the config file configs/ucf101/ucf101_sample_bridge_nwarp.yaml.
Set --ckpt arguments in the corresponding script with downloaded or trained checkpoint, and run the script:
# vanilla FrameBridge
bash sample/ucf101_bridge.sh
# FrameBridge with neural prior
bash sample/ucf101_bridge_neural_prior.shThis repository is built upon the excellent work of Latte, CogVideoX, cogvideox-factory, DynamiCrafter and VideoCrafter. We sincerely thank the authors and contributors of these projects for their open-source efforts and valuable resources.