Skip to content

Commit

Permalink
3.17 commit
Browse files Browse the repository at this point in the history
  • Loading branch information
YuyangYin committed Mar 17, 2024
1 parent e54900f commit 6fc3160
Show file tree
Hide file tree
Showing 2 changed files with 25 additions and 6 deletions.
18 changes: 12 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,12 @@ Authors: Yuyang Yin, Dejia Xu, Zhangyang Wang, Yao Zhao, Yunchao Wei
![overview](docs/static/media/task.png)

## News
- `2023/12/28` Release code and paper.
- `2024/2/1` Enhance the coherence of the video outputs in low fps.
- `2023/12/28` First release code and paper.
- `2024/2/14` Update text-to-4d and image-to-4d functions and cases.
- `2024/3/17` Add a completed example script.

## Task Type
As show in figure above, we define grounded 4D generation, which focuses on video-to-4D generation. Video is not required to be user-specified but can also be generated by video diffusion. With the help of [stable video diffusion](https://github.com/nateraw/stable-diffusion-videos), we implement the function of image-to-video-to-4d and text-to-image-to-video-to-4d . Due to the unsatisfactory performance of the text-to-video model, we use [stable diffusion-XL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) and [stable video diffusion](https://github.com/nateraw/stable-diffusion-videos) implement the function of text-to-image-to-video-to-4d.
As show in figure above, we define grounded 4D generation, which focuses on video-to-4D generation. Video is not required to be user-specified but can also be generated by video diffusion. With the help of [stable video diffusion](https://github.com/nateraw/stable-diffusion-videos), we implement the function of image-to-video-to-4d and text-to-image-to-video-to-4d . Due to the unsatisfactory performance of the text-to-video model, we use [stable diffusion-XL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) and [stable video diffusion](https://github.com/nateraw/stable-diffusion-videos) implement the function of text-to-image-to-video-to-4d. Therefore, our model support **text-to-4D** and **image-to-4D** tasks.



Expand All @@ -37,6 +37,10 @@ pip install ./simple-knn
# pip install kaolin -f https://nvidia-kaolin.s3.us-east-2.amazonaws.com/torch-1.12.1_cu116.html
```

## Example Case Script
We have organized a complete pipeline script in **main.bash** for your reference.


## Data Preparation

We release our collected data in [Google Drive](https://drive.google.com/drive/folders/1-lbtj-YiA7d0Nbe6Qcc_t0W_CKKEw_bm?usp=drive_link). Some of these data are user-specified, while others are generated.
Expand Down Expand Up @@ -70,7 +74,7 @@ python preprocess_sync.py --path xxx
## Training

```bash
python train.py --configs arguments/i2v.py -e rose
python train.py --configs arguments/i2v.py -e rose --name_override rose
```

## Rendering
Expand All @@ -84,14 +88,16 @@ python render.py --skip_train --configs arguments/i2v.py --skip_test --model_pat


## Evaluation
As for CLIP loss, we calculate clip distance loss between rendered images and reference images. The refernce images are n frames. The rendered images are 10 viewpoints in each timestep.
Please see main.bash.

<!-- As for CLIP loss, we calculate clip distance loss between rendered images and reference images. The refernce images are n frames. The rendered images are 10 viewpoints in each timestep.
As for CLIP-T loss, we choose to also measure CLIP-T distance at different viewpoint, not only for the frontal view but also for the back and side views.
```bash
cd evaluation
bash eval.bash #please change file paths before running
```
``` -->


## Result ##
Expand Down
13 changes: 13 additions & 0 deletions main.bash
Original file line number Diff line number Diff line change
@@ -1,3 +1,13 @@
#optional image-to-4D data prepocess
#remove background
python preprocess.py --path /data/users/yyy/4DGen_git/4DGen/exp_data/fish.jpg --recenter True

#generate videos by svd
python image_to_video.py --data_path /data/users/yyy/4DGen_git/4DGen/exp_data/fish.jpg_pose0/fish.png --name clown_fish
#svd results highly rely on random seed. Pick the best result.



cd 4DGen
mkdir data
export name="fish"
Expand Down Expand Up @@ -57,3 +67,6 @@ python evaluation.py --model clip_t --input_data_path $input_data_path --datase
input_data_path="${pred_list_data_path}/back/front"
python evaluation.py --model clip_t --input_data_path $input_data_path --dataset $name --direction back --save_name ${name}

#xclip
python xclip.py --video_path ./output/fish16_13:50:01/video/ours_3000/multiview.mp4 --prompt a swimming fish

0 comments on commit 6fc3160

Please sign in to comment.