3.17 commit

VITA-Group · Mar 17, 2024 · 6fc3160 · 6fc3160
1 parent e54900f
commit 6fc3160
Show file tree

Hide file tree

Showing 2 changed files with 25 additions and 6 deletions.
diff --git a/README.md b/README.md
@@ -8,12 +8,12 @@ Authors: Yuyang Yin, Dejia Xu, Zhangyang Wang, Yao Zhao, Yunchao Wei
 ![overview](docs/static/media/task.png)
 
 ## News
-- `2023/12/28`  Release code and paper.
-- `2024/2/1`   Enhance the coherence of the video outputs in low fps.
+- `2023/12/28`  First release code and paper.
 - `2024/2/14`   Update text-to-4d and image-to-4d functions and cases.
+- `2024/3/17`   Add a completed example script.
 
 ## Task Type
-As show in figure above, we define grounded 4D generation, which focuses on video-to-4D generation. Video is not required to be user-specified but can also be generated by video diffusion. With the help of [stable video diffusion](https://github.com/nateraw/stable-diffusion-videos), we implement the function  of image-to-video-to-4d  and text-to-image-to-video-to-4d . Due to the unsatisfactory performance of the text-to-video model, we use [stable diffusion-XL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) and [stable video diffusion](https://github.com/nateraw/stable-diffusion-videos) implement the function  of text-to-image-to-video-to-4d.
+As show in figure above, we define grounded 4D generation, which focuses on video-to-4D generation. Video is not required to be user-specified but can also be generated by video diffusion. With the help of [stable video diffusion](https://github.com/nateraw/stable-diffusion-videos), we implement the function  of image-to-video-to-4d  and text-to-image-to-video-to-4d . Due to the unsatisfactory performance of the text-to-video model, we use [stable diffusion-XL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) and [stable video diffusion](https://github.com/nateraw/stable-diffusion-videos) implement the function  of text-to-image-to-video-to-4d. Therefore, our model support **text-to-4D** and **image-to-4D** tasks.
 
 
 
@@ -37,6 +37,10 @@ pip install ./simple-knn
 # pip install kaolin -f https://nvidia-kaolin.s3.us-east-2.amazonaws.com/torch-1.12.1_cu116.html
 ```
 
+## Example Case Script
+We have organized a complete pipeline script in **main.bash** for your reference.
+
+
 ## Data Preparation
 
 We release our collected data in [Google Drive](https://drive.google.com/drive/folders/1-lbtj-YiA7d0Nbe6Qcc_t0W_CKKEw_bm?usp=drive_link). Some of these data are user-specified, while others are generated. 
@@ -70,7 +74,7 @@ python preprocess_sync.py --path xxx
 ## Training
 
 ```bash
-python train.py --configs arguments/i2v.py -e rose
+python train.py --configs arguments/i2v.py -e rose --name_override rose
 ```
 
 ## Rendering
@@ -84,14 +88,16 @@ python render.py --skip_train --configs arguments/i2v.py --skip_test --model_pat
 
 
 ## Evaluation
-As for CLIP loss, we calculate clip distance loss between rendered images and reference images. The refernce images are n frames. The rendered images are 10 viewpoints in each timestep. 
+Please see main.bash.
+
+<!-- As for CLIP loss, we calculate clip distance loss between rendered images and reference images. The refernce images are n frames. The rendered images are 10 viewpoints in each timestep. 
 
 As for CLIP-T loss, we choose to also measure CLIP-T distance at different viewpoint, not only for the frontal view but also for the back and side views.
 
 ```bash
 cd evaluation
 bash eval.bash  #please change file paths before running
-```
+``` -->
 
 
 ## Result ##

diff --git a/main.bash b/main.bash
@@ -1,3 +1,13 @@
+#optional  image-to-4D data prepocess
+#remove background
+python preprocess.py --path /data/users/yyy/4DGen_git/4DGen/exp_data/fish.jpg --recenter True
+
+#generate videos by svd
+python image_to_video.py --data_path /data/users/yyy/4DGen_git/4DGen/exp_data/fish.jpg_pose0/fish.png --name clown_fish 
+#svd results highly rely on random seed. Pick the best result.
+
+
+
 cd 4DGen
 mkdir data
 export name="fish"
@@ -57,3 +67,6 @@ python evaluation.py  --model clip_t --input_data_path $input_data_path --datase
 input_data_path="${pred_list_data_path}/back/front"
 python evaluation.py  --model clip_t --input_data_path $input_data_path --dataset $name --direction back --save_name ${name}
 
+#xclip
+python xclip.py --video_path ./output/fish16_13:50:01/video/ours_3000/multiview.mp4 --prompt a swimming fish
+