GitHub - TencentARC/StereoCrafter: A framework to convert any 2D videos to immersive stereoscopic 3D

StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos

Sijie Zhao* Wenbo Hu* Xiaodong Cun* Yong Zhang† Xiaoyu Li†
Zhe Kong Xiangjun Gao Muyao Niu Ying Shan

* equal contribution † corresponding author

Tencent AI Lab ARC Lab, Tencent PCG

💡 Abstract

We propose a novel framework to convert any 2D videos to immersive stereoscopic 3D ones that can be viewed on different display devices, like 3D Glasses, Apple Vision Pro and 3D Display. It can be applied to various video sources, such as movies, vlogs, 3D cartoons, and AIGC videos.

📣 News

2024/12/27 We released our inference code and model weights.
2024/09/11 We submitted our technical report on arXiv and released our project page.

🎞️ Showcases

Here we show some examples of input videos and their corresponding stereo outputs in Anaglyph 3D format.

🛠️ Installation

1. Set up the environment

We run our code on Python 3.8 and Cuda 11.8. You can use Anaconda or Docker to build this basic environment.

2. Clone the repo

# use --recursive to clone the dependent submodules
git clone --recursive https://github.com/TencentARC/StereoCrafter
cd StereoCrafter

3. Install the requirements

pip install -r requirements.txt

4. Install customized 'Forward-Warp' package for forward splatting

cd ./dependency/Forward-Warp
chmod a+x install.sh
./install.sh

📦 Model Weights

1. Download the SVD img2vid model for the image encoder and VAE.

# in StereoCrafter project root directory
mkdir weights
cd ./weights
git lfs install
git clone https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt-1-1

2. Download the DepthCrafter model for the video depth estimation.

git clone https://huggingface.co/tencent/DepthCrafter

3. Download the StereoCrafter model for the stereo video generation.

git clone https://huggingface.co/TencentARC/StereoCrafter

🔄 Inference

Script:

# in StereoCrafter project root directory
sh run_inference.sh

There are two main steps in this script for generating stereo video.

1. Depth-Based Video Splatting Using the Video Depth from DepthCrafter

Execute the following command:

python depth_splatting_inference.py --pre_trained_path [PATH] --unet_path [PATH]
                                    --input_video_path [PATH] --output_video_path [PATH]

Arguments:

--pre_trained_path: Path to the SVD img2vid model weights (e.g., ./weights/stable-video-diffusion-img2vid-xt-1-1).
--unet_path: Path to the DepthCrafter model weights (e.g., ./weights/DepthCrafter).
--input_video_path: Path to the input video (e.g., ./source_video/camel.mp4).
--output_video_path: Path to the output video (e.g., ./outputs/camel_splatting_results.mp4).
--max_disp: Parameter controlling the maximum disparity between the generated right video and the input left video. Default value is 20 pixels.

The first step generates a video grid with input video, visualized depth map, occlusion mask, and splatting right video, as shown below:

2. Stereo Video Inpainting of the Splatting Video

Execute the following command:

python inpainting_inference.py --pre_trained_path [PATH] --unet_path [PATH]
                               --input_video_path [PATH] --save_dir [PATH]

Arguments:

--pre_trained_path: Path to the SVD img2vid model weights (e.g., ./weights/stable-video-diffusion-img2vid-xt-1-1).
--unet_path: Path to the StereoCrafter model weights (e.g., ./weights/StereoCrafter).
--input_video_path: Path to the splatting video result generated by the first stage (e.g., ./outputs/camel_splatting_results.mp4).
--save_dir: Directory for the output stereo video (e.g., ./outputs).
--tile_num: The number of tiles in width and height dimensions for tiled processing, which allows for handling high resolution input without requiring more GPU memory. The default value is 1 (1 $\times$ 1 tile). For input videos with a resolution of 2K or higher, you could use more tiles to avoid running out of memory.

The stereo video inpainting generates the stereo video result in side-by-side format and anaglyph 3D format, as shown below:

🤝 Acknowledgements

We would like to express our gratitude to the following open-source projects:

Stable Video Diffusion: A latent diffusion model trained to generate video clips from an image or text conditioning.
DepthCrafter: A novel method to generate temporally consistent depth sequences from videos.

📚 Citation

@article{zhao2024stereocrafter,
  title={Stereocrafter: Diffusion-based generation of long and high-fidelity stereoscopic 3d from monocular videos},
  author={Zhao, Sijie and Hu, Wenbo and Cun, Xiaodong and Zhang, Yong and Li, Xiaoyu and Kong, Zhe and Gao, Xiangjun and Niu, Muyao and Shan, Ying},
  journal={arXiv preprint arXiv:2409.07447},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
dependency		dependency
pipelines		pipelines
source_video		source_video
.gitignore		.gitignore
.gitmodules		.gitmodules
License-Code.txt		License-Code.txt
README.md		README.md
depth_splatting_inference.py		depth_splatting_inference.py
inpainting_inference.py		inpainting_inference.py
requirements.txt		requirements.txt
run_inference.sh		run_inference.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos

Tencent AI Lab ARC Lab, Tencent PCG

💡 Abstract

📣 News

🎞️ Showcases

🛠️ Installation

1. Set up the environment

2. Clone the repo

3. Install the requirements

4. Install customized 'Forward-Warp' package for forward splatting

📦 Model Weights

1. Download the SVD img2vid model for the image encoder and VAE.

2. Download the DepthCrafter model for the video depth estimation.

3. Download the StereoCrafter model for the stereo video generation.

🔄 Inference

1. Depth-Based Video Splatting Using the Video Depth from DepthCrafter

2. Stereo Video Inpainting of the Splatting Video

🤝 Acknowledgements

📚 Citation

About

Releases

Packages

Languages

License

TencentARC/StereoCrafter

Folders and files

Latest commit

History

Repository files navigation

StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos

Tencent AI Lab ARC Lab, Tencent PCG

💡 Abstract

📣 News

🎞️ Showcases

🛠️ Installation

1. Set up the environment

2. Clone the repo

3. Install the requirements

4. Install customized 'Forward-Warp' package for forward splatting

📦 Model Weights

1. Download the SVD img2vid model for the image encoder and VAE.

2. Download the DepthCrafter model for the video depth estimation.

3. Download the StereoCrafter model for the stereo video generation.

🔄 Inference

1. Depth-Based Video Splatting Using the Video Depth from DepthCrafter

2. Stereo Video Inpainting of the Splatting Video

🤝 Acknowledgements

📚 Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages