StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos
Sijie Zhao*
Wenbo Hu*
Xiaodong Cun*
Yong Zhang†
Xiaoyu Li†
Zhe Kong
Xiangjun Gao
Muyao Niu
Ying Shan
* equal contribution † corresponding author
We propose a novel framework to convert any 2D videos to immersive stereoscopic 3D ones that can be viewed on different display devices, like 3D Glasses, Apple Vision Pro and 3D Display. It can be applied to various video sources, such as movies, vlogs, 3D cartoons, and AIGC videos.
2024/12/27
We released our inference code and model weights.2024/09/11
We submitted our technical report on arXiv and released our project page.
Here we show some examples of input videos and their corresponding stereo outputs in Anaglyph 3D format.
We run our code on Python 3.8 and Cuda 11.8. You can use Anaconda or Docker to build this basic environment.
# use --recursive to clone the dependent submodules
git clone --recursive https://github.com/TencentARC/StereoCrafter
cd StereoCrafter
pip install -r requirements.txt
cd ./dependency/Forward-Warp
chmod a+x install.sh
./install.sh
1. Download the SVD img2vid model for the image encoder and VAE.
# in StereoCrafter project root directory
mkdir weights
cd ./weights
git lfs install
git clone https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt-1-1
2. Download the DepthCrafter model for the video depth estimation.
git clone https://huggingface.co/tencent/DepthCrafter
3. Download the StereoCrafter model for the stereo video generation.
git clone https://huggingface.co/TencentARC/StereoCrafter
Script:
# in StereoCrafter project root directory
sh run_inference.sh
There are two main steps in this script for generating stereo video.
Execute the following command:
python depth_splatting_inference.py --pre_trained_path [PATH] --unet_path [PATH]
--input_video_path [PATH] --output_video_path [PATH]
Arguments:
--pre_trained_path
: Path to the SVD img2vid model weights (e.g.,./weights/stable-video-diffusion-img2vid-xt-1-1
).--unet_path
: Path to the DepthCrafter model weights (e.g.,./weights/DepthCrafter
).--input_video_path
: Path to the input video (e.g.,./source_video/camel.mp4
).--output_video_path
: Path to the output video (e.g.,./outputs/camel_splatting_results.mp4
).--max_disp
: Parameter controlling the maximum disparity between the generated right video and the input left video. Default value is20
pixels.
The first step generates a video grid with input video, visualized depth map, occlusion mask, and splatting right video, as shown below:
Execute the following command:
python inpainting_inference.py --pre_trained_path [PATH] --unet_path [PATH]
--input_video_path [PATH] --save_dir [PATH]
Arguments:
-
--pre_trained_path
: Path to the SVD img2vid model weights (e.g.,./weights/stable-video-diffusion-img2vid-xt-1-1
). -
--unet_path
: Path to the StereoCrafter model weights (e.g.,./weights/StereoCrafter
). -
--input_video_path
: Path to the splatting video result generated by the first stage (e.g.,./outputs/camel_splatting_results.mp4
). -
--save_dir
: Directory for the output stereo video (e.g.,./outputs
). -
--tile_num
: The number of tiles in width and height dimensions for tiled processing, which allows for handling high resolution input without requiring more GPU memory. The default value is1
(1$\times$ 1 tile). For input videos with a resolution of 2K or higher, you could use more tiles to avoid running out of memory.
The stereo video inpainting generates the stereo video result in side-by-side format and anaglyph 3D format, as shown below:
We would like to express our gratitude to the following open-source projects:
- Stable Video Diffusion: A latent diffusion model trained to generate video clips from an image or text conditioning.
- DepthCrafter: A novel method to generate temporally consistent depth sequences from videos.
@article{zhao2024stereocrafter,
title={Stereocrafter: Diffusion-based generation of long and high-fidelity stereoscopic 3d from monocular videos},
author={Zhao, Sijie and Hu, Wenbo and Cun, Xiaodong and Zhang, Yong and Li, Xiaoyu and Kong, Zhe and Gao, Xiangjun and Niu, Muyao and Shan, Ying},
journal={arXiv preprint arXiv:2409.07447},
year={2024}
}