diff --git a/.all-contributorsrc b/.all-contributorsrc new file mode 100644 index 00000000..f3245869 --- /dev/null +++ b/.all-contributorsrc @@ -0,0 +1,4 @@ +{ + "projectName": "Open-Sora", + "projectOwner": "hpcaitech" +} diff --git a/README.md b/README.md index 3dfa8789..3b9af882 100644 --- a/README.md +++ b/README.md @@ -20,9 +20,7 @@ Open-Sora not only democratizes access to advanced video generation techniques, streamlined and user-friendly platform that simplifies the complexities of video production. With Open-Sora, we aim to inspire innovation, creativity, and inclusivity in the realm of content creation. -[[中文文档]](/docs/zh_CN/README.md) - -[潞晨云部署Open-Sora保姆级视频教程](https://www.bilibili.com/video/BV141421R7Ag) +[[中文文档]](/docs/zh_CN/README.md) [[潞晨云部署视频教程]](https://www.bilibili.com/video/BV141421R7Ag)

Open-Sora is still at an early stage and under active development.

@@ -41,24 +39,24 @@ With Open-Sora, we aim to inspire innovation, creativity, and inclusivity in the ## 🎥 Latest Demo -🔥 You can experinece Open-Sora on our [🤗 Gradio application on Hugging Face](https://huggingface.co/spaces/hpcai-tech/open-sora) - -More samples are available in our [gallery](https://hpcaitech.github.io/Open-Sora/). - +🔥 You can experinece Open-Sora on our [🤗 Gradio application on Hugging Face](https://huggingface.co/spaces/hpcai-tech/open-sora). More samples are available in our [Gallery](https://hpcaitech.github.io/Open-Sora/). | **2s 240×426** | **2s 240×426** | | ----------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- | -| [](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/c31ebc52-de39-4a4e-9b1e-9211d45e05b2) | [](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/c31ebc52-de39-4a4e-9b1e-9211d45e05b2) | +| [](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/c31ebc52-de39-4a4e-9b1e-9211d45e05b2) | [](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/c31ebc52-de39-4a4e-9b1e-9211d45e05b2) | | [](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/f7ce4aaa-528f-40a8-be7a-72e61eaacbbd) | [](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/5d58d71e-1fda-4d90-9ad3-5f2f7b75c6a9) | -| **2s 426×240** | **4s 480×854** | -| ---------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | +| **2s 426×240** | **4s 480×854** | +| ---------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------- | | [](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/34ecb4a0-4eef-4286-ad4c-8e3a87e5a9fd) | [](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/c1619333-25d7-42ba-a91c-18dbc1870b18) | -| **16s 320×320** | **16s 224×448** | **2s 426×240** | -| ----------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------- | -| [](https://github.com/hpcaitech/Open-Sora/assets/99191637/3cab536e-9b43-4b33-8da8-a0f9cf842ff2) | [](https://github.com/hpcaitech/Open-Sora/assets/99191637/9fb0b9e0-c6f4-4935-b29e-4cac10b373c4) | [](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/3e892ad2-9543-4049-b005-643a4c1bf3bf) | +| **16s 320×320** | **16s 224×448** | **2s 426×240** | +| ------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------- | +| [](https://github.com/hpcaitech/Open-Sora/assets/99191637/3cab536e-9b43-4b33-8da8-a0f9cf842ff2) | [](https://github.com/hpcaitech/Open-Sora/assets/99191637/9fb0b9e0-c6f4-4935-b29e-4cac10b373c4) | [](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/3e892ad2-9543-4049-b005-643a4c1bf3bf) | +<<<<<<< Updated upstream +======= +>>>>>>> Stashed changes
OpenSora 1.0 Demo @@ -68,12 +66,11 @@ More samples are available in our [gallery](https://hpcaitech.github.io/Open-Sor | [](https://github.com/hpcaitech/Open-Sora/assets/99191637/de1963d3-b43b-4e68-a670-bb821ebb6f80) | [](https://github.com/hpcaitech/Open-Sora/assets/99191637/13f8338f-3d42-4b71-8142-d234fbd746cc) | [](https://github.com/hpcaitech/Open-Sora/assets/99191637/fa6a65a6-e32a-4d64-9a9e-eabb0ebb8c16) | | A serene night scene in a forested area. [...] The video is a time-lapse, capturing the transition from day to night, with the lake and forest serving as a constant backdrop. | A soaring drone footage captures the majestic beauty of a coastal cliff, [...] The water gently laps at the rock base and the greenery that clings to the top of the cliff. | The majestic beauty of a waterfall cascading down a cliff into a serene lake. [...] The camera angle provides a bird's eye view of the waterfall. | | [](https://github.com/hpcaitech/Open-Sora/assets/99191637/64232f84-1b36-4750-a6c0-3e610fa9aa94) | [](https://github.com/hpcaitech/Open-Sora/assets/99191637/983a1965-a374-41a7-a76b-c07941a6c1e9) | [](https://github.com/hpcaitech/Open-Sora/assets/99191637/ec10c879-9767-4c31-865f-2e8d6cf11e65) | -| A bustling city street at night, filled with the glow of car headlights and the ambient light of streetlights. [...] | The vibrant beauty of a sunflower field. The sunflowers are arranged in neat rows, creating a sense of order and symmetry. [...] | A serene underwater scene featuring a sea turtle swimming through a coral reef. The turtle, with its greenish-brown shell [...] | +| A bustling city street at night, filled with the glow of car headlights and the ambient light of streetlights. [...] | The vibrant beauty of a sunflower field. The sunflowers are arranged in neat rows, creating a sense of order and symmetry. [...] | A serene underwater scene featuring a sea turtle swimming through a coral reef. The turtle, with its greenish-brown shell [...] | Videos are downsampled to `.gif` for display. Click for original videos. Prompts are trimmed for display, see [here](/assets/texts/t2v_samples.txt) for full prompts. -
## 🔆 New Features/Updates @@ -117,8 +114,8 @@ see [here](/assets/texts/t2v_samples.txt) for full prompts. ### TODO list sorted by priority * [ ] Training Video-VAE and adapt our model to new VAE. **[WIP]** -* [ ] Incoporate a better scheduler, e.g., rectified flow in SD3. -* [ ] Scaling model parameters and dataset size. +* [ ] Scaling model parameters and dataset size. **[WIP]** +* [ ] Incoporate a better scheduler, e.g., rectified flow in SD3. **[WIP]**
View more @@ -187,21 +184,25 @@ pip install -v . ### Open-Sora 1.1 Model Weights -| Resolution | Data | #iterations | Batch Size | URL | -| ------------------ | -------------------------- | ----------- | ------------------------------------------------- | -------------------------------------------------------------------- | -| mainly 144p & 240p | 10M videos + 2M images | 100k | [dynamic](/configs/opensora-v1-1/train/stage2.py) | [:link:](https://huggingface.co/hpcai-tech/OpenSora-STDiT-v2-stage2) | -| 144p to 720p | 500K HQ videos + 1M images | 4k | [dynamic](/configs/opensora-v1-1/train/stage3.py) | [:link:](https://huggingface.co/hpcai-tech/OpenSora-STDiT-v2-stage3) | +| Resolution | Model Size | Data | #iterations | Batch Size | URL | +| ------------------ | ---------- | -------------------------- | ----------- | ------------------------------------------------- | -------------------------------------------------------------------- | +| mainly 144p & 240p | 700M | 10M videos + 2M images | 100k | [dynamic](/configs/opensora-v1-1/train/stage2.py) | [:link:](https://huggingface.co/hpcai-tech/OpenSora-STDiT-v2-stage2) | +| 144p to 720p | 700M | 500K HQ videos + 1M images | 4k | [dynamic](/configs/opensora-v1-1/train/stage3.py) | [:link:](https://huggingface.co/hpcai-tech/OpenSora-STDiT-v2-stage3) | + +See our **[report 1.1](docs/report_02.md)** for more infomation. + +:warning: **LIMITATION**: This version contains known issues which we are going to fix in the next version (as we save computation resource for the next release). In addition, the video generation may fail for long duration, and high resolution will have noisy results due to this problem. ### Open-Sora 1.0 Model Weights
View more -| Resolution | Data | #iterations | Batch Size | GPU days (H800) | URL | -| ---------- | ------ | ----------- | ---------- | --------------- | --------------------------------------------------------------------------------------------- | -| 16×512×512 | 20K HQ | 20k | 2×64 | 35 | [:link:](https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-HQ-16x512x512.pth) | -| 16×256×256 | 20K HQ | 24k | 8×64 | 45 | [:link:](https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-HQ-16x256x256.pth) | -| 16×256×256 | 366K | 80k | 8×64 | 117 | [:link:](https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-16x256x256.pth) | +| Resolution | Model Size | Data | #iterations | Batch Size | GPU days (H800) | URL | +| ---------- | ---------- | ------ | ----------- | ---------- | --------------- | +| 16×512×512 | 700M | 20K HQ | 20k | 2×64 | 35 | [:link:](https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-HQ-16x512x512.pth) | +| 16×256×256 | 700M | 20K HQ | 24k | 8×64 | 45 | [:link:](https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-HQ-16x256x256.pth) | +| 16×256×256 | 700M | 366K | 80k | 8×64 | 117 | [:link:](https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-16x256x256.pth) | Training orders: 16x256x256 $\rightarrow$ 16x256x256 HQ $\rightarrow$ 16x512x512 HQ. @@ -219,7 +220,7 @@ on improving the quality and text alignment. ### Gradio Demo -🔥 You can experinece Open-Sora on our [🤗 Gradio application](https://huggingface.co/spaces/hpcai-tech/open-sora) on Hugging Face online. +🔥 You can experinece Open-Sora on our [🤗 Gradio application](https://huggingface.co/spaces/hpcai-tech/open-sora) on Hugging Face online. If you want to deploy gradio locally, we have also provided a [Gradio application](./gradio) in this repository, you can use the following the command to start an interactive web application to experience video generation with Open-Sora. @@ -270,12 +271,12 @@ To lower the memory usage, set a smaller `vae.micro_batch_size` in the config (s
## Data Processing + High-quality data is crucial for training good generation models. To this end, we establish a complete pipeline for data processing, which could seamlessly convert raw videos to high-quality video-text pairs. The pipeline is shown below. For detailed information, please refer to [data processing](docs/data_processing.md). Also check out the [datasets](docs/datasets.md) we use. - ![Data Processing Pipeline](assets/readme/report_data_pipeline.png) ## Training @@ -331,24 +332,6 @@ following [all-contributors](https://github.com/all-contributors/all-contributor - - - - - - - - - - - - - - - - - -
zhengzangw
zhengzangw

💻 📖 🤔 📹 🚧
ver217
ver217

💻 🤔 📖 🐛
FrankLeeeee
FrankLeeeee

💻 🚇 🔧
xyupeng
xyupeng

💻 📖 🎨
Yanjia0
Yanjia0

📖
binmakeswell
binmakeswell

📖
eltociear
eltociear

📖
ganeshkrishnan1
ganeshkrishnan1

📖
fastalgo
fastalgo

📖
powerzbt
powerzbt

📖
diff --git a/docs/datasets.md b/docs/datasets.md index bdf54713..ca096357 100644 --- a/docs/datasets.md +++ b/docs/datasets.md @@ -3,36 +3,50 @@ For Open-Sora 1.1, we conduct mixed training with both images and videos. The main datasets we use are listed below. Please refer to [README](/README.md#data-processing) for data processing. -## Panda-70M +## Video + +### Panda-70M + [Panda-70M](https://github.com/snap-research/Panda-70M) is a large-scale dataset with 70M video-caption pairs. -We use the [training-10M subset](https://github.com/snap-research/Panda-70M/tree/main/dataset_dataloading) for training, +We use the [training-10M subset](https://github.com/snap-research/Panda-70M/tree/main/dataset_dataloading) for training, which contains ~10M videos of better quality. -## Pexels -[Pexels](https://www.pexels.com/) is a popular online platform that provides high-quality stock photos, videos, and music for free. +### Pexels + +[Pexels](https://www.pexels.com/) is a popular online platform that provides high-quality stock photos, videos, and music for free. Most videos from this website are of high quality. Thus, we use them for both pre-training and HQ fine-tuning. We really appreciate the great platform and the contributors! -## Inter4K +### Inter4K + [Inter4K](https://github.com/alexandrosstergiou/Inter4K) is a dataset containing 1K video clips with 4K resolution. The dataset is proposed for super-resolution tasks. We use the dataset for HQ fine-tuning. +### HD-VG-130M -## HD-VG-130M -[HD-VG-130M](https://github.com/daooshee/HD-VG-130M?tab=readme-ov-file) comprises 130M text-video pairs. -The caption is generated by BLIP-2. +[HD-VG-130M](https://github.com/daooshee/HD-VG-130M?tab=readme-ov-file) comprises 130M text-video pairs. +The caption is generated by BLIP-2. We find the scene and the text quality are relatively poor. For OpenSora 1.0, we only use ~350K samples from this dataset. -## Midjourney-v5-1.7M +## Image + +### Midjourney-v5-1.7M + [Midjourney-v5-1.7M](https://huggingface.co/datasets/wanng/midjourney-v5-202304-clean) includes 1.7M image-text pairs. In detail, this dataset introduces two subsets: original and upscale. This dataset is proposed for exploring the relationship of prompts and high-quality images. -## Midjourney-kaggle-clean +### Midjourney-kaggle-clean + [Midjourney-kaggle-clean](https://huggingface.co/datasets/wanng/midjourney-kaggle-clean) is a reconstructed version of [Midjourney User Prompts & Generated Images (250k)](https://www.kaggle.com/datasets/succinctlyai/midjourney-texttoimage?select=general-01_2022_06_20.json%5D), which is cleaned by rules. Moreover, this dataset is divided into two subsets: original and upscale. This dataset is proposed for enabling research on text-to-image model prompting. -## upsplash-lite -The [Unsplash-lite](https://github.com/unsplash/datasets) Dataset comprises 25k nature-themed Unsplash photos, 25k keywords, and 1M searches. +### upsplash-lite + +The [Unsplash-lite](https://github.com/unsplash/datasets) Dataset comprises 25k nature-themed Unsplash photos, 25k keywords, and 1M searches. This dataset covers a vast range of uses and contexts. Its extensive scope in intent and semantics opens new avenues for research and learning. + +### LAION-AESTHETICS 6.5+ + +LAION aesthetic 6.5+ dataset is a subset of the LAION dataset, which contains 625K high-quality images with aesthetic scores > 6.5. However, as LAION is currently not publicly available, we use this 168k [subset](https://huggingface.co/datasets/bhargavsdesai/laion_improved_aesthetics_6.5plus_with_images).