Update main & examples readme (#737)

* update readme * fix format * fix linting * update * fix typo * update readme * update readme * fix linting * rm detect_chatgpt, tango, tuneavideo * Unify v020 readmes (#730) * Unify v020 readmes (#730) Update docs for pku-opensora * update pixart sigma readme * fix typo * update * rm redundency --------- Co-authored-by: fzilan <[email protected]>
mindspore-lab · Nov 22, 2024 · 0ad4991 · 0ad4991
1 parent 931bde6
commit 0ad4991
Show file tree

Hide file tree

Showing 95 changed files with 249 additions and 143,003 deletions.
diff --git a/README.md b/README.md
@@ -1,30 +1,43 @@
 # MindONE
 
-This repository contains SoTA algorithms, models, and interesting projects in the area of multimodal understanding and content generation
+This repository contains SoTA algorithms, models, and interesting projects in the area of multimodal understanding and content generation.
 
 ONE is short for "ONE for all"
+
 ## News
+- [2024.11.06] MindONE [v0.2.0](https://github.com/mindspore-lab/mindone/releases/tag/v0.2.0) is released
+
+## Quick tour
+
+To install MindONE v0.2.0, please install [MindSpore 2.3.1](https://www.mindspore.cn/install) and run `pip install mindone`
+
+Alternatively, to install the latest version from the `master` branch, please run.
+```
+git clone https://github.com/mindspore-lab/mindone.git
+cd mindone
+pip install -e .
+```
+
+We support state-of-the-art diffusion models for generating images, audio, and video. Let's get started using [Stable Diffusion 3](https://huggingface.co/stabilityai/stable-diffusion-3-medium) as an example.
 
 **Hello MindSpore** from **Stable Diffusion 3**!
 
 <div>
 <img src="https://github.com/townwish4git/mindone/assets/143256262/8c25ae9a-67b1-436f-abf6-eca36738cd17" alt="sd3" width="512" height="512">
 </div>
 
-- [mindone/diffusers](mindone/diffusers) now supports [Stable Diffusion 3](https://huggingface.co/stabilityai/stable-diffusion-3-medium). Give it a try yourself!
-
-    ```py
-    import mindspore
-    from mindone.diffusers import StableDiffusion3Pipeline
+```py
+import mindspore
+from mindone.diffusers import StableDiffusion3Pipeline
 
-    pipe = StableDiffusion3Pipeline.from_pretrained(
-        "stabilityai/stable-diffusion-3-medium-diffusers",
-        mindspore_dtype=mindspore.float16,
-    )
-    prompt = "A cat holding a sign that says 'Hello MindSpore'"
-    image = pipe(prompt)[0][0]
-    image.save("sd3.png")
-    ```
+pipe = StableDiffusion3Pipeline.from_pretrained(
+    "stabilityai/stable-diffusion-3-medium-diffusers",
+    mindspore_dtype=mindspore.float16,
+)
+prompt = "A cat holding a sign that says 'Hello MindSpore'"
+image = pipe(prompt)[0][0]
+image.save("sd3.png")
+```
 
 ### supported models under mindone/examples
 | model  |  features  
@@ -37,23 +50,26 @@ ONE is short for "ONE for all"
 | [pllava](https://github.com/mindspore-lab/mindone/blob/master/examples/pllava)      | working on it |
 | [hpcai open sora](https://github.com/mindspore-lab/mindone/blob/master/examples/opensora_hpcai)      | support v1.0/1.1/1.2 large scale training with dp/sp/zero |
 | [open sora plan](https://github.com/mindspore-lab/mindone/blob/master/examples/opensora_pku) | support v1.0/1.1/1.2 large scale training with dp/sp/zero |
-| [stable diffusion](https://github.com/mindspore-lab/mindone/blob/master/examples/stable_diffusion_v2) | support sd 1.5/2.0/2.1, vanilla fine tune, lora, dreambooth, text inversion|
-| [stable diffusion xl](https://github.com/mindspore-lab/mindone/blob/master/examples/stable_diffusion_xl)  |support sai style(stability AI) vanilla fine tune, lora, dreambooth |
-| [dit](https://github.com/mindspore-lab/mindone/blob/master/examples/dit)     | support text to image fine tune |
-| [latte](https://github.com/mindspore-lab/mindone/blob/master/examples/latte)     | support uncondition text to image fine tune |
+| [stable diffusion](https://github.com/mindspore-lab/mindone/blob/master/examples/stable_diffusion_v2) | support sd 1.5/2.0/2.1, vanilla fine-tune, lora, dreambooth, text inversion|
+| [stable diffusion xl](https://github.com/mindspore-lab/mindone/blob/master/examples/stable_diffusion_xl)  |support sai style(stability AI) vanilla fine-tune, lora, dreambooth |
+| [dit](https://github.com/mindspore-lab/mindone/blob/master/examples/dit)     | support text to image fine-tune |
+| [latte](https://github.com/mindspore-lab/mindone/blob/master/examples/latte)     | support unconditional text to image fine-tune |
 | [animate diff](https://github.com/mindspore-lab/mindone/blob/master/examples/animatediff) | support motion module and lora training |
 | [video composer](https://github.com/mindspore-lab/mindone/tree/master/examples/videocomposer)     | support conditional video generation with motion transfer and etc.|
 | [ip adapter](https://github.com/mindspore-lab/mindone/blob/master/examples/ip_adapter)     | refactoring  |
 | [t2i-adapter](https://github.com/mindspore-lab/mindone/blob/master/examples/t2i_adapter)     | refactoring |
+| [dynamicrafter](https://github.com/mindspore-lab/mindone/blob/master/examples/dynamicrafter)     | support image to video generation |
+| [hunyuan_dit](https://github.com/mindspore-lab/mindone/blob/master/examples/hunyuan_dit)     | support text to image fine-tune |
+| [pixart_sigma](https://github.com/mindspore-lab/mindone/blob/master/examples/pixart_sigma)     | support text to image fine-tune at different aspect ratio |
 
 ###  run hf diffusers on mindspore
-mindone diffusers is under active development, most tasks were tested with mindspore 2.2.10 and ascend 910 hardware.
+mindone diffusers is under active development, most tasks were tested with mindspore 2.3.1 and ascend 910 hardware.
 
 | component  |  features  
 | :---   |  :--  
 | [pipeline](https://github.com/mindspore-lab/mindone/tree/master/mindone/diffusers/pipelines) | support text2image,text2video,text2audio tasks 30+
 | [models](https://github.com/mindspore-lab/mindone/tree/master/mindone/diffusers/models) | support audoencoder & transformers base models same as hf diffusers
 | [schedulers](https://github.com/mindspore-lab/mindone/tree/master/mindone/diffusers/schedulers) | support ddpm & dpm solver 10+ schedulers same as hf diffusers
+
 #### TODO
-* [ ] mindspore 2.3.0 version adaption
 * [ ] hf diffusers 0.30.0 version adaption
diff --git a/examples/README.md b/examples/README.md
@@ -1,4 +1,4 @@
-### multi modal understanding and generation model examples supported by mindone
+### multi-modal understanding and generation model examples supported by mindone
 | project  |  introduction | original repo
 | :---   |  :--  | :-
 | [mindone.diffusers](https://github.com/mindspore-lab/mindone/blob/master/examples/diffusers) | run hf diffusers on mindspore | https://github.com/huggingface/diffusers |
@@ -21,3 +21,6 @@
 | [llava](https://github.com/mindspore-lab/mindone/blob/master/examples/llava)      | Haotian-Liu official | https://github.com/haotian-liu/LLaVA
 | [vila](https://github.com/mindspore-lab/mindone/blob/master/examples/vila)      | Nvidia Lab official | https://github.com/NVlabs/VILA
 | [pllava](https://github.com/mindspore-lab/mindone/blob/master/examples/pllava)      | Magic Research official | https://github.com/magic-research/PLLaVA
+| [dynamicrafter](https://github.com/mindspore-lab/mindone/blob/master/examples/dynamicrafter)     | Tencent Research official | https://github.com/Doubiiu/DynamiCrafter
+| [hunyuan_dit](https://github.com/mindspore-lab/mindone/blob/master/examples/hunyuan_dit)     | Tencent Research official | https://github.com/Tencent/HunyuanDiT
+| [pixart_sigma](https://github.com/mindspore-lab/mindone/blob/master/examples/pixart_sigma)     | Noah Lab official | https://github.com/PixArt-alpha/PixArt-sigma
diff --git a/examples/animatediff/README.md b/examples/animatediff/README.md
@@ -252,41 +252,41 @@ Here are some generation results after lora fine-tuning on 512x512 resolution an
 
 ## Performance (AnimateDiff v2)
 
-Experiments are tested on ascend 910* graph mode with a single card.
+Experiments are tested on ascend 910* graph mode.
 
 ### Inference
 
 - mindspore 2.3.1
 
-|   model name   | mindspore | scheduler | steps | resolution | frame | speed (step/s) | time(s/video) |
-|:--------------:|:---------:|:---------:|:-----:|:----------:|:-----:|:--------------:|:-------------:|
-| AnimateDiff v2 |   2.3.1   |   DDIM    |  30   |  512x512   |  16   |      0.6       |      18       |
+|   model name   |cards| resolution  | scheduler | steps |     s/step     |    s/video    |
+|:--------------:|:--:|:----------: |:---------:|:-----:|:--------------:|:-------------:|
+| AnimateDiff v2 |1|  512x512x16 |   DDIM    |  30   |      0.60      |      18.00    |
 
 - mindspore 2.2.10
 
-|   model name   | mindspore | scheduler | steps | resolution | frame | speed (step/s) | time(s/video) |
-|:--------------:|:---------:|:---------:|:-----:|:----------:|:-----:|:--------------:|:-------------:|
-| AnimateDiff v2 |  2.2.10   |   DDIM    |  30   |  512x512   |  16   |      1.2       |      25       |
+|   model name   | cards | resolution  | scheduler | steps |   s/step      |    s/video    |
+|:--------------:| :--: |:---------:|:-----:|:----------: |:--------------:|:-------------:|
+| AnimateDiff v2 | 1 |  512x512x16 |   DDIM    |  30   |      1.20      |      25.00     |
 
 ### Training
 
 - mindspore 2.3.1
 
-|             task             | image size | frames  | batch size | flash attention | jit level | step time(s) | train. imgs/s |
-|:----------------------------:|:----------:|:-------:|:----------:|:---------------:|:---------:|:------------:|:-------------:|
-|         MM training          |    512     |   16    |     1      |       ON        |    O0     |    1.320     |     0.75      |
-|         Motion Lora          |    512     |   16    |     1      |       ON        |    O0     |    1.566     |     0.638     |
-| MM training w/ Embed. cached |    512     |   16    |     1      |       ON        |    O0     |    1.004     |     0.996     |
-| Motion Lora w/ Embed. cached |    512     |   16    |     1      |       ON        |    O0     |    1.009     |     0.991     |
+|             method             |cards|  batch size| resolution    | flash attn | jit level |    s/step    |    img/s    |
+|:----------------------------:|:---:|:----------:|:----------:  |:---------------:|:---------:|:------------:|:-------------:|
+|         MM training          |  1   |     1     |    16x512x512 |       ON        |    O0     |    1.320     |     0.75      |
+|         Motion Lora          |  1   |     1     |    16x512x512 |       ON        |    O0     |    1.566     |     0.64     |
+| MM training w/ Embed. cached |  1   |     1     |    16x512x512 |       ON        |    O0     |    1.004     |     0.99     |
+| Motion Lora w/ Embed. cached |  1   |     1     |    16x512x512 |       ON        |    O0     |    1.009     |     0.99     |
 
 - mindspore 2.2.10
 
-|             task             | image size | frames  | batch size | flash attention | jit level | step time(s) | train. imgs/s |
-|:----------------------------:|:----------:|:-------:|:----------:|:---------------:|:---------:|:------------:|:-------------:|
-|         MM training          |    512     |   16    |     1      |       OFF       |    NA     |     1.29     |     0.775     |
-|         Motion Lora          |    512     |   16    |     1      |       OFF       |    NA     |     1.26     |     0.794     |
-| MM training w/ Embed. cached |    512     |   16    |     1      |       ON        |    NA     |     0.75     |     1.333     |
-| Motion Lora w/ Embed. cached |    512     |   16    |     1      |       ON        |    NA     |     0.71     |     1.408     |
+|             method             |cards| batch size | resolution       | flash attn | jit level | s/step | img/s |
+|:----------------------------:|:---:|:----------:|:----------:      |:---------------:|:---------:|:------------:|:-------------:|
+|         MM training          |  1   |     1     |    16x512x512     |       OFF       |    N/A     |     1.29     |     0.78     |
+|         Motion Lora          |  1   |     1     |    16x512x512     |       OFF       |    N/A     |     1.26     |     0.79     |
+| MM training w/ Embed. cached |  1   |     1     |    16x512x512     |       ON        |    N/A     |     0.75     |     1.33     |
+| Motion Lora w/ Embed. cached |  1   |     1     |    16x512x512     |       ON        |    N/A     |     0.71     |     1.49     |
 
 > MM training: Motion Module training.
 

diff --git a/examples/detect_chatgpt/README.md b/examples/detect_chatgpt/README.md
diff --git a/examples/detect_chatgpt/config.json b/examples/detect_chatgpt/config.json
diff --git a/examples/detect_chatgpt/config_zh.json b/examples/detect_chatgpt/config_zh.json