Skip to content

Commit

Permalink
Update main & examples readme (#737)
Browse files Browse the repository at this point in the history
* update readme

* fix format

* fix linting

* update

* fix typo

* update readme

* update readme

* fix linting

* rm detect_chatgpt, tango, tuneavideo

* Unify v020 readmes (#730)

* Unify v020 readmes (#730)

Update docs for pku-opensora

* update pixart sigma readme

* fix typo

* update

* rm redundency

---------

Co-authored-by: fzilan <[email protected]>
  • Loading branch information
SamitHuang and Fzilan authored Nov 22, 2024
1 parent 931bde6 commit 0ad4991
Show file tree
Hide file tree
Showing 95 changed files with 249 additions and 143,003 deletions.
56 changes: 36 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,43 @@
# MindONE

This repository contains SoTA algorithms, models, and interesting projects in the area of multimodal understanding and content generation
This repository contains SoTA algorithms, models, and interesting projects in the area of multimodal understanding and content generation.

ONE is short for "ONE for all"

## News
- [2024.11.06] MindONE [v0.2.0](https://github.com/mindspore-lab/mindone/releases/tag/v0.2.0) is released

## Quick tour

To install MindONE v0.2.0, please install [MindSpore 2.3.1](https://www.mindspore.cn/install) and run `pip install mindone`

Alternatively, to install the latest version from the `master` branch, please run.
```
git clone https://github.com/mindspore-lab/mindone.git
cd mindone
pip install -e .
```

We support state-of-the-art diffusion models for generating images, audio, and video. Let's get started using [Stable Diffusion 3](https://huggingface.co/stabilityai/stable-diffusion-3-medium) as an example.

**Hello MindSpore** from **Stable Diffusion 3**!

<div>
<img src="https://github.com/townwish4git/mindone/assets/143256262/8c25ae9a-67b1-436f-abf6-eca36738cd17" alt="sd3" width="512" height="512">
</div>

- [mindone/diffusers](mindone/diffusers) now supports [Stable Diffusion 3](https://huggingface.co/stabilityai/stable-diffusion-3-medium). Give it a try yourself!

```py
import mindspore
from mindone.diffusers import StableDiffusion3Pipeline
```py
import mindspore
from mindone.diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained(
"stabilityai/stable-diffusion-3-medium-diffusers",
mindspore_dtype=mindspore.float16,
)
prompt = "A cat holding a sign that says 'Hello MindSpore'"
image = pipe(prompt)[0][0]
image.save("sd3.png")
```
pipe = StableDiffusion3Pipeline.from_pretrained(
"stabilityai/stable-diffusion-3-medium-diffusers",
mindspore_dtype=mindspore.float16,
)
prompt = "A cat holding a sign that says 'Hello MindSpore'"
image = pipe(prompt)[0][0]
image.save("sd3.png")
```

### supported models under mindone/examples
| model | features
Expand All @@ -37,23 +50,26 @@ ONE is short for "ONE for all"
| [pllava](https://github.com/mindspore-lab/mindone/blob/master/examples/pllava) | working on it |
| [hpcai open sora](https://github.com/mindspore-lab/mindone/blob/master/examples/opensora_hpcai) | support v1.0/1.1/1.2 large scale training with dp/sp/zero |
| [open sora plan](https://github.com/mindspore-lab/mindone/blob/master/examples/opensora_pku) | support v1.0/1.1/1.2 large scale training with dp/sp/zero |
| [stable diffusion](https://github.com/mindspore-lab/mindone/blob/master/examples/stable_diffusion_v2) | support sd 1.5/2.0/2.1, vanilla fine tune, lora, dreambooth, text inversion|
| [stable diffusion xl](https://github.com/mindspore-lab/mindone/blob/master/examples/stable_diffusion_xl) |support sai style(stability AI) vanilla fine tune, lora, dreambooth |
| [dit](https://github.com/mindspore-lab/mindone/blob/master/examples/dit) | support text to image fine tune |
| [latte](https://github.com/mindspore-lab/mindone/blob/master/examples/latte) | support uncondition text to image fine tune |
| [stable diffusion](https://github.com/mindspore-lab/mindone/blob/master/examples/stable_diffusion_v2) | support sd 1.5/2.0/2.1, vanilla fine-tune, lora, dreambooth, text inversion|
| [stable diffusion xl](https://github.com/mindspore-lab/mindone/blob/master/examples/stable_diffusion_xl) |support sai style(stability AI) vanilla fine-tune, lora, dreambooth |
| [dit](https://github.com/mindspore-lab/mindone/blob/master/examples/dit) | support text to image fine-tune |
| [latte](https://github.com/mindspore-lab/mindone/blob/master/examples/latte) | support unconditional text to image fine-tune |
| [animate diff](https://github.com/mindspore-lab/mindone/blob/master/examples/animatediff) | support motion module and lora training |
| [video composer](https://github.com/mindspore-lab/mindone/tree/master/examples/videocomposer) | support conditional video generation with motion transfer and etc.|
| [ip adapter](https://github.com/mindspore-lab/mindone/blob/master/examples/ip_adapter) | refactoring |
| [t2i-adapter](https://github.com/mindspore-lab/mindone/blob/master/examples/t2i_adapter) | refactoring |
| [dynamicrafter](https://github.com/mindspore-lab/mindone/blob/master/examples/dynamicrafter) | support image to video generation |
| [hunyuan_dit](https://github.com/mindspore-lab/mindone/blob/master/examples/hunyuan_dit) | support text to image fine-tune |
| [pixart_sigma](https://github.com/mindspore-lab/mindone/blob/master/examples/pixart_sigma) | support text to image fine-tune at different aspect ratio |

### run hf diffusers on mindspore
mindone diffusers is under active development, most tasks were tested with mindspore 2.2.10 and ascend 910 hardware.
mindone diffusers is under active development, most tasks were tested with mindspore 2.3.1 and ascend 910 hardware.

| component | features
| :--- | :--
| [pipeline](https://github.com/mindspore-lab/mindone/tree/master/mindone/diffusers/pipelines) | support text2image,text2video,text2audio tasks 30+
| [models](https://github.com/mindspore-lab/mindone/tree/master/mindone/diffusers/models) | support audoencoder & transformers base models same as hf diffusers
| [schedulers](https://github.com/mindspore-lab/mindone/tree/master/mindone/diffusers/schedulers) | support ddpm & dpm solver 10+ schedulers same as hf diffusers

#### TODO
* [ ] mindspore 2.3.0 version adaption
* [ ] hf diffusers 0.30.0 version adaption
5 changes: 4 additions & 1 deletion examples/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
### multi modal understanding and generation model examples supported by mindone
### multi-modal understanding and generation model examples supported by mindone
| project | introduction | original repo
| :--- | :-- | :-
| [mindone.diffusers](https://github.com/mindspore-lab/mindone/blob/master/examples/diffusers) | run hf diffusers on mindspore | https://github.com/huggingface/diffusers |
Expand All @@ -21,3 +21,6 @@
| [llava](https://github.com/mindspore-lab/mindone/blob/master/examples/llava) | Haotian-Liu official | https://github.com/haotian-liu/LLaVA
| [vila](https://github.com/mindspore-lab/mindone/blob/master/examples/vila) | Nvidia Lab official | https://github.com/NVlabs/VILA
| [pllava](https://github.com/mindspore-lab/mindone/blob/master/examples/pllava) | Magic Research official | https://github.com/magic-research/PLLaVA
| [dynamicrafter](https://github.com/mindspore-lab/mindone/blob/master/examples/dynamicrafter) | Tencent Research official | https://github.com/Doubiiu/DynamiCrafter
| [hunyuan_dit](https://github.com/mindspore-lab/mindone/blob/master/examples/hunyuan_dit) | Tencent Research official | https://github.com/Tencent/HunyuanDiT
| [pixart_sigma](https://github.com/mindspore-lab/mindone/blob/master/examples/pixart_sigma) | Noah Lab official | https://github.com/PixArt-alpha/PixArt-sigma
38 changes: 19 additions & 19 deletions examples/animatediff/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -252,41 +252,41 @@ Here are some generation results after lora fine-tuning on 512x512 resolution an

## Performance (AnimateDiff v2)

Experiments are tested on ascend 910* graph mode with a single card.
Experiments are tested on ascend 910* graph mode.

### Inference

- mindspore 2.3.1

| model name | mindspore | scheduler | steps | resolution | frame | speed (step/s) | time(s/video) |
|:--------------:|:---------:|:---------:|:-----:|:----------:|:-----:|:--------------:|:-------------:|
| AnimateDiff v2 | 2.3.1 | DDIM | 30 | 512x512 | 16 | 0.6 | 18 |
| model name |cards| resolution | scheduler | steps | s/step | s/video |
|:--------------:|:--:|:----------: |:---------:|:-----:|:--------------:|:-------------:|
| AnimateDiff v2 |1| 512x512x16 | DDIM | 30 | 0.60 | 18.00 |

- mindspore 2.2.10

| model name | mindspore | scheduler | steps | resolution | frame | speed (step/s) | time(s/video) |
|:--------------:|:---------:|:---------:|:-----:|:----------:|:-----:|:--------------:|:-------------:|
| AnimateDiff v2 | 2.2.10 | DDIM | 30 | 512x512 | 16 | 1.2 | 25 |
| model name | cards | resolution | scheduler | steps | s/step | s/video |
|:--------------:| :--: |:---------:|:-----:|:----------: |:--------------:|:-------------:|
| AnimateDiff v2 | 1 | 512x512x16 | DDIM | 30 | 1.20 | 25.00 |

### Training

- mindspore 2.3.1

| task | image size | frames | batch size | flash attention | jit level | step time(s) | train. imgs/s |
|:----------------------------:|:----------:|:-------:|:----------:|:---------------:|:---------:|:------------:|:-------------:|
| MM training | 512 | 16 | 1 | ON | O0 | 1.320 | 0.75 |
| Motion Lora | 512 | 16 | 1 | ON | O0 | 1.566 | 0.638 |
| MM training w/ Embed. cached | 512 | 16 | 1 | ON | O0 | 1.004 | 0.996 |
| Motion Lora w/ Embed. cached | 512 | 16 | 1 | ON | O0 | 1.009 | 0.991 |
| method |cards| batch size| resolution | flash attn | jit level | s/step | img/s |
|:----------------------------:|:---:|:----------:|:----------: |:---------------:|:---------:|:------------:|:-------------:|
| MM training | 1 | 1 | 16x512x512 | ON | O0 | 1.320 | 0.75 |
| Motion Lora | 1 | 1 | 16x512x512 | ON | O0 | 1.566 | 0.64 |
| MM training w/ Embed. cached | 1 | 1 | 16x512x512 | ON | O0 | 1.004 | 0.99 |
| Motion Lora w/ Embed. cached | 1 | 1 | 16x512x512 | ON | O0 | 1.009 | 0.99 |

- mindspore 2.2.10

| task | image size | frames | batch size | flash attention | jit level | step time(s) | train. imgs/s |
|:----------------------------:|:----------:|:-------:|:----------:|:---------------:|:---------:|:------------:|:-------------:|
| MM training | 512 | 16 | 1 | OFF | NA | 1.29 | 0.775 |
| Motion Lora | 512 | 16 | 1 | OFF | NA | 1.26 | 0.794 |
| MM training w/ Embed. cached | 512 | 16 | 1 | ON | NA | 0.75 | 1.333 |
| Motion Lora w/ Embed. cached | 512 | 16 | 1 | ON | NA | 0.71 | 1.408 |
| method |cards| batch size | resolution | flash attn | jit level | s/step | img/s |
|:----------------------------:|:---:|:----------:|:----------: |:---------------:|:---------:|:------------:|:-------------:|
| MM training | 1 | 1 | 16x512x512 | OFF | N/A | 1.29 | 0.78 |
| Motion Lora | 1 | 1 | 16x512x512 | OFF | N/A | 1.26 | 0.79 |
| MM training w/ Embed. cached | 1 | 1 | 16x512x512 | ON | N/A | 0.75 | 1.33 |
| Motion Lora w/ Embed. cached | 1 | 1 | 16x512x512 | ON | N/A | 0.71 | 1.49 |

> MM training: Motion Module training.
Expand Down
2 changes: 0 additions & 2 deletions examples/detect_chatgpt/README.md

This file was deleted.

29 changes: 0 additions & 29 deletions examples/detect_chatgpt/config.json

This file was deleted.

35 changes: 0 additions & 35 deletions examples/detect_chatgpt/config_zh.json

This file was deleted.

Loading

0 comments on commit 0ad4991

Please sign in to comment.