diff --git a/examples/moviegen/README.md b/examples/moviegen/README.md index 182aca797f..58dbe232f5 100644 --- a/examples/moviegen/README.md +++ b/examples/moviegen/README.md @@ -1,55 +1,271 @@ -# Movie Gen Video based on MindSpore +# Movie Gen based on MindSpore -This project is built on the [Movie Gen](https://arxiv.org/abs/2410.13720) paper by Meta for video generation, personalization, and editing. We aim to explore an efficient implementation based on MindSpore and Ascend NPUs. See our [report](docs/report.md) for more details. +This repository implements the [Movie Gen](https://arxiv.org/abs/2410.13720) model presented by Meta. + +Movie Gen is a family of foundation models that can natively generate high-fidelity images and videos +while also possessing the abilities to edit and personalize the videos. + +We aim to explore an efficient implementation based on MindSpore and Ascend NPUs. +See our [report](docs/report.md) for more details. ## ๐Ÿ“‘ Development Plan -This project is in an early stage and under active development. We welcome the open-source community to contribute to this project! +This project is in an early stage and under active development. We welcome the open-source community to contribute to +this project! - Temporal Autoencoder (TAE) - - [x] Inference - - [x] Training -- MovieGenVideo-5B (T2I/V) - - [x] Inference - - [x] Training stage 1: T2I 256px - - [x] Training stage 2: T2I/V 256px 256frames - - [ ] Training stage 3: T2I/V 768px 256frames (under training) - - [x] Web Demo (Gradio) -- MovieGenVideo-30B (T2I/V) - - [x] Inference - - [ ] Mixed parallelism training (support DP+SP+CP+TP+MP+Zero3, under training) -- Personalized-MovieGenVideo (PT2V) - - [ ] Inference - - [ ] Training -- MovieGen-Edit - - [ ] Inference - - [ ] Training + - [x] Inference + - [x] Training +- Movie Gen 5B (T2I/V) + - [x] Inference + - [x] Training stage 1: T2I 256px + - [x] Training stage 2: T2I/V 256px 256frames + - [ ] Training stage 3: T2I/V 768px 256frames (under verification) + - [x] Web Demo (Gradio) +- Movie Gen 30B (T2I/V) + - [x] Inference + - [x] Mixed parallelism training (support Ulysses-SP + ZeRO-3) + - [x] Training stage 1: T2I 256px + - [x] Training stage 2: T2V 256px 256frames + - [ ] Training stage 3: T2I/V 768px 256frames +- Training with Buckets + - [ ] Support variable resolutions and aspect ratios + - [ ] Support variable number of frames +- Video Personalization (PT2V) + - [ ] Inference + - [ ] Training +- Video Editing + - [ ] Inference + - [ ] Training +- Video Super-Resolution + - [ ] Inference + - [ ] Training + +## Demo + +| 256x256x455 | 256x256x455 | +|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| +|