Nexus-Gen: A Unified Model for Image Understanding, Generation, and Editing

TODO

Release the train and inference code.
Release the model checkpoint.
Release the technical report.
Release the training datasets.

What is Nexus-Gen

Nexus-Gen is a unified model that synergizes the language reasoning capabilities of LLMs with the image synthesis power of diffusion models. To align the embedding space of the LLM and diffusion model, we conduct a dual-phase alignment training process. (1) The autoregressive LLM learns to predict image embeddings conditioned on multimodal inputs, while (2) the vision decoder is trained to reconstruct high-fidelity images from these embeddings. During training the LLM, we identified a critical discrepancy between the autoregressive paradigm's training and inference phases, where error accumulation in continuous embedding space severely degrades generation quality. To avoid this issue, we introduce a prefilled autoregression strategy that prefills input sequence with position-embedded special tokens instead of continuous embeddings. Through dual-phase training, Nexus-Gen has developed the integrated capability to comprehensively address the image understanding, generation and editing tasks as follows.

Getting Started

Installation

Install DiffSynth-Studio from source:

git clone https://github.com/modelscope/DiffSynth-Studio.git
cd DiffSynth-Studio
pip install -e .

Install requirements

pip install -r requirements.txt

Install ms-swift if you want to perform finetuning on Nexus-Gen.

pip install ms-swift -U

Prepare models

python download_models.py

Image Understanding

python image_understanding.py

Image Generation

image generation with detailed prompt.

python image_generation.py

Polish prompt and generate images with Nexus-Gen.

image_generation_with_selfpolish.py

Image Editing

python image_editing.py

Gradio demo

python app.py

Training Codes

Nexus-Gen is trained base on ms-swift and DiffSynth-Studio. You can find the training scripts in train/scripts/train_decoder.sh and train_llm.sh.

Citation

@article{zhang2025nexus-gen,
      title={Nexus-Gen: A Unified Model for Image Understanding, Generation, and Editing}, 
      author={Hong Zhang and Zhongjie Duan and Xingjun Wang and Yingda Chen and Yuze Zhao and Yu Zhang},
      journal={arXiv preprint arXiv:2504.21356},
      year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nexus-Gen: A Unified Model for Image Understanding, Generation, and Editing

TODO

What is Nexus-Gen

Getting Started

Installation

Prepare models

Image Understanding

Image Generation

Image Editing

Gradio demo

Training Codes

Citation

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
assets		assets
inference		inference
modeling		modeling
train		train
.gitignore		.gitignore
.style.yapf		.style.yapf
LICENSE		LICENSE
README.md		README.md
app.py		app.py
download_models.py		download_models.py
image_editing.py		image_editing.py
image_generation.py		image_generation.py
image_generation_with_selfpolish.py		image_generation_with_selfpolish.py
image_understanding.py		image_understanding.py
requirements.txt		requirements.txt

License

modelscope/Nexus-Gen

Folders and files

Latest commit

History

Repository files navigation

Nexus-Gen: A Unified Model for Image Understanding, Generation, and Editing

TODO

What is Nexus-Gen

Getting Started

Installation

Prepare models

Image Understanding

Image Generation

Image Editing

Gradio demo

Training Codes

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages