News and ToDo List

Improve latent space training skills (For fair comparison with previous methods, we train from scratch on COCO-stuff, not finetuned from Stable Diffusion)
Release the pretrained LayoutDiffusion on latent space !!!COMING SOON!!!
Improve README and code usage instructions
Clean up code
Code for Training on Latent Space using AutoEncoderKL
Release tools for evaluation
2023-04-09: Release pre-trained model
2023-04-09: Release instructions for environment and training
2023-04-09: Release Gradio Webui Demo
2023-03-30: Publish complete code
2023-02-27: Accepted by CVPR2023
2022-11-11: Submitted to CVPR2023
2022-07-08: Publish initial code

Introduction

This repository is the official implementation of CVPR2023: LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation.

papers with code
arxiv
cvpr open access paper pdf
cvpr open access supplement pdf

The code is heavily based on openai/guided-diffusion, with the following modifications:

Added support for Distributed Training of PyTorch.
Added support for OmegaConfig in ./configs for easy control
Added support for layout-to-image generation by introducing a layout encoder (layout fusion module or LFM) and object-aware cross-attention (OaCA).

Gradio Webui Demo

Pipeline

Visualizations on COCO-stuff

Setup Environment

conda create -y -n LayoutDiffusion python=3.8
conda activate LayoutDiffusion

conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
conda install -y imageio==2.9.0
pip install omegaconf opencv-python h5py==3.2.1 gradio==3.38.0 
# try '''pip install -U gradio''' when meeting bugs
pip install -e ./repositories/dpm_solver

python setup.py build develop

Gradio Webui Demo (No need for setup of dataset)

  python scripts/launch_gradio_app.py  \
  --config_file configs/COCO-stuff_256x256/LayoutDiffusion_large.yaml \
  sample.pretrained_model_path=./pretrained_models/COCO-stuff_256x256_LayoutDiffusion_large_ema_1150000.pt

add '--share' after '--config_file XXX' to allow for remote link share

Setup Dataset

See here

Pretrained Models

Dataset	Resolution	steps, FID (Sample imgs x times)	Link (TODO)
COCO-Stuff 2017 segmentation challenge (deprecated coco-stuff, not full coco-stuff)	256 x 256	steps=25 FID=15.61 ( 3097 x 5 ) FID=31.68 ( 2048 x 1 )	Google drive
COCO-Stuff 2017 segmentation challenge (deprecated coco-stuff, not full coco-stuff)	256 x 256	waiting	Google drive
COCO-Stuff 2017 segmentation challenge (deprecated coco-stuff, not full coco-stuff)	128 x 128	steps=25 FID=16.57 ( 3097 x 5 )	Google drive
VG	256 x 256	steps=25 FID=15.63 ( 5097 x 1 )	Google drive
VG	128 x 128	steps=25 FID=16.35 ( 5097 x 1 )	Google drive

Training on Latent Space

download the first stage model vae-8

    cd pretrained_models
    git clone https://huggingface.co/stabilityai/sd-vae-ft-ema
    cd sd-vae-ft-ema
    wget https://huggingface.co/stabilityai/sd-vae-ft-ema/resolve/main/diffusion_pytorch_model.bin -O diffusion_pytorch_model.bin
    wget https://huggingface.co/stabilityai/sd-vae-ft-ema/resolve/main/diffusion_pytorch_model.safetensors -O diffusion_pytorch_model.safetensors
    pip install --upgrade diffusers[torch]

python -m torch.distributed.launch \
       --nproc_per_node 8 \
       scripts/image_train_for_layout.py \
       --config_file ./configs/COCO-stuff_256x256/latent_LayoutDiffusion_large.yaml

Training on Image Space

python -m torch.distributed.launch \
       --nproc_per_node 8 \
       scripts/image_train_for_layout.py \
       --config_file ./configs/COCO-stuff_256x256/LayoutDiffusion_large.yaml

Sampling

pip install --upgrade diffusers[torch]

bash/quick_sample.bash for quick sample
bash/sample.bash for sample entire test dataset

Evaluation

[Important] In each metrics, you should first configure the environment according to the specified repo.

FID

Fr‘echet Inception Distance (FID) were evaluated by using TTUR.

After sampling, using the following command to measure the FID score:

CUDA_VISIBLE_DEVICES=0 python fid.py path/to/generated_imgs path/to/gt_imgs --gpu 0

IS

Inception Score (IS) were evaluated by using Improved-GAN.

After sampling, using the following command to measure the IS:

cd inception_score
CUDA_VISIBLE_DEVICES=0 python model.py --path path/to/generated_imgs

DS

Diversity Score (DS) were evaluated by using PerceptualSimilarity.

We modified lpips_2dirs.py to make it easier to calculate the mean and variance of DS automatically, please refer this.

After sampling, using the following command to measure the IS:

CUDA_VISIBLE_DEVICES=0 python lpips_2dirs.py -d0 path/to/generated_imgs_0 -d1 path/to/generated_imgs_1 -o imgs/example_dists.txt --use_gpu

YOLO Score

YOLO Score were evaluated by using LAMA.

Since we filter the objects and images in datasets, we think it is better to evaluate bbox mAP only on filtered annotations. So we modified test.py to measure YOLO Score both on full annotations(using instances_val2017.json in coco dataset) and filtered annotations.

After sampling, using the following command to measure the YOLO Score:

cd yolo_experiments
cd data
CUDA_VISIBLE_DEVICES=0 python test.py --image_path path/to/generated_imgs

CAS

Classification Score (CAS) were evaluated by using pytorch_image_classification.

We crop the GT box area of images and resize objects at a resolution of 32×32 with their class. Then train a ResNet101 classifier with cropped images on generated images and test it on cropped images on real images. Finally, measuring CAS using the generated images.

CUDA_VISIBLE_DEVICES=0 python evaluate.py --config configs/test.yaml

You should configure the ckpt path and dataset info in configs/test.yaml.

For beginner

The field of layout-to-image generation is related to scenegraph-to-image generation and remained some confusing issues. You could refer to issues like:

the deprecated coco-stuff 2017
FID, IS, LPIPS, CAS of LostGAN-v2
IS, FID, LPIPS, CAS of Grid2Im
IS, SceneIS, FID, SceneFID, LPIPS, CAS of AttrLostGAN

However, it is recommended to ignore the confusing history and follow the latest LDM, Frido to work on a relatively new benchmark.

Cite

@InProceedings{Zheng_2023_CVPR,
    author    = {Zheng, Guangcong and Zhou, Xianpan and Li, Xuewei and Qi, Zhongang and Shan, Ying and Li, Xi},
    title     = {LayoutDiffusion: Controllable Diffusion Model for Layout-to-Image Generation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {22490-22499}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

News and ToDo List

Introduction

Gradio Webui Demo

Pipeline

Visualizations on COCO-stuff

Setup Environment

Gradio Webui Demo (No need for setup of dataset)

Setup Dataset

Pretrained Models

Training on Latent Space

Training on Image Space

Sampling

Evaluation

FID

IS

DS

YOLO Score

CAS

For beginner

Cite

Files

README.md

Latest commit

History

README.md

File metadata and controls

News and ToDo List

Introduction

Gradio Webui Demo

Pipeline

Visualizations on COCO-stuff

Setup Environment

Gradio Webui Demo (No need for setup of dataset)

Setup Dataset

Pretrained Models

Training on Latent Space

Training on Image Space

Sampling

Evaluation

FID

IS

DS

YOLO Score

CAS

For beginner

Cite