Add-SD: Rational Generation without Manual Reference

This is the official repository with PyTorch implementation of Add-SD: Rational Generation without Manual Reference.

☀️ If you find this work useful for your research, please kindly star our repo and cite our paper! ☀️

Catalogue

1. Introduction
2. Method
3. Overall Pipeline
4. Main Results
5. References
6. Citation

1. Introduction

We propose Add-SD, a novel visual generation method for instruction-based object addition, demonstrating significant advancements in seamlessly integrating objects into realistic scenes using only textual instructions.

2. Mehtod

Add-SD consists of three essential stages to complete the object addition task:

Creating image pairs by removing objects,
Fine-tuning Add-SD,
Generating synthetic data for downstream tasks.

Main architecture of Add-SD.

3. Overall Pipeline

Step 0: Creating image pairs by removing objects

Follow the instructions in Inpaint-Anything repository to install the necessary dependencies.
Download pretrained models, including sam_vit_h_4b8939 and big-lama, into the pretrained directory
Navigate to the 0_inpaint_anything directory and run the script to process COCO and LVIS data:

cd 0_inpaint_anything
sh script/remove_anything_with_GTbox.sh ### containing both COCO and LVIS

Step 1: Fine-tuning Add-SD

Follow the installation instructions in the instruct-pix2pix repository.
Download pretrained model, v1-5-pruned-emaonly.ckpt, in the pretrained directory.
Download the required JSON files and organize them as follows:

1_AddSD/data/
  ├── json/
     ├── seeds_coco_multi_vanilla.json
     ├── seeds_coco_multi_vanilla.json
     ├── seeds_lvis_vanilla.json
     ├── seeds_lvis_multi_vanilla.json
     ├── seeds_refcoco_vanilla.json
     ├── seeds_vg_vanilla.json
     └── seeds_vgcut_vanilla.json

(Optional) If you want to make your own datasets, conduct the following steps:

cd 1_AddSD
python utils/gen_train_data_annos.py

Train Add-SD

cd 1_AddSD
python run_train.sh

Make sure place the datasets, such as COCO, LVIS, VG, VGCUT, RefCOCO, RefCOCO+, and RefCOCOg, in the data directory with the following structure:

1_AddSD/data/
  ├── coco/
     ├── train2017/
     ├── val2017/
     ├── train2017_remove_image/  ## coco single object remove datasets
     ├── train2017_remove_image_multiobj/  ## coco multiple objects remove datasets
     ├── lvis_remove_image/  ## lvis single object remove datasets
     ├── lvis_remove_image_multiobj/  ## lvis multiple objects remove datasets
     └── annotations/
        ├── instances_train2017.json
        └── instances_val2017.json
  ├── lvis/
     ├── lvis_v1_train.json
     └── lvis_v1_val.json
  ├── refcoco/
     ├── refcoco/
     	└── instances.json
     ├── refcoco+/
     	└── instances.json
     ├── refcocog/
     	└── instances.json
  ├── refcoco_remove/
  ├── vg/
     ├── images/
     ├── metas/
     ├── caption_vg_all.json
     └── caption_vg_train.json
  ├── vg_remove/
  ├── vgcut/
     ├── refer_train.json
     ├── refer_val.json
     ├── refer_input_train.json
     └── refer_input_val.json
  └── vgcut_remove/

Generating synthetic data

Download the pretrained models from Google Drive.

Run the dataset generation script:

cd 1_AddSD
sh utils/gen_datasets.sh

Here are examples of generation on COCO and LVIS datasets.

COCO object generation

python edit_cli_datasets.py --config configs/generate.yaml \
-n $NNODES -nr $NODE_RANK --addr $ADDR --port $PORT --input $INPUT --output $OUTPUT --ckpt $MODEL --seed $SEED \

By default, use super-label-based sampling strategy to restrict the category of the added object. If do not use it, please add --no_superlabel parameter.
By default, generate single object. If want to generate multiple objects, please add --multi parameter.

LVIS object generation

python edit_cli_datasets.py --config configs/generate.yaml -n $NNODES -nr $NODE_RANK --addr $ADDR --port $PORT --input $INPUT --output $OUTPUT --ckpt $MODEL --seed $SEED \
--is_lvis --lvis_label_selection r

Need to add --is_lvis parameter to generate on LVIS dataset.
By default, add object with rare classes. If want to use common or frequent classes, please change --lvis_label_selection f c r parameter, where f, c, r represents frequent, common, rare class, respectively.

Step 2: Postprocessing synthetic data to localize the added objects from Add-SD

Follow the installation instructions in the GroundingDINO repository.
Download pretrained model, groundingdino_swinb_cogcoor.pth, in the pretrained directory.
Navigate to the 2_grounding_dino directory and run the inference script:

cd 2_grounding_dino
sh run_infer_with_GT_for_AddSD.sh

Step 3: Train detectors with original data and synthetic data from Add-SD

Follow the installation instructions in the XPaste repository.
Navigate to the 3_XPaste directory and run the inference script:

cd 3_XPaste
sh train.sh

4. Main Results

Visualization on image editing.

Visualization under different instructions.

5. References

Our project is conducted based on the following public paper with code:

Inpaint-Anything
GroundingDINO
instruct-pix2pix
XPaste

6. Citation

If you find this code useful in your research, please kindly consider citing our paper:

    @article{yang2024add,
        title={Add-SD: Rational Generation without Manual Reference},
        author={Yang, Lingfeng and Zhang, Xinyu and Li, Xiang and Chen, Jinwen and Yao, Kun and Zhang, Gang and Liu, Lingqiao and Wang, Jingdong and Yang, Jian},
        journal={arXiv preprint arXiv},
        year={2024}
    }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Add-SD: Rational Generation without Manual Reference

Catalogue

1. Introduction

2. Mehtod

3. Overall Pipeline

Step 0: Creating image pairs by removing objects

Step 1: Fine-tuning Add-SD

Step 2: Postprocessing synthetic data to localize the added objects from Add-SD

Step 3: Train detectors with original data and synthetic data from Add-SD

4. Main Results

5. References

6. Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Add-SD: Rational Generation without Manual Reference

Catalogue

1. Introduction

2. Mehtod

3. Overall Pipeline

Step 0: Creating image pairs by removing objects

Step 1: Fine-tuning Add-SD

Step 2: Postprocessing synthetic data to localize the added objects from Add-SD

Step 3: Train detectors with original data and synthetic data from Add-SD

4. Main Results

5. References

6. Citation