Skip to content

xinchengshuai/Awesome-Image-Editing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome PR's Welcome

A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models

Xincheng Shuai · Henghui Ding · Xingjun Ma · Rongcheng Tu · Yu-Gang Jiang · Dacheng Tao ·

arXiv PDF


This repo is used for recording and tracking recent multimodal-guided image editing methods with T2I models, as a supplement to our survey.
If you find any work missing or have any suggestions, feel free to pull requests. We will add the missing papers to this repo ASAP.

🔥News

[1] We have uploaded our evaluation dataset!!

🔥Highlight!!

[1] Two concurrent works (Huang et al., Cao et al.) are related to our survey. Huang et al. introduce the application of diffusion models in image editing, while Cao et al. focus on the controllable image generation. Compared to the review from Huang et al. and other previous literature, we investigate the image editing in a more general context. Our discussion extends beyond low-level semantics, encompassing customization tasks that align with our topic. We integrate existing general editing methods into a unified framework and provide a design space for users through qualitative and quantitative analyses.

[2] In this repo, we organize the reviewed methods based on the editing task and present their inversion & editing algorithms along with guidance set. It's worth noting that many of these studies employ multiple editing algorithms simultaneously. For simplicity, we have currently only indicate the primary technology they use.

[3] We hope our work will assist researchers in exploring novel combinations within our framework, thereby enhancing performance in challenging scenarios.

Editing Tasks Discussed in Our Survey

image

Unified Framework

image

Notation

Inversion Algorithm:

  • $F_{inv}^{T}$: Tuning-Based Inversion.
  • $F_{inv}^{F}$: Forward-Based Inversion.

Editing Algorithm:

  • $F_{edit}^{Norm}$: Normal Editing.
  • $F_{edit}^{Attn}$: Attention-Based Editing.
  • $F_{edit}^{Blend}$: Blending-Based Editing.
  • $F_{edit}^{Score}$: Score-Based Editing.
  • $F_{edit}^{Optim}$: Optimization-Based Editing.

Table of contents

Content-Aware Editing


Content-Free Editing


Experiment and Data


Object and Attribute Manipulation

1. Training-Free Approaches

Publication Paper Title Guidance Set Combination Code/Project
TOG 2023 UniTune: Text-Driven Image Editing by Fine Tuning a Diffusion Model on a Single Image text $F_{inv}^T+F_{edit}^{Norm}$ Code
CVPR 2024 Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation instruction $F_{inv}^T+F_{edit}^{Attn}$ Code
CVPR 2023 Imagic: Text-Based Real Image Editing with Diffusion Models text $F_{inv}^T+F_{edit}^{Blend}$ Code
Arxiv 2023 Forgedit: Text Guided Image Editing via Learning and Forgetting text $F_{inv}^T+F_{edit}^{Blend}$ Code
CVPR 2024 Doubly Abductive Counterfactual Inference for Text-based Image Editing text $F_{inv}^T+F_{edit}^{Blend}$ Code
CVPR 2024 ZONE: Zero-Shot Instruction-Guided Local Editing instruction $F_{inv}^T+F_{edit}^{Blend}$ Code
CVPR 2023 SINE: Sinle Image Editing with Text-to-Image Diffusion Models text $F_{inv}^T+F_{edit}^{Score}$ Code
CVPR 2023 EDICT: Exact Diffusion Inversion via Coupled Transformations text $F_{inv}^F+F_{edit}^{Norm}$ Code
Arxiv 2023 Exact Diffusion Inversion via Bi-directional Integration Approximation text $F_{inv}^F+F_{edit}^{Norm}$ Code
CVPR 2023 Null-text Inversion for Editing Real Images using Guided Diffusion Models text $F_{inv}^F+F_{edit}^{Attn}$ Code
Arxiv 2023 Negative-prompt Inversion: Fast Image Inversion for Editing with Text-guided Diffusion Models text $F_{inv}^F+F_{edit}^{Attn}$ Code
Arxiv 2023 Fixed-point Inversion for Text-to-image diffusion models text $F_{inv}^F+F_{edit}^{Attn}$ Code
NeurIPS 2023 Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing text $F_{inv}^F+F_{edit}^{Attn}$ Code
ICLR 2023 Prompt-to-Prompt Image Editing with Cross-Attention Control text $F_{inv}^F+F_{edit}^{Attn}$ Code
CVPR 2023 Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation text $F_{inv}^F+F_{edit}^{Attn}$ Code
Arxiv 2023 StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing text $F_{inv}^F+F_{edit}^{Attn}$ Code
WACV 2024 ProxEdit: Improving Tuning-Free Real Image Editing with Proximal Guidance text $F_{inv}^F+F_{edit}^{Attn}$ Code
ICLR 2024 PnP Inversion: Boosting Diffusion-based Editing with 3 Lines of Code text $F_{inv}^F+F_{edit}^{Attn}$ Code
CVPR 2024 An Edit Friendly DDPM Noise Space: Inversion and Manipulations text $F_{inv}^F+F_{edit}^{Attn}$ Code
CVPR 2024 Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing text $F_{inv}^F+F_{edit}^{Attn}$ Code
ICCV 2023 Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion Models text $F_{inv}^F+F_{edit}^{Blend}$ Code
ICLR 2023 DiffEdit: Diffusion-based semantic image editing with mask guidance text $F_{inv}^F+F_{edit}^{Blend}$ Code
Arxiv 2023 PFB-Diff: Progressive Feature Blending Diffusion for Text-driven Image Editing text $F_{inv}^F+F_{edit}^{Blend}$ Code
CVPR 2023 Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models text $F_{inv}^F+F_{edit}^{Blend}$ Code
ICLR 2024 Object-aware Inversion and Reassembly for Image Editing text $F_{inv}^F+F_{edit}^{Blend}$ Code
Arxiv 2022 The Stable Artist: Steering Semantics in Diffusion Latent Space text $F_{inv}^F+F_{edit}^{Score}$ Code
SIGGRAPH 2023 Zero-shot Image-to-Image Translation text $F_{inv}^F+F_{edit}^{Score}$ Code
NeurIPS 2023 SEGA: Instructing Diffusion using Semantic Dimensions text $F_{inv}^F+F_{edit}^{Score}$ Code
ICCV 2023 Effective Real Image Editing with Accelerated Iterative Diffusion Inversion text $F_{inv}^F+F_{edit}^{Score}$ Code
Arxiv 2023 LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance text $F_{inv}^F+F_{edit}^{Score}$ Code
ICLR 2024 Noise Map Guidance: Inversion with Spatial Context for Real Image Editing text $F_{inv}^F+F_{edit}^{Score}$ Code
CVPR 2024 LEDITS++: Limitless Image Editing using Text-to-Image Models text $F_{inv}^F+F_{edit}^{Score}$ Code
ICLR 2024 Noise Map Guidance: Inversion with Spatial Context for Real Image Editing text $F_{inv}^F+F_{edit}^{Score}$ Code
ICLR 2024 Magicremover: Tuning-free Text-guided Image inpainting with Diffusion Models text $F_{inv}^F+F_{edit}^{Score}$ Code
Arxiv 2023 Region-Aware Diffusion for Zero-shot Text-driven Image Editing text $F_{inv}^F+F_{edit}^{Optim}$ Code
ICCV 2023 Delta Denoising Score text $F_{inv}^F+F_{edit}^{Optim}$ Code
CVPR 2024 Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing text $F_{inv}^F+F_{edit}^{Optim}$ Code
Arxiv 2024 Ground-A-Score: Scaling Up the Score Distillation for Multi-Attribute Editing text + mask $F_{inv}^F+F_{edit}^{Optim}$ Code
NeurIPS 2024 Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion Models text $F_{inv}^F+F_{edit}^{Optim}$ Code
CVPR 2023 Custom-Edit: Text-Guided Image Editing with Customized Diffusion Models text + image $F_{inv}^T+F_{inv}^F+F_{edit}^{Attn}$ Code
NeurIPS 2023 Photoswap: Personalized Subject Swapping in Images text + image $F_{inv}^T+F_{inv}^F+F_{edit}^{Attn}$ Code
TMLR 2023 DreamEdit: Subject-driven Image Editing text + image $F_{inv}^T+F_{inv}^F+F_{edit}^{Blend}$ Code

2. Training-Based Approaches

Publication Paper Title Guidance Set Code/Project
CVPR 2023 InstructPix2Pix: Learning to Follow Image Editing Instructions instruction Code
NeurIPS 2023 MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing instruction Code
Arxiv 2023 HIVE: Harnessing Human Feedback for Instructional Visual Editing instruction Code
Arxiv 2023 Emu Edit: Precise Image Editing via Recognition and Generation Tasks instruction Code
ICLR 2024 Guiding Instruction-Based Image Editing via Multimodal Large Language Models instruction Code
CVPR 2024 SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models instruction Code
CVPR 2024 Referring Image Editing: Object-level Image Editing via Referring Expressions instruction Code
Arxiv 2024 EditWorld: Simulating World Dynamics for Instruction-Following Image Editing instruction Code

Attribute Manipulation:

1. Training-Free Approaches

Publication Paper Title Guidance Set Combination Code/Project
PRCV 2023 KV Inversion: KV Embeddings Learning for Text-Conditioned Real Image Action Editing text $F_{inv}^F+F_{edit}^{Attn}$ Code
ICCV 2023 Localizing Object-level Shape Variations with Text-to-Image Diffusion Models text $F_{inv}^F+F_{edit}^{Attn}$ Code
ICCV 2023 MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing text $F_{inv}^F+F_{edit}^{Attn}$ Code
AAAI 2023 Tuning-Free Inversion-Enhanced Control for Consistent Image Editing text $F_{inv}^F+F_{edit}^{Attn}$ Code
SIGGRAPH 2024 Cross-Image Attention for Zero-Shot Appearance Transfer image $F_{inv}^F+F_{edit}^{Attn}$ Code

Spatial Transformation:

1. Training-Free Approaches

Publication Paper Title Guidance Set Combination Code/Project
Arxiv 2024 DesignEdit: Multi-Layered Latent Decomposition and Fusion for Unified & Accurate Image Editing user interface $F_{inv}^F+F_{edit}^{Blend}$ Code
NeurIPS 2023 Diffusion Self-Guidance for Controllable Image Generation text + image + user interface $F_{inv}^F+F_{edit}^{Score}$ Code
ICLR 2024 DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models image + user interface $F_{inv}^F+F_{edit}^{Score}$ Code
ICLR 2024 DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing mask + user interface $F_{inv}^T+F_{inv}^F+F_{edit}^{Optim}$ Code
ICLR 2024 DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing image + user interface $F_{inv}^T+F_{inv}^F+F_{edit}^{Score}$ Code

Inpainting:

1. Training-Free Approaches

Publication Paper Title Guidance Set Combination Code/Project
Arxiv 2023 HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models text + mask $F_{inv}^T+F_{edit}^{Attn}$ Code
ICCV 2023 TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition text + image $F_{inv}^F+F_{edit}^{Attn}$ Code
TOG 2023 Blended Latent Diffusion text + mask $F_{inv}^F+F_{edit}^{Blend}$ Code
Arxiv 2023 High-Resolution Image Editing via Multi-Stage Blended Diffusion text + mask $F_{inv}^F+F_{edit}^{Blend}$ Code
Arxiv 2023 Differential Diffusion: Giving Each Pixel Its Strength text + mask $F_{inv}^F+F_{edit}^{Blend}$ Code
CVPR 2024 Tuning-Free Image Customization with Image and Text Guidance text + image + mask $F_{inv}^F+F_{edit}^{Blend}$ Code
TMLR 2023 DreamEdit: Subject-driven Image Editing text + image +mask $F_{inv}^T+F_{inv}^F+F_{edit}^{Blend}$ Code)

2. Training-Based Approaches

Publication Paper Title Guidance Set Code/Project
CVPR 2024 Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting text + mask Code
CVPR 2023 SmartBrush: Text and Shape Guided Object Inpainting with Diffusion Model text + mask Code
Arxiv 2023 A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting text + mask Code
CVPR 2023 Paint by Example: Exemplar-based Image Editing with Diffusion Models image + mask Code
CVPR 2023 ObjectStitch: Object Compositing with Diffusion Model image + mask Code
CVPR 2023 Reference-based Image Composition with Sketch via Structure-aware Diffusion Model image + mask Code
ICASSP 2024 Paste, Inpaint and Harmonize via Denoising: Subject-Driven Image Editing with Pre-Trained Diffusion Model text+ image + mask Code
CVPR 2024 AnyDoor: Zero-shot Object-level Image Customization image + mask Code

Style Change:

1. Training-Free Approaches

Publication Paper Title Guidance Set Combination Code/Project
CVPR 2023 Inversion-Based Style Transfer with Diffusion Models text + image $F_{inv}^T+F_{inv}^F+F_{edit}^{Norm}$ Code
Arxiv 2023 Z∗: Zero-shot Style Transfer via Attention Rearrangement image $F_{inv}^F+F_{edit}^{Attn}$ Code
CVPR 2024 Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer image $F_{inv}^F+F_{edit}^{Attn}$ Code

Image Translation:

1. Training-Free Approaches

Publication Paper Title Guidance Set Combination Code/Project
CVPR 2024 FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition text $F_{inv}^F+F_{edit}^{Score}$ Code

2. Training-Based Approaches

Publication Paper Title Guidance Set Code/Project
ICCV 2023 Adding Conditional Control to Text-to-Image Diffusion Models text Code
NeurIPS 2023 Cocktail: Mixing Multi-Modality Controls for Text-Conditional Image Generation text Code
NeurIPS 2023 Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Model text Code
NeurIPS 2023 CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation text Code
AAAI 2024 T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models text Code
CVPR 2024 SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing text Code
Arxiv 2024 One-Step Image Translation with Text-to-Image Models text Code

Subject-Driven Customization:

1. Training-Free Approaches

Publication Paper Title Guidance Set Combination Code/Project
ICLR 2023 An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion text $F_{inv}^T+F_{edit}^{Norm}$ Code
Arxiv 2022 DreamArtist: Towards Controllable One-Shot Text-to-Image Generation via Positive-Negative Prompt-Tuning text $F_{inv}^T+F_{edit}^{Norm}$ Code
Arxiv 2023 Highly Personalized Text Embedding for Image Manipulation by Stable Diffusion text $F_{inv}^T+F_{edit}^{Norm}$ Code
Arxiv 2023 P+: Extended Textual Conditioning in Text-to-Image Generation text $F_{inv}^T+F_{edit}^{Norm}$ Code
TOG 2023 A Neural Space-Time Representation for Text-to-Image Personalization text $F_{inv}^T+F_{edit}^{Norm}$ Code
CVPR 2023 DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation text $F_{inv}^T+F_{edit}^{Norm}$ Code
CVPR 2023 Multi-Concept Customization of Text-to-Image Diffusion text $F_{inv}^T+F_{edit}^{Norm}$ Code
ICML 2023 Cones: Concept Neurons in Diffusion Models for Customized Generation text $F_{inv}^T+F_{edit}^{Norm}$ Code
ICCV 2023 SVDiff: Compact Parameter Space for Diffusion Fine-Tuning text $F_{inv}^T+F_{edit}^{Norm}$ Code
Low-Rank Adaptation for Fast Text-to-Image Diffusion Fine-Tuning text $F_{inv}^T+F_{edit}^{Norm}$ Code
Arxiv 2023 A Closer Look at Parameter-Efficient Tuning in Diffusion Models text $F_{inv}^T+F_{edit}^{Norm}$ Code
SIGGRAPH 2023 Break-a-scene: Extracting multiple concepts from a single image text $F_{inv}^T+F_{edit}^{Norm}$ Code
Arxiv 2023 Clic: Concept Learning in Context text $F_{inv}^T+F_{edit}^{Norm}$ Code
Arxiv 2023 Disenbooth: Disentangled parameter-efficient tuning for subject-driven text-to-image generation text $F_{inv}^T+F_{edit}^{Norm}$ Code
AAAI 2024 Decoupled Textual Embeddings for Customized Image Generation text $F_{inv}^T+F_{edit}^{Norm}$ Code
ICLR 2024 A Data Perspective on Enhanced Identity Preservation for Diffusion Personalization text $F_{inv}^T+F_{edit}^{Norm}$ Code
CVPR 2024 FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation text $F_{inv}^T+F_{edit}^{Norm}$ Code
Arxiv 2023 ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation text $F_{inv}^T+F_{edit}^{Attn}$ Code
CVPR 2024 DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization text $F_{inv}^T+F_{edit}^{Attn}$ Code
Arxiv 2024 Direct Consistency Optimization for Compositional Text-to-Image Personalization text $F_{inv}^T+F_{edit}^{Score}$ Code
Arxiv 2024 Pick-and-Draw: Training-free Semantic Guidance for Text-to-Image Personalization text $F_{inv}^F+F_{edit}^{Optim}$ Code

2. Training-Based Approaches

Publication Paper Title Guidance Set Code/Project
Arxiv 2023 Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models text Code
Arxiv 2023 FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention text Code
Arxiv 2023 PhotoMaker: Customizing Realistic Human Photos via Stacked {ID} Embedding text Code
Arxiv 2023 PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models text Code
ICCV 2023 ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation text Code
NeurIPS 2023 BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing text Code
SIGGRAPH 2023 Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models text Code
Arxiv 2023 IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models text Code
NeurIPS 2023 Subject-driven Text-to-Image Generation via Apprenticeship Learning text Code
Arxiv 2023 Unified Multi-Modal Latent Diffusion for Joint Subject and Text Conditional Image Generation text Code
Arxiv 2023 Subject-Diffusion: Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning text Code
Arxiv 2024 Instruct-Imagen: Image Generation with Multi-modal Instruction instruction Code
Arxiv 2024 InstantID: Zero-shot Identity-Preserving Generation in Seconds text Code
ICLR 2024 Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models text Code
CVPR 2024 InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning text Code
ICLR 2024 Enhancing Detail Preservation for Customized Text-to-Image Generation: A Regularization-Free Approach text Code

Attribute-Driven Customization:

1. Training-Free Approaches

Publication Paper Title Guidance Set Combination Code/Project
Arxiv 2023 ProSpect: Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models text $F_{inv}^T+F_{edit}^{Norm}$ Code
Arxiv 2023 An Image is Worth Multiple Words: Multi-attribute Inversion for Constrained Text-to-Image Synthesis text $F_{inv}^T+F_{edit}^{Norm}$ Code
TOG 2023 Concept Decomposition for Visual Exploration and Inspiration text $F_{inv}^T+F_{edit}^{Norm}$ Code
Arxiv 2023 ReVersion: Diffusion-Based Relation Inversion from Images text $F_{inv}^T+F_{edit}^{Norm}$ Code
Arxiv 2023 Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation text $F_{inv}^T+F_{edit}^{Norm}$ Code
Arxiv 2023 Lego: Learning to Disentangle and Invert Concepts Beyond Object Appearance in Text-to-Image Diffusion Models text $F_{inv}^T+F_{edit}^{Norm}$ Code
NeurIPS 2023 StyleDrop: Text-to-Image Generation in Any Style text $F_{inv}^T+F_{edit}^{Norm}$ Code

2. Training-Based Approaches

Publication Paper Title Guidance Set Code/Project
Arxiv 2023 ArtAdapter: Text-to-Image Style Transfer using Multi-Level Style Encoder and Explicit Adaptation text Code
Arxiv 2023 DreamCreature: Crafting Photorealistic Virtual Creatures from Imagination text Code
ICLR 2024 Language-Informed Visual Concept Learning text Code
Arxiv 2024 pOps: Photo-Inspired Diffusion Operators text Code

Acknowledgement

If you find our survey and repository useful for your research project, please consider citing our paper:

@article{ImgEditing,
      title={A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models}, 
      author={Shuai, Xincheng and Ding, Henghui and Ma, Xingjun and Tu, Rongcheng and Jiang, Yu-Gang and Tao, Dacheng},
      journal={arXiv},
      year={2024}
}

Contact

henghui.ding[AT]gmail.com