This is the github repository of our work "A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models".
Content-Aware Editing
- Object Manipulation + Attribute Manipulation
- Attribute Manipulation
- Spatial Transformation
- Inpainting
- Style Change
- Image Translation
Content-Free Editing
Experiment and Data
π UniTune: Text-Driven Image Editing by Fine Tuning a Diffusion Model on a Single Image | π TOG 2023 | π
π Highly Personalized Text Embedding for Image Manipulation by Stable Diffusion | π Arxiv 2023 | π
π Imagic: Text-Based Real Image Editing with Diffusion Models | π CVPR 2023 | π
π Forgedit: Text Guided Image Editing via Learning and Forgetting | π Arxiv 2023 | π
π Doubly Abductive Counterfactual Inference for Text-based Image Editing | π CVPR 2024 | π
π SINE: Sinle Image Editing with Text-to-Image Diffusion Models | π CVPR 2023 | π
π EDICT: Exact Diffusion Inversion via Coupled Transformations | π CVPR 2023 | π
π Exact Diffusion Inversion via Bi-directional Integration Approximation | π Arxiv 2023 | π
π Effective Real Image Editing with Accelerated Iterative Diffusion Inversion | π ICCV 2023 | π
π Null-text Inversion for Editing Real Images using Guided Diffusion Models | π CVPR 2023 | π
π Negative-prompt Inversion: Fast Image Inversion for Editing with Text-guided Diffusion Models | π Arxiv 2023 | π
πProxEdit: Improving Tuning-Free Real Image Editing with Proximal Guidance | π WACV 2024 | π
π Fixed-point Inversion for Text-to-image diffusion models | π Arxiv 2023 | π
πPnP Inversion: Boosting Diffusion-based Editing with 3 Lines of Code | π ICLR 2024 | π
π Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing | π NeurIPS 2023 | π
π An Edit Friendly DDPM Noise Space: Inversion and Manipulations | π CVPR 2024 | π
π Prompt-to-Prompt Image Editing with Cross-Attention Control | π ICLR 2023 | π
π Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation | π CVPR 2023 | π
π Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing | π CVPR 2024 | π
π StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing | π Arxiv 2023 | π
π Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion Models | π ICCV 2023 | π
π Object-aware Inversion and Reassembly for Image Editing | π ICLR 2024 | π
π DiffEdit: Diffusion-based semantic image editing with mask guidance | π ICLR 2023 | π
π PFB-Diff: Progressive Feature Blending Diffusion for Text-driven Image Editing | π Arxiv 2023 | π
π Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models | π CVPR 2023 | π
π Noise Map Guidance: Inversion with Spatial Context for Real Image Editing | π ICLR 2024 | π
π pix2pix-zero | π SIGGRAPH 2023 | π
π SEGA: Instructing Diffusion using Semantic Dimensions | π NeurIPS 2023 | π
π The Stable Artist: Steering Semantics in Diffusion Latent Space | π Arxiv 2022 | π
π LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance | π Arxiv 2023 | π
π LEDITS++: Limitless Image Editing using Text-to-Image Models | π CVPR 2024 | π
π Magicremover: Tuning-free Text-guided Image inpainting with Diffusion Models | π ICLR 2024 | π
π Region-Aware Diffusion for Zero-shot Text-driven Image Editing | π Arxiv 2023 | π
π Delta Denoising Score | π ICCV 2023 | π
π Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing | π CVPR 2024 | π
π Ground-A-Score: Scaling Up the Score Distillation for Multi-Attribute Editing | π Arxiv 2024 | π
π Custom-Edit: Text-Guided Image Editing with Customized Diffusion Models | π CVPR 2023 | π
π Photoswap: Personalized Subject Swapping in Images | π NeurIPS 2023 | π
π DreamEdit: Subject-driven Image Editing | π TMLR 2023 |π
π InstructPix2Pix: Learning to Follow Image Editing Instructions | π CVPR 2023 | π Code
π MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing | π NeurIPS 2023 | π Code
π HIVE: Harnessing Human Feedback for Instructional Visual Editing | π Arxiv 2023 | π Code
π Emu Edit: Precise Image Editing via Recognition and Generation Tasks | π Arxiv 2023 | π Code
π GUIDING INSTRUCTION-BASED IMAGE EDITING VIA MULTIMODAL LARGE LANGUAGE MODELS | π ICLR 2024 | π Code
π SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models | πCVPR 2024 | π Code
π Referring Image Editing: Object-level Image Editing via Referring Expressions | πCVPR 2024 | π Code
π KV Inversion: KV Embeddings Learning for Text-Conditioned Real Image Action Editing | π PRCV 2023 | π
π Localizing Object-level Shape Variations with Text-to-Image Diffusion Models | π ICCV 2023 | π
π MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing | π ICCV 2023 | π
π Tuning-Free Inversion-Enhanced Control for Consistent Image Editing | π AAAI 2023 | π
π Cross-Image Attention for Zero-Shot Appearance Transfer | π SIGGRAPH 2024 | π
π DesignEdit: Multi-Layered Latent Decomposition and Fusion for Unified & Accurate Image Editing | π Arxiv 2024 | π
π Diffusion Self-Guidance for Controllable Image Generation | π NeurIPS 2023 | π
π DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models | π ICLR 2024 | π
π DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing | π ICLR 2024 | π
π DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing | π ICLR 2024 | π
π HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models | π Arxiv 2023 | π
π TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition | π ICCV 2023 | π
π Blended Latent Diffusion | π TOG 2023 | π
π High-Resolution Image Editing via Multi-Stage Blended Diffusion | π Arxiv 2022 | π
π Differential Diffusion: Giving Each Pixel Its Strength | π Arxiv 2023 | π
π Tuning-Free Image Customization with Image and Text Guidance | π CVPR 2024 | π
π Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting | π CVPR 2024| π Code
π SmartBrush: Text and Shape Guided Object Inpainting with Diffusion Model | π CVPR 2023 | π Code
π A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting | π Arxiv 2023 | π Code
π Paint by Example: Exemplar-based Image Editing with Diffusion Models | π CVPR 2023 | π Code
π ObjectStitch: Object Compositing with Diffusion Model | π CVPR 2023 | π Code
π Reference-based Image Composition with Sketch via Structure-aware Diffusion Model | π CVPR 2023 | π Code
π Paste, Inpaint and Harmonize via Denoising: Subject-Driven Image Editing with Pre-Trained Diffusion Model | π ICASSP 2024 | π Code
π AnyDoor: Zero-shot Object-level Image Customization | π CVPR 2024 | π Code
π Inversion-Based Style Transfer with Diffusion Models | π CVPR 2023 | π
π Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer | π CVPR 2024 | π
π Zβ: Zero-shot Style Transfer via Attention Rearrangement | π Arxiv 2023 | π
π FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition | π CVPR 2024 | π Code
π Adding Conditional Control to Text-to-Image Diffusion Models | π ICCV 2023 | π Code
π T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models | π AAAI 2024 | π Code
π SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing | π CVPR 2024 | π Code
π Cocktail: Mixing Multi-Modality Controls for Text-Conditional Image Generation | π NeurIPS 2023 | π Code
π Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Model | π NeurIPS 2023 | π Code
π CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation | π NeurIPS 2023 | π Code
π One-Step Image Translation with Text-to-Image Models | π Arxiv 2024] | π Code
π An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion | π ICLR 2023 | π
π DreamArtist: Towards Controllable One-Shot Text-to-Image Generation via Positive-Negative Prompt-Tuning | π Arxiv 2022 | π
π P+: Extended Textual Conditioning in Text-to-Image Generation | π Arxiv 2023 | π
π A Neural Space-Time Representation for Text-to-Image Personalization | π TOG 2023 | π
π DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation | π CVPR 2023 | π
πA Data Perspective on Enhanced Identity Preservation for Diffusion Personalization | π ICLR 2024 | π
π FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation | π CVPR 2024 | π
π Multi-Concept Customization of Text-to-Image Diffusion | π CVPR 2023 | π
π Cones: Concept Neurons in Diffusion Models for Customized Generation | π ICML 2023 | π
π SVDiff: Compact Parameter Space for Diffusion Fine-Tuning | π ICCV 2023 | π
π Low-Rank Adaptation for Fast Text-to-Image Diffusion Fine-Tuning | π | π
π A Closer Look at Parameter-Efficient Tuning in Diffusion Models | π Arxiv 2023 | π
π Break-a-scene: Extracting multiple concepts from a single image | π SIGGRAPH 2023 | π
π Clic: Concept Learning in Context | π Arxiv 2023 | π
π Disenbooth: Disentangled parameter-efficient tuning for subject-driven text-to-image generation | π Arxiv 2023 | π
π Decoupled Textual Embeddings for Customized Image Generation | π AAAI 2024 | π
π ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation | π Arxiv 2023 | π
π DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization | π CVPR 2024 | π
π Pick-and-Draw: Training-free Semantic Guidance for Text-to-Image Personalization | π Arxiv 2024 | π
π Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models | π ICLR 2024 | π Code
π InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning | π CVPR 2024] | π Code
π Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models | π Arxiv 2023 | π Code
π Enhancing Detail Preservation for Customized Text-to-Image Generation: A Regularization-Free Approach | π ICLR 2024 | π Code
π FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention | π Arxiv 2023 | π Code
π PhotoMaker: Customizing Realistic Human Photos via Stacked {ID} Embedding | π Arxiv 2023 | π Code
π PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models | π Arxiv 2023 | π Code
π InstantID: Zero-shot Identity-Preserving Generation in Seconds | π Arxiv 2024 | π Code
π ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation | π ICCV 2023 | π Code
π BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing | π NeurIPS 2023 | π Code
π Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models | π SIGGRAPH 2023 | π Code
π Unified Multi-Modal Latent Diffusion for Joint Subject and Text Conditional Image Generation | π Arxiv 2023 | π Code
π Subject-Diffusion: Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning | π Arxiv 2023 | π Code
π Instruct-Imagen: Image Generation with Multi-modal Instruction | π Arxiv 2024 | π Code
π ProSpect: Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models | π Arxiv 2023 | π
π An Image is Worth Multiple Words: Multi-attribute Inversion for Constrained Text-to-Image Synthesis | π Arxiv 2023 | π
π Concept Decomposition for Visual Exploration and Inspiration | π TOG 2023 | π
π ReVersion: Diffusion-Based Relation Inversion from Images | π Arxiv 2023 | π
π Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation | π Arxiv 2023 | π
π Lego: Learning to Disentangle and Invert Concepts Beyond Object Appearance in Text-to-Image Diffusion Models | π Arxiv 2023 | π
π StyleDrop: Text-to-Image Generation in Any Style | π NeurIPS 2023 | π
π ArtAdapter: Text-to-Image Style Transfer using Multi-Level Style Encoder and Explicit Adaptation | π Arxiv 2023 | π Code
π DreamCreature: Crafting Photorealistic Virtual Creatures from Imagination | π Arxiv 2023 | π Code
π Language-Informed Visual Concept Learning | π ICLR 2024| π Code
π pOps: Photo-Inspired Diffusion Operators | π Arxiv 2024 | π Code