DirectedDiffusion

This repository contains the implementation of the following paper:

Directed Diffusion: Direct Control of Object Placement through Attention Guidance
Wan-Duo Kurt Ma¹,Avisek Lahiri¹, J.P. Lewis^3*,Thomas Leung¹, W. Bastiaan Kleijn¹,
Victoria University of Wellington¹, Google Research², NVIDIA Research³

* work done at Google Research

🔥 Overview

Text-guided diffusion models such as DALLE-2, Imagen, eDiff-I, and Stable Diffusion are able to generate an effectively endless variety of images given only a short text prompt describing the desired image content. In many cases the images are of very high quality. However, these models often struggle to compose scenes containing several key objects such as characters in specified positional relationships. The missing capability to direct the placement of characters and objects both within and across images is crucial in storytelling, as recognized in the literature on film and animation theory. In this work, we take a particularly straightforward approach to providing the needed direction. Drawing on the observation that the cross-attention maps for prompt words reflect the spatial layout of objects denoted by those words, we introduce an optimization objective that produces ``activation'' at desired positions in these cross-attention maps. The resulting approach is a step toward generalizing the applicability of text-guided diffusion models beyond single images to collections of related images, as in storybooks. Directed Diffusion provides easy high-level positional control over multiple objects, while making use of an existing pre-trained model and maintaining a coherent blend between the positioned objects and the background. Moreover, it requires only a few lines to implement.

🔥 Requirements

The codebase is tested under NVIDIA GeForce RTX 3090 with the python library pytorch-2.1.2+cu121 and diffusers-0.21.4. We strongly recommend using a specific version of Diffusers as it is continuously evolving. For PyTorch, you could probably use other version under 2.x.x. With RTX 3090, I follow the post to avoid the compatibility of sm_86 issue.

🔥 Timeline

[2024/02/22]: We will present our poster at Poster Session 1 in Vancouver Convention Centre.
[2023/12/08]: Directed Diffusion has been accepted by AAAI 2024
[2023/09/28]: We have updated the latest version v3 of our ArXiv paper. More validations, more ablations, with enhanced methodologies.
[2023/03/16]: Gradio Web UI integrated! See walk-through for more info.
[2023/03/01]: First version submitted on ArXiv

🔥 Usage

Please refer to our walk-through for more information from scratch. Once you get familiar with it, then you could reproduce our paper result and alter some of the parameter in paper-result. We also provide some of the experiments that are not listed in our paper for fun other-prompt!

🔥 Acknolwedgements

We thank Jason Baldridge, and Arkanath Pathak for helpful feedback.

🔥 TODO

We apologize this repository is not fully udpated and stays at our v1 ArXiv version link. We are actively working on enhancing our codebase. The updates are scheduled for release in March 2024, along with an online HuggingFace demo.

🔥 Citation

TrailBlazer is the recent descendant project extended from Directed Diffusion in the context of video generation. If you find our work useful for your research, please consider citing our paper.

    @article{ma2023directed,
       title={Directed Diffusion: Direct Control of Object Placement through Attention Guidance},
           author={Wan-Duo Kurt Ma and J. P. Lewis and Avisek Lahiri and Thomas Leung and W. Bastiaan Kleijn},
       year={2023},
       eprint={2302.13153},
       archivePrefix={arXiv},
       primaryClass={cs.CV}
   }

   @article{ma2023trailblazer,
       title={TrailBlazer: Trajectory Control for Diffusion-Based Video Generation},
       author={Wan-Duo Kurt Ma and J. P. Lewis and W. Bastiaan Kleijn},
       year={2023},
       eprint={2401.00896},
       archivePrefix={arXiv},
       primaryClass={cs.CV}
   }

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
assets		assets
bin		bin
doc		doc
script		script
source/DirectedDiffusion		source/DirectedDiffusion
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DirectedDiffusion

🔥 Overview

🔥 Requirements

🔥 Timeline

🔥 Usage

🔥 Acknolwedgements

🔥 TODO

🔥 Citation

About

Releases

Packages

Contributors 2

Languages

hohonu-vicml/DirectedDiffusion

Folders and files

Latest commit

History

Repository files navigation

DirectedDiffusion

🔥 Overview

🔥 Requirements

🔥 Timeline

🔥 Usage

🔥 Acknolwedgements

🔥 TODO

🔥 Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages