Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stable Diffusion 3 local support #1018

Merged
merged 28 commits into from
Jul 29, 2024
Merged

Stable Diffusion 3 local support #1018

merged 28 commits into from
Jul 29, 2024

Conversation

andrewfrench
Copy link
Member

@andrewfrench andrewfrench commented Jul 25, 2024

Describe your changes

This PR introduces drivers that can be used to generate images using Stable Diffusion 3 locally.

A model-agnostic HuggingFaceDiffusionPipelineImageGenerationDriver manages creating and running inferences on a HuggingFace diffusers pipeline. New model drivers: StableDiffusion3PipelineImageGenerationModelDriver, StableDiffusion3Img2ImgPipelineImageGenerationModelDriver, and StableDiffusion3ControlNetPipelineImageGenerationDriver extend the BaseDiffusionPipelineImageGenerationModelDriver to specify how to prepare the inference pipeline and format pipeline inputs.

@andrewfrench andrewfrench changed the title Sd3 local Stable Diffusion 3 Local support Jul 25, 2024
@andrewfrench andrewfrench changed the title Stable Diffusion 3 Local support Stable Diffusion 3 local support Jul 25, 2024
Comment on lines 33 to 35
raise NotImplementedError(
"StableDiffusion3Img2ImgPipeline does not yet support loading from a single file."
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For eventual ComfyUI convenience, we accept three model input types:

  • path to a single file containing a model
  • path to a directory containing model files
  • HuggingFace model repo name

The one exception to this is the StableDiffusion3Img2ImgPipeline, which doesn't support .from_single_file(). Models can still be loaded by path to a local directory or by model repo name (and not downloaded again if they're cached locally).

Copy link

codecov bot commented Jul 25, 2024

@andrewfrench andrewfrench marked this pull request as ready for review July 25, 2024 20:03
pyproject.toml Outdated
@@ -134,6 +138,14 @@ drivers-observability-datadog = [
"opentelemetry-exporter-otlp-proto-http",
]

drivers-imagegen-huggingface = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be named drivers-image-generation-huggingface

Comment on lines 23 to 24
@abstractmethod
def get_output_image_dimensions(self) -> Optional[tuple[int, int]]: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should maybe be a @property?



@define
class HuggingFaceDiffusionPipelineImageGenerationDriver(BaseImageGenerationDriver, ABC):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this subclass BaseImageGenerationDriver instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you mean BaseMultiModelImageGenerationDriver. That was the original intention, but the Pipeline drivers require a substantially different interface than those that inherit from BaseImageGenerationModelDriver as required by the BaseMultiModelImageGenerationDriver.



@define
class HuggingFaceDiffusionPipelineImageGenerationDriver(BaseImageGenerationDriver, ABC):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we should just rename to HuggingFacePipelineImageGenerationDriver for similarity with the HuggingFacePipelinePromptDriver that uses transformers.

# as a path to a local file or as a HuggingFace model repo name.
# We use the from_single_file method if the model is a local file and the
# from_pretrained method if the model is a local directory or hosted on HuggingFace.
sd3_controlnet_model = import_optional_dependency("diffusers.models.controlnet_sd3").SD3ControlNetModel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imports should happen at the top of the function as if it were a regular import.

Comment on lines 56 to 58
sd3_controlnet_pipeline = import_optional_dependency(
"diffusers.pipelines.controlnet_sd3.pipeline_stable_diffusion_3_controlnet"
).StableDiffusion3ControlNetPipeline
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imports should happen at the top of the function as if it were a regular import.

).StableDiffusion3ControlNetPipeline
if os.path.isfile(model):
pipeline = sd3_controlnet_pipeline.from_single_file(model, **pipeline_params)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird that the autoformatter didn't format away this whitespace

def prepare_pipeline(self, model: str, device: Optional[str]) -> Any: ...

@abstractmethod
def make_image_param(self, image: Optional[Image]) -> Optional[dict[str, Image]]: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is image optional if every implementation seems to require it?

return pipeline

def make_image_param(self, image: Optional[Image]) -> Optional[dict[str, Image]]:
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is why image is optional, but this feels odd.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is odd, and a half-measure. Zooming out, I think the time has come to admit the design where each driver implements a suite of common image generation types (prompt, variation, inpainting, outpainting) isn't working so well: forcing NotImplementedErrors and backwards workarounds like this. We can move this conversation to another venue, but should maybe consider making image generation drivers more granular.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please create a ticket to track this work? This feels like an important refactor pre-1.0.

if TYPE_CHECKING:
from PIL.Image import Image
else:
StableDiffusion3ControlNetPipeline = import_optional_dependency(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having 3 in the file/class name feels odd, maybe let's remove?

Copy link
Member Author

@andrewfrench andrewfrench Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stable Diffusion's pipelines include versions in their class names and are distinct enough that we would need a driver for each, much in the same way that ControlNet and Img2Img are distinct here. Amaru pointed out that Stable Diffusion 3 is still quite new and many people still prefer their Stable Diffusion XL workflows — to support this, we'd need another driver typed for StableDiffusionXL (same story for SD1.5 and SD2). I don't like the aesthetics of it, but this leaves some space for other SD versions.

@andrewfrench
Copy link
Member Author

It seemed like I was trying to fit a square peg (model drivers that were actually specific to pipeline types, not models) into a round hole (model drivers meant to define interactions with a model), so I refactored these out to a new type: BaseImageGenerationPipelineDriver, that encapsulates specific preparation and input for diffusion pipeline image generation flows.

collindutter
collindutter previously approved these changes Jul 26, 2024
@andrewfrench andrewfrench merged commit 9f9ac91 into dev Jul 29, 2024
13 checks passed
@andrewfrench andrewfrench deleted the sd3-local branch July 29, 2024 19:30
collindutter pushed a commit that referenced this pull request Aug 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants