Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(nodes): migrate cnet nodes away from controlnet_aux #6831

Merged
merged 30 commits into from
Sep 11, 2024

Conversation

psychedelicious
Copy link
Collaborator

Summary

This gets use closer to removing our dependency on the controlnet_aux package. This package provides a handful of classes that perform "controlnet preprocessing". Most of the classes which run a ML model.

Why

Those classes have baked in image resizing logic, apparently intended for use in the A1111 controlnet extension:

  • detect_resolution: int arg: Resizes the image to fit in the given dimension before running it through the processor
  • image_resolution: int arg: Resizes the image to fit in the given dimension after running it through the processor
  • Some models require the input image dimensions to be multiples of 8 (but they resized images to the nearest multiple of 64)

There are issues with this API:

  • You have to carefully select the resolution args to get the right size of image back
  • It seems that occasionally, even when you provide the right dimension to the resolution args, the output is just slightly off
  • The models that require image dimensions to be multiples of 8 would sometimes not resize the image at the end
  • The extra resizing takes some amount of time
  • The extra resizing causes minor degradation in image quality
  • The models are loaded directly from HF, bypassing the model manager & its cache

Up until this point, these issues have not caused problems, because our controlnet implementation automatically resizes control images just before generation. This meant that the dimensions of the processed control images didn't really matter.

With Canvas v2, control image processing is implemented as layer filters. After processing, the image is put back onto the canvas, and should be the exact same size as it was before processing. It is not acceptable for the images to be differently sized.

How

Instead of making potentially breaking and hairy changes to the existing "controlnet processor" nodes, I've created a new set of nodes to be on filter duty in Canvas v2:

  • Canny Edge Detection
  • Color Map
  • Content Shuffle
  • Depth Anything
  • HED Edge Detection
  • Lineart Anime Edge Detection
  • Lineart Edge Detection
  • MediaPipe Face Detection
  • MLSD Edge Detection
  • Normal Map Generation
  • PiDiNet Edge Detection

I have not migrated:

  • The OG SAM node (I don't think it has any use-case currently)
  • Leres, Midas and Zoe depth nodes (Depth Anything is superior, there's no good reason to use any of these over it)

Nodes that have models have revised, separate classes that give our model manager control over model downloading, caching and loading.

This change does bring in a lot of code from controlnet_aux. The biggest chunk is the backend support for NormalBAE. Turns out there's a whole git repo embedded in controlnet_aux for the EfficientNet architecture... I understand the timm package can be used instead but I didn't pursue this.

As mentioned, the new nodes skip all image resizing except where necessary for the model to run, in which case the images are resized back to the original dimensions before the node finishes. This makes it a lot easier to use these nodes too - just provide the image and settings. No need to futz around with image_resolution and detect_resolution.

There are no changes to the existing nodes, so existing workflows will not break. That said, we may want to add a new value to the Classification enum used by nodes so that we can deprecate the existing nodes. Eventually we can drop the controlnet_aux dependency entirely.

Related Issues / Discussions

Discord & offline discussion

QA Instructions

I did a few rounds of testing:

  • With an input image with multiple of 8 width/height
  • With an input image with odd number width/height
  • Side-by-side with the other versions of the nodes
  • With all permutations of settings (not comparing outputs - just making sure they did something and didn't break anything)

In all cases, the outputs were identical where possible (the controlnet_aux resizing logic makes this impossible in some situations).

Merge Plan

Once this merges, I'll update all of the linear UI to use the new nodes. Maybe there are some default workflows to update also?

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • Tests added / updated (if applicable)
  • Documentation added / updated (if applicable)

@github-actions github-actions bot added python PRs that change python files Root invocations PRs that change invocations backend PRs that change backend files python-deps PRs that change python dependencies labels Sep 10, 2024
@psychedelicious
Copy link
Collaborator Author

Note for reviewers: The vast majority of this PR is copy-pasted code from controlnet_aux (nearly all of which was itself copied from some other source).

Review should focus on:

  • The new invocations
  • backend/image_util/*.py
  • The __init__.py files in the newly-added folders in backend/image_util/, which contain the revised model loading and execution classes for the nodes

@hipsterusername
Copy link
Member

Deprecation class should be instituted. We probably want to de-clutter the node library sooner v later. It will be confusing for users.

Or, maybe, deprecated nodes don't show up in the node library and are only visible on old workflows?

Similar to the existing node, but without the resolution fields.
Similar to the existing node, but without the resolution fields.
Similar to the existing node, but without the resolution fields.
Similar to the existing node, but without any resizing and with a revised model loading API.
Similar to the existing node, but without any resizing and with a revised model loading API that uses the model manager.
Similar to the existing node, but without any resizing and with a revised model loading API that uses the model manager.
Similar to the existing node, but without any resizing and with a revised model loading API that uses the model manager.
Similar to the existing node, but without any resizing and with a revised model loading API that uses the model manager.

All code related to the invocation now lives in the Invoke repo.
Similar to the existing node, but without any resizing and with a revised model loading API that uses the model manager.

All code related to the invocation now lives in the Invoke repo.
Similar to the existing node, but without any resizing and with a revised model loading API that uses the model manager.

All code related to the invocation now lives in the Invoke repo. Unfortunately, this includes a whole git repo for EfficientNet. I believe we could use the package `timm` instead of this, but it's beyond me.
Similar to the existing node, but without any resizing and with a revised model loading API that uses the model manager.

All code related to the invocation now lives in the Invoke repo.
Similar to the existing node, but without any resizing. The backend logic was consolidated and modified so that it the model loading can be managed by the model manager.

The ONNX Runtime `InferenceSession` class was added to the `AnyModel` union to satisfy the type checker.
- Changed name
- Better field names
Human-readable field names.
Human-readable field names.
Use a generic to narrow the `type` field from `string` to a literal. Now you can do e.g. `adapter.type === 'control_layer_adapter'` and TS narrows the type.
They will still be usable if a workflow uses one. You just cannot add them directly.
It's a line segment detector, not general edge detector.
- Add backcompat for cnet model default settings
- Default filter selection based on model type
- Updated UI components to use new filter nodes
- Added handling for failed filter executions, preventing filter from getting stuck in case it failed for some reason
- New translations for all filters & fields
@github-actions github-actions bot added the frontend PRs that change frontend files label Sep 11, 2024
@psychedelicious
Copy link
Collaborator Author

  • Added DW Openpose filter
  • Renamed some of the cryptic filter settings from ML research abbreviations to human words
  • Added Classification.Deprecated, applied to the old cnet processor nodes, hidden from UI (but will still work if an existing workflow is loaded using them)
  • Fix a couple bugs in MLSD detector
  • Fix a race condition with progress bar/queue count which tended to happen when executing filters (which execute really fast)
  • Updated UI to use the new filters
  • Default filter now linked to control model
  • Added filter button next to control model drop down
  • Control & raster layers can have an image dropped on them to replace the layer's content
  • Add "pull bbox into" button for control layers, global ip adapters, regional ip adapters

@hipsterusername hipsterusername merged commit 88dcb38 into main Sep 11, 2024
14 checks passed
@hipsterusername hipsterusername deleted the psyche/feat/migrate-cnet-nodes branch September 11, 2024 12:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend PRs that change backend files frontend PRs that change frontend files invocations PRs that change invocations python PRs that change python files python-deps PRs that change python dependencies Root
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants