Redo custom attention processor to support other attention types #6550

StAlKeR7779 · 2024-06-27T13:33:07Z

Summary

Current attention processor implements only torch-sdp attention type, so when any ip-adapter or regional prompt used, we override model to run torch-sdp attention.
New attention processor combines 4 attention processors(normal, sliced, xformers, torch-sdp) by moving parts of attention that differs(mask preparation and attention itself), to separate function call, where required implementation executed.

Related Issues / Discussions

None

QA Instructions

Change attention_type in invokeai.yaml and then run generation with ip-adapter or regional prompt.

Merge Plan

None?

Checklist

The PR has a short but descriptive title, suitable for a changelog
Tests added / updated (if applicable)
Documentation added / updated (if applicable)

@dunkeroni @RyanJDick

RyanJDick · 2024-06-27T14:34:20Z

I haven't looked at the code yet, but do you know if there are still use cases for using attention processors other than Torch 2.0 SDP? Based on the benchmarking that diffusers has done, it seems like the all around best choice. But maybe there are still reasons to use other implementation e.g. very-low-vram system?

StAlKeR7779 · 2024-06-27T14:54:41Z

I thought roughly same:
normal - generally no need in it
xformers - if you said that torch-sdp on par or even faster, then too can be removed
sliced - yes it's suitable for low memory situations, and I think it's main attention for mps

psychedelicious · 2024-06-28T02:28:36Z

On CUDA, torch's SDP was faster than xformers for me when I last checked a month or so back. IIRC it was just a couple % faster.

RyanJDick · 2024-07-04T18:25:09Z

I thought about this some more, and I'm hesitant to proceed with trying to merge this until we have more clarity around which attention implementations we actually want to support.

Right now, we have _adjust_memory_efficient_attention, which tries to configure attention based on the config and the system properties. The logic in this function is outdated, and I think there has been hesitation to change it out of fear of causing a regression on some systems. Let's get to the bottom of this, before deciding how to proceed with this PR.

My current guess is that just supporting torch SDP and sliced attention would cover all use cases. But, we need to do some testing to determine if this is accurate.

A few data points to consider:

https://pytorch.org/blog/accelerated-diffusers-pt-20/
https://huggingface.co/docs/diffusers/v0.13.0/en/optimization/torch2.0
I did a quick experiment on an RTX4090 and saw a speedup from choosing SDP over xformers (not currently the default behaviour).

@StAlKeR7779 do you want to look into this?

StAlKeR7779 · 2024-07-28T00:04:12Z

@RyanJDick ok, I removed normal and xformers attentions.
But some parts related to frontend, so I hope that @psychedelicious will look at it.
Also question is - what to do with config, I think old configs can be migrated, to convert normal/xformers values, but I not familiar with this part, will look closely later. Maybe @lstein can suggest how to do it.
Upd: Already added config migration, but welcome to hear if done smth wrong in it.

invokeai/backend/stable_diffusion/diffusion/custom_attention.py

psychedelicious

Did some smoke tests covering various permutations of attention type (normal vs torch-sdp) and slice size configs (none, max, balanced, 1, 3, 6) - all good.

Co-Authored-By: psychedelicious <[email protected]>

psychedelicious · 2024-08-04T07:00:44Z

pls run scripts/update_config_docstring.py to update the config docstrings

Co-Authored-By: psychedelicious <[email protected]>

RyanJDick

Looks good to me. I just did a basic smoke test - looks like others have done more rigorous testing.

A few minor things:

Delete _ignore_xformers_triton_message_on_windows
Delete logging.getLogger("xformers").addFilter(lambda record: "A matching Triton is not available" not in record.getMessage())
Remove xformers instructions from 020_INSTALL_MANUAL.md

invokeai/backend/stable_diffusion/diffusion/custom_attention.py

Co-Authored-By: Ryan Dick <[email protected]>

ebr

Only requesting changes here to ensure we don't merge this before testing on a couple of older GPUs. will test it asap

ebr · 2024-08-07T14:12:18Z

How critical is it to remove xformers completely vs leaving it as an option?

torch-sdp is 4.8x slower on older generation Pascal GPUs, like P40 or P100 or the Nvidia 10xx GPUs. For SDXL, this means 8.8 vs 1.8 seconds/iter

Other than that, torch-sdp is slightly faster on Ampere and Turing. Likely also on Ada, but i haven't tested that one.

StAlKeR7779 · 2024-08-07T14:34:56Z

As I said - it cost nothing to support it further in code. But looks like most peoples thought it should be removed, so I removed.
I can easily return it.
Should we select attention type based on cuda compute capability that starting from 7.5 default is torch-sdp and for older ones default xformers(if available)?

hipsterusername · 2024-08-07T15:03:50Z

That seems reasonable, with a configurable override if user wants to force one.

Does look like we should add it back.

ebr · 2024-08-07T15:08:25Z

The PyTorch blog says Flash Attention is supported from sm80 compute capability onwards: https://pytorch.org/blog/accelerated-pytorch-2/, so perhaps we should default to xformers for anything lower than that. (As @hipsterusername said, with a configurable override).

invokeai/backend/stable_diffusion/diffusion/custom_attention.py

RyanJDick · 2024-08-08T14:05:26Z

Not for this PR, but I did some performance testing and we'll probably want to address this at some point:

SDXL:

>>> Time taken to prepare attention processors: 0.10069823265075684s
>>> Time taken to prepare attention processors: 0.07877492904663086s
>>> Time taken to set attention processors: 0.1278061866760254s
>>> Time taken to reset attention processors: 0.13225793838500977s

Code used to measure:

    def apply_custom_attention(self, unet: UNet2DConditionModel):
        """A context manager that patches `unet` with CustomAttnProcessor2_0 attention layers."""
        start = time.time()
        attn_procs = self._prepare_attention_processors(unet)
        time_1 = time.time()
        print(f">>> Time taken to prepare attention processors: {time_1 - start}s")
        orig_attn_processors = unet.attn_processors
        time_2 = time.time()
        print(f">>> Time taken to prepare attention processors: {time_2 - time_1}s")

        try:
            # Note to future devs: set_attn_processor(...) does something slightly unexpected - it pops elements from
            # the passed dict. So, if you wanted to keep the dict for future use, you'd have to make a
            # moderately-shallow copy of it. E.g. `attn_procs_copy = {k: v for k, v in attn_procs.items()}`.
            unet.set_attn_processor(attn_procs)
            time_3 = time.time()
            print(f">>> Time taken to set attention processors: {time_3 - time_2}s")
            yield None
        finally:
            time_4 = time.time()
            unet.set_attn_processor(orig_attn_processors)
            time_5 = time.time()
            print(f">>> Time taken to reset attention processors: {time_5 - time_4}s")

invokeai/backend/stable_diffusion/diffusion/custom_attention.py

ebr

Tested after changes, seeing expected performance increases on Ampere and no performance degradation on Pascal. LGTM!!

RyanJDick

I think this is just about good-to-go. There are a couple minor requests for docs, but otherwise the code looks good to me. (I haven't tested all cases myself, but it sounds like others have.)

invokeai/backend/stable_diffusion/diffusion/custom_attention.py

…d cuda, multihead xformers for high heads count)

Co-Authored-By: Ryan Dick <[email protected]>

Redo attention processor to support other attention types

cd2dccf

github-actions bot added python PRs that change python files backend PRs that change backend files labels Jun 27, 2024

StAlKeR7779 marked this pull request as ready for review June 27, 2024 14:02

StAlKeR7779 requested review from lstein, blessedcoolant, brandonrising, RyanJDick and hipsterusername as code owners June 27, 2024 14:02

StAlKeR7779 added 2 commits July 28, 2024 02:24

Remove xformers and normal attention

9f40c2d

Fix file name

1ab8276

StAlKeR7779 requested a review from psychedelicious as a code owner July 27, 2024 23:26

github-actions bot added the api label Jul 27, 2024

StAlKeR7779 added 2 commits July 28, 2024 02:27

Merge branch 'main' into stalker7779/new_attention_processor

d430e4c

Sync fixes

89c37c3

github-actions bot added the invocations PRs that change invocations label Jul 27, 2024

Update app config

e9cc750

github-actions bot added the services PRs that change app services label Jul 28, 2024

Remove remaining references to xformers

4b6d613

StAlKeR7779 requested a review from ebr as a code owner July 29, 2024 11:02

github-actions bot added docker Root installer PRs that change the installer python-deps PRs that change python dependencies labels Jul 29, 2024

Run api regen

d5fa938

psychedelicious reviewed Aug 4, 2024

View reviewed changes

invokeai/backend/stable_diffusion/diffusion/custom_attention.py Outdated Show resolved Hide resolved

psychedelicious reviewed Aug 4, 2024

View reviewed changes

invokeai/backend/stable_diffusion/diffusion/custom_attention.py Outdated Show resolved Hide resolved

psychedelicious approved these changes Aug 4, 2024

View reviewed changes

StAlKeR7779 and others added 2 commits August 4, 2024 04:26

Suggested changes

18fc36d

Co-Authored-By: psychedelicious <[email protected]>

Merge branch 'main' into stalker7779/new_attention_processor

6bad046

Update config docstring

f44e0cd

Co-Authored-By: psychedelicious <[email protected]>

RyanJDick reviewed Aug 6, 2024

View reviewed changes

invokeai/backend/stable_diffusion/diffusion/custom_attention.py Outdated Show resolved Hide resolved

invokeai/backend/stable_diffusion/diffusion/custom_attention.py Outdated Show resolved Hide resolved

Suggested changes

9618b6e

Co-Authored-By: Ryan Dick <[email protected]>

github-actions bot added the docs PRs that change docs label Aug 6, 2024

StAlKeR7779 requested a review from RyanJDick August 6, 2024 17:32

ebr requested changes Aug 6, 2024

View reviewed changes

StAlKeR7779 added 2 commits August 7, 2024 20:53

Restore xformers

09aef43

Small fixes

37dfab7

StAlKeR7779 requested a review from ebr August 7, 2024 18:24

RyanJDick reviewed Aug 8, 2024

View reviewed changes

invokeai/backend/stable_diffusion/diffusion/custom_attention.py Outdated Show resolved Hide resolved

invokeai/backend/stable_diffusion/diffusion/custom_attention.py Outdated Show resolved Hide resolved

invokeai/backend/stable_diffusion/diffusion/custom_attention.py Show resolved Hide resolved

RyanJDick reviewed Aug 9, 2024

View reviewed changes

invokeai/backend/stable_diffusion/diffusion/custom_attention.py Show resolved Hide resolved

invokeai/backend/stable_diffusion/diffusion/custom_attention.py Outdated Show resolved Hide resolved

ebr approved these changes Aug 10, 2024

View reviewed changes

RyanJDick reviewed Aug 12, 2024

View reviewed changes

invokeai/backend/stable_diffusion/diffusion/custom_attention.py Outdated Show resolved Hide resolved

StAlKeR7779 added 2 commits August 20, 2024 02:03

Rewrite sliced attention, more optimizations(batched torch-sdp for ol…

192fba4

…d cuda, multihead xformers for high heads count)

Remove redundant alignment in batched torch-sdp execution, add comments

0b1ff8f

StAlKeR7779 requested review from RyanJDick and blessedcoolant August 19, 2024 23:37

StAlKeR7779 and others added 2 commits August 20, 2024 03:05

Suggested changes

3d19cac

Co-Authored-By: Ryan Dick <[email protected]>

Edit comments

b947129

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redo custom attention processor to support other attention types #6550

Redo custom attention processor to support other attention types #6550

StAlKeR7779 commented Jun 27, 2024 •

edited

Loading

RyanJDick commented Jun 27, 2024

StAlKeR7779 commented Jun 27, 2024

psychedelicious commented Jun 28, 2024

RyanJDick commented Jul 4, 2024

StAlKeR7779 commented Jul 28, 2024 •

edited

Loading

psychedelicious left a comment

psychedelicious commented Aug 4, 2024

RyanJDick left a comment •

edited

Loading

ebr left a comment

ebr commented Aug 7, 2024 •

edited

Loading

StAlKeR7779 commented Aug 7, 2024 •

edited

Loading

hipsterusername commented Aug 7, 2024

ebr commented Aug 7, 2024

RyanJDick commented Aug 8, 2024

ebr left a comment

RyanJDick left a comment

Redo custom attention processor to support other attention types #6550

Are you sure you want to change the base?

Redo custom attention processor to support other attention types #6550

Conversation

StAlKeR7779 commented Jun 27, 2024 • edited Loading

Summary

Related Issues / Discussions

QA Instructions

Merge Plan

Checklist

RyanJDick commented Jun 27, 2024

StAlKeR7779 commented Jun 27, 2024

psychedelicious commented Jun 28, 2024

RyanJDick commented Jul 4, 2024

StAlKeR7779 commented Jul 28, 2024 • edited Loading

psychedelicious left a comment

Choose a reason for hiding this comment

psychedelicious commented Aug 4, 2024

RyanJDick left a comment • edited Loading

Choose a reason for hiding this comment

ebr left a comment

Choose a reason for hiding this comment

ebr commented Aug 7, 2024 • edited Loading

StAlKeR7779 commented Aug 7, 2024 • edited Loading

hipsterusername commented Aug 7, 2024

ebr commented Aug 7, 2024

RyanJDick commented Aug 8, 2024

ebr left a comment

Choose a reason for hiding this comment

RyanJDick left a comment

Choose a reason for hiding this comment

StAlKeR7779 commented Jun 27, 2024 •

edited

Loading

StAlKeR7779 commented Jul 28, 2024 •

edited

Loading

RyanJDick left a comment •

edited

Loading

ebr commented Aug 7, 2024 •

edited

Loading

StAlKeR7779 commented Aug 7, 2024 •

edited

Loading