Load non quantized t5encoder using same dtype the model is saved in/ #7140

Vargol · 2024-10-17T12:50:23Z

Summary

While testing the possibility of using GGUF to load the T5encoder, I noticed that when I tried a fp16 T5encoder in GGUf format it wasn't using the extreme amount of memory on MacOS the base encoder was, despite that encoder being bf16
I tried forcing the load of the base encoder to bfloat16 and that had the same effect, reducing the memory usage and reducing the model load times by 60 - 80%

Read the transformer doc and it says by default it loads to the torch default dtype not the dtype the model was saved in.
So changed the code to auto

Related Issues / Discussions

No issue raised as MacOS requires #7113 to used flux by default (upgrading to torch nightlies works as well) and ideally you'd want to combine this with that PR.
i did bring this up in discord https://discord.com/channels/1020123559063990373/1049495067846524939/1296111527044186144

QA Instructions

Tested on MacOS with and without #7113
Needs testing on Non MacOS systems to make sure it doesn't break stuff.
There may be some difference in images in the same thing is happening there,
as there will be a difference between float32 and bfloat16 calculations and hence resulting images.

Merge Plan

Should be a straight forward merge, as its only one line

brandonrising

Seems to be working as described for me, have any concerns with this @RyanJDick ?

…at32 on Macs

Vargol requested review from lstein, blessedcoolant, brandonrising, RyanJDick and hipsterusername as code owners October 17, 2024 12:50

github-actions bot added python PRs that change python files backend PRs that change backend files labels Oct 17, 2024

brandonrising approved these changes Oct 18, 2024

View reviewed changes

Vargol mentioned this pull request Oct 18, 2024

[bug]: MacOSX/M1 Flux workflow 5.0.0: TypeError: BFloat16 is not supported on MPS #6991

Closed

1 task

psychedelicious enabled auto-merge (rebase) October 22, 2024 22:59

Vargol added 2 commits October 23, 2024 09:59

load t5 model in the same format as it is saved, seems to load as flo…

1c9fc0a

…at32 on Macs

ruff fix

2c1f49e

psychedelicious force-pushed the auto_dtype_t5 branch from 8a70366 to 2c1f49e Compare October 22, 2024 22:59

psychedelicious merged commit 24f9b46 into invoke-ai:main Oct 22, 2024
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load non quantized t5encoder using same dtype the model is saved in/ #7140

Load non quantized t5encoder using same dtype the model is saved in/ #7140

Vargol commented Oct 17, 2024

brandonrising left a comment

Load non quantized t5encoder using same dtype the model is saved in/ #7140

Load non quantized t5encoder using same dtype the model is saved in/ #7140

Conversation

Vargol commented Oct 17, 2024

Summary

Related Issues / Discussions

QA Instructions

Merge Plan

brandonrising left a comment

Choose a reason for hiding this comment