Load non quantized t5encoder using same dtype the model is saved in/ #7140
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
While testing the possibility of using GGUF to load the T5encoder, I noticed that when I tried a fp16 T5encoder in GGUf format it wasn't using the extreme amount of memory on MacOS the base encoder was, despite that encoder being bf16
I tried forcing the load of the base encoder to bfloat16 and that had the same effect, reducing the memory usage and reducing the model load times by 60 - 80%
Read the transformer doc and it says by default it loads to the torch default dtype not the dtype the model was saved in.
So changed the code to auto
Related Issues / Discussions
No issue raised as MacOS requires #7113 to used flux by default (upgrading to torch nightlies works as well) and ideally you'd want to combine this with that PR.
i did bring this up in discord https://discord.com/channels/1020123559063990373/1049495067846524939/1296111527044186144
QA Instructions
Tested on MacOS with and without #7113
Needs testing on Non MacOS systems to make sure it doesn't break stuff.
There may be some difference in images in the same thing is happening there,
as there will be a difference between float32 and bfloat16 calculations and hence resulting images.
Merge Plan
Should be a straight forward merge, as its only one line