Replies: 3 comments
-
i'm following the progress, but i don't see it as production ready just yet as torch/cuda does not have compute fp8 capabilities, so this is applies to storage only and relies on autocast to fp16 during runtime for processing - and exact gpus which are likely to be memory starved are same ones that are likely to have either autocast (e.g. directml) or fp16 precision issues (e.g. nvidia series 1xxx). |
Beta Was this translation helpful? Give feedback.
-
Honestly, for memory starved setups the best answer may come when stable-difussion.cpp matures to a usable point, since they use the GGML framework from llamacpp that allows really low-bit quantizations (among other cool things). |
Beta Was this translation helpful? Give feedback.
-
[Edit] changed the title: FP8 Precision -> fp8 dtype support The above proposal was merged into their dev branch. |
Beta Was this translation helpful? Give feedback.
-
In AUTOMATIC1111's WebUI repo, the FP8 dtype support is proposed.
It reduces VRAM usage by almost HALF compared to FP16 with a speed decrease of only 5% or less.
It needs PyTorch 2.1.0 or newer, and so far AUTOMATIC1111's WebUI doesn't have the 2.1.0 support, but SD.Next does.
What do you think of FP8? And can it be implemented in SD.Next?
Beta Was this translation helpful? Give feedback.
All reactions