You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently the BitsAndBytesLinearQuant4bit for submodule always calls bitsandbytes.functional.quantize_4bit. This is somewhat touchy for CPU tensors because quantize_4bit only works on GPU tensors but it is outright not so nice for meta tensors, where we only would need to get the right shapes.
I don't think we need a new class, just functions to complement bitsandbytes.functional.quantize_4bit(w, quant_type="nf4") for meta and cpu inputs (to return a tensor on w.device and a quant state with tensors in w.device).
Ideally, the quantize_weight function should have exactly the same inputs and outputs, except that all tensors stay on the device they are at, so same shape and quant state as if we called bitsandbytes.functional.quantize_4bit.
We could also offer it to bitsandbytes if they're interested.
Currently the BitsAndBytesLinearQuant4bit for submodule always calls
bitsandbytes.functional.quantize_4bit
. This is somewhat touchy for CPU tensors becausequantize_4bit
only works on GPU tensors but it is outright not so nice for meta tensors, where we only would need to get the right shapes.lightning-thunder/thunder/transforms/quantization.py
Lines 93 to 103 in e64d347
The text was updated successfully, but these errors were encountered: