Better Bfloat16 support #777

Giuseppe5 · 2023-12-06T14:50:23Z

No description provided.

src/brevitas/export/onnx/standard/function.py

src/brevitas_examples/imagenet_classification/ptq/ptq_evaluate.py

nickfraser · 2023-12-13T17:18:50Z

I think this is good, but there are a few things that we may want to revisit down the line:

The function name float32_kthvalue:
- There is a slight mismatch between the function name and what actually runs, e.g., the function name implies that kthvalue will always run in float32, but I believe it will only do that when specific datatypes / devices have been encountered - a float64 input will still run the algorithm at float64
  - It may be better to simply call the function kthvalue, or change the functionality
The upconversion of datatypes during QuantTensorBase._pre_round_int_value() seems to mean that if x.value is a bfloat16 tensor then the result of the x._pre_round_int_value() will be a float32 type, but otherwise the returned int_value will be the same type as the input (e.g., float16, float32 or float64)
- I'm not saying that this is wrong, but it may have some undesirable side-effects down-the-line where downstream values are being upcast to higher precision values, especially if we want to take advantage of an accelerated bfloat16 backend. Consider the examples below*. I'm just cautious that this may cause downstream operations to be different to the ones we desire...

*

>>> import torch
>>> x = torch.rand((1,),dtype=torch.bfloat16,device="cuda:0")
>>> y = torch.rand((1,),dtype=torch.bfloat16,device="cuda:0")
>>> r = x + y # Output type == input type
>>> r.dtype
torch.bfloat16

>>> import torch
>>> x = torch.rand((1,),dtype=torch.bfloat16,device="cuda:0")
>>> y = torch.rand((1,),dtype=torch.float32,device="cuda:0")
>>> r = x + y # Implicit upcast of x
>>> r.dtype
torch.float32

Giuseppe5 · 2023-12-13T18:04:23Z

Regarding 1, I will rename the function to match its functionality.

Regarding 2, In this current implementation, actually the output of QuantTensor.int() will always be float32 (even though the original QuantTensor was in float16, for example).
I believe the solution could be to cast the output after rounding to the original dtype (being that bfloat16, float16, or float64), basically the behaviour that you thought it is currently happening (but it's actually not).
In this way we would maintain consistency in terms of dtype during computation, also considering that the output of _pre_round_int_value is always used internally, so we just need to be sure that post rounding value has the correct dtype (when we have integer, and don't have the issue of erroneous representation of floating point values in (b)float16).

src/brevitas_examples/imagenet_classification/ptq/ptq_evaluate.py

src/brevitas/quant_tensor/__init__.py

Giuseppe5 commented Dec 8, 2023

View reviewed changes

src/brevitas/export/onnx/standard/function.py Outdated Show resolved Hide resolved

Giuseppe5 commented Dec 8, 2023

View reviewed changes

src/brevitas/export/onnx/standard/function.py Outdated Show resolved Hide resolved

Giuseppe5 commented Dec 8, 2023

View reviewed changes

src/brevitas_examples/imagenet_classification/ptq/ptq_evaluate.py Outdated Show resolved Hide resolved

Giuseppe5 commented Dec 8, 2023

View reviewed changes

src/brevitas_examples/imagenet_classification/ptq/ptq_evaluate.py Outdated Show resolved Hide resolved

Giuseppe5 changed the title ~~Bfloat16 support in ptq evaluate~~ Better Bfloat16 support Dec 11, 2023

Giuseppe5 commented Dec 17, 2023

View reviewed changes

src/brevitas_examples/imagenet_classification/ptq/ptq_evaluate.py Outdated Show resolved Hide resolved

Giuseppe5 added the next release PRs which should be merged for the next release label Dec 20, 2023

Giuseppe5 requested review from nickfraser and volcacius and removed request for nickfraser December 21, 2023 10:18

Giuseppe5 commented Dec 21, 2023

View reviewed changes

src/brevitas/quant_tensor/__init__.py Outdated Show resolved Hide resolved

Giuseppe5 requested review from nickfraser and volcacius and removed request for volcacius and nickfraser December 21, 2023 13:22

Feat (pytorch_utils): kthvalue for (b)float16

6a5f496

Giuseppe5 force-pushed the bfloat16_support branch from d5f6a3c to 024563c Compare December 21, 2023 23:07

Giuseppe5 added 2 commits December 22, 2023 09:16

Feat (quant_tensor): support for bfloat16

50ad4ce

Feat (ptq/evaluate): support for bfloat16

e3f98d4

Giuseppe5 force-pushed the bfloat16_support branch from 024563c to e3f98d4 Compare December 22, 2023 09:17

Giuseppe5 merged commit ade1036 into Xilinx:dev Dec 22, 2023
22 checks passed

Giuseppe5 deleted the bfloat16_support branch December 22, 2023 09:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better Bfloat16 support #777

Better Bfloat16 support #777

Giuseppe5 commented Dec 6, 2023

nickfraser commented Dec 13, 2023

Giuseppe5 commented Dec 13, 2023

Better Bfloat16 support #777

Better Bfloat16 support #777

Conversation

Giuseppe5 commented Dec 6, 2023

nickfraser commented Dec 13, 2023

Giuseppe5 commented Dec 13, 2023