error when using fp8 #412

mxjmtxrm · 2025-01-15T03:13:41Z

Hi, I tried to do quantization with FP8, and I met the following error:

RuntimeError: "fill_empty_deterministic_" not implemented for 'Float8_e4m3fn'

Why set torch.use_deterministic_algorithms(True, warn_only=True)?

The text was updated successfully, but these errors were encountered:

wenhuach21 · 2025-01-15T03:26:32Z

This is primarily for reproduction purposes, allowing the use of a deterministic algorithm whenever possible. You may set this to False if needed. Could you let me know which device you are using? We have tested this on A100 and Gaudi, and both work fine.

mxjmtxrm · 2025-01-15T03:43:58Z

H100. As far as I know, FP8 is not supported on a100. How do you conduct Fp8 quantization on A100? or is it supported on A100 already?

mxjmtxrm · 2025-01-15T03:59:05Z

BTW, I have a question about int_sym. why doesn't the max_v use abs_max instead of the following code? and the max_v can be negative？

max_v = (2 * (wmax_abs < wmin_abs).int() - 1) * torch.max(wmax_abs, wmin_abs)

wenhuach21 · 2025-01-15T03:59:57Z

H100. As far as I know, FP8 is not supported on a100. How do you conduct Fp8 quantization on A100? or is it supported on A100 already?

Although the quantized model cannot run on an A100, the tuning process can still be performed on an A100.

mxjmtxrm · 2025-01-15T04:02:50Z

H100. As far as I know, FP8 is not supported on a100. How do you conduct Fp8 quantization on A100? or is it supported on A100 already?

Although the quantized model cannot run on an A100, the tuning process can still be performed on an A100.

there is a cast op in float8_e4m3fn_ste as x.to(torch.float8_e4m3fn). Does it support on A100?

wenhuach21 · 2025-01-15T04:08:19Z

yes

wenhuach21 · 2025-01-15T05:16:57Z

BTW, I have a question about int_sym. why doesn't the max_v use abs_max instead of the following code? and the max_v can be negative？
max_v = (2 * (wmax_abs < wmin_abs).int() - 1) * torch.max(wmax_abs, wmin_abs)

This variant, known as Full Range Sym, is detailed in our blog
https://medium.com/@NeuralCompressor/10-tips-for-quantizing-llms-and-vlms-with-autoround-923e733879a7
or https://zhuanlan.zhihu.com/p/13291803189

wenhuach21 · 2025-01-16T13:07:44Z

@WeiweiZhang1 please help add an arg to disable the use_deterministic_algorithms

wenhuach21 · 2025-01-24T08:16:04Z

workaround #417

wenhuach21 assigned WeiweiZhang1 Jan 16, 2025

wenhuach21 closed this as completed Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error when using fp8 #412

error when using fp8 #412

mxjmtxrm commented Jan 15, 2025

wenhuach21 commented Jan 15, 2025

mxjmtxrm commented Jan 15, 2025

mxjmtxrm commented Jan 15, 2025 •

edited

Loading

wenhuach21 commented Jan 15, 2025

mxjmtxrm commented Jan 15, 2025

wenhuach21 commented Jan 15, 2025

wenhuach21 commented Jan 15, 2025

wenhuach21 commented Jan 16, 2025

wenhuach21 commented Jan 24, 2025

error when using fp8 #412

error when using fp8 #412

Comments

mxjmtxrm commented Jan 15, 2025

wenhuach21 commented Jan 15, 2025

mxjmtxrm commented Jan 15, 2025

mxjmtxrm commented Jan 15, 2025 • edited Loading

wenhuach21 commented Jan 15, 2025

mxjmtxrm commented Jan 15, 2025

wenhuach21 commented Jan 15, 2025

wenhuach21 commented Jan 15, 2025

wenhuach21 commented Jan 16, 2025

wenhuach21 commented Jan 24, 2025

mxjmtxrm commented Jan 15, 2025 •

edited

Loading