Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error when using fp8 #412

Closed
mxjmtxrm opened this issue Jan 15, 2025 · 9 comments
Closed

error when using fp8 #412

mxjmtxrm opened this issue Jan 15, 2025 · 9 comments
Assignees

Comments

@mxjmtxrm
Copy link

Hi, I tried to do quantization with FP8, and I met the following error:

RuntimeError: "fill_empty_deterministic_" not implemented for 'Float8_e4m3fn'

Why set torch.use_deterministic_algorithms(True, warn_only=True)?

@wenhuach21
Copy link
Contributor

This is primarily for reproduction purposes, allowing the use of a deterministic algorithm whenever possible. You may set this to False if needed. Could you let me know which device you are using? We have tested this on A100 and Gaudi, and both work fine.

@mxjmtxrm
Copy link
Author

H100. As far as I know, FP8 is not supported on a100. How do you conduct Fp8 quantization on A100? or is it supported on A100 already?

@mxjmtxrm
Copy link
Author

mxjmtxrm commented Jan 15, 2025

BTW, I have a question about int_sym. why doesn't the max_v use abs_max instead of the following code? and the max_v can be negative?

max_v = (2 * (wmax_abs < wmin_abs).int() - 1) * torch.max(wmax_abs, wmin_abs)

@wenhuach21
Copy link
Contributor

H100. As far as I know, FP8 is not supported on a100. How do you conduct Fp8 quantization on A100? or is it supported on A100 already?

Although the quantized model cannot run on an A100, the tuning process can still be performed on an A100.

@mxjmtxrm
Copy link
Author

H100. As far as I know, FP8 is not supported on a100. How do you conduct Fp8 quantization on A100? or is it supported on A100 already?

Although the quantized model cannot run on an A100, the tuning process can still be performed on an A100.

there is a cast op in float8_e4m3fn_ste as x.to(torch.float8_e4m3fn). Does it support on A100?

@wenhuach21
Copy link
Contributor

yes

@wenhuach21
Copy link
Contributor

BTW, I have a question about int_sym. why doesn't the max_v use abs_max instead of the following code? and the max_v can be negative?

max_v = (2 * (wmax_abs < wmin_abs).int() - 1) * torch.max(wmax_abs, wmin_abs)

This variant, known as Full Range Sym, is detailed in our blog
https://medium.com/@NeuralCompressor/10-tips-for-quantizing-llms-and-vlms-with-autoround-923e733879a7
or https://zhuanlan.zhihu.com/p/13291803189

@wenhuach21
Copy link
Contributor

@WeiweiZhang1 please help add an arg to disable the use_deterministic_algorithms

@wenhuach21
Copy link
Contributor

workaround #417

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants