Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Add support for fp8 QDQ models. #348

Open
hopef opened this issue Dec 24, 2024 · 2 comments
Open

[Feature Request] Add support for fp8 QDQ models. #348

hopef opened this issue Dec 24, 2024 · 2 comments

Comments

@hopef
Copy link

hopef commented Dec 24, 2024

In the latest version of the onnx-simplifier, I have met errors on fp8 QDQ models.

  • Error 1: shape inference
$> onnxsim fp8.onnx fp8-sim.onnx
Simplifying...
Traceback (most recent call last):
  File "/opt/conda3/bin/onnxsim", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/conda3/lib/python3.11/site-packages/onnxsim/onnx_simplifier.py", line 453, in main
    model_opt, check_ok = simplify(
                          ^^^^^^^^^
  File "/opt/conda3/lib/python3.11/site-packages/onnxsim/onnx_simplifier.py", line 187, in simplify
    model_opt_bytes = C.simplify(
                      ^^^^^^^^^^^
onnx.onnx_cpp2py_export.shape_inference.InferenceError: [ShapeInferenceError] (op_type:QuantizeLinear, node name: pts_bbox_head.transformer.decoder.layers.0.attentions.0.attn.query_quantizer/QuantizeLinear1): [TypeInferenceError] Inferred elem type differs from existing elem type: (17) vs (INT8)
  • Error 2: CSETensorHash
$> onnxsim fp8.onnx fp8-sim.onnx --skip-shape-inference
Simplifying...
Traceback (most recent call last):
  File "/opt/conda3/bin/onnxsim", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/conda3/lib/python3.11/site-packages/onnxsim/onnx_simplifier.py", line 453, in main
    model_opt, check_ok = simplify(
                          ^^^^^^^^^
  File "/opt/conda3/lib/python3.11/site-packages/onnxsim/onnx_simplifier.py", line 187, in simplify
    model_opt_bytes = C.simplify(
                      ^^^^^^^^^^^
RuntimeError: no supported data type: 17

Could you please add support for fp8 QDQ models?
The fp8 QDQ models are aligned with ModelOpt, which has the following structure. Compared to int8 QDQ models, the only difference is the data_type of zero points (int8 vs. float8_e4m3fn).

x_scales = torch.ones(1, dtype=torch.float32)
x_zero_points = torch.zeros(1, dtype=torch.float8_e4m3fn)
w_scales = torch.ones(32, dtype=torch.float32)
w_zero_points= torch.zeros(32, dtype=torch.float8_e4m3fn)

x = Q(x, x_scales, x_zero_points)
x = DQ(x, x_scales, x_zero_points)

quant_weights = Q(weights, w_scales, w_zero_points)
quant_weights = DQ(quant_weights , w_scales, w_zero_points)
y = Conv(x, quant_weights)
@OValery16
Copy link

I also have the same problem

@congyang12345
Copy link

You can try using onnxslim

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants