[Feature Request] Add support for fp8 QDQ models. #348

hopef · 2024-12-24T09:30:30Z

In the latest version of the onnx-simplifier, I have met errors on fp8 QDQ models.

Error 1: shape inference

$> onnxsim fp8.onnx fp8-sim.onnx
Simplifying...
Traceback (most recent call last):
  File "/opt/conda3/bin/onnxsim", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/conda3/lib/python3.11/site-packages/onnxsim/onnx_simplifier.py", line 453, in main
    model_opt, check_ok = simplify(
                          ^^^^^^^^^
  File "/opt/conda3/lib/python3.11/site-packages/onnxsim/onnx_simplifier.py", line 187, in simplify
    model_opt_bytes = C.simplify(
                      ^^^^^^^^^^^
onnx.onnx_cpp2py_export.shape_inference.InferenceError: [ShapeInferenceError] (op_type:QuantizeLinear, node name: pts_bbox_head.transformer.decoder.layers.0.attentions.0.attn.query_quantizer/QuantizeLinear1): [TypeInferenceError] Inferred elem type differs from existing elem type: (17) vs (INT8)

Error 2: CSETensorHash

$> onnxsim fp8.onnx fp8-sim.onnx --skip-shape-inference
Simplifying...
Traceback (most recent call last):
  File "/opt/conda3/bin/onnxsim", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/conda3/lib/python3.11/site-packages/onnxsim/onnx_simplifier.py", line 453, in main
    model_opt, check_ok = simplify(
                          ^^^^^^^^^
  File "/opt/conda3/lib/python3.11/site-packages/onnxsim/onnx_simplifier.py", line 187, in simplify
    model_opt_bytes = C.simplify(
                      ^^^^^^^^^^^
RuntimeError: no supported data type: 17

Could you please add support for fp8 QDQ models?
The fp8 QDQ models are aligned with ModelOpt, which has the following structure. Compared to int8 QDQ models, the only difference is the data_type of zero points (int8 vs. float8_e4m3fn).

x_scales = torch.ones(1, dtype=torch.float32)
x_zero_points = torch.zeros(1, dtype=torch.float8_e4m3fn)
w_scales = torch.ones(32, dtype=torch.float32)
w_zero_points= torch.zeros(32, dtype=torch.float8_e4m3fn)

x = Q(x, x_scales, x_zero_points)
x = DQ(x, x_scales, x_zero_points)

quant_weights = Q(weights, w_scales, w_zero_points)
quant_weights = DQ(quant_weights , w_scales, w_zero_points)
y = Conv(x, quant_weights)

OValery16 · 2025-01-29T16:23:05Z

I also have the same problem

congyang12345 · 2025-02-01T04:37:33Z

You can try using onnxslim

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Add support for fp8 QDQ models. #348

[Feature Request] Add support for fp8 QDQ models. #348

hopef commented Dec 24, 2024 •

edited

Loading

OValery16 commented Jan 29, 2025

congyang12345 commented Feb 1, 2025

[Feature Request] Add support for fp8 QDQ models. #348

[Feature Request] Add support for fp8 QDQ models. #348

Comments

hopef commented Dec 24, 2024 • edited Loading

OValery16 commented Jan 29, 2025

congyang12345 commented Feb 1, 2025

hopef commented Dec 24, 2024 •

edited

Loading