Add ONNX and ORT support for Falcon #1391

fxmarty · 2023-09-16T13:39:56Z

This one was more painful than it should have been because:

Exporting without position_ids input and then using generate is bugged because the generated position_ids https://github.com/huggingface/transformers/blob/0a55d9f7376f72ad3ff296d4249840021b03bcc4/src/transformers/models/falcon/modeling_falcon.py#L932 have a different shape than the position_ids generated in generate. I believe this is a bug in many Transformers models.
Transormers implementation reformats the KV cache directly in the model, which is new: Falcon: remove cache reformatting in the modeling code transformers#26199
Optimum ORT has a rather bad support of MQA. Especially, the shape computation for IO Binding is very ugly with a lot of controlflow, expensive calls to normalized_config - this should be refactored with inheritance to avoid any controlflow at all.
ORT does not support bool input to Trilu op onnxruntime.capi.onnxruntime_pybind11_state.NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for Trilu(14) node with name '/decoder/Trilu' microsoft/onnxruntime#16189

Remaining issue: I think repeat_interleave ONNX export inserts Loop in the ONNX, which we may want to avoid. EDIT: fixed in pytorch 2.1

fxmarty · 2023-09-16T13:41:00Z

Fixes #1172

michaelbenayoun · 2023-09-18T07:51:14Z

optimum/exporters/onnx/model_configs.py

+    # we need to set output_attentions=True in the model input to avoid calling
+    # torch.nn.functional.scaled_dot_product_attention that is not supported by the ONNX export


nit: I would move this comment inside the method.

optimum/exporters/onnx/model_patcher.py

michaelbenayoun · 2023-09-18T08:12:34Z

optimum/onnxruntime/modeling_decoder.py

+            generation_config=generation_config,
+            **kwargs,
+        )
+        # self.num_kv_heads = config.num_kv_heads if (config.new_decoder_architecture or not config.multi_query) else 1


Let's keep it for now

michaelbenayoun · 2023-09-18T08:14:14Z

optimum/utils/normalized_config.py

@@ -211,7 +211,7 @@ class NormalizedConfigManager:
        "blenderbot": BartLikeNormalizedTextConfig,
        "blenderbot_small": BartLikeNormalizedTextConfig,
        "bloom": NormalizedTextConfig.with_args(num_layers="n_layer"),
-        "falcon": NormalizedTextConfig.with_args(num_layers="num_hidden_layers", num_attention_heads="num_kv_heads"),
+        "falcon": NormalizedTextConfig,


Question: does NormalizedConfig have a NUM_KV_HEADS attribute to normalize it or not?

add onnx and ort falcon

15b619d

fxmarty requested review from michaelbenayoun, JingyaHuang, echarlaix and mht-sharma September 16, 2023 13:41

michaelbenayoun reviewed Sep 18, 2023

View reviewed changes

fxmarty added 6 commits October 17, 2023 15:32

Merge branch 'master' into add-falcon-onnx-ort

b46e146

add back ort support

244a985

hopefully working ort inference

14e2ad8

address review

54aa31e

style

3d45e05

remove diff in base.py

b93c33a

fxmarty merged commit 1ae95a7 into huggingface:main Oct 18, 2023
64 of 68 checks passed

fxmarty mentioned this pull request Oct 18, 2023

Add support for Falcon model to export to ONNX #1172

Closed

This was referenced Feb 27, 2024

[BUG] Mistral feature extraction export to ONNX is broken #1731

Closed

[BUG] Mistral feature extraction export to ONNX is broken #1732

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ONNX and ORT support for Falcon #1391

Add ONNX and ORT support for Falcon #1391

fxmarty commented Sep 16, 2023 •

edited

Loading

fxmarty commented Sep 16, 2023

michaelbenayoun Sep 18, 2023

michaelbenayoun Sep 18, 2023

fxmarty Oct 17, 2023

michaelbenayoun Sep 18, 2023

fxmarty Oct 17, 2023

		# we need to set output_attentions=True in the model input to avoid calling
		# torch.nn.functional.scaled_dot_product_attention that is not supported by the ONNX export

Add ONNX and ORT support for Falcon #1391

Add ONNX and ORT support for Falcon #1391

Conversation

fxmarty commented Sep 16, 2023 • edited Loading

fxmarty commented Sep 16, 2023

michaelbenayoun Sep 18, 2023

Choose a reason for hiding this comment

michaelbenayoun Sep 18, 2023

Choose a reason for hiding this comment

fxmarty Oct 17, 2023

Choose a reason for hiding this comment

michaelbenayoun Sep 18, 2023

Choose a reason for hiding this comment

fxmarty Oct 17, 2023

Choose a reason for hiding this comment

fxmarty commented Sep 16, 2023 •

edited

Loading