Remove attn mask patching #1473

baskrahmer · 2023-10-21T12:09:28Z

What does this PR do?

Removes attention mask patching for specific models when doing an ONNX export.

Picked this up but I am not sure about:

Whether to log a warning or raise an error upon exporting with sequence_length=1 and also not what the exact scope is for such an action. Right now it raises a warning for any models that have tasks prefixed with text-generation. Maybe this should be more specific.
Whether to add a warning/error for previously exported models with such a configuration.

fxmarty · 2023-10-24T09:47:27Z

Not sure if this context is given anywhere in the code base, but anyway:

That's great @baskrahmer, thank you for the simplification! For context, @echarlaix introduced a simplification for the ONNX export of decoder-only models in #1257, where a single ONNX without subgraphs can be used, handling both prefill and decode steps (contrary to the previous decoder_model_merged.onnx) that handled both with subgraphs.

However, to do that, the traced model during the ONNX export needs to encompass the causal mask generation. Unfortunately, some architectures as llama https://github.com/huggingface/transformers/blob/fc142bd775ae4639f80a8b0085a5df33bd2853ce/src/transformers/models/llama/modeling_llama.py#L139-L147. So to export models with the new structure, we either need to patch the models to remove this controlflow (what was done), or simply use sequence_length>1 to go into the controlflow during tracing. That is what I was suggesting in #1461 for simplification purpose.

baskrahmer · 2023-10-26T16:42:13Z

@fxmarty thanks for the context. Sounds sensible :)

fxmarty

I'll test a bit more later!

optimum/utils/modeling_utils.py

fxmarty · 2023-10-31T09:23:23Z

Hi @baskrahmer, following huggingface/transformers#27086 quite a few _make_causal, _prepare_attention_mask functions were removed and moved elsewhere. So this PR should fix the issue. I believe you can also remove the patching of _prepare_attn_mask and _make_causal_mask for Falcon (that don't exist anymore).

fxmarty · 2023-10-31T09:23:42Z

see #1495

baskrahmer · 2023-10-31T19:44:18Z

I believe you can also remove the patching of _prepare_attn_mask and _make_causal_mask for Falcon (that don't exist anymore).

Not sure if I follow - you mean also removing this?

fxmarty · 2023-11-02T08:18:25Z

Hi @baskrahmer, sorry for the late reply. I meant this:

optimum/optimum/exporters/onnx/model_patcher.py

Lines 407 to 411 in 1aee8ff

    
           # In order to use a single decoder, we need to patch the _prepare_attn_mask function to behave independently of the sequence length. 
        
           if isinstance(self._model, FalconModel): 
        
               self._model._prepare_attn_mask = _falcon_prepare_attn_mask 
        
           else: 
        
               self._model.transformer._prepare_attn_mask = _falcon_prepare_attn_mask

EDIT: Nevermind, you already removed it!

I'm preparing a release for today in sync with Transformers release and we'll need this PR in, for the interest of time I'll be pushing to your branch to get this PR merged, apology in advance about that!

baskrahmer · 2023-11-02T08:49:05Z

@fxmarty thanks for the reply. All good, you are definitely more in the details here so feel free to change anything :)

fxmarty · 2023-11-02T14:50:00Z

@baskrahmer #1509 is merged based off your branch (I could not push to your branch), sorry for the hurry and thank you for your contribution!

baskrahmer added 3 commits October 26, 2023 16:34

Remove _prepare_decoder_attention_mask patching

1c990c6

Add specific warning for exports with sequence_length set to 1

c26518b

Style

30a922c

baskrahmer force-pushed the remove_attn_mask_patching branch from eab6299 to 30a922c Compare October 26, 2023 16:40

baskrahmer marked this pull request as ready for review October 26, 2023 16:40

fxmarty requested review from echarlaix and fxmarty and removed request for echarlaix October 27, 2023 07:29

fxmarty reviewed Oct 27, 2023

View reviewed changes

optimum/utils/modeling_utils.py Show resolved Hide resolved

Remove Falcon attention mask patching

2df564d

baskrahmer force-pushed the remove_attn_mask_patching branch from 1c1a4be to 2df564d Compare October 31, 2023 19:55

fxmarty mentioned this pull request Nov 2, 2023

Remove attn mask patching #1509

Merged

fxmarty closed this Nov 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove attn mask patching #1473

Remove attn mask patching #1473

baskrahmer commented Oct 21, 2023

fxmarty commented Oct 24, 2023

baskrahmer commented Oct 26, 2023

fxmarty left a comment

fxmarty commented Oct 31, 2023

fxmarty commented Oct 31, 2023

baskrahmer commented Oct 31, 2023

fxmarty commented Nov 2, 2023 •

edited

Loading

baskrahmer commented Nov 2, 2023

fxmarty commented Nov 2, 2023 •

edited

Loading

Remove attn mask patching #1473

Remove attn mask patching #1473

Conversation

baskrahmer commented Oct 21, 2023

What does this PR do?

fxmarty commented Oct 24, 2023

baskrahmer commented Oct 26, 2023

fxmarty left a comment

Choose a reason for hiding this comment

fxmarty commented Oct 31, 2023

fxmarty commented Oct 31, 2023

baskrahmer commented Oct 31, 2023

fxmarty commented Nov 2, 2023 • edited Loading

baskrahmer commented Nov 2, 2023

fxmarty commented Nov 2, 2023 • edited Loading

fxmarty commented Nov 2, 2023 •

edited

Loading

fxmarty commented Nov 2, 2023 •

edited

Loading