Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kunal-vaishnavi Thank you for the PR. Isn't
past_key_values
expected to be None in the first forward pass? And prepared later atuse_cache_branch, past_key_values, known_output_shapes = self.prepare_past_key_values(input_ids, past_key_values, use_torch)
?cc @echarlaix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or did you mean
decoder_model_merged.onnx
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure when this could happen, could you share an example so that we can reproduce @kunal-vaishnavi ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is with
decoder_model_merged.onnx
. When running the forward pass, eachpast_key
andpast_value
inpast_key_values
has shape(batch_size, num_heads, past_seq_len, head_size)
. For the prompt step,past_seq_len = 0
so eachpast_key
andpast_value
input shape is(batch_size, num_heads, 0, head_size)
.Since
self.use_cache
is true forORTModelForCausalLM
models and the merged model always haspast_key_values
, bothif
statements are true. Then theinput_ids
currently get modified for both prompt and token generation.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kunal-vaishnavi I think it is not true. While
decoder_model_merged.onnx
, this is only true at thesession.run
call, not at theforward
call, that is typically called from transformersGenerationMixin.generate
. Do you have a different use case?@echarlaix loading
decoder_model_merged.onnx
following #1257 printsis this expected?
@kunal-vaishnavi Apart from backward compatibility reasons, do you have limiting factors on your side that prevent to simply use the new version of the exported model (with an empty KV cache at the first iteration)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the PR linked below, we want to benchmark exported models using Optimum. To have an equal comparison, we currently evaluate with
model(...)
and notmodel.generate(...)
. We pass bothinput_ids
andpast_key_values
to the forward call in order for the merged ONNX model to receive both inputs. When calling the forward pass, theinput_ids
andpast_key_values
are passed with the right shapes. However, theinput_ids
shape is changed in Optimum before they are passed with the wrong shape to the ONNX model.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, makes sense indeed.