You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For WebNN EP, the graph builder does not accept input and output with dynamic shape. So after FreeDimensionOverride it is expected that all shape / dims are static.
There was already a shape inference function for GroupQueryAttention Op in BaseGroupQueryAttentionTypeAndShapeInference() of onnxruntime/core/graph/contrib_ops/bert_defs.cc. However, the use_max_past_present_buffer parameter is set to -1 for each case, as in the following code:
So I was wondering if it is possible to pass an argument/flag to give it a chance to perform shape inference, at least when the shared buffer is used by some EPs.
Describe scenario use case
When some EPs use shared buffer for key / value cache, they pass the flag/argument to set the use_max_past_present_buffer to 1, which will enable the shape inference for GroupQueryAttention Ops.
The text was updated successfully, but these errors were encountered:
Describe the feature request
For WebNN EP, the graph builder does not accept input and output with dynamic shape. So after FreeDimensionOverride it is expected that all shape / dims are static.
There was already a shape inference function for
GroupQueryAttention
Op inBaseGroupQueryAttentionTypeAndShapeInference()
ofonnxruntime/core/graph/contrib_ops/bert_defs.cc
. However, theuse_max_past_present_buffer
parameter is set to -1 for each case, as in the following code:onnxruntime/onnxruntime/core/graph/contrib_ops/bert_defs.cc
Lines 319 to 323 in 81cd6ea
So I was wondering if it is possible to pass an argument/flag to give it a chance to perform shape inference, at least when the shared buffer is used by some EPs.
Describe scenario use case
When some EPs use shared buffer for key / value cache, they pass the flag/argument to set the
use_max_past_present_buffer
to1
, which will enable the shape inference for GroupQueryAttention Ops.The text was updated successfully, but these errors were encountered: