Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MultiThreshold] Generalize data layouts for node execution #143

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

iksnagreb
Copy link
Contributor

The relevant aspect of the data layout annotation seems to be which axis is labeled as the channel dimension "C": We do not actually have to care about the total number and ordering of the other axes, as long as we can find the index of the "C" axis and swap to have "C" at index 1 for node execution (and swap it back afterwards).

Falls back to the default assumption that "C" is at index 1 if there is no layout annotation, which is equivalent to the "NCHW" or "NC" layouts.

This is a rather experimental change which might break existing code and is currently still restricted to the well-known 2-, 3- and 4-dimensional layouts.

This PR is based on #92 which is less experimental and should be merged first (#92 also does not risk breaking existing code as it only adds new special cases).

This allows node execution of MultiThreshold operators with arbitrary
number of dimensions, as long as the channel dimension is last. This is
necessary to run some verification steps of attention operators which,
at least for some intermediate steps, have 3 dimensional data layouts.

This does not change the behavior of execution on the already existing
2d and 4d data layouts.
The relevant aspect of the data layout annotation seems to be which axis
is labeled as the channel dimension "C": We do not actually have to care
about the total number and ordering of the other axes, as long as we can
find the index of the "C" axis and swap to have "C" at index 1 for node
execution (and swap it back afterwards).

Falls back to the default assumption that "C" is at index 1 if there is
no layout annotation, which is equivalent to the "NCHW" or "NC" layouts.

This is a rather experimental change which might break existing code and
is currently still restricted to the well-known 2-, 3- and 4-dimensional
layouts.
As we only really care for the index of the "C" axis there is no need to
restrict the set of valid layouts here.
Note: Only covers data layouts for tensors with less than five axes
@maltanar
Copy link
Collaborator

Instead of relying on data layout strings, how about we switch to an attribute axis (like many standard ONNX ops do) to indicate the location (index) of the channels axis? I'd actually prefer to deprecate the old data_layout attribute, and I think we can still keep backwards compatibility by treating data_layout=NCHW as axis=1 and data_layout=NHWC as axis=-1. If the interpretation of the two attributes disagree, the axis one can dominate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants