Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Substrait] Create emit op to represent output mapping. #825

Merged
merged 3 commits into from
May 30, 2024

Conversation

ingomueller-net
Copy link
Collaborator

This PR is based on and, therefore, includes #820 and its dependencies.

The commit introduces handling for the emit_kind field of the
RelCommon message, which is a common field of all (?) cases of the
Rel message. The current design models this field as a dedicated op
such that the output mapping is only ever present in that op, emit.

There are at least two alternatives to this IR design. The first one
consists of making the output mapping part of each of the ops the
represnet Rel messages and expose it through the RelOpInterface.
However, this would mean that (1) the custom assembly of each op would
have to represent the mapping, which is manual effort and a possible
source for inconsistencies, (2) each op would have to implement type
inference in the presence of a mapping, and (3) most rewrites of all ops
would have to take that mapping into account for their semantics. Having
the mapping in one place makes all of this simpler. The downside is that
what is kept in a single place in the Substrait protobuf format is now
spread across two ops in the MLIR representation. However, I believe
that this is the smaller of two evils and the current import and export
seems to work.

Another alternative would be to combine the two: make the mapping part
of all ops but also introduce a dedicated emit op. Then, two passes
could move the mapping from one to the other depending on which of the
two representations would be more convenient. However, this would not
get rid of Issues (1) and (2) above and lead to more concepts and code.

Comment on lines 158 to 209
/// We just forward to the overload for `RelOpInterface`, which will have to
/// export this op. We can't (easily) do it here because the emit op is
/// represented as part of the `RelCommon` message of one of the cases of the
/// `Rel` message but there is no generic way to access the `common` field of
/// the various cases.
FailureOr<std::unique_ptr<Rel>> exportOperation(EmitOp op) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% happy with the whole structure. It is somewhat dangerous to rely on implementations doing a particular thing without being able to enforce it.

The problem is, I think, that I alluded to here: there is no proper way to the access the common field of different message types. Maybe it's worth to consider the following alternative: implement a accessCommonField function with a big switch that accesses that field for any of the cases we support. This is ugly but in concentrates the ugliness in one place. Then exportOperation(EmitOp) could first export the op that produces its input and then modify the common field of the resulting message of that export. I suppose something similar could work for the import.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is all local here right? Meaning, your unit tests should cover this and its an invariant one has to satisfy here/when adding a new export type and the export test should cover it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, it isn't quite local. You spotted to problem below: the need to call this function for every op type. And the problem is that the tests also need to be done for each op type, so I wouldn't notice if I just forget to add both...

Comment on lines 158 to 209
/// We just forward to the overload for `RelOpInterface`, which will have to
/// export this op. We can't (easily) do it here because the emit op is
/// represented as part of the `RelCommon` message of one of the cases of the
/// `Rel` message but there is no generic way to access the `common` field of
/// the various cases.
FailureOr<std::unique_ptr<Rel>> exportOperation(EmitOp op) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is all local here right? Meaning, your unit tests should cover this and its an invariant one has to satisfy here/when adding a new export type and the export test should cover it.

lib/Target/SubstraitPB/Export.cpp Outdated Show resolved Hide resolved
/// Imports the provided `RelCommon` message by producing an `EmitOp` that
/// expresses the `Emit` message if it exists.
///
/// **This function must be called at the end of the import function of every
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to have some top level helper function?

say

template <typename T>
mlir::FailureOr<RelOpInterface> importRel(..., T inputOp) {
  ... importRelImpl<T>(...)
  return importMaybeEmit(....)
}

Not sure if it helps that much, but makes a structure where more difficult to forget it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that's what I should aim for. I'll have a stab at that. The problem I'll have to solve for that is what I sketched above: I'll need to access the common field of the different message types but because they are different message types, there is no one function that does that. I think that the solution is to encapsulate that functionality in a function that does just that (and is implemented with one big switch).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that's what I ended up doing. I think that that was an important thing to clean up -- the new version is quite a bit more elegant.

using ReferenceSegment = Expression::ReferenceSegment;

MLIRContext *context = builder.getContext();
Location loc = UnknownLoc::get(context);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we do better here?  even just file name

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think yes, but this would require some code reorganization. A few notes before I forget: I'd need to implement the translation using llvm::SourceMgr and access BufferBuffer:: getBufferIdentifier to get the filename.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created #839 to track this effort.

return emitError(loc) << "only direct reference supported";

// Traverse list to extract indexes.
llvm::SmallVector<int64_t> indexes;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indices ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(i'm not sure which one is used more db side)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is due to me not being a native speaker, I think. Fixed in #840.

The commit introduces handling for the `emit_kind` field of the
`RelCommon` message, which is a common field of all (?) cases of the
`Rel` message. The current design models this field as a dedicated op
such that the output mapping is only ever present in that op, `emit`.

There are at least two alternatives to this IR design. The first one
consists of making the output mapping part of each of the ops the
represnet `Rel` messages and expose it through the `RelOpInterface`.
However, this would mean that (1) the custom assembly of each op would
have to represent the mapping, which is manual effort and a possible
source for inconsistencies, (2) each op would have to implement type
inference in the presence of a mapping, and (3) most rewrites of all ops
would have to take that mapping into account for their semantics. Having
the mapping in one place makes all of this simpler. The downside is that
what is kept in a single place in the Substrait protobuf format is now
spread across two ops in the MLIR representation. However, I believe
that this is the smaller of two evils and the current import and export
seems to work.

Another alternative would be to combine the two: make the mapping part
of all ops but *also* introduce a dedicated `emit` op. Then, two passes
could move the mapping from one to the other depending on which of the
two representations would be more convenient. However, this would not
get rid of Issues (1) and (2) above and lead to more concepts and code.

Signed-off-by: Ingo Müller <[email protected]>
Copy link
Collaborator Author

@ingomueller-net ingomueller-net left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! I'll address the points shortly.

Comment on lines 158 to 209
/// We just forward to the overload for `RelOpInterface`, which will have to
/// export this op. We can't (easily) do it here because the emit op is
/// represented as part of the `RelCommon` message of one of the cases of the
/// `Rel` message but there is no generic way to access the `common` field of
/// the various cases.
FailureOr<std::unique_ptr<Rel>> exportOperation(EmitOp op) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, it isn't quite local. You spotted to problem below: the need to call this function for every op type. And the problem is that the tests also need to be done for each op type, so I wouldn't notice if I just forget to add both...

/// Imports the provided `RelCommon` message by producing an `EmitOp` that
/// expresses the `Emit` message if it exists.
///
/// **This function must be called at the end of the import function of every
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that's what I should aim for. I'll have a stab at that. The problem I'll have to solve for that is what I sketched above: I'll need to access the common field of the different message types but because they are different message types, there is no one function that does that. I think that the solution is to encapsulate that functionality in a function that does just that (and is implemented with one big switch).

The first attempt had been rather akward, requiring the import and
export of each specific op to cooperate with the corresponding logic of
the `emit` op. This was due to the fact that there is no out-of-the-box
way to access the `RelCommon` message of the `rel_type` message without
being specific to the `rel_type` case. In this version, two utility
functions provide access to that message (one read-only, one mutable).
This allows the import and export of `emit` to be local, which removes a
whole class of difficult to detect bugs.

Signed-off-by: Ingo Müller <[email protected]>
@ingomueller-net ingomueller-net merged commit 53adb5e into iree-org:main May 30, 2024
4 checks passed
@ingomueller-net ingomueller-net deleted the substrait-emit branch May 30, 2024 14:41
ingomueller-net added a commit to substrait-io/substrait-mlir-contrib that referenced this pull request Oct 15, 2024
…e-llvm-sandbox#825)

* [Substrait] Create `emit` op to represent output mapping.

The commit introduces handling for the `emit_kind` field of the
`RelCommon` message, which is a common field of all (?) cases of the
`Rel` message. The current design models this field as a dedicated op
such that the output mapping is only ever present in that op, `emit`.

There are at least two alternatives to this IR design. The first one
consists of making the output mapping part of each of the ops the
represnet `Rel` messages and expose it through the `RelOpInterface`.
However, this would mean that (1) the custom assembly of each op would
have to represent the mapping, which is manual effort and a possible
source for inconsistencies, (2) each op would have to implement type
inference in the presence of a mapping, and (3) most rewrites of all ops
would have to take that mapping into account for their semantics. Having
the mapping in one place makes all of this simpler. The downside is that
what is kept in a single place in the Substrait protobuf format is now
spread across two ops in the MLIR representation. However, I believe
that this is the smaller of two evils and the current import and export
seems to work.

Another alternative would be to combine the two: make the mapping part
of all ops but *also* introduce a dedicated `emit` op. Then, two passes
could move the mapping from one to the other depending on which of the
two representations would be more convenient. However, this would not
get rid of Issues (1) and (2) above and lead to more concepts and code.

Signed-off-by: Ingo Müller <[email protected]>
ingomueller-net added a commit to substrait-io/substrait-mlir-contrib that referenced this pull request Oct 15, 2024
…e-llvm-sandbox#825)

* [Substrait] Create `emit` op to represent output mapping.

The commit introduces handling for the `emit_kind` field of the
`RelCommon` message, which is a common field of all (?) cases of the
`Rel` message. The current design models this field as a dedicated op
such that the output mapping is only ever present in that op, `emit`.

There are at least two alternatives to this IR design. The first one
consists of making the output mapping part of each of the ops the
represnet `Rel` messages and expose it through the `RelOpInterface`.
However, this would mean that (1) the custom assembly of each op would
have to represent the mapping, which is manual effort and a possible
source for inconsistencies, (2) each op would have to implement type
inference in the presence of a mapping, and (3) most rewrites of all ops
would have to take that mapping into account for their semantics. Having
the mapping in one place makes all of this simpler. The downside is that
what is kept in a single place in the Substrait protobuf format is now
spread across two ops in the MLIR representation. However, I believe
that this is the smaller of two evils and the current import and export
seems to work.

Another alternative would be to combine the two: make the mapping part
of all ops but *also* introduce a dedicated `emit` op. Then, two passes
could move the mapping from one to the other depending on which of the
two representations would be more convenient. However, this would not
get rid of Issues (1) and (2) above and lead to more concepts and code.

Signed-off-by: Ingo Müller <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants