-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Substrait] Create emit op to represent output mapping. #825
[Substrait] Create emit op to represent output mapping. #825
Conversation
lib/Target/SubstraitPB/Export.cpp
Outdated
/// We just forward to the overload for `RelOpInterface`, which will have to | ||
/// export this op. We can't (easily) do it here because the emit op is | ||
/// represented as part of the `RelCommon` message of one of the cases of the | ||
/// `Rel` message but there is no generic way to access the `common` field of | ||
/// the various cases. | ||
FailureOr<std::unique_ptr<Rel>> exportOperation(EmitOp op) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not 100% happy with the whole structure. It is somewhat dangerous to rely on implementations doing a particular thing without being able to enforce it.
The problem is, I think, that I alluded to here: there is no proper way to the access the common
field of different message types. Maybe it's worth to consider the following alternative: implement a accessCommonField
function with a big switch that accesses that field for any of the cases we support. This is ugly but in concentrates the ugliness in one place. Then exportOperation(EmitOp)
could first export the op that produces its input and then modify the common
field of the resulting message of that export. I suppose something similar could work for the import.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this is all local here right? Meaning, your unit tests should cover this and its an invariant one has to satisfy here/when adding a new export type and the export test should cover it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, it isn't quite local. You spotted to problem below: the need to call this function for every op type. And the problem is that the tests also need to be done for each op type, so I wouldn't notice if I just forget to add both...
d411841
to
c30ab73
Compare
c30ab73
to
9f58c07
Compare
9f58c07
to
dfd6939
Compare
lib/Target/SubstraitPB/Export.cpp
Outdated
/// We just forward to the overload for `RelOpInterface`, which will have to | ||
/// export this op. We can't (easily) do it here because the emit op is | ||
/// represented as part of the `RelCommon` message of one of the cases of the | ||
/// `Rel` message but there is no generic way to access the `common` field of | ||
/// the various cases. | ||
FailureOr<std::unique_ptr<Rel>> exportOperation(EmitOp op) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this is all local here right? Meaning, your unit tests should cover this and its an invariant one has to satisfy here/when adding a new export type and the export test should cover it.
lib/Target/SubstraitPB/Import.cpp
Outdated
/// Imports the provided `RelCommon` message by producing an `EmitOp` that | ||
/// expresses the `Emit` message if it exists. | ||
/// | ||
/// **This function must be called at the end of the import function of every |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to have some top level helper function?
say
template <typename T>
mlir::FailureOr<RelOpInterface> importRel(..., T inputOp) {
... importRelImpl<T>(...)
return importMaybeEmit(....)
}
Not sure if it helps that much, but makes a structure where more difficult to forget it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that's what I should aim for. I'll have a stab at that. The problem I'll have to solve for that is what I sketched above: I'll need to access the common
field of the different message types but because they are different message types, there is no one function that does that. I think that the solution is to encapsulate that functionality in a function that does just that (and is implemented with one big switch).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that's what I ended up doing. I think that that was an important thing to clean up -- the new version is quite a bit more elegant.
lib/Target/SubstraitPB/Import.cpp
Outdated
using ReferenceSegment = Expression::ReferenceSegment; | ||
|
||
MLIRContext *context = builder.getContext(); | ||
Location loc = UnknownLoc::get(context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do better here? even just file name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think yes, but this would require some code reorganization. A few notes before I forget: I'd need to implement the translation using llvm::SourceMgr
and access BufferBuffer:: getBufferIdentifier
to get the filename.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've created #839 to track this effort.
lib/Target/SubstraitPB/Import.cpp
Outdated
return emitError(loc) << "only direct reference supported"; | ||
|
||
// Traverse list to extract indexes. | ||
llvm::SmallVector<int64_t> indexes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indices ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(i'm not sure which one is used more db side)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is due to me not being a native speaker, I think. Fixed in #840.
The commit introduces handling for the `emit_kind` field of the `RelCommon` message, which is a common field of all (?) cases of the `Rel` message. The current design models this field as a dedicated op such that the output mapping is only ever present in that op, `emit`. There are at least two alternatives to this IR design. The first one consists of making the output mapping part of each of the ops the represnet `Rel` messages and expose it through the `RelOpInterface`. However, this would mean that (1) the custom assembly of each op would have to represent the mapping, which is manual effort and a possible source for inconsistencies, (2) each op would have to implement type inference in the presence of a mapping, and (3) most rewrites of all ops would have to take that mapping into account for their semantics. Having the mapping in one place makes all of this simpler. The downside is that what is kept in a single place in the Substrait protobuf format is now spread across two ops in the MLIR representation. However, I believe that this is the smaller of two evils and the current import and export seems to work. Another alternative would be to combine the two: make the mapping part of all ops but *also* introduce a dedicated `emit` op. Then, two passes could move the mapping from one to the other depending on which of the two representations would be more convenient. However, this would not get rid of Issues (1) and (2) above and lead to more concepts and code. Signed-off-by: Ingo Müller <[email protected]>
dfd6939
to
6ccb6d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review! I'll address the points shortly.
lib/Target/SubstraitPB/Export.cpp
Outdated
/// We just forward to the overload for `RelOpInterface`, which will have to | ||
/// export this op. We can't (easily) do it here because the emit op is | ||
/// represented as part of the `RelCommon` message of one of the cases of the | ||
/// `Rel` message but there is no generic way to access the `common` field of | ||
/// the various cases. | ||
FailureOr<std::unique_ptr<Rel>> exportOperation(EmitOp op) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, it isn't quite local. You spotted to problem below: the need to call this function for every op type. And the problem is that the tests also need to be done for each op type, so I wouldn't notice if I just forget to add both...
lib/Target/SubstraitPB/Import.cpp
Outdated
/// Imports the provided `RelCommon` message by producing an `EmitOp` that | ||
/// expresses the `Emit` message if it exists. | ||
/// | ||
/// **This function must be called at the end of the import function of every |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that's what I should aim for. I'll have a stab at that. The problem I'll have to solve for that is what I sketched above: I'll need to access the common
field of the different message types but because they are different message types, there is no one function that does that. I think that the solution is to encapsulate that functionality in a function that does just that (and is implemented with one big switch).
The first attempt had been rather akward, requiring the import and export of each specific op to cooperate with the corresponding logic of the `emit` op. This was due to the fact that there is no out-of-the-box way to access the `RelCommon` message of the `rel_type` message without being specific to the `rel_type` case. In this version, two utility functions provide access to that message (one read-only, one mutable). This allows the import and export of `emit` to be local, which removes a whole class of difficult to detect bugs. Signed-off-by: Ingo Müller <[email protected]>
Signed-off-by: Ingo Müller <[email protected]>
…e-llvm-sandbox#825) * [Substrait] Create `emit` op to represent output mapping. The commit introduces handling for the `emit_kind` field of the `RelCommon` message, which is a common field of all (?) cases of the `Rel` message. The current design models this field as a dedicated op such that the output mapping is only ever present in that op, `emit`. There are at least two alternatives to this IR design. The first one consists of making the output mapping part of each of the ops the represnet `Rel` messages and expose it through the `RelOpInterface`. However, this would mean that (1) the custom assembly of each op would have to represent the mapping, which is manual effort and a possible source for inconsistencies, (2) each op would have to implement type inference in the presence of a mapping, and (3) most rewrites of all ops would have to take that mapping into account for their semantics. Having the mapping in one place makes all of this simpler. The downside is that what is kept in a single place in the Substrait protobuf format is now spread across two ops in the MLIR representation. However, I believe that this is the smaller of two evils and the current import and export seems to work. Another alternative would be to combine the two: make the mapping part of all ops but *also* introduce a dedicated `emit` op. Then, two passes could move the mapping from one to the other depending on which of the two representations would be more convenient. However, this would not get rid of Issues (1) and (2) above and lead to more concepts and code. Signed-off-by: Ingo Müller <[email protected]>
…e-llvm-sandbox#825) * [Substrait] Create `emit` op to represent output mapping. The commit introduces handling for the `emit_kind` field of the `RelCommon` message, which is a common field of all (?) cases of the `Rel` message. The current design models this field as a dedicated op such that the output mapping is only ever present in that op, `emit`. There are at least two alternatives to this IR design. The first one consists of making the output mapping part of each of the ops the represnet `Rel` messages and expose it through the `RelOpInterface`. However, this would mean that (1) the custom assembly of each op would have to represent the mapping, which is manual effort and a possible source for inconsistencies, (2) each op would have to implement type inference in the presence of a mapping, and (3) most rewrites of all ops would have to take that mapping into account for their semantics. Having the mapping in one place makes all of this simpler. The downside is that what is kept in a single place in the Substrait protobuf format is now spread across two ops in the MLIR representation. However, I believe that this is the smaller of two evils and the current import and export seems to work. Another alternative would be to combine the two: make the mapping part of all ops but *also* introduce a dedicated `emit` op. Then, two passes could move the mapping from one to the other depending on which of the two representations would be more convenient. However, this would not get rid of Issues (1) and (2) above and lead to more concepts and code. Signed-off-by: Ingo Müller <[email protected]>
This PR is based on and, therefore, includes #820 and its dependencies.
The commit introduces handling for the
emit_kind
field of theRelCommon
message, which is a common field of all (?) cases of theRel
message. The current design models this field as a dedicated opsuch that the output mapping is only ever present in that op,
emit
.There are at least two alternatives to this IR design. The first one
consists of making the output mapping part of each of the ops the
represnet
Rel
messages and expose it through theRelOpInterface
.However, this would mean that (1) the custom assembly of each op would
have to represent the mapping, which is manual effort and a possible
source for inconsistencies, (2) each op would have to implement type
inference in the presence of a mapping, and (3) most rewrites of all ops
would have to take that mapping into account for their semantics. Having
the mapping in one place makes all of this simpler. The downside is that
what is kept in a single place in the Substrait protobuf format is now
spread across two ops in the MLIR representation. However, I believe
that this is the smaller of two evils and the current import and export
seems to work.
Another alternative would be to combine the two: make the mapping part
of all ops but also introduce a dedicated
emit
op. Then, two passescould move the mapping from one to the other depending on which of the
two representations would be more convenient. However, this would not
get rid of Issues (1) and (2) above and lead to more concepts and code.