[Substrait] Create emit op to represent output mapping. #825

ingomueller-net · 2024-05-14T09:18:36Z

This PR is based on and, therefore, includes #820 and its dependencies.

The commit introduces handling for the emit_kind field of the
RelCommon message, which is a common field of all (?) cases of the
Rel message. The current design models this field as a dedicated op
such that the output mapping is only ever present in that op, emit.

There are at least two alternatives to this IR design. The first one
consists of making the output mapping part of each of the ops the
represnet Rel messages and expose it through the RelOpInterface.
However, this would mean that (1) the custom assembly of each op would
have to represent the mapping, which is manual effort and a possible
source for inconsistencies, (2) each op would have to implement type
inference in the presence of a mapping, and (3) most rewrites of all ops
would have to take that mapping into account for their semantics. Having
the mapping in one place makes all of this simpler. The downside is that
what is kept in a single place in the Substrait protobuf format is now
spread across two ops in the MLIR representation. However, I believe
that this is the smaller of two evils and the current import and export
seems to work.

Another alternative would be to combine the two: make the mapping part
of all ops but also introduce a dedicated emit op. Then, two passes
could move the mapping from one to the other depending on which of the
two representations would be more convenient. However, this would not
get rid of Issues (1) and (2) above and lead to more concepts and code.

ingomueller-net · 2024-05-14T11:30:14Z

lib/Target/SubstraitPB/Export.cpp

+/// We just forward to the overload for `RelOpInterface`, which will have to
+/// export this op. We can't (easily) do it here because the emit op is
+/// represented as part of the `RelCommon` message of one of the cases of the
+/// `Rel` message but there is no generic way to access the `common` field of
+/// the various cases.
+FailureOr<std::unique_ptr<Rel>> exportOperation(EmitOp op) {


I am not 100% happy with the whole structure. It is somewhat dangerous to rely on implementations doing a particular thing without being able to enforce it.

The problem is, I think, that I alluded to here: there is no proper way to the access the common field of different message types. Maybe it's worth to consider the following alternative: implement a accessCommonField function with a big switch that accesses that field for any of the cases we support. This is ugly but in concentrates the ugliness in one place. Then exportOperation(EmitOp) could first export the op that produces its input and then modify the common field of the resulting message of that export. I suppose something similar could work for the import.

But this is all local here right? Meaning, your unit tests should cover this and its an invariant one has to satisfy here/when adding a new export type and the export test should cover it.

Unfortunately, it isn't quite local. You spotted to problem below: the need to call this function for every op type. And the problem is that the tests also need to be done for each op type, so I wouldn't notice if I just forget to add both...

jpienaar · 2024-05-29T09:55:16Z

lib/Target/SubstraitPB/Export.cpp

+/// We just forward to the overload for `RelOpInterface`, which will have to
+/// export this op. We can't (easily) do it here because the emit op is
+/// represented as part of the `RelCommon` message of one of the cases of the
+/// `Rel` message but there is no generic way to access the `common` field of
+/// the various cases.
+FailureOr<std::unique_ptr<Rel>> exportOperation(EmitOp op) {


But this is all local here right? Meaning, your unit tests should cover this and its an invariant one has to satisfy here/when adding a new export type and the export test should cover it.

lib/Target/SubstraitPB/Export.cpp

jpienaar · 2024-05-29T10:08:35Z

lib/Target/SubstraitPB/Import.cpp

+/// Imports the provided `RelCommon` message by producing an `EmitOp` that
+/// expresses the `Emit` message if it exists.
+///
+/// **This function must be called at the end of the import function of every


Is it possible to have some top level helper function?

say

template <typename T> mlir::FailureOr<RelOpInterface> importRel(..., T inputOp) { ... importRelImpl<T>(...) return importMaybeEmit(....) }

Not sure if it helps that much, but makes a structure where more difficult to forget it.

Yes, I think that's what I should aim for. I'll have a stab at that. The problem I'll have to solve for that is what I sketched above: I'll need to access the common field of the different message types but because they are different message types, there is no one function that does that. I think that the solution is to encapsulate that functionality in a function that does just that (and is implemented with one big switch).

OK, that's what I ended up doing. I think that that was an important thing to clean up -- the new version is quite a bit more elegant.

jpienaar · 2024-05-29T10:09:40Z

lib/Target/SubstraitPB/Import.cpp

+  using ReferenceSegment = Expression::ReferenceSegment;
+
+  MLIRContext *context = builder.getContext();
+  Location loc = UnknownLoc::get(context);


Could we do better here? even just file name

I think yes, but this would require some code reorganization. A few notes before I forget: I'd need to implement the translation using llvm::SourceMgr and access BufferBuffer:: getBufferIdentifier to get the filename.

I've created #839 to track this effort.

jpienaar · 2024-05-29T10:10:22Z

lib/Target/SubstraitPB/Import.cpp

+    return emitError(loc) << "only direct reference supported";
+
+  // Traverse list to extract indexes.
+  llvm::SmallVector<int64_t> indexes;


(i'm not sure which one is used more db side)

This is due to me not being a native speaker, I think. Fixed in #840.

The commit introduces handling for the `emit_kind` field of the `RelCommon` message, which is a common field of all (?) cases of the `Rel` message. The current design models this field as a dedicated op such that the output mapping is only ever present in that op, `emit`. There are at least two alternatives to this IR design. The first one consists of making the output mapping part of each of the ops the represnet `Rel` messages and expose it through the `RelOpInterface`. However, this would mean that (1) the custom assembly of each op would have to represent the mapping, which is manual effort and a possible source for inconsistencies, (2) each op would have to implement type inference in the presence of a mapping, and (3) most rewrites of all ops would have to take that mapping into account for their semantics. Having the mapping in one place makes all of this simpler. The downside is that what is kept in a single place in the Substrait protobuf format is now spread across two ops in the MLIR representation. However, I believe that this is the smaller of two evils and the current import and export seems to work. Another alternative would be to combine the two: make the mapping part of all ops but *also* introduce a dedicated `emit` op. Then, two passes could move the mapping from one to the other depending on which of the two representations would be more convenient. However, this would not get rid of Issues (1) and (2) above and lead to more concepts and code. Signed-off-by: Ingo Müller <[email protected]>

ingomueller-net

Thanks for the review! I'll address the points shortly.

ingomueller-net · 2024-05-29T14:50:09Z

lib/Target/SubstraitPB/Export.cpp

+/// We just forward to the overload for `RelOpInterface`, which will have to
+/// export this op. We can't (easily) do it here because the emit op is
+/// represented as part of the `RelCommon` message of one of the cases of the
+/// `Rel` message but there is no generic way to access the `common` field of
+/// the various cases.
+FailureOr<std::unique_ptr<Rel>> exportOperation(EmitOp op) {


Unfortunately, it isn't quite local. You spotted to problem below: the need to call this function for every op type. And the problem is that the tests also need to be done for each op type, so I wouldn't notice if I just forget to add both...

ingomueller-net · 2024-05-29T15:01:10Z

lib/Target/SubstraitPB/Import.cpp

+/// Imports the provided `RelCommon` message by producing an `EmitOp` that
+/// expresses the `Emit` message if it exists.
+///
+/// **This function must be called at the end of the import function of every


Yes, I think that's what I should aim for. I'll have a stab at that. The problem I'll have to solve for that is what I sketched above: I'll need to access the common field of the different message types but because they are different message types, there is no one function that does that. I think that the solution is to encapsulate that functionality in a function that does just that (and is implemented with one big switch).

The first attempt had been rather akward, requiring the import and export of each specific op to cooperate with the corresponding logic of the `emit` op. This was due to the fact that there is no out-of-the-box way to access the `RelCommon` message of the `rel_type` message without being specific to the `rel_type` case. In this version, two utility functions provide access to that message (one read-only, one mutable). This allows the import and export of `emit` to be local, which removes a whole class of difficult to detect bugs. Signed-off-by: Ingo Müller <[email protected]>

Signed-off-by: Ingo Müller <[email protected]>

…e-llvm-sandbox#825) * [Substrait] Create `emit` op to represent output mapping. The commit introduces handling for the `emit_kind` field of the `RelCommon` message, which is a common field of all (?) cases of the `Rel` message. The current design models this field as a dedicated op such that the output mapping is only ever present in that op, `emit`. There are at least two alternatives to this IR design. The first one consists of making the output mapping part of each of the ops the represnet `Rel` messages and expose it through the `RelOpInterface`. However, this would mean that (1) the custom assembly of each op would have to represent the mapping, which is manual effort and a possible source for inconsistencies, (2) each op would have to implement type inference in the presence of a mapping, and (3) most rewrites of all ops would have to take that mapping into account for their semantics. Having the mapping in one place makes all of this simpler. The downside is that what is kept in a single place in the Substrait protobuf format is now spread across two ops in the MLIR representation. However, I believe that this is the smaller of two evils and the current import and export seems to work. Another alternative would be to combine the two: make the mapping part of all ops but *also* introduce a dedicated `emit` op. Then, two passes could move the mapping from one to the other depending on which of the two representations would be more convenient. However, this would not get rid of Issues (1) and (2) above and lead to more concepts and code. Signed-off-by: Ingo Müller <[email protected]>

ingomueller-net commented May 14, 2024

View reviewed changes

ingomueller-net force-pushed the substrait-emit branch from d411841 to c30ab73 Compare May 15, 2024 12:19

ingomueller-net mentioned this pull request May 16, 2024

[Substrait] Add folder for EmitOp with identity mapping. #827

Merged

ingomueller-net force-pushed the substrait-emit branch from c30ab73 to 9f58c07 Compare May 17, 2024 08:40

ingomueller-net force-pushed the substrait-emit branch from 9f58c07 to dfd6939 Compare May 27, 2024 14:55

jpienaar approved these changes May 29, 2024

View reviewed changes

ingomueller-net force-pushed the substrait-emit branch from dfd6939 to 6ccb6d5 Compare May 29, 2024 12:35

ingomueller-net commented May 29, 2024

View reviewed changes

ingomueller-net added 2 commits May 30, 2024 12:07

Add test for export of emit after cross.

cf885d7

Signed-off-by: Ingo Müller <[email protected]>

ingomueller-net mentioned this pull request May 30, 2024

Pass file name into SubstraitPB import. #839

Open

ingomueller-net merged commit 53adb5e into iree-org:main May 30, 2024
4 checks passed

ingomueller-net deleted the substrait-emit branch May 30, 2024 14:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Substrait] Create emit op to represent output mapping. #825

[Substrait] Create emit op to represent output mapping. #825

ingomueller-net commented May 14, 2024

ingomueller-net May 14, 2024

jpienaar May 29, 2024

ingomueller-net May 29, 2024

jpienaar May 29, 2024

jpienaar May 29, 2024

ingomueller-net May 29, 2024

ingomueller-net May 30, 2024

jpienaar May 29, 2024

ingomueller-net May 30, 2024

ingomueller-net May 30, 2024

jpienaar May 29, 2024

jpienaar May 29, 2024

ingomueller-net May 30, 2024

ingomueller-net left a comment

ingomueller-net May 29, 2024

ingomueller-net May 29, 2024

[Substrait] Create emit op to represent output mapping. #825

[Substrait] Create emit op to represent output mapping. #825

Conversation

ingomueller-net commented May 14, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ingomueller-net left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment