-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add ibis-substrait join operator example #2
Conversation
This patch adds a walkthrough example of converting a simple `join` operator from [Ibis][0] through the [Substrait relalg compute IR][1]. We use the [ibis-substrait][2] compiler to bootstrap a substrait `Extension` that supports some primitive operators (including join) so that we can study the IR. Having an example to study / walkthrough should help us better understand the translation of `substrait` IR into the MLIR ecosystem. Example protobuf text for the substrait join can be found here: https://gist.github.com/bsarden-rivos/280109a64f1f3a2c497d5d2de2f53cd6 [0]: https://github.com/ibis-project/ibis [1]: https://github.com/substrait-io/substrait [2]: https://github.com/ibis-project/ibis-substrait
We can't safely assert that internal ordinal values of table indices will match, since that is an internal representation. However, we can assert that both column fields *exist*.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this patch! I think this is a great starting point.
I had a chance to look closer at substrait and it seems like a good
starting point for the conversion to MLIR since it has a concise
definition of the compute ops (which is easier to parse than pandas for example).
The ibis-substrait compiler also seems like a good direction to go because it breaks the ops into the subtrait representation which consist of type annotations, group operations into the core types etc.
Regarding mapping subtrait types to mlir,
- plan -> mlir::ModuleOp
This makes sense. Seems like the plan is the main top-level definition of the compute consisting of one or more relation trees so this sounds like a 1-1 mapping, where each relation tree could either be a separate function or they could all be grouped into one function. - rel -> mlir::Value
Looking at your example, seems like rel represents a relation such as a join/projection/read/select. In the case of the read type, I would agree the rel would be an mlir::Value. But for relations like join, wouldn't it make sense to make the arguments of the scalar function mlir::Values? - expression
So this seems to be a decomposition of the core ops into even more specialized categories like scalar functions, window functions etc. We could definitely add this as an attribute and tag the ops with it, but we could also think of these as separate ops. So for example, join is a scalar function op with a different body of computation than the cross product. I might not be understanding this right so let me know if this makes sense to you. Also do you know why we would need anything
other than a scalar function? I'm not sure why we would need aggregate or window functions. - function
I didn't see an example of this on the subtrait website or in your test. Is this the scalar function?
input_: Rel = result.project.input | ||
assert(input_.WhichOneof("rel_type") == "join") | ||
|
||
join: Rel = input_.join |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From here it seems like Rel defines a relation and its type can be determined by rel_type(project, join). Is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, I believe we can treat the Rel
as semantically equivalent to an OpInterface in TableGen, where each *Rel variant would just be an mlir::Operation
that inherits from the Relational
interface (or trait).
Also another interesting data point - some folks have already started lowering from ibis to mlir here: https://github.com/google/iree-llvm-sandbox |
Nice find! I'll try and bringup an example of the ibis-to-mlir lowering as well, and see if that path makes more sense. My gut feels like it makes sense to join forces here, but wdyt? It looks like they were bootstrapping with the IRDL Dialect which just came up in the ODM today (I couldn't attend so will have to catch up). |
I think it makes sense to join forces since we are all trying to accomplish similar things. The more the merrier :) I will create an issue on that repo and cc you and them and propose a meeting to brainstorm and align our efforts. |
Here is the link to the issue: iree-org/iree-llvm-sandbox#506 |
This patch adds a walkthrough example of converting a simple
join
operator from Ibis through the Substrait relalg compute IR.
We use the ibis-substrait compiler to bootstrap a substrait
Extension
that supports some primitive operators (including join) sothat we can study the IR. Having an example to study / walkthrough
should help us better understand the translation of
substrait
IRinto the MLIR ecosystem.
Example protobuf text for the substrait join can be found here:
https://gist.github.com/bsarden-rivos/280109a64f1f3a2c497d5d2de2f53cd6
From this deep-dive and a walkthrough of the comments, I think we can make
the following high-level mapping between Substrait and MLIR constructs. Here's
an initial semantic mapping.
Plan
mlir::ModuleOp
*Rel
mlir::Value
Expression
mlir::Attribute
Function
mlir::Operation
An
Expression
is really like a combination ofmlir::Operation
andmlir::Attribute
, sincethey map both to Input / Output Descriptors and to an underlying
Function
throughthe
rex_type
Union.