feat: add ibis-substrait join operator example #2

bsarden-rivos · 2022-06-01T19:46:26Z

This patch adds a walkthrough example of converting a simple join
operator from Ibis through the Substrait relalg compute IR.
We use the ibis-substrait compiler to bootstrap a substrait
Extension that supports some primitive operators (including join) so
that we can study the IR. Having an example to study / walkthrough
should help us better understand the translation of substrait IR
into the MLIR ecosystem.

Example protobuf text for the substrait join can be found here:
https://gist.github.com/bsarden-rivos/280109a64f1f3a2c497d5d2de2f53cd6

From this deep-dive and a walkthrough of the comments, I think we can make
the following high-level mapping between Substrait and MLIR constructs. Here's
an initial semantic mapping.

Substrait Protobuf Type	MLIR Type
`Plan`	`mlir::ModuleOp`
`*Rel`	`mlir::Value`
`Expression`	`mlir::Attribute`
`Function`	`mlir::Operation`

An Expression is really like a combination of mlir::Operation and mlir::Attribute, since
they map both to Input / Output Descriptors and to an underlying Function through
the rex_type Union.

This patch adds a walkthrough example of converting a simple `join` operator from [Ibis][0] through the [Substrait relalg compute IR][1]. We use the [ibis-substrait][2] compiler to bootstrap a substrait `Extension` that supports some primitive operators (including join) so that we can study the IR. Having an example to study / walkthrough should help us better understand the translation of `substrait` IR into the MLIR ecosystem. Example protobuf text for the substrait join can be found here: https://gist.github.com/bsarden-rivos/280109a64f1f3a2c497d5d2de2f53cd6 [0]: https://github.com/ibis-project/ibis [1]: https://github.com/substrait-io/substrait [2]: https://github.com/ibis-project/ibis-substrait

We can't safely assert that internal ordinal values of table indices will match, since that is an internal representation. However, we can assert that both column fields *exist*.

harsh-nod

Thanks for this patch! I think this is a great starting point.

I had a chance to look closer at substrait and it seems like a good
starting point for the conversion to MLIR since it has a concise
definition of the compute ops (which is easier to parse than pandas for example).

The ibis-substrait compiler also seems like a good direction to go because it breaks the ops into the subtrait representation which consist of type annotations, group operations into the core types etc.

Regarding mapping subtrait types to mlir,

plan -> mlir::ModuleOp
This makes sense. Seems like the plan is the main top-level definition of the compute consisting of one or more relation trees so this sounds like a 1-1 mapping, where each relation tree could either be a separate function or they could all be grouped into one function.
rel -> mlir::Value
Looking at your example, seems like rel represents a relation such as a join/projection/read/select. In the case of the read type, I would agree the rel would be an mlir::Value. But for relations like join, wouldn't it make sense to make the arguments of the scalar function mlir::Values?
expression
So this seems to be a decomposition of the core ops into even more specialized categories like scalar functions, window functions etc. We could definitely add this as an attribute and tag the ops with it, but we could also think of these as separate ops. So for example, join is a scalar function op with a different body of computation than the cross product. I might not be understanding this right so let me know if this makes sense to you. Also do you know why we would need anything
other than a scalar function? I'm not sure why we would need aggregate or window functions.
function
I didn't see an example of this on the subtrait website or in your test. Is this the scalar function?

python/test/ibis/test_ibis_substrait.py

harsh-nod · 2022-06-02T15:35:59Z

python/test/ibis/test_ibis_substrait.py

+    input_: Rel = result.project.input
+    assert(input_.WhichOneof("rel_type") == "join")
+
+    join: Rel = input_.join


From here it seems like Rel defines a relation and its type can be determined by rel_type(project, join). Is this correct?

Correct, I believe we can treat the Rel as semantically equivalent to an OpInterface in TableGen, where each *Rel variant would just be an mlir::Operation that inherits from the Relational interface (or trait).

python/test/ibis/test_ibis_substrait.py

harsh-nod · 2022-06-02T17:43:13Z

Also another interesting data point - some folks have already started lowering from ibis to mlir here: https://github.com/google/iree-llvm-sandbox

bsarden-rivos · 2022-06-02T21:02:06Z

Nice find! I'll try and bringup an example of the ibis-to-mlir lowering as well, and see if that path makes more sense. My gut feels like it makes sense to join forces here, but wdyt? It looks like they were bootstrapping with the IRDL Dialect which just came up in the ODM today (I couldn't attend so will have to catch up).

harsh-nod · 2022-06-02T22:11:47Z

I think it makes sense to join forces since we are all trying to accomplish similar things. The more the merrier :) I will create an issue on that repo and cc you and them and propose a meeting to brainstorm and align our efforts.

harsh-nod · 2022-06-02T22:25:34Z

I think it makes sense to join forces since we are all trying to accomplish similar things. The more the merrier :) I will create an issue on that repo and cc you and them and propose a meeting to brainstorm and align our efforts.

Here is the link to the issue: iree-org/iree-llvm-sandbox#506

bsarden-rivos added 2 commits June 1, 2022 14:36

chore: add venv to gitignore

7a6bc57

bsarden-rivos requested a review from harsh-nod June 1, 2022 19:46

bsarden-rivos self-assigned this Jun 1, 2022

bsarden-rivos added 2 commits June 1, 2022 14:51

test(substrait): fixup bad assert

578b810

We can't safely assert that internal ordinal values of table indices will match, since that is an internal representation. However, we can assert that both column fields *exist*.

doc: fixup stale comment

bd6eaa1

harsh-nod approved these changes Jun 2, 2022

View reviewed changes

harsh-nod merged commit f3a85f1 into main Jun 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add ibis-substrait join operator example #2

feat: add ibis-substrait join operator example #2

bsarden-rivos commented Jun 1, 2022

harsh-nod left a comment

harsh-nod Jun 2, 2022

bsarden-rivos Jun 6, 2022

harsh-nod commented Jun 2, 2022

bsarden-rivos commented Jun 2, 2022

harsh-nod commented Jun 2, 2022

harsh-nod commented Jun 2, 2022

feat: add ibis-substrait join operator example #2

feat: add ibis-substrait join operator example #2

Conversation

bsarden-rivos commented Jun 1, 2022

harsh-nod left a comment

Choose a reason for hiding this comment

harsh-nod Jun 2, 2022

Choose a reason for hiding this comment

bsarden-rivos Jun 6, 2022

Choose a reason for hiding this comment

harsh-nod commented Jun 2, 2022

bsarden-rivos commented Jun 2, 2022

harsh-nod commented Jun 2, 2022

harsh-nod commented Jun 2, 2022