Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add ibis-substrait join operator example #2

Merged
merged 4 commits into from
Jun 20, 2022

Conversation

bsarden-rivos
Copy link
Contributor

This patch adds a walkthrough example of converting a simple join
operator from Ibis through the Substrait relalg compute IR.
We use the ibis-substrait compiler to bootstrap a substrait
Extension that supports some primitive operators (including join) so
that we can study the IR. Having an example to study / walkthrough
should help us better understand the translation of substrait IR
into the MLIR ecosystem.

Example protobuf text for the substrait join can be found here:
https://gist.github.com/bsarden-rivos/280109a64f1f3a2c497d5d2de2f53cd6

From this deep-dive and a walkthrough of the comments, I think we can make
the following high-level mapping between Substrait and MLIR constructs. Here's
an initial semantic mapping.

Substrait Protobuf Type MLIR Type
Plan mlir::ModuleOp
*Rel mlir::Value
Expression mlir::Attribute
Function mlir::Operation

An Expression is really like a combination of mlir::Operation and mlir::Attribute, since
they map both to Input / Output Descriptors and to an underlying Function through
the rex_type Union.

This patch adds a walkthrough example of converting a simple `join`
operator from [Ibis][0] through the [Substrait relalg compute IR][1].
We use the [ibis-substrait][2] compiler to bootstrap a substrait
`Extension` that supports some primitive operators (including join) so
that we can study the IR. Having an example to study / walkthrough
should help us better understand the translation of `substrait` IR
into the MLIR ecosystem.

Example protobuf text for the substrait join can be found here:
https://gist.github.com/bsarden-rivos/280109a64f1f3a2c497d5d2de2f53cd6

[0]: https://github.com/ibis-project/ibis
[1]: https://github.com/substrait-io/substrait
[2]: https://github.com/ibis-project/ibis-substrait
@bsarden-rivos bsarden-rivos requested a review from harsh-nod June 1, 2022 19:46
@bsarden-rivos bsarden-rivos self-assigned this Jun 1, 2022
We can't safely assert that internal ordinal values of table indices
will match, since that is an internal representation. However, we can
assert that both column fields *exist*.
Copy link
Contributor

@harsh-nod harsh-nod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this patch! I think this is a great starting point.

I had a chance to look closer at substrait and it seems like a good
starting point for the conversion to MLIR since it has a concise
definition of the compute ops (which is easier to parse than pandas for example).

The ibis-substrait compiler also seems like a good direction to go because it breaks the ops into the subtrait representation which consist of type annotations, group operations into the core types etc.

Regarding mapping subtrait types to mlir,

  1. plan -> mlir::ModuleOp
    This makes sense. Seems like the plan is the main top-level definition of the compute consisting of one or more relation trees so this sounds like a 1-1 mapping, where each relation tree could either be a separate function or they could all be grouped into one function.
  2. rel -> mlir::Value
    Looking at your example, seems like rel represents a relation such as a join/projection/read/select. In the case of the read type, I would agree the rel would be an mlir::Value. But for relations like join, wouldn't it make sense to make the arguments of the scalar function mlir::Values?
  3. expression
    So this seems to be a decomposition of the core ops into even more specialized categories like scalar functions, window functions etc. We could definitely add this as an attribute and tag the ops with it, but we could also think of these as separate ops. So for example, join is a scalar function op with a different body of computation than the cross product. I might not be understanding this right so let me know if this makes sense to you. Also do you know why we would need anything
    other than a scalar function? I'm not sure why we would need aggregate or window functions.
  4. function
    I didn't see an example of this on the subtrait website or in your test. Is this the scalar function?

python/test/ibis/test_ibis_substrait.py Show resolved Hide resolved
python/test/ibis/test_ibis_substrait.py Show resolved Hide resolved
python/test/ibis/test_ibis_substrait.py Show resolved Hide resolved
input_: Rel = result.project.input
assert(input_.WhichOneof("rel_type") == "join")

join: Rel = input_.join
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From here it seems like Rel defines a relation and its type can be determined by rel_type(project, join). Is this correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, I believe we can treat the Rel as semantically equivalent to an OpInterface in TableGen, where each *Rel variant would just be an mlir::Operation that inherits from the Relational interface (or trait).

python/test/ibis/test_ibis_substrait.py Show resolved Hide resolved
@harsh-nod
Copy link
Contributor

Also another interesting data point - some folks have already started lowering from ibis to mlir here: https://github.com/google/iree-llvm-sandbox

@bsarden-rivos
Copy link
Contributor Author

Nice find! I'll try and bringup an example of the ibis-to-mlir lowering as well, and see if that path makes more sense. My gut feels like it makes sense to join forces here, but wdyt? It looks like they were bootstrapping with the IRDL Dialect which just came up in the ODM today (I couldn't attend so will have to catch up).

@harsh-nod
Copy link
Contributor

I think it makes sense to join forces since we are all trying to accomplish similar things. The more the merrier :) I will create an issue on that repo and cc you and them and propose a meeting to brainstorm and align our efforts.

@harsh-nod
Copy link
Contributor

I think it makes sense to join forces since we are all trying to accomplish similar things. The more the merrier :) I will create an issue on that repo and cc you and them and propose a meeting to brainstorm and align our efforts.

Here is the link to the issue: iree-org/iree-llvm-sandbox#506

@harsh-nod harsh-nod merged commit f3a85f1 into main Jun 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants