Collaboration on data analytics workloads in MLIR #506

harsh-nod · 2022-06-02T22:25:01Z

@bsarden-rivos and myself are interested in running data analytics workloads (as found in popular frameworks such as pandas) on iree. To do that, we were trying to flesh out a path from pandas to mlir. I have a simple prototype that takes element-wise addition and lowers it to linalg here: https://github.com/nod-ai/pandas-mlir. But recently, based on @bsarden-rivos's findings, we have been thinking of using substrait (https://substrait.io) and more specifically, the ibis-subtrait compiler (https://github.com/ibis-project/ibis-substrait) as a starting point for lowering to MLIR (linalg on tensors).

Looking at your commits in this repo, seems like you are exploring an alternate path to get to MLIR and so would love to chat and brainstorm with you all about your project goals, roadmap, current state of things and how we can align efforts and collaborate in any way.

Thanks and looking forward to collaborating with you all,
Harsh

nicolasvasilache · 2022-06-03T16:15:50Z

Hello @harsh-nod , thanks for sharing your plans!

Would you be available to meet next week with @webmiche ?
Since Harsh is on the W coast, I could meet Thursday 5-7pm CEST (8-10am PST) or after 8 pm CEST / 11am PST.

Do you have availabilities in these slots ?

harsh-nod · 2022-06-03T16:38:51Z

Hi @nicolasvasilache ,

Unfortunately, I am out of the office on Thursday the 9th, but (8am PST) works for me on Mon, Tue, Wed of next week. Alternatively, if only Thursdays work, I can do Thursday June 16 at 8am PST. Do any of those days work for you?

webmiche · 2022-06-04T10:42:04Z

Hey, that sounds interesting, looking forward to the meeting!

In terms of timing, I can't make Thursday 9th since I will be on military service Thursday and Friday. Other than that, I can make 8am PST on any day, but Fridays.

nicolasvasilache · 2022-06-06T17:21:46Z

Ok, let's do June 16th at 8am PST?
I can also other days but it seems that this particular day is already preidentified as working for all.

bsarden-rivos · 2022-06-06T18:14:12Z

Hi all, I'm excited for our chat! I can also do other days, but earlier than June 16th works better for me since I have some free cycles to work on this early this week / next week. My schedule is wide open, but how does Monday (6/13) at 8am work for folks? A few questions for @webmiche in the meantime...:

Where would be the best place to start contributing? Looking at some of the PR's in flight I'm also interested in running a tcph query through MLIR, but not sure where to start.
What would be the best path for running a query e2e? Does extending alp and the AlpRuntime to execute a query make sense, or is there already something in the works that I can help flesh out?

harsh-nod · 2022-06-06T18:43:10Z

Unfortunately I am out of the office on Monday 6/13, 14 and 15. If we want something sooner, I can meet 8am PST tomorrow 6/7 or the day after 6/8?

bsarden-rivos · 2022-06-06T19:24:19Z

I can meet 8am PST tomorrow 6/7 or the day after 6/8?

Either time works for me!

webmiche · 2022-06-07T06:49:31Z

Unfortunately I am out of the office on Monday 6/13, 14 and 15. If we want something sooner, I can meet 8am PST tomorrow 6/7 or the day after 6/8?

For me, that time window would only work tomorrow (6/8).

Where would be the best place to start contributing? Looking at some of the PR's in flight I'm also interested in running a tcph query through MLIR, but not sure where to start.

I think it would be very useful for our meeting, if you could look through the tpc-h queries and maybe think a bit about some of the challenges for modeling/running with mlir. I think I found "hard to solve" problems for most of them and I feel that Q6 is the most reasonable to get running first, but I would be happy about a second opinion.

What would be the best path for running a query e2e? Does extending alp and the AlpRuntime to execute a query make sense, or is there already something in the works that I can help flesh out?

This is still very much an open question. The broad idea that we have is that since pandas stores data in columnar form and these columns are numpy arrays, we extract the numpy arrays from pandas and pass them to our mlir-functions (find the file here). This approach piggy backs off of parts of the sandbox. AFAIK, we have not yet developed a more concrete/complete idea of how this should look in the end.

nicolasvasilache · 2022-06-07T08:17:20Z

6/8 at 8am PST is great, I'll post a link here

harsh-nod · 2022-06-07T16:55:03Z

Not sure if you all have read this (just came out a few days ago), but found an interesting paper on implementing relational operators in PyTorch and running on TPC-H queries (including Q6) where they outperform DuckDB. Query Processing on Tensor Computation Runtimes

nicolasvasilache · 2022-06-08T08:16:00Z

Here is the meeting for today's meeting.
Video call link: https://meet.google.com/ndw-fzsv-hqb
Or dial: ‪(CH) +41 31 560 24 00‬ PIN: ‪295 558 240 8107‬#
More phone numbers: https://tel.meet/ndw-fzsv-hqb?pin=2955582408107

harsh-nod · 2022-06-08T15:03:19Z

I'm trying to join the meeting but its stuck at "Asking to join...".

harsh-nod mentioned this issue Jun 2, 2022

feat: add ibis-substrait join operator example nod-ai/pandas-mlir#2

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collaboration on data analytics workloads in MLIR #506

Collaboration on data analytics workloads in MLIR #506

harsh-nod commented Jun 2, 2022

nicolasvasilache commented Jun 3, 2022

harsh-nod commented Jun 3, 2022

webmiche commented Jun 4, 2022

nicolasvasilache commented Jun 6, 2022

bsarden-rivos commented Jun 6, 2022 •

edited

Loading

harsh-nod commented Jun 6, 2022

bsarden-rivos commented Jun 6, 2022 •

edited

Loading

webmiche commented Jun 7, 2022

nicolasvasilache commented Jun 7, 2022

harsh-nod commented Jun 7, 2022 •

edited

Loading

nicolasvasilache commented Jun 8, 2022

harsh-nod commented Jun 8, 2022

Collaboration on data analytics workloads in MLIR #506

Collaboration on data analytics workloads in MLIR #506

Comments

harsh-nod commented Jun 2, 2022

nicolasvasilache commented Jun 3, 2022

harsh-nod commented Jun 3, 2022

webmiche commented Jun 4, 2022

nicolasvasilache commented Jun 6, 2022

bsarden-rivos commented Jun 6, 2022 • edited Loading

harsh-nod commented Jun 6, 2022

bsarden-rivos commented Jun 6, 2022 • edited Loading

webmiche commented Jun 7, 2022

nicolasvasilache commented Jun 7, 2022

harsh-nod commented Jun 7, 2022 • edited Loading

nicolasvasilache commented Jun 8, 2022

harsh-nod commented Jun 8, 2022

bsarden-rivos commented Jun 6, 2022 •

edited

Loading

bsarden-rivos commented Jun 6, 2022 •

edited

Loading

harsh-nod commented Jun 7, 2022 •

edited

Loading