-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collaboration on data analytics workloads in MLIR #506
Comments
Hello @harsh-nod , thanks for sharing your plans! Would you be available to meet next week with @webmiche ? Do you have availabilities in these slots ? |
Hi @nicolasvasilache , Unfortunately, I am out of the office on Thursday the 9th, but (8am PST) works for me on Mon, Tue, Wed of next week. Alternatively, if only Thursdays work, I can do Thursday June 16 at 8am PST. Do any of those days work for you? |
Hey, that sounds interesting, looking forward to the meeting! In terms of timing, I can't make Thursday 9th since I will be on military service Thursday and Friday. Other than that, I can make 8am PST on any day, but Fridays. |
Ok, let's do June 16th at 8am PST? |
Hi all, I'm excited for our chat! I can also do other days, but earlier than June 16th works better for me since I have some free cycles to work on this early this week / next week. My schedule is wide open, but how does Monday (6/13) at 8am work for folks? A few questions for @webmiche in the meantime...:
|
Unfortunately I am out of the office on Monday 6/13, 14 and 15. If we want something sooner, I can meet 8am PST tomorrow 6/7 or the day after 6/8? |
Either time works for me! |
For me, that time window would only work tomorrow (6/8).
I think it would be very useful for our meeting, if you could look through the tpc-h queries and maybe think a bit about some of the challenges for modeling/running with mlir. I think I found "hard to solve" problems for most of them and I feel that Q6 is the most reasonable to get running first, but I would be happy about a second opinion.
This is still very much an open question. The broad idea that we have is that since pandas stores data in columnar form and these columns are numpy arrays, we extract the numpy arrays from pandas and pass them to our mlir-functions (find the file here). This approach piggy backs off of parts of the sandbox. AFAIK, we have not yet developed a more concrete/complete idea of how this should look in the end. |
6/8 at 8am PST is great, I'll post a link here |
Not sure if you all have read this (just came out a few days ago), but found an interesting paper on implementing relational operators in PyTorch and running on TPC-H queries (including Q6) where they outperform DuckDB. Query Processing on Tensor Computation Runtimes |
Here is the meeting for today's meeting. |
I'm trying to join the meeting but its stuck at "Asking to join...". |
Hi folks @ingomueller-net @webmiche,
@bsarden-rivos and myself are interested in running data analytics workloads (as found in popular frameworks such as pandas) on iree. To do that, we were trying to flesh out a path from pandas to mlir. I have a simple prototype that takes element-wise addition and lowers it to linalg here: https://github.com/nod-ai/pandas-mlir. But recently, based on @bsarden-rivos's findings, we have been thinking of using substrait (https://substrait.io) and more specifically, the ibis-subtrait compiler (https://github.com/ibis-project/ibis-substrait) as a starting point for lowering to MLIR (linalg on tensors).
Looking at your commits in this repo, seems like you are exploring an alternate path to get to MLIR and so would love to chat and brainstorm with you all about your project goals, roadmap, current state of things and how we can align efforts and collaborate in any way.
Thanks and looking forward to collaborating with you all,
Harsh
The text was updated successfully, but these errors were encountered: