Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU-Accelerated Backend #9986

Closed
1 task done
adamamer20 opened this issue Sep 1, 2024 · 6 comments
Closed
1 task done

GPU-Accelerated Backend #9986

adamamer20 opened this issue Sep 1, 2024 · 6 comments
Assignees
Labels
feature Features or general enhancements new backend PRs or issues related to adding new backends

Comments

@adamamer20
Copy link

adamamer20 commented Sep 1, 2024

Which new backend would you like to see in Ibis?

I think it would be useful to have a GPU-Accelerated backend for operations on big DFs. In this paper they tested duckdb againts other GPU-Accelerated databases and the performance difference is significant. Since ibis used to support pandas, RAPIDS cuDF would be an obvious choice as it probably wouldn't need too much refactoring.

Code of Conduct

  • I agree to follow this project's Code of Conduct
@lostmygithubaccount
Copy link
Member

lostmygithubaccount commented Sep 5, 2024

hi @adamamer20, thanks for opening this issue! apologies for the slow reply

you are right that with the pandas backend, you could use cuDF (and in theory other pandas-API compatible tools) -- this was shown here: https://voltrondata.com/blog/ibis-cudf-pandas

while supporting GPU-based execution engines is important to us, the pandas backend with Ibis (and pandas API in general) leaves a ton of performance on the table, largely negating the purpose of using hardware acceleration. the pandas API assumes all data can fit in memory and a single-threaded eager execution model

Ibis is an independently governed open source project, with its main sponsor being Voltron Data -- BlazingSQL (mentioned in the paper you link) was effectively merged (idk the exact corporate language here) into Voltron Data: https://voltrondata.com/news/fundinglaunch. a lot of the founders and engineers at Voltron Data that many of the Ibis contributors work with were largely responsible for RAPIDS and cuDF and BlazingSQL

separately, Polars is working with the RAPIDS team at NVIDIA to bring a new version of GPU execution that presumably improves on the pandas API version: https://pola.rs/posts/polars-on-gpu/. our tentative plan is to leverage this via the Polars backend for Ibis once it becomes available for single-node NVIDIA GPU execution

so our general thinking on this is:

  • a pandas-API GPU backend isn't performant and not worth keeping our pandas backend around for
  • Polars is adding GPU support Ibis gets "for free", covering single-node GPU query engine
  • Ibis already supports a distributed GPU query engine with Voltron Data's Theseus engine

we also have maintainers who were heavily involved in Dask and may have more thoughts on cuDF via Dask, though my understanding would be you generally still suffer the performance hit of the pandas API

@adamamer20
Copy link
Author

ehi @lostmygithubaccount, thanks for the detailed response! In my tests, pandas-cudf was also slower than eager polars, glad to hear it wasn't due to my implementation. I hadn't heard about Theseus, that looks promising. From what I understand it is not publicly available yet, right? I would keep the issue open until RAPIDS polars comes out or Theseus is a supported backend, but you can close it if you'd like (since it's not dependent on ibis itself).

@lostmygithubaccount
Copy link
Member

yep, Theseus is not public (and probably won't be anytime soon) -- the general thinking is these modern single-node OLAP engines like DuckDB, DataFusion, and eventually Polars (once it implements its new "streaming" engine that works like the other two) are sufficient for 90-99% of data use cases, as they allow you to scale up to ~10TB size queries

it's unclear if single-node GPU w/ current tooling would be of much use in real world scenarios. we'll keep an eye on developments with Polars and look to add via the Polars backend if it's compelling

I'll close this out because there's nothing to do immediately, just waiting and watching. Ibis already does support Theseus (the repo for that backend is private, though could be made public eventually)

@lostmygithubaccount lostmygithubaccount closed this as not planned Won't fix, can't repro, duplicate, stale Sep 6, 2024
@github-project-automation github-project-automation bot moved this from backlog to done in Ibis planning and roadmap Sep 6, 2024
@adamamer20
Copy link
Author

Wanted to leave a note that Polars GPU engine has just been released in open beta

https://pola.rs/posts/gpu-engine-release/

@lostmygithubaccount
Copy link
Member

yep! in theory this should "just work" through the Ibis backend already (we might need to bump the supported version of Polars or something), though I don't have a NVIDIA GPU readily available to try it out

@lostmygithubaccount
Copy link
Member

I believe this will allow you to use it: #10151

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements new backend PRs or issues related to adding new backends
Projects
Archived in project
Development

No branches or pull requests

2 participants