Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC] DataFusion on Ray 0.1.0 release #2

Open
10 of 18 tasks
andygrove opened this issue Sep 21, 2024 · 0 comments
Open
10 of 18 tasks

[EPIC] DataFusion on Ray 0.1.0 release #2

andygrove opened this issue Sep 21, 2024 · 0 comments

Comments

@andygrove
Copy link
Member

andygrove commented Sep 21, 2024

Manage donation

Initial tasks once the donation has been accepted

Benchmarking

  • Add documentation for running benchmarks
  • Automate running benchmarks against PRs using compute infrastructure provided by @andygrove
  • Update performance charts in README

Set up release process

  • Add scripts for creating and publishing source releases
  • Add CI scripts for building Python wheels
  • Add CI scripts for building Docker images
  • Set up RAT checks in CI

First Release

andygrove added a commit that referenced this issue Sep 30, 2024
* Initial commit

* Basic project structure

* gitignore

* Add protobuf plumbing (#2)

* Implement protobuf codec

* Wire up query execution (#4)

* query runs end to end (#5)

* re-organize python code (#6)

* Implement shuffle more fully (#7)

* update README (#8)

* Bug fix (#9)

* Support multiple shuffle partitions (#10)

* More shuffle fixes (#11)

* fix readme (#12)

* add perf chart (#13)

* Remove hard-coded temp dir (#14)

* bug fix (#15)

* New results (#16)

* Upgrade to DataFusion 17, fix a couple of bugs, add some tests (#18)

* Remove debug logging (#19)

* update README (#21)

* Make better use of futures (#23)

* Documentation & bug fixes (#24)

* Update README.md

* [WIP] Use Ray object store for shuffle exchange (#28)

* Fixes for Ray-based shuffle (#29)

* Small fixes for Context (#30)

* Make distributed execution work (#33)

* Make distributed execution work

* fix tips.py

* fixes; incorporate changes from #32

* Upgrade to DataFusion 20 (#31)

* Add support for DDL statements, such as `CREATE VIEW` (#35)

* Experimenting with supporting DDL

* update docs

* Use PyArrow for zero-copy interaction with the Ray Object Store (#36)

* Optimize Ray shuffle with zero-copy object store

* remove more clones

* change bytes to pyarrow.array

* revert /tmp

* remove empty_result_set

* remove empty_result_set

* Fix input partition count bug

* Add Frank as author (#37)

* fix hyperlink of issue 22 in docs/README.txt (#40)

Co-authored-by: ivanfan <[email protected]>

* delta lake and iceberg table support (#43)

* delta support

* imports

* Update DataFusion version to 28.0.0 (#41)

* Update DataFusion version

* update example

* Upgrade to DataFusion 33 (#45)

* Upgrade to DataFusion 33

* undo release profile change

* Add basic GitHub workflow to compile code (#47)

* Create rust.yml

* install protobuf

* fix

* fix

* fix

* fix

* fix

* fix

* Add ASF license header

Signed-off-by: Austin Liu <[email protected]>

* Remove ASF header for generated code

Signed-off-by: Austin Liu <[email protected]>

---------

Signed-off-by: Austin Liu <[email protected]>
Co-authored-by: Andy Grove <[email protected]>
Co-authored-by: Frank Luan <[email protected]>
Co-authored-by: Frank Luan <[email protected]>
Co-authored-by: Ivankings <[email protected]>
Co-authored-by: ivanfan <[email protected]>
Co-authored-by: raviranak <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant