Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release DataFusion 46.0.0 #14123

Closed
23 of 26 tasks
alamb opened this issue Jan 14, 2025 · 59 comments · Fixed by #15073
Closed
23 of 26 tasks

Release DataFusion 46.0.0 #14123

alamb opened this issue Jan 14, 2025 · 59 comments · Fixed by #15073
Assignees
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Jan 14, 2025

Is your feature request related to a problem or challenge?

Tracking ticket for next release, also a place to track desired inclusions

Previous release will be https://crates.io/crates/datafusion/45.0.0 (likely Feb 1, 2025) December 31, 2024 so next major release would be around March 1, 2025

Steps:

Prior release tickets:

Bugs that would be good to fix

Nice to have but non blockers

Needs some traige (should it be a blocker?)

TODO

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

@alamb alamb added the enhancement New feature or request label Jan 14, 2025
@alamb
Copy link
Contributor Author

alamb commented Jan 14, 2025

I think @xudong963 said they might be interested in running this one:

Thanks, alamb, I booked 46 in advance!

@xudong963
Copy link
Member

@alamb I'll also do some updates in the issue summary.

Considering that this is the first time I've been involved in this process, could you please remind me if some critical time comes up and I don't do anything about it?

@alamb
Copy link
Contributor Author

alamb commented Feb 10, 2025

@alamb I'll also do some updates in the issue summary.

Considering that this is the first time I've been involved in this process, could you please remind me if some critical time comes up and I don't do anything about it?

Yes of course. Thank you @xudong963

I am planning to be involved in this one too -- perhaps I can help by testing the release e.g. with delta-rs and influxdb_iox

@alamb
Copy link
Contributor Author

alamb commented Feb 10, 2025

@xudong963 when would you like to start making the release? Maybe we should targe the week of Feb 24 🤔

@xudong963
Copy link
Member

@xudong963 when would you like to start making the release? Maybe we should targe the week of Feb 24 🤔

yes, the week is suitable.

@andygrove
Copy link
Member

I've created a draft PR to upgrade Comet to use latest DataFusion:
apache/datafusion-comet#1423

@andygrove
Copy link
Member

I noticed that older releases have not been removed from https://dist.apache.org/repos/dist/release/datafusion/. Perhaps this can be cleaned up as part of this release.

Image

@alamb
Copy link
Contributor Author

alamb commented Feb 23, 2025

@xudong963 are we still thinking of trying to get the release ready this upcoming week?

I will try and focus my efforts starting tomorrow on ensuring the bugs listed in "Bugs that would be good to fix" are in

@xudong963
Copy link
Member

xudong963 commented Feb 23, 2025

@alamb In Bugs that would be good to fix, four issues already have PRs, and one does not four issues and all have PRs. I'll focus on reviewing the four in the next two days and plan to update version and changelog on Wed. What do you think?

From experience, It seems that we should leave one week to test with Comet, delta, SailHQ, and InfluxData, so the final release may be delayed to next Wed, about 3.5.

@alamb
Copy link
Contributor Author

alamb commented Feb 24, 2025

@alamb In Bugs that would be good to fix, four issues already have PRs, and one does not four issues and all have PRs. I'll focus on reviewing the four in the next two days and plan to update version and changelog on Wed. What do you think?

I think it is a good idea. I'll try and make a PR to delta-rs as well

Maybe @shehabgamin can help with testing SailHQ

Looks like @andygrove has made a PR for testing with comet (linked above): apache/datafusion-comet#1423

@alamb
Copy link
Contributor Author

alamb commented Feb 24, 2025

I made a PR for testing in delta here:

Still has some issues to work out

@shehabgamin
Copy link
Contributor

I will test with Sail by Wednesday!

@andygrove
Copy link
Member

I am running into one error in Comet: Function sha256 does not implement invoke but called. I will investigate this today.

@xudong963
Copy link
Member

I'll open the changelog PR tomorrow

@shehabgamin
Copy link
Contributor

Is there a reason why some arrow crates are using version 54.2.0 while others are using 54.1.0?

arrow = { version = "54.2.0", features = [
    "prettyprint",
    "chrono-tz",
] }
arrow-buffer = { version = "54.1.0", default-features = false }
arrow-flight = { version = "54.2.0", features = [
    "flight-sql-experimental",
] }
arrow-ipc = { version = "54.2.0", default-features = false, features = [
    "lz4",
] }
arrow-ord = { version = "54.1.0", default-features = false }
arrow-schema = { version = "54.1.0", default-features = false }

https://github.com/apache/datafusion/blob/main/Cargo.toml#L83-L95

@alamb
Copy link
Contributor Author

alamb commented Mar 3, 2025

Thanks @xudong963!

I also pushed a note about the upgrade guide into the branch

Would you like to try and make the release candidate now?

### Create, sign, and upload artifacts
Run `create-tarball.sh` with the `<version>` tag and `<rc>` and you found in previous steps:
```shell
GH_TOKEN=<TOKEN> ./dev/release/create-tarball.sh 38.0.0 0
```
The `create-tarball.sh` script
1. creates and uploads all release candidate artifacts to the [datafusion
dev](https://dist.apache.org/repos/dist/dev/datafusion) location on the
apache distribution svn server
2. provide you an email template to
send to [email protected] for release voting.

(I can do this too if you prefer)

@xudong963
Copy link
Member

Would you like to try and make the release candidate now?

Yes, but need to wait a bit. I'm out now

@alamb
Copy link
Contributor Author

alamb commented Mar 3, 2025

Would you like to try and make the release candidate now?

Yes, but need to wait a bit. I'm out now

SOunds good -- let me know if you hit issues or want me to try.

Thanks again for doing this! It will be great to have someone else with experience

@xudong963
Copy link
Member

@alamb My pleasure

I sent the release email to [email protected], did you see it?

@alamb
Copy link
Contributor Author

alamb commented Mar 3, 2025

@alamb My pleasure

I sent the release email to [email protected], did you see it?

Yup! I am looking at it now

@alamb
Copy link
Contributor Author

alamb commented Mar 3, 2025

@alamb
Copy link
Contributor Author

alamb commented Mar 3, 2025

Update : the error with validation is related to a change in rustup, not the release candidate directly. I am working on a fix

@alamb
Copy link
Contributor Author

alamb commented Mar 4, 2025

We have merged the fix here

To simplify votiing I suggest we backport that fix to the branch-46 line and make a second release candidate (RC2)

Is this something you can do @xudong963 ?

@xudong963
Copy link
Member

xudong963 commented Mar 4, 2025

okay, wait a bit

Ing

done @alamb

@Blizzara
Copy link
Contributor

Blizzara commented Mar 6, 2025

I tested the upgrade on our system, and ran into bunch of runtime errors with Function X does not implement invoke but called. They seem to come from us wrapping some DF UDFs inside our own, and us calling their invoke_batch methods, which then fail to work as that delegates to invoke but the DF UDFs only implement invoke_with_args.

Should ScalarUDFImpl::invoke_batch be marked as deprecated? (There is already a comment saying that in the docstring, but the compiler doesn't catch that 😅 ) That way we'd at least get some compile-time warning that we're doing something bad, now there was only runtime failures.

@alamb
Copy link
Contributor Author

alamb commented Mar 6, 2025

Should ScalarUDFImpl::invoke_batch be marked as deprecated? (There is already a comment saying that in the docstring, but the compiler doesn't catch that 😅 ) That way we'd at least get some compile-time warning that we're doing something bad, now there was only runtime failures.

Thanks @Blizzara !

I think we should mark it deprecated (though we had bad luck in the past with the compiler not complaining about implementing a deprecated trait method 🤔 )

We also tried to document this in the (new for the first time) upgrade guide:
https://datafusion.apache.org/library-user-guide/upgrading.html#use-invoke-with-args-instead-of-invoke-and-invoke-batch

I think we may simply want to remove invoke_with_batch / other older methods ahead of deprecation schedule to avoid the confusion

Blizzara added a commit to Blizzara/datafusion that referenced this issue Mar 6, 2025
@Blizzara
Copy link
Contributor

Blizzara commented Mar 6, 2025

I think we should mark it deprecated

Filed a PR here: #15049. Though I see there's already a release branch, how does that work?

(though we had bad luck in the past with the compiler not complaining about implementing a deprecated trait method 🤔 )

Yea, in this case implementing the method is fine, since the way the method chain is setup means calling the newer methods automatically delegates to the older, deprecated methods if needed. However custom code calling those deprecated methods is not fine, and that'll be caught by properly marking the deprecation.

The earlier case I noted around the implementing was for return_type stuff, where the chain isn't setup correctly (as it cannot really be as the args changed in an incompatible way).

We also tried to document this in the (new for the first time) upgrade guide: https://datafusion.apache.org/library-user-guide/upgrading.html#use-invoke-with-args-instead-of-invoke-and-invoke-batch

Yup, that's where I originally saw this and wanted to test it, thanks! :) Still even with migration guide, people may not see or understand it, so avoiding silent runtime breaks is preferrable when ever possible (I'm sure we all agree on that, just saying it for completeness 😄 )

I think we may simply want to remove invoke_with_batch / other older methods ahead of deprecation schedule to avoid the confusion

I'd +1 that, especially for the return_type things, but also for invokes. It'll be a compile break but that's better than any weirdness happening at runtime due to mismatching/wrong implementations.

FWIW, after fixing all the invokes, all our tests seem to pass :)

@alamb
Copy link
Contributor Author

alamb commented Mar 6, 2025

I think we should mark it deprecated

Filed a PR here: #15049. Though I see there's already a release branch, how does that work?

Realistically, I think what is on branch-46 will be what is released as DataFusion 46. New PRs will go into datafusion 47.

I'd +1 that, especially for the return_type things, but also for invokes. It'll be a compile break but that's better than any weirdness happening at runtime due to mismatching/wrong implementations.

FWIW, after fixing all the invokes, all our tests seem to pass :)

Nice!

@xudong963
Copy link
Member

@alamb i will catch a flight today and have a short vacation, do you have time to do the final release?

@alamb
Copy link
Contributor Author

alamb commented Mar 7, 2025

@alamb i will catch a flight today and have a short vacation, do you have time to do the final release?

Yes, absolutely -- thank you for all your help @xudong963 (I think a PMC member needs to do the final release anyways).

@alamb
Copy link
Contributor Author

alamb commented Mar 7, 2025

The release was approved and published to crates.io

See thread here: https://lists.apache.org/thread/rmvsc1ffpqd2z3m0c9lptjypzl83k2j8

I have filed a ticket for 47

I also have a small PR with cleanups from this release:
-#15073

Thanks again @xudong963 for running this release

@alamb alamb closed this as completed Mar 7, 2025
Weijun-H pushed a commit that referenced this issue Mar 8, 2025
* fix: mark ScalarUDFImpl::invoke_batch as deprecated

should use invoke_with_args instead

See #14123 (comment)

* fix deprecated usage that clippy warns about

* fix another deprecated usage that clippy warns about

* fix the rest of benches

* fix two more implementations - now all that's left is in udf.rs

* fix clippy

* cleanup some leftover comments
AdamGS added a commit to spiraldb/vortex that referenced this issue Mar 9, 2025
I mostly just like to see changes in Github.
We can merge this once Datafusion 46 released, assuming it includes the
following PRs:
- apache/datafusion#14754
- apache/datafusion#14671
- also hoping for apache/datafusion#14798

Seems like everything we wanted to get in made it, this branch now
compiles AND passes all checks, so we're only waiting for the actual
release in apache/datafusion#14123

---------

Co-authored-by: Alexander Droste <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants