Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade window UDF api #730

Closed
Michael-J-Ward opened this issue Jun 14, 2024 · 3 comments · Fixed by #741 or #747
Closed

Upgrade window UDF api #730

Michael-J-Ward opened this issue Jun 14, 2024 · 3 comments · Fixed by #741 or #747
Labels
enhancement New feature or request

Comments

@Michael-J-Ward
Copy link
Contributor

Michael-J-Ward commented Jun 14, 2024

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

datafusion is in process of removing their built-in window-functions, which our current window function relies on. sum has already been removed which brakes the q11_important_stock_identification.py tpch example.

Additional context

AggregateFunction::Sum enum variant is still defined, but can not be used.

I suspect that the proper solution is to register the new UDAFs with the function registry.

As a workaround for releasing 39, I explicitly match on name == "sum" and redirect to the UDAF.

UDAF Migration Epic: apache/datafusion#8709

@Michael-J-Ward Michael-J-Ward added the enhancement New feature or request label Jun 14, 2024
Michael-J-Ward added a commit to Michael-J-Ward/datafusion-python that referenced this issue Jun 14, 2024
@Michael-J-Ward
Copy link
Contributor Author

@timsaucer could you take a look at this once #728 is merged?

andygrove pushed a commit that referenced this issue Jun 14, 2024
* deps: update datafusion to 39.0.0, pyo3 to 0.21, and object_store to 0.10.1

`datafusion-common` also depends on `pyo3`, so they need to be upgraded together.

* feat: remove GetIndexField

datafusion replaced Expr::GetIndexField with a FieldAccessor trait.

Ref apache/datafusion#10568
Ref apache/datafusion#10769

* feat: update ScalarFunction

The field `func_name` was changed to `func` as part of removing `ScalarFunctionDefinition` upstream.

Ref apache/datafusion#10325

* feat: incorporate upstream array_slice fixes

Fixes #670

* update ExectionPlan::children impl for DatasetExec

Ref apache/datafusion#10543

* update value_interval_daytime

Ref apache/arrow-rs#5769

* update regexp_replace and regexp_match

Fixes #677

* add gil-refs feature to pyo3

This silences pyo3's deprecation warnings for its new Bounds api.

It's the 1st step of the migration, and should be removed before merge.

Ref https://pyo3.rs/v0.21.0/migration#from-020-to-021

* fix signature for octet_length

Ref apache/datafusion#10726

* update signature for covar_samp

AggregateUDF expressions now have a builder API design, which removes arguments like filter and order_by

Ref apache/datafusion#10545
Ref apache/datafusion#10492

* convert covar_pop to expr_fn api

Ref: https://github.com/apache/datafusion/pull/10418/files

* convert median to expr_fn api

Ref apache/datafusion#10644

* convert variance sample to UDF

Ref apache/datafusion#10667

* convert first_value and last_value to UDFs

Ref apache/datafusion#10648

* checkpointing with a few todos to fix remaining compile errors

* impl PyExpr::python_value for IntervalDayTime and IntervalMonthDayNano

* convert sum aggregate function to UDF

* remove unnecessary clone on double reference

* apply cargo fmt

* remove duplicate allow-dead-code annotation

* update tpch examples for new pyarrow interval

Fixes #665

* marked q11 tpch example as expected fail

Ref #730

* add default stride of None back to array_slice
@timsaucer
Copy link
Contributor

timsaucer commented Jun 14, 2024 via email

Michael-J-Ward added a commit to Michael-J-Ward/datafusion-python that referenced this issue Jun 25, 2024
Michael-J-Ward added a commit to Michael-J-Ward/datafusion-python that referenced this issue Jun 25, 2024
andygrove pushed a commit that referenced this issue Jun 26, 2024
* provides workaround for half-migrated UDAF `sum`

Ref #730

* provide compatibility for sqlparser::ast::NullTreatment

This is now exposed as part of the API to `first_value` and `last_value` functions.

If there's a more elegant way to achieve this, please let me know.
@timsaucer
Copy link
Contributor

@Michael-J-Ward I know this issue was already closed but I have proposed a method to remove the work around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants