Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: python udf implementation #703

Merged
merged 18 commits into from
Aug 28, 2023

Conversation

jordanrfrazier
Copy link
Collaborator

@jordanrfrazier jordanrfrazier commented Aug 23, 2023

Adds the ability for users to define Python user-defined-functions that operate on Pandas Series, and interoperate with Timestreams. For example,

@kd.udf("add<N: number>(x: N, y: N) -> N")
def add(x: pd.Series, y: pd.Series) -> pd.Series:
    x + y

Which can be called simply as:

add(Foo.m, Foo.n)

There are some considerations:

  1. Only allows for operations on the Series and not per-element.
  2. Requires knowing Fenl syntax/types to place the annotation.

However, improvements can be made where only the return type is supplied, and the argument types checked at runtime. This is similar to how PySpark handles it.

Polars offers another option, where they split udf calls into apply (working per-element) and map (over arrays). It's noted that the per-element operations are slow, which is expected.

Closes #698

@cla-bot cla-bot bot added the cla-signed Set when all authors of a PR have signed our CLA label Aug 23, 2023
python/src/udf.rs Outdated Show resolved Hide resolved
crates/sparrow-compiler/src/ast_to_dfg.rs Outdated Show resolved Hide resolved
crates/sparrow-session/src/lib.rs Outdated Show resolved Hide resolved
crates/sparrow-session/src/session.rs Show resolved Hide resolved
python/Pipfile Outdated Show resolved Hide resolved
python/pysrc/kaskada/__init__.py Show resolved Hide resolved
python/pysrc/kaskada/_timestream.py Outdated Show resolved Hide resolved
python/pysrc/kaskada/_timestream.py Outdated Show resolved Hide resolved
python/pysrc/kaskada/udf.py Outdated Show resolved Hide resolved
python/src/udf.rs Outdated Show resolved Hide resolved
@jordanrfrazier jordanrfrazier marked this pull request as ready for review August 25, 2023 03:15
@bjchambers bjchambers changed the title draft: python udf implementation feat: python udf implementation Aug 25, 2023
@github-actions github-actions bot added the enhancement New feature or request label Aug 25, 2023
crates/sparrow-runtime/src/execute/operation.rs Outdated Show resolved Hide resolved
python/pysrc/kaskada/__init__.py Outdated Show resolved Hide resolved
python/pysrc/kaskada/_timestream.py Show resolved Hide resolved
python/pysrc/kaskada/udf.py Outdated Show resolved Hide resolved
python/pysrc/kaskada/udf.py Outdated Show resolved Hide resolved
python/pysrc/kaskada/udf.py Outdated Show resolved Hide resolved
python/src/expr.rs Show resolved Hide resolved
@jordanrfrazier jordanrfrazier force-pushed the python-udf/python-trait-implementation-udf branch from d3a595d to 0683b00 Compare August 28, 2023 16:41
@jordanrfrazier jordanrfrazier added this pull request to the merge queue Aug 28, 2023
Merged via the queue into main with commit 4caa780 Aug 28, 2023
33 checks passed
@jordanrfrazier jordanrfrazier deleted the python-udf/python-trait-implementation-udf branch August 28, 2023 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed Set when all authors of a PR have signed our CLA enhancement New feature or request sparrow
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Python UDF
2 participants