Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversion from pyarrow Expressions to QueryBuilder expressions #2202

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

IvoDD
Copy link
Collaborator

@IvoDD IvoDD commented Feb 25, 2025

Reference Issues/PRs

Monday ref: 8471524478

What does this implement or fix?

Introduces ExpressionNode.from_pyarrow_expression_str.

This is required for predicate pushdown integration with polars. E.g. when we do:

lf = polars.scan_arcticdb(arctic_identifier)
lf = lf.filter(pl.col("float_col") <= 40.1)
lf.collect()

When calling collect the filter will get pushed down to our (to be implemented) polars.scan_arcticdb via a callback. The filter is given as a string which can be evaluated to a pyarrow expression. For reference string construction happens here.

I've decided to keep this conversion code in core arcticdb instead of polars for a few reasons:

  • Gives us more flexibility if we decide to update our QueryBuilder expressions
  • We'll have less code to maintain in a repository which is not ours
  • Automated tests allow us to not accidentally break polars predicate pushdown to arcticdb

For additional reference see how pyiceberg handles predicate pushdown from polars here

And how this might fit in the prototype polars.scan_arcticdb implementation here

Change Type (Required)

  • Patch (Bug fix or non-breaking improvement)
  • Minor (New feature, but backward compatible)
  • Major (Breaking changes)
  • Cherry pick

Any other comments?

Checklist

Checklist for code changes...
  • Have you updated the relevant docstrings, documentation and copyright notice?
  • Is this contribution tested against all ArcticDB's features?
  • Do all exceptions introduced raise appropriate error messages?
  • Are API changes highlighted in the PR description?
  • Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

@github-actions github-actions bot added the patch Small change, should increase patch version label Feb 25, 2025
@IvoDD IvoDD force-pushed the parse-pyarrow-expression-strings branch 2 times, most recently from 8e51d61 to 9c7aa0f Compare February 26, 2025 15:46
Introduces `ExpressionNode.from_pyarrow_expression_str`.

This is required for predicate pushdown integration with polars. E.g.
when we do:
```
lf = polars.scan_arcticdb(arctic_identifier)
lf = lf.filter(pl.col("float_col") <= 40.1)
lf.collect()
```
When calling `collect` the filter will get pushed down to our (to be
implemented) `polars.scan_arcticdb` via a callback. The filter is given
as a string which can be evaluated to a pyarrow expression. For
reference string construction happens [here](https://github.com/pola-rs/polars/blob/4e286d8c83b0fcb56f0a7ea06d2eb731f179e01e/crates/polars-plan/src/plans/python/pyarrow.rs#L25).

I've decided to keep this conversion code in core arcticdb instead of
polars for a few reasons:
- Gives us more flexibility if we decide to update our `QueryBuilder`
  expressions
- We'll have less code to maintain in a repository which is not ours
- Automated tests allow us to not accidentally break polars predicate
  pushdown to arcticdb

For additional reference see how pyiceberg handles predicate pushdown
from polars [here](https://github.com/pola-rs/polars/blob/4e286d8c83b0fcb56f0a7ea06d2eb731f179e01e/py-polars/polars/io/iceberg.py#L197-L200)

And how this might fit in the prototype `polars.scan_arcticdb`
implementation [here](https://github.com/IvoDD/polars/pull/1/files)
@IvoDD IvoDD force-pushed the parse-pyarrow-expression-strings branch from 9c7aa0f to cfe2c11 Compare February 27, 2025 08:45
if isinstance(a.op, ast.USub):
# pyarrow expressions don't support unary subrtract, so this branch will not be reached.
# Leaving as future-proofing in case they ever introduce it.
return -operand
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Abs operator?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
patch Small change, should increase patch version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants