-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deps: update to ibis-framework 9.x and newer sqlglot #827
Conversation
Getting
for |
Re: Looks like the time scalar is losing microsecond precision.
Edit: looks like an ibis or maybe sqlglot bug. Filed ibis-project/ibis#9609 |
Marking as |
…ecision_score etc ml tests
…es into b350749011-ibis-9
…es into b350749011-ibis-9
…es into b350749011-ibis-9
@@ -31,6 +31,17 @@ | |||
scalar_compiler = scalar_compilers.scalar_op_compiler | |||
|
|||
|
|||
# TODO(swast): We can remove this if ibis adds general approx_quantile | |||
# See: https://github.com/ibis-project/ibis/issues/9541 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI (no change necessary for this PR): An approximate quantile node type has been merged into ibis. ibis-project/ibis#9881
Once we start vendoring the Ibis expressions in addition to the compiler, we can take advantage of that (if we want).
bigframes/ml/preprocessing.py
Outdated
column_min = X[column].min() | ||
column_max = X[column].max() | ||
|
||
# Use Python value rather than Numpy value to serialization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit fragile - should the sql generator be more strict in its formatting instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 This is a bit scary and may indicate a bigger problem. Would adding numpy literal support to BigQueryCompiler.visit_NonNullLiteral
allow us to avoid this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved changes to the ML SQL generation resided in ml/sql.py
. Other data core APIs tests didn't have same complains so that BigQueryCompiler.visit_NonNullLiteral
remain unaffected.
return ( | ||
value.fill_null(ibis_types.literal("$NULL_SENTINEL$")) | ||
if hasattr(value, "fill_null") | ||
else value.fillna(ibis_types.literal("$NULL_SENTINEL$")) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is going on here? This seems fragile - what types no longer have fillna?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fillna was renamed to fill_null in ibis-project/ibis#9300 (ibis 9.1.0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, fillna
was renamed to fill_null
in Ibis 9.1.0. Using fillna
with this version or later will trigger a deprecation warning. However, we cann't use fill_null
because it's not available in Ibis 9.0.0. Since Ibis 9.1.0 and later dropped support for Python 3.9, we must use Ibis 9.0.0 to maintain compatibility with Python 3.9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I do hate to have hasattr checks in our code. Should we just use fillna and suppress warnings?
left = x.cast(ibis_dtypes.str).fillna(ibis_types.literal("$NULL_SENTINEL$")) | ||
right = y.cast(ibis_dtypes.str).fillna(ibis_types.literal("$NULL_SENTINEL$")) | ||
literal = ibis_types.literal("$NULL_SENTINEL$") | ||
if hasattr(x, "fill_null"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is causing the fillna/fill_nbull divide?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ibis 9.0.0 -> Ibis 9.1.0. We need Ibis 9.0.0 for Python 3.9 support, at least until we vendor the ibis expr APIs too, which we'll want to do sometime between now and bigframes 2.0.
# In NumPy versions 2 and later, `np.floor` and `np.ceil` now produce integer | ||
# outputs for the "int64_col" column. | ||
if opname in ["floor", "ceil"] and isinstance( | ||
pd_result["int64_col"].dtypes, pd.Int64Dtype | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a big behavior change?! I thought numpy 2 was backwards compatible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were behavior changes. See: https://numpy.org/doc/stable/numpy_2_0_migration_guide.html#numpy-2-migration-guide
I suspect it may relate to this change: https://numpy.org/neps/nep-0050-scalar-promotion.html#nep50
Type promotion is no longer value dependent, only dtype dependent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we match the new behavior then? dtype-dependent is something we can actually emulate perfectly, unlike value-dependency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we match the new behavior then? dtype-dependent is something we can actually emulate perfectly, unlike value-dependency
Yes. Maybe bigframes 2.0? Could you file an issue @TrevorBergeron ?
third_party/bigframes_vendored/ibis/backends/sql/compilers/bigquery/__init__.py
Outdated
Show resolved
Hide resolved
bigframes/ml/preprocessing.py
Outdated
column_min = X[column].min() | ||
column_max = X[column].max() | ||
|
||
# Use Python value rather than Numpy value to serialization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 This is a bit scary and may indicate a bigger problem. Would adding numpy literal support to BigQueryCompiler.visit_NonNullLiteral
allow us to avoid this?
# In NumPy versions 2 and later, `np.floor` and `np.ceil` now produce integer | ||
# outputs for the "int64_col" column. | ||
if opname in ["floor", "ceil"] and isinstance( | ||
pd_result["int64_col"].dtypes, pd.Int64Dtype | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were behavior changes. See: https://numpy.org/doc/stable/numpy_2_0_migration_guide.html#numpy-2-migration-guide
I suspect it may relate to this change: https://numpy.org/neps/nep-0050-scalar-promotion.html#nep50
Type promotion is no longer value dependent, only dtype dependent.
Reading through some of the review here, I would really love for y'all to chime in more proactively on the Ibis issue tracker. I understand that contributing to Ibis isn't top priority, but it might help ease some of the pain here if we had some advance notice about any difficulty a particular change might create. I also realize that may not be knowable until you actually do the work to upgrade to a later version of Ibis. Generally speaking, if there's a decision that would cause lots of headaches, or that we expect might, we'd like to discuss and get feedback on it. |
…es into b350749011-ibis-9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Thanks for your input, @cpcloud. We appreciate you taking the time to engage with us on this thread. |
This change is updating to Ibis-framework 9.x and a newer version of SQLGLot. The Ibis upgrading also removes previous version restrictions on certain packages. Specifically, it expands the allowable version of
pyarrow
(from15.0.2
to17.0.0
) andnumpy
(from1.26.4
to2.1.1
).Fixes internal issue 350749011 🦕