deps: update to ibis-framework 9.x and newer sqlglot #827

tswast · 2024-07-08T16:28:17Z

This change is updating to Ibis-framework 9.x and a newer version of SQLGLot. The Ibis upgrading also removes previous version restrictions on certain packages. Specifically, it expands the allowable version of pyarrow (from 15.0.2 to 17.0.0) and numpy (from 1.26.4 to 2.1.1).

Fixes internal issue 350749011
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

Fixes internal issue 350749011 🦕

tswast · 2024-07-09T15:48:03Z

Getting

pytest tests/system/small/test_dataframe.py::test_dataframe_agg_int_single_string
...
../../envs/bigframes/lib/python3.11/site-packages/sqlglot/dialects/bigquery.py:53: in <listcomp>
    exp.PropertyEQ(this=exp.to_identifier(name), expression=fld)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

name = Column(
  this=Identifier(this=col_18, quoted=True)), quoted = None, copy = True

    def to_identifier(name, quoted=None, copy=True):
        """Builds an identifier.
    
        Args:
            name: The name to turn into an identifier.
            quoted: Whether to force quote the identifier.
            copy: Whether to copy name if it's an Identifier.
    
        Returns:
            The identifier ast node.
        """
    
        if name is None:
            return None
    
        if isinstance(name, Identifier):
            identifier = maybe_copy(name, copy)
        elif isinstance(name, str):
            identifier = Identifier(
                this=name,
                quoted=not SAFE_IDENTIFIER_RE.match(name) if quoted is None else quoted,
            )
        else:
>           raise ValueError(f"Name needs to be a string or an Identifier, got: {name.__class__}")
E           ValueError: Name needs to be a string or an Identifier, got: <class 'sqlglot.expressions.Column'>

../../envs/bigframes/lib/python3.11/site-packages/sqlglot/expressions.py:6798: ValueError

for DataFrame.agg and related tests like describe. Might have something to do with how we're unnesting memtables. Will have to investigate further.

tswast · 2024-07-15T22:00:15Z

Re: FAILED tests/system/small/test_pandas.py::test_get_dummies_dataframe[kwargs2] - AssertionError: DataFrame.iloc[:, 11] (column name="time_col.11:14:34.701606") are different

Looks like the time scalar is losing microsecond precision.

  COALESCE(`t0`.`time_col` = time(11, 14, 34), FALSE) AS `col_15`,
  COALESCE(`t0`.`time_col` = time(11, 41, 43), FALSE) AS `col_17`,
  COALESCE(`t0`.`time_col` = time(12, 0, 0), FALSE) AS `col_19`,
  COALESCE(`t0`.`time_col` = time(15, 16, 17), FALSE) AS `col_21`,
  COALESCE(`t0`.`time_col` = time(23, 59, 59), FALSE) AS `col_23`,

Edit: looks like an ibis or maybe sqlglot bug. Filed ibis-project/ibis#9609

tswast · 2024-07-16T16:17:15Z

Marking as do not merge because to fix 3.9, we'll need to vendor ibis 9.x into third_party.

…uery-dataframes into b350749011-ibis-9

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

…ecision_score etc ml tests

…es into b350749011-ibis-9

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

tswast · 2024-09-12T18:15:36Z

bigframes/core/compile/aggregate_compiler.py

@@ -31,6 +31,17 @@
 scalar_compiler = scalar_compilers.scalar_op_compiler


+# TODO(swast): We can remove this if ibis adds general approx_quantile
+# See: https://github.com/ibis-project/ibis/issues/9541


FYI (no change necessary for this PR): An approximate quantile node type has been merged into ibis. ibis-project/ibis#9881

Once we start vendoring the Ibis expressions in addition to the compiler, we can take advantage of that (if we want).

tests/system/small/test_dataframe.py

TrevorBergeron · 2024-09-13T00:58:29Z

bigframes/ml/preprocessing.py

+                column_min = X[column].min()
+                column_max = X[column].max()
+
+                # Use Python value rather than Numpy value to serialization.


This seems a bit fragile - should the sql generator be more strict in its formatting instead?

+1 This is a bit scary and may indicate a bigger problem. Would adding numpy literal support to BigQueryCompiler.visit_NonNullLiteral allow us to avoid this?

Moved changes to the ML SQL generation resided in ml/sql.py. Other data core APIs tests didn't have same complains so that BigQueryCompiler.visit_NonNullLiteral remain unaffected.

TrevorBergeron · 2024-09-13T00:59:25Z

bigframes/core/compile/single_column.py

+    return (
+        value.fill_null(ibis_types.literal("$NULL_SENTINEL$"))
+        if hasattr(value, "fill_null")
+        else value.fillna(ibis_types.literal("$NULL_SENTINEL$"))
+    )


What is going on here? This seems fragile - what types no longer have fillna?

fillna was renamed to fill_null in ibis-project/ibis#9300 (ibis 9.1.0)

Yes, fillna was renamed to fill_null in Ibis 9.1.0. Using fillna with this version or later will trigger a deprecation warning. However, we cann't use fill_null because it's not available in Ibis 9.0.0. Since Ibis 9.1.0 and later dropped support for Python 3.9, we must use Ibis 9.0.0 to maintain compatibility with Python 3.9

Hmm, I do hate to have hasattr checks in our code. Should we just use fillna and suppress warnings?

TrevorBergeron · 2024-09-13T01:00:14Z

bigframes/core/compile/scalar_op_compiler.py

-    left = x.cast(ibis_dtypes.str).fillna(ibis_types.literal("$NULL_SENTINEL$"))
-    right = y.cast(ibis_dtypes.str).fillna(ibis_types.literal("$NULL_SENTINEL$"))
+    literal = ibis_types.literal("$NULL_SENTINEL$")
+    if hasattr(x, "fill_null"):


What is causing the fillna/fill_nbull divide?

Ibis 9.0.0 -> Ibis 9.1.0. We need Ibis 9.0.0 for Python 3.9 support, at least until we vendor the ibis expr APIs too, which we'll want to do sometime between now and bigframes 2.0.

TrevorBergeron · 2024-09-13T01:31:25Z

tests/system/small/test_numpy.py

+    # In NumPy versions 2 and later, `np.floor` and `np.ceil` now produce integer
+    # outputs for the "int64_col" column.
+    if opname in ["floor", "ceil"] and isinstance(
+        pd_result["int64_col"].dtypes, pd.Int64Dtype
+    ):


Seems like a big behavior change?! I thought numpy 2 was backwards compatible?

There were behavior changes. See: https://numpy.org/doc/stable/numpy_2_0_migration_guide.html#numpy-2-migration-guide

I suspect it may relate to this change: https://numpy.org/neps/nep-0050-scalar-promotion.html#nep50

Type promotion is no longer value dependent, only dtype dependent.

Should we match the new behavior then? dtype-dependent is something we can actually emulate perfectly, unlike value-dependency

Should we match the new behavior then? dtype-dependent is something we can actually emulate perfectly, unlike value-dependency

Yes. Maybe bigframes 2.0? Could you file an issue @TrevorBergeron ?

third_party/bigframes_vendored/ibis/backends/sql/compilers/bigquery/__init__.py

tswast · 2024-09-13T16:12:09Z

bigframes/ml/preprocessing.py

+                column_min = X[column].min()
+                column_max = X[column].max()
+
+                # Use Python value rather than Numpy value to serialization.


+1 This is a bit scary and may indicate a bigger problem. Would adding numpy literal support to BigQueryCompiler.visit_NonNullLiteral allow us to avoid this?

tswast · 2024-09-13T16:18:14Z

tests/system/small/test_numpy.py

+    # In NumPy versions 2 and later, `np.floor` and `np.ceil` now produce integer
+    # outputs for the "int64_col" column.
+    if opname in ["floor", "ceil"] and isinstance(
+        pd_result["int64_col"].dtypes, pd.Int64Dtype
+    ):


There were behavior changes. See: https://numpy.org/doc/stable/numpy_2_0_migration_guide.html#numpy-2-migration-guide

I suspect it may relate to this change: https://numpy.org/neps/nep-0050-scalar-promotion.html#nep50

Type promotion is no longer value dependent, only dtype dependent.

cpcloud · 2024-09-13T18:28:01Z

Reading through some of the review here, I would really love for y'all to chime in more proactively on the Ibis issue tracker.

I understand that contributing to Ibis isn't top priority, but it might help ease some of the pain here if we had some advance notice about any difficulty a particular change might create. I also realize that may not be knowable until you actually do the work to upgrade to a later version of Ibis.

Generally speaking, if there's a decision that would cause lots of headaches, or that we expect might, we'd like to discuss and get feedback on it.

…es into b350749011-ibis-9

tswast

LGTM, thanks!

chelsea-lin · 2024-09-13T21:02:39Z

Thanks for your input, @cpcloud. We appreciate you taking the time to engage with us on this thread.
We generally aim to report issues that are relevant to the wider Ibis community directly in the issue tracker, so everyone can benefit from the discussion and resolution. However, we also recognize that some issues might be specific to our use case, and in those cases, we'll handle them internally.
We value your insights and want to ensure we're collaborating effectively. Are there any particular issues you've observed in our usage of Ibis that you'd like to discuss further? Your feedback helps us to better understand your priorities and identify areas for improvement.

deps: update to ibis-framework 9.x and newer sqlglot

b4e6a50

product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Jul 8, 2024

tswast added 6 commits July 8, 2024 21:00

update sqlglot and ibis

f1ce09d

Merge remote-tracking branch 'origin/main' into b350749011-ibis-9

3c0cae0

bump minimum pandas

d224a52

bump pyarrow

28b6a31

fix bfill and ffill

bb68b2b

Merge remote-tracking branch 'origin/main' into b350749011-ibis-9

05550ac

tswast added 4 commits July 9, 2024 15:49

nearly implement describe

d5622b6

remove remaining reference to vendored_ibis_ops.ApproximateMultiQuantile

3596edd

support ToJsonString

32c2ab6

partial support for quantile

5e0c1e7

product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Jul 9, 2024

tswast added 6 commits July 12, 2024 21:00

fix inmemorytable

d877261

Merge remote-tracking branch 'origin/main' into b350749011-ibis-9

3d63801

Merge remote-tracking branch 'origin/main' into b350749011-ibis-9

f8d2864

fixed Series.explode

847f459

nearly fix to_datetime

6a0bedc

remove tests I added

9d56ee7

tswast added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Jul 16, 2024

tswast added 5 commits July 16, 2024 16:20

patch for python 3.9 support

fc84cb8

Merge remote-tracking branch 'origin/main' into b350749011-ibis-9

bb2408f

fix unit tests

de32335

fix explode with time type

e7dd60f

Merge remote-tracking branch 'origin/main' into b350749011-ibis-9

8672495

chelsea-lin and others added 2 commits September 10, 2024 18:17

Merge branch 'b350749011-ibis-9' of github.com:googleapis/python-bigq…

e8e5e97

…uery-dataframes into b350749011-ibis-9

🦉 Updates from OwlBot post-processor

822180a

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

product-auto-label bot added size: l Pull request size is large. and removed size: xl Pull request size is extra large. labels Sep 10, 2024

chelsea-lin and others added 7 commits September 10, 2024 20:55

fix unit-test compile errors

6f054f4

remove unused ibis codes

c7167db

fix fillna deprecated warning

555453b

add _remove_null_ordering_from_unsupported_window back to fix test_pr…

0762299

…ecision_score etc ml tests

Merge branch 'main' of github.com:googleapis/python-bigquery-datafram…

8d36966

…es into b350749011-ibis-9

fix is_monotonic_decreasing test

4c90516

Merge branch 'main' into b350749011-ibis-9

ad122af

tswast mentioned this pull request Sep 11, 2024

feat: add bigframes.ml.compose.SQLScalarColumnTransformer to create custom SQL-based transformations #955

Merged

4 tasks

chelsea-lin and others added 5 commits September 12, 2024 04:44

Merge branch 'main' of github.com:googleapis/python-bigquery-datafram…

ab84436

…es into b350749011-ibis-9

fix explode after merge

d425aa9

fix numpy on remote function test

fa46553

Merge branch 'main' of github.com:googleapis/python-bigquery-datafram…

4b3fb0f

…es into b350749011-ibis-9

🦉 Updates from OwlBot post-processor

f175596

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

tswast commented Sep 12, 2024

View reviewed changes

TrevorBergeron reviewed Sep 13, 2024

View reviewed changes

tswast commented Sep 13, 2024

View reviewed changes

chelsea-lin added 2 commits September 13, 2024 20:38

ml numpy sql generations

f3a43b1

Merge branch 'main' of github.com:googleapis/python-bigquery-datafram…

8ef190f

…es into b350749011-ibis-9

tswast commented Sep 13, 2024

View reviewed changes

TrevorBergeron approved these changes Sep 13, 2024

View reviewed changes

chelsea-lin merged commit 89ea44f into main Sep 13, 2024
20 of 23 checks passed

chelsea-lin deleted the b350749011-ibis-9 branch September 13, 2024 22:22

release-please bot mentioned this pull request Sep 13, 2024

chore(main): release 1.18.0 #986

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deps: update to ibis-framework 9.x and newer sqlglot #827

deps: update to ibis-framework 9.x and newer sqlglot #827

tswast commented Jul 8, 2024 •

edited by chelsea-lin

Loading

tswast commented Jul 9, 2024

tswast commented Jul 15, 2024 •

edited

Loading

tswast commented Jul 16, 2024

tswast Sep 12, 2024

TrevorBergeron Sep 13, 2024

tswast Sep 13, 2024

chelsea-lin Sep 13, 2024

TrevorBergeron Sep 13, 2024

tswast Sep 13, 2024

chelsea-lin Sep 13, 2024

TrevorBergeron Sep 13, 2024

TrevorBergeron Sep 13, 2024

tswast Sep 13, 2024

TrevorBergeron Sep 13, 2024

tswast Sep 13, 2024

TrevorBergeron Sep 13, 2024

tswast Sep 16, 2024

tswast Sep 13, 2024

tswast Sep 13, 2024

cpcloud commented Sep 13, 2024

tswast left a comment

chelsea-lin commented Sep 13, 2024 •

edited

Loading

deps: update to ibis-framework 9.x and newer sqlglot #827

deps: update to ibis-framework 9.x and newer sqlglot #827

Conversation

tswast commented Jul 8, 2024 • edited by chelsea-lin Loading

tswast commented Jul 9, 2024

tswast commented Jul 15, 2024 • edited Loading

tswast commented Jul 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cpcloud commented Sep 13, 2024

tswast left a comment

Choose a reason for hiding this comment

chelsea-lin commented Sep 13, 2024 • edited Loading

tswast commented Jul 8, 2024 •

edited by chelsea-lin

Loading

tswast commented Jul 15, 2024 •

edited

Loading

chelsea-lin commented Sep 13, 2024 •

edited

Loading