Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fix read_gbq_function issue in dataframe apply method #1174

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

jialuoo
Copy link
Contributor

@jialuoo jialuoo commented Nov 25, 2024

See the description/example from b/379750234.

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

@jialuoo jialuoo requested a review from shobsi November 25, 2024 23:52
@jialuoo jialuoo self-assigned this Nov 25, 2024
@jialuoo jialuoo requested review from a team as code owners November 25, 2024 23:52
@product-auto-label product-auto-label bot added size: xs Pull request size is extra small. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Nov 25, 2024
bigframes/dataframe.py Show resolved Hide resolved
bigframes/dataframe.py Show resolved Hide resolved
@product-auto-label product-auto-label bot added size: s Pull request size is small. and removed size: xs Pull request size is extra small. labels Dec 2, 2024
tests/system/small/test_dataframe.py Outdated Show resolved Hide resolved
tests/system/small/test_dataframe.py Outdated Show resolved Hide resolved
bigframes/dataframe.py Show resolved Hide resolved
@@ -3674,7 +3674,11 @@ def apply(self, func, *, axis=0, args: typing.Tuple = (), **kwargs):
return result_series

# Per-column apply
results = {name: func(col, *args, **kwargs) for name, col in self.items()}
if hasattr(func, "bigframes_remote_function"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TrevorBergeron could you please review if this element-wise proxy for series-wise application doesn't have any blind sight?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't dataframe.apply for column or series-wise application? Why do we need to support applying as an element-wise operator? isn't this what dataframe.map is for?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For scalar ops the end result is the same. I think we don't lose anything by being lax for user convenience. Note that this API has other non-pandas-like extended behavior

# This is a special case where we are providing not-pandas-like

@shobsi shobsi requested a review from TrevorBergeron December 5, 2024 20:53

func_ref = session.read_gbq_function("bqutil.fn.cw_lower_case_ascii_only")
bf_result = bdf.apply(func_ref).to_pandas()
# The str.lower() method has the same functionality as func_ref.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true for ASCII only, we should say that in the comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: s Pull request size is small.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants