Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support array output in remote_function #1057

Open
wants to merge 48 commits into
base: main
Choose a base branch
from

Conversation

shobsi
Copy link
Contributor

@shobsi shobsi commented Oct 7, 2024

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)
    • remote_function: screen/5rMtCZVaUYKdqxP
    • Series.apply: screen/9HkKMuWxMvbbPgf
    • DataFrame.apply: screen/BoXH9A7d4hGpETu

Fixes internal issue 298876217 🦕

This is feature request to support use cases like creating custom
feature vectors, embeddings etc.
@shobsi shobsi requested review from a team as code owners October 7, 2024 17:28
@shobsi shobsi requested a review from GarrettWu October 7, 2024 17:28
@product-auto-label product-auto-label bot added the size: m Pull request size is medium. label Oct 7, 2024
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. label Oct 7, 2024
@shobsi shobsi marked this pull request as draft October 8, 2024 00:42
@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Oct 10, 2024
@shobsi shobsi requested review from TrevorBergeron and removed request for GarrettWu October 24, 2024 20:20
@shobsi shobsi marked this pull request as ready for review October 24, 2024 20:20
@shobsi shobsi added the owlbot:run Add this label to trigger the Owlbot post processor. label Oct 29, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Oct 29, 2024
Comment on lines 1376 to 1381
output_type (type, default None):
Python type equivalent to the BigQuery return type. This is used
to coerce the return value of the BigQuery function to BigFrames
type. For example, if the BigQuery function returns a JSON
serialized array of integers such as "[1, 2, 3]", then user can
set `output_type=list[int]` to read it as array of integers [1, 2, 3].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this makes sense for existing bq functions. Its more helpful for local python functions where the output type is otherwise ambiguous. Users should call astype or similar methods to explicitly coerce the result.

Copy link
Contributor Author

@shobsi shobsi Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The place where it would be useful is reading back a remote function that was created using bigframes. When the user used it after creating, their code didn't have to call astype, so we want to give them a way to have that code continue to work after reading back. The test test_df_apply_axis_1_array_output captures this usecase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this param in the latest change, PTAL

bigframes/series.py Show resolved Hide resolved
ibis_output_type = ibis_signature.output_type
if output_type is not None:
if not isinstance(ibis_signature.output_type, ibis.expr.datatypes.String):
raise TypeError(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tswast mentioned offline: "I would find that to be pretty annoying as a user. Better to make sure it matches and maybe warn that it's not necessary"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed output_type from read_gbq_function, PTAL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants