-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support array output in remote_function
#1057
base: main
Are you sure you want to change the base?
Conversation
This is feature request to support use cases like creating custom feature vectors, embeddings etc.
…tr array outputs
bigframes/session/__init__.py
Outdated
output_type (type, default None): | ||
Python type equivalent to the BigQuery return type. This is used | ||
to coerce the return value of the BigQuery function to BigFrames | ||
type. For example, if the BigQuery function returns a JSON | ||
serialized array of integers such as "[1, 2, 3]", then user can | ||
set `output_type=list[int]` to read it as array of integers [1, 2, 3]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this makes sense for existing bq functions. Its more helpful for local python functions where the output type is otherwise ambiguous. Users should call astype or similar methods to explicitly coerce the result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The place where it would be useful is reading back a remote function that was created using bigframes. When the user used it after creating, their code didn't have to call astype, so we want to give them a way to have that code continue to work after reading back. The test test_df_apply_axis_1_array_output
captures this usecase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this param in the latest change, PTAL
ibis_output_type = ibis_signature.output_type | ||
if output_type is not None: | ||
if not isinstance(ibis_signature.output_type, ibis.expr.datatypes.String): | ||
raise TypeError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tswast mentioned offline: "I would find that to be pretty annoying as a user. Better to make sure it matches and maybe warn that it's not necessary"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed output_type from read_gbq_function, PTAL
…tput_type in read_gbq_function
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
remote_function
: screen/5rMtCZVaUYKdqxPSeries.apply
: screen/9HkKMuWxMvbbPgfDataFrame.apply
: screen/BoXH9A7d4hGpETuFixes internal issue 298876217 🦕