-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: update comments and tests for label counts. #1182
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -348,7 +348,10 @@ def export_gcs( | |
export_options=dict(export_options), | ||
) | ||
job_config = bigquery.QueryJobConfig() | ||
bq_io.add_labels(job_config, api_name=f"dataframe-to_{format.lower()}") | ||
|
||
# Note: Ensure no additional labels are added to job_config after this point, | ||
# as `add_and_trim_labels` ensures the label count does not exceed 64. | ||
bq_io.add_and_trim_labels(job_config, api_name=f"dataframe-to_{format.lower()}") | ||
export_job = self.bqclient.query(export_data_statement, job_config=job_config) | ||
self._wait_on_job(export_job) | ||
return query_job | ||
|
@@ -358,7 +361,9 @@ def dry_run( | |
) -> bigquery.QueryJob: | ||
sql = self.to_sql(array_value, ordered=ordered) | ||
job_config = bigquery.QueryJobConfig(dry_run=True) | ||
bq_io.add_labels(job_config) | ||
# Note: Ensure no additional labels are added to job_config after this point, | ||
# as `add_and_trim_labels` ensures the label count does not exceed 64. | ||
bq_io.add_and_trim_labels(job_config) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we actually create a job resource in the backend when we do a dry-run, so this might actually lose analytics. I think probably better not to call |
||
query_job = self.bqclient.query(sql, job_config=job_config) | ||
_ = query_job.result() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm pretty sure |
||
return query_job | ||
|
@@ -486,8 +491,10 @@ def _run_execute_query( | |
if not self.strictly_ordered: | ||
job_config.labels["bigframes-mode"] = "unordered" | ||
|
||
# Note: add_labels is global scope which may have unexpected effects | ||
bq_io.add_labels(job_config, api_name=api_name) | ||
# Note: add_and_trim_labels is global scope which may have unexpected effects | ||
# Ensure no additional labels are added to job_config after this point, | ||
# as `add_and_trim_labels` ensures the label count does not exceed 64. | ||
bq_io.add_and_trim_labels(job_config, api_name=api_name) | ||
try: | ||
query_job = self.bqclient.query(sql, job_config=job_config) | ||
return ( | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The repetition of this note at multiple places makes me think - can we not call
start_query_with_client
instead ofbqclient.query
from everywhere? That way any preprocessing (including labels) would be centralized and we would have only one note lilke this in our code.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic after bqclient.query in start_query_with_client don't seems to match other functions, so this may not work.
We can make a new function that add labels and execute to replace bqclient.query, what do you think? This is my original thought but I'm not sure if it's necessary to add a new function just for two lines of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a great refactoring but better than leaving repetitive notes IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we could make a hook in the
ClientsProvider._create_bigquery_client
, something like:@tswast would this be too hacky? What would be the most pythonic way to ensure labels are within the max limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would discourage us from monkey patching like this, though it should work in Python. I think your first instinct of using our central helper function makes the most sense to me.
I don't fully understand what you mean by
We should be able to put
start_query_with_client
in a try/except block. Also, it's probably a bug that we aren't showing that a query job is running here.