Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retry factory to consolidate retry strategies across dbt-bigquery #1395

Merged
merged 43 commits into from
Nov 20, 2024
Merged
Show file tree
Hide file tree
Changes from 31 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
2b01804
fix imports
mikealfare Oct 31, 2024
8b45594
create a retry factory and move relevant objects from connections
mikealfare Nov 1, 2024
391099d
add on_error method for deadline retries
mikealfare Nov 1, 2024
7872a58
remove dependency on retry_and_handle from cancel_open
mikealfare Nov 1, 2024
42a8869
remove dependencies on retry_and_handle
mikealfare Nov 1, 2024
900dcac
remove timeout methods from connection manager
mikealfare Nov 1, 2024
81bfa0c
add retry to get_bq_table
mikealfare Nov 4, 2024
3e32872
fix mocks in unit tests
mikealfare Nov 4, 2024
89e2a50
rebase on main
mikealfare Nov 5, 2024
3f79642
reorder this tuple to make the pr review easier to understand
mikealfare Nov 5, 2024
f300080
move client factory to credentials module so that on_error can be mov…
mikealfare Nov 6, 2024
c3065e5
move on_error factory to retry module
mikealfare Nov 6, 2024
ad74114
move client factories from python_submissions module to credentials m…
mikealfare Nov 6, 2024
9029c49
create a clients module
mikealfare Nov 6, 2024
bc0fbea
retry all client factories by default
mikealfare Nov 6, 2024
9a9f87e
move polling from manual check in python_submissions module into retr…
mikealfare Nov 6, 2024
136ea77
move load_dataframe logic from adapter to connection manager, use the…
mikealfare Nov 6, 2024
90d5308
move upload_file logic from adapter to connection manager, use the bu…
mikealfare Nov 6, 2024
9211e1c
move the retry to polling for done instead of create
mikealfare Nov 6, 2024
e90c24d
fix broken import in tests from code migration
mikealfare Nov 6, 2024
a2db35b
align new retries with original methods, simplify retry factory
mikealfare Nov 7, 2024
b8408c2
fix seed load result
mikealfare Nov 7, 2024
5b896ee
create a method for the dataproc endpoint
mikealfare Nov 7, 2024
43c10f1
add some readability updates
mikealfare Nov 7, 2024
4256682
add some readability updates
mikealfare Nov 7, 2024
5644509
add some readability updates, simplify submit methods
mikealfare Nov 7, 2024
df2971b
make imports explicit, remove unused constant
mikealfare Nov 7, 2024
0beaac6
changelog
mikealfare Nov 7, 2024
6e2f4b4
add community member who contributed a solution and research to the c…
mikealfare Nov 9, 2024
b560554
Merge branch 'main' into add-retry-factory
mikealfare Nov 18, 2024
6354483
Merge branch 'main' into add-retry-factory
colin-rogers-dbt Nov 19, 2024
f72da43
update names in clients.py to follow the naming convention
mikealfare Nov 19, 2024
9fb25bc
update names in connections.py to follow the naming convention
mikealfare Nov 19, 2024
e99d857
update names in credentials.py to follow the naming convention
mikealfare Nov 19, 2024
f8ad953
update names in python_submissions.py to follow the naming convention
mikealfare Nov 19, 2024
5f3a456
update names in retry.py to follow the naming convention
mikealfare Nov 19, 2024
7c4388f
run linter and update unit test mocks
mikealfare Nov 19, 2024
5928098
update types on retry factory
mikealfare Nov 20, 2024
02385bb
update inputs on retry factory
mikealfare Nov 20, 2024
51cc87f
update predicate class name
mikealfare Nov 20, 2024
eaab976
add retry strategy back to copy table
mikealfare Nov 20, 2024
a81289f
linting and fix unit test for new argument
mikealfare Nov 20, 2024
76d6979
fix whitespace
mikealfare Nov 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .changes/unreleased/Under the Hood-20241107-143856.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
kind: Under the Hood
body: Create a retry factory to simplify retry strategies across dbt-bigquery
time: 2024-11-07T14:38:56.210445-05:00
custom:
Author: mikealfare osalama
Issue: "1395"
67 changes: 67 additions & 0 deletions dbt/adapters/bigquery/clients.py
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These client methods used to live in BigQueryConnectionsManager and python_submissions. Centralizing them here reduced the interface for credentials and removed noise from those other classes, making troubleshooting easier.

Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
from google.api_core.client_info import ClientInfo
from google.api_core.client_options import ClientOptions
from google.api_core.retry import Retry
from google.auth.exceptions import DefaultCredentialsError
from google.cloud.bigquery import Client as BigQueryClient
from google.cloud.dataproc_v1 import BatchControllerClient, JobControllerClient
from google.cloud.storage import Client as StorageClient

from dbt.adapters.events.logging import AdapterLogger

import dbt.adapters.bigquery.__version__ as dbt_version
from dbt.adapters.bigquery.credentials import (
BigQueryCredentials,
google_credentials,
setup_default_credentials,
)


_logger = AdapterLogger("BigQuery")


def bigquery_client(credentials: BigQueryCredentials) -> BigQueryClient:
try:
return _bigquery_client(credentials)
except DefaultCredentialsError:
_logger.info("Please log into GCP to continue")
setup_default_credentials()
return _bigquery_client(credentials)


@Retry() # google decorator. retries on transient errors with exponential backoff
def storage_client(credentials: BigQueryCredentials) -> StorageClient:
mikealfare marked this conversation as resolved.
Show resolved Hide resolved
return StorageClient(
project=credentials.execution_project,
credentials=google_credentials(credentials),
)


@Retry() # google decorator. retries on transient errors with exponential backoff
def job_controller_client(credentials: BigQueryCredentials) -> JobControllerClient:
mikealfare marked this conversation as resolved.
Show resolved Hide resolved
return JobControllerClient(
credentials=google_credentials(credentials),
client_options=ClientOptions(api_endpoint=_dataproc_endpoint(credentials)),
)


@Retry() # google decorator. retries on transient errors with exponential backoff
def batch_controller_client(credentials: BigQueryCredentials) -> BatchControllerClient:
mikealfare marked this conversation as resolved.
Show resolved Hide resolved
return BatchControllerClient(
credentials=google_credentials(credentials),
client_options=ClientOptions(api_endpoint=_dataproc_endpoint(credentials)),
)


@Retry() # google decorator. retries on transient errors with exponential backoff
def _bigquery_client(credentials: BigQueryCredentials) -> BigQueryClient:
return BigQueryClient(
credentials.execution_project,
google_credentials(credentials),
location=getattr(credentials, "location", None),
client_info=ClientInfo(user_agent=f"dbt-bigquery-{dbt_version.version}"),
client_options=ClientOptions(quota_project_id=credentials.quota_project),
)


def _dataproc_endpoint(credentials: BigQueryCredentials) -> str:
return f"{credentials.dataproc_region}-dataproc.googleapis.com:443"
Loading