You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some thoughts of what to try or what to add following the retry refactoring:
it looks like we might be hanging on the model upload to gcs since some folks are saying they don't see the job getting created
we take the defaults here (60s timeout)
we could be running into retention policy issues or permissions issues
we do not catch any errors here
folks reported issues specifically with serverless batch jobs hanging, we used to retry this with a custom polling method, but we now use the built in .result to wait for the operation to finish while supplying the same timeout config from the user; so this might be resolved
we do not catch any errors for serverless batch jobs nor cluster jobs
when we run sql models, we use an error handler on the connections class that routes certain errors to dbt errors
when we run sql models, we use retry strategies that retry particular errors that we identified as transient errors; google does warn against overriding the defaults for dataproc here
The text was updated successfully, but these errors were encountered:
colin-rogers-dbt
changed the title
[Regression] Understand DataProc Batch Jobs Failure Scenarios
[Regression][SPIKE] Understand DataProc Batch Jobs Failure Scenarios
Aug 28, 2024
Current Behavior
Users have noted (see: #1157 ) inconsistent performance in production with python models on dataproc.
We need to map out when/why jobs on dataproc are failing and how dbt-bigquery should handle those scenarios (i.e. retry, raise a warning etc.)
Expected/Previous Behavior
Python model execution should be stable.
Environment
Additional Context
Some thoughts of what to try or what to add following the retry refactoring:
.result
to wait for the operation to finish while supplying the same timeout config from the user; so this might be resolvedThe text was updated successfully, but these errors were encountered: