Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Incorrect 'done' signal - leading to premature start of downstream models #1387

Open
2 tasks done
eschaubert opened this issue Oct 25, 2024 · 1 comment
Open
2 tasks done
Labels
bug Something isn't working

Comments

@eschaubert
Copy link

eschaubert commented Oct 25, 2024

Is this a new bug in dbt-bigquery?

  • I believe this is a new bug in dbt-bigquery
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

It appears that dbt received an incorrect 'done' signal or interpreted some BQ communication as a 'done' signal and continued with the next model in the DAG leading to it having outdated information since the upstream table was not finished yet.

In the dbt LOG it looks like this

 08:21:12  1 of 2 START table model gigante.upstream_model .................... [RUN]
 08:25:11  1 of 2 OK created table model gigante.upstream_model ............... [[32mCREATE TABLE (7.8b rows, None processed)[0m in 239.26s]
 08:25:11  2 of 2 START table model gigante.downstream_model .................. [RUN]
 08:25:22  2 of 2 OK created table model gigante.downstream_model ............. [�[32mCREATE TABLE (42.3m rows, 7.5 GB processed)�[0m in 11.09s]

Here it looks like the first model ran only for 4 minutes. But the BIGQUERY JOB Logs show that the upstream_model finished at 08:53:22. This matches what we'd expect. The query usually takes about 30 minutes to run. And it also matches with the outdated data that was present in the downstream_model but updated data in the upstream_model.

Very interesting is the "None processed" part in the second line, which we have never gotten before this model.

Expected Behavior

We would expect the dbt log to be in line with BigQuery Log regarding the model and we would expect the processed size information to be a positive human readable size figure and not 'None'.

Steps To Reproduce

Not reproducible. Rerunning the model fixed the issue. We found this has happened before with other models, but it is rare. It seems to be a fluke.

Relevant log output

 08:21:12  1 of 2 START table model gigante.upstream_model .................... [RUN]
 08:25:11  1 of 2 OK created table model gigante.upstream_model ............... [[32mCREATE TABLE (7.8b rows, None processed)[0m in 239.26s]
 08:25:11  2 of 2 START table model gigante.downstream_model .................. [RUN]
 08:25:22  2 of 2 OK created table model gigante.downstream_model ............. [�[32mCREATE TABLE (42.3m rows, 7.5 GB processed)�[0m in 11.09s]

Environment

- OS: alpine:3.19
- Python: 3.9
- dbt-core: 1.1.1
- dbt-bigquery: 1.1.1

Additional Context

This is a simple table model and we ran it every day for over a year without any issues, it runs in a container of which the configs have not been changed for the last year. Literally just re-runnung the dbt run command fixed the issue, so it's not something we can reproduce.

@eschaubert eschaubert added bug Something isn't working triage labels Oct 25, 2024
@amychen1776
Copy link

@eschaubert could you try this dbt-bigquery 1.8? You're on a no longer supported version of dbt (1.1) and a lot has changed since then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants