-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Swap dataproc batch_id
declaration to model config
#804
Swap dataproc batch_id
declaration to model config
#804
Conversation
@dbeatty10 @colin-rogers-dbt - I realised that the implementation I merged in didn't actually address the desired functionality - this PR fixes it |
@nickozilla we're doing release candidates (RC) for dbt-core v1.6 and their associated adapters right now. So that dbt-bigquery is in a solid state for the RC for 1.6 and to avoid the issue described in #822, we're going to use #826 to revert the original PR in #727. Then going forward, this PR (#804) can be used as the new implementation of the |
@nickozilla could you add at least one test case that covers this new behavior? Ideally, it would include at least two different dbt Python models each with a different custom Let us know if you need any help finding an existing test as a template to follow. |
Co-authored-by: Doug Beatty <[email protected]>
@nickozilla could reply to dbt-labs/docs.getdbt.com#3718 with some code examples that we can include in the documentation? |
Hi @dbeatty10 sorry for delay getting back to you. I've settup the dev environment locally & added a test into target: dev
outputs:
dev:
dataset: dev
job_execution_timeout_seconds: 2000
job_retries: 1
location: EU
method: oauth
priority: interactive
project: "{{ env_var('WAREHOUSE_ANALYTICS_PROJECT_ID') }}"
threads: 8
type: bigquery
dataproc_region: europe-west1
gcs_bucket: "{{ env_var('PYTHON_DBT_MODELS_BUCKET') }}"
dataproc_batch:
environment_config:
execution_config:
service_account: "dbt-py@{{ env_var('WAREHOUSE_ANALYTICS_PROJECT_ID') }}.iam.gserviceaccount.com"
subnetwork_uri: "projects/{{ env_var('NETWORK_PROJECT_ID') }}/regions/europe-west1/subnetworks/dataproc"
network_tags: ["dataproc-ingress"]
staging_bucket: "{{ env_var('PYTHON_DBT_STAGING_BUCKET') }}"
pyspark_batch:
jar_file_uris:
[
"gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.13-0.29.0.jar",
]
runtime_config:
container_image: "{{ env_var('PYTHON_DBT_IMAGESTORE') }}:{{ env_var('PYTHON_DBT_TAG') }}" AFAICT the test framework assumes python models are using a cluster on the default project network and gives no extra support for the parameters we're using in the I've also included a model yaml configuration example in the issue you've linked, and tested it locally, see below. |
@nickozilla no worries! If you commit your changes to |
@dbeatty10 I've added the tests as requested - though I'm not sure I'm using the |
resolves #671
Description
My initial implementation here: #727 does allow users to to set the
batch_id
for dataproc serverless models in the profile & this does apply onto a python model, but this wouldn't work for any projects that have more than one python model (which kinda defeats the point of including it at all). As thebatch_id
parameter needs to be unique, and this is not achievable via a profile level definition.I've created this new implementation for declaring
batch_id
at the model level locally, which would be preferred.Checklist
changie new
to create a changelog entry