Skip to content

Commit

Permalink
Fixing issues with BQ to Vertex Service Account IAM member role (#252)
Browse files Browse the repository at this point in the history
* predicting for only the users with traffic in the past 72h - purchase propensity

* running inference only for users events in the past 72h

* including 72h users for all models predictions

* considering null values in TabWorkflow models

* deleting unused pipfile

* upgrading lib versions

* implementing reporting preprocessing as a new pipeline

* adding more code documentation

* adding important information on the main README.md and DEVELOPMENT.md

* adding schedule run name and more code documentation

* implementing a new scheduler using the vertex ai sdk & adding user_id to procedures for consistency

* adding more code documentation

* adding code doc to the python custom component

* adding more code documentation

* fixing aggregated predictions query

* removing unnecessary resources from deployment

* Writing MDS guide

* adding the MDS developer and troubleshooting documentation

* fixing deployment for activation pipelines and gemini dataset

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* removing deprecated api

* fixing purchase propensity pipelines names

* adding extra condition for when there is not enough data for the window interval to be applied on backfill procedures

* adding more instructions for post deployment and fixing issues when GA4 export was configured for less than 10 days

* removing unnecessary comments

* adding the number of past days to process in the variables files

* adding comment about combining data from different ga4 export datasets to data store

* fixing small issues with feature engineering and ml pipelines

* fixing hyper parameter tuning for kmeans modeling

* fixing optuna parameters

* adding cloud shell image

* fixing the list of all possible users in the propensity training preparation tables

* additional guardrails for when there is not enough data

* adding more documentation

* adding more doc to feature store

* add feature store documentation

* adding ml pipelines docs

* adding ml pipelines docs

* adding more documentation

* adding user agent client info

* fixing scope of client info

* fix

* removing client_info from vertex components

* fixing versioning of tf submodules

* reconfiguring meta providers

* fixing issue 187

* chore(deps): upgrade terraform providers and modules version

* chore(deps): set the provider version

* chore: formatting

* fix: brand naming

* fix: typo

* fixing secrets issue

* implementing secrets region as tf variable

* implementing secrets region as tf variable

* last changes requested by lgrangeau

* documenting keys location better

* implementing vpc peering network

* Update README.md

* Rebase Main into Multi-property (#243)

* Update README.md

* ensure the build bucket is created in the specified region (#230)

* Update audience_segmentation_query_template.sqlx

* Update auto_audience_segmentation_query_template.sqlx

* Update churn_propensity_query_template.sqlx

* Update cltv_query_template.sqlx

* Update purchase_propensity_query_template.sqlx

* Restrict regions for GCP Cloud Build support (#241)

* Update README.md

* Move to uv (#242)

* add uv required project table segment in toml file

* switch to uv in terraform deployment

* switch to uv

* remove poetry usage from terraform

* format

* remove poetry

* Add files via upload

---------

Co-authored-by: Charlie Wang <[email protected]>
Co-authored-by: Mårten Lindblad <[email protected]>

* supporting property id in the resources

* fixing iam member roles issues

* fixing issue with service account iam resources

---------

Co-authored-by: Carlos Timoteo <[email protected]>
Co-authored-by: Laurent Grangeau <[email protected]>
Co-authored-by: Charlie Wang <[email protected]>
Co-authored-by: Mårten Lindblad <[email protected]>
  • Loading branch information
5 people authored Nov 22, 2024
1 parent 74756a7 commit b9b127f
Show file tree
Hide file tree
Showing 4 changed files with 87 additions and 42 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ resource "google_workflows_workflow" "dataform-incremental-workflow" {
name = "dataform-${var.property_id}-incremental"
region = var.region
description = "Dataform incremental workflow for ${var.property_id} ga4 property"
service_account = google_service_account.workflow-dataform.email
service_account = module.workflow-dataform.email
# The source code includes the following steps:
# Init: This step initializes the workflow by assigning the value of the dataform_repository_id variable to the repository variable.
# Create Compilation Result: This step creates a compilation result for the Dataform repository. The compilation result includes the git commit hash and the code compilation configuration.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ resource "google_cloud_scheduler_job" "daily-dataform-increments" {
uri = "https://workflowexecutions.googleapis.com/v1/projects/${module.data_processing_project_services.project_id}/locations/${var.region}/workflows/${google_workflows_workflow.dataform-incremental-workflow.name}/executions"

oauth_token {
service_account_email = google_service_account.scheduler.email
service_account_email = module.scheduler.email
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -12,20 +12,36 @@
# See the License for the specific language governing permissions and
# limitations under the License.

resource "google_service_account" "scheduler" {
locals {
scheduler_sa = "workflow-scheduler-${var.property_id}@${module.data_processing_project_services.project_id}.iam.gserviceaccount.com"
workflows_sa = "workflow-dataform-${var.property_id}@${module.data_processing_project_services.project_id}.iam.gserviceaccount.com"
}

module "scheduler" {
source = "terraform-google-modules/service-accounts/google//modules/simple-sa"
version = "~> 4.0"

project_id = null_resource.check_cloudscheduler_api.id != "" ? module.data_processing_project_services.project_id : var.project_id
name = "workflow-scheduler-${var.property_id}"
project_roles = [
"roles/workflows.invoker"
]

depends_on = [
module.data_processing_project_services,
null_resource.check_cloudscheduler_api,
]

project = null_resource.check_cloudscheduler_api.id != "" ? module.data_processing_project_services.project_id : var.project_id
account_id = "workflow-scheduler-${var.property_id}"
display_name = "Service Account to schedule Dataform workflows in ${var.property_id}"
}

locals {
scheduler_sa = "workflow-scheduler-${var.property_id}@${module.data_processing_project_services.project_id}.iam.gserviceaccount.com"
workflows_sa = "workflow-dataform-${var.property_id}@${module.data_processing_project_services.project_id}.iam.gserviceaccount.com"
# Propagation time for change of access policy typically takes 2 minutes
# according to https://cloud.google.com/iam/docs/access-change-propagation
# this wait make sure the policy changes are propagated before proceeding
# with the build
resource "time_sleep" "wait_for_scheduler_service_account_role_propagation" {
create_duration = "120s"
depends_on = [
module.scheduler
]
}

# Wait for the scheduler service account to be created
Expand All @@ -37,45 +53,52 @@ resource "null_resource" "wait_for_scheduler_sa_creation" {
MAX_TRIES=100
while ! gcloud iam service-accounts list --project=${module.data_processing_project_services.project_id} --filter="EMAIL:${local.scheduler_sa} AND DISABLED:False" --format="table(EMAIL, DISABLED)" && [ $COUNTER -lt $MAX_TRIES ]
do
sleep 3
sleep 10
printf "."
COUNTER=$((COUNTER + 1))
done
if [ $COUNTER -eq $MAX_TRIES ]; then
echo "scheduler service account was not created, terraform can not continue!"
exit 1
fi
sleep 20
sleep 120
EOT
}

depends_on = [
module.data_processing_project_services,
null_resource.check_dataform_api
time_sleep.wait_for_scheduler_service_account_role_propagation,
null_resource.check_dataform_api,
module.scheduler,
]
}

resource "google_project_iam_member" "scheduler-workflow-invoker" {
depends_on = [
module.data_processing_project_services,
null_resource.check_cloudscheduler_api,
null_resource.wait_for_scheduler_sa_creation
]
module "workflow-dataform" {
source = "terraform-google-modules/service-accounts/google//modules/simple-sa"
version = "~> 4.0"

project = null_resource.check_cloudscheduler_api.id != "" ? module.data_processing_project_services.project_id : var.project_id
member = "serviceAccount:${google_service_account.scheduler.email}"
role = "roles/workflows.invoker"
}
project_id = null_resource.check_workflows_api.id != "" ? module.data_processing_project_services.project_id : var.project_id
name = "workflow-dataform-${var.property_id}"
project_roles = [
"roles/dataform.editor"
]

resource "google_service_account" "workflow-dataform" {
depends_on = [
module.data_processing_project_services,
null_resource.check_workflows_api,
null_resource.check_dataform_api,
]
}

project = null_resource.check_workflows_api.id != "" ? module.data_processing_project_services.project_id : var.project_id
account_id = "workflow-dataform-${var.property_id}"
display_name = "Service Account to run Dataform workflows in ${var.property_id}"
# Propagation time for change of access policy typically takes 2 minutes
# according to https://cloud.google.com/iam/docs/access-change-propagation
# this wait make sure the policy changes are propagated before proceeding
# with the build
resource "time_sleep" "wait_for_workflow_dataform_service_account_role_propagation" {
create_duration = "120s"
depends_on = [
module.workflow-dataform
]
}

# Wait for the workflows service account to be created
Expand All @@ -86,33 +109,22 @@ resource "null_resource" "wait_for_workflows_sa_creation" {
MAX_TRIES=100
while ! gcloud iam service-accounts list --project=${module.data_processing_project_services.project_id} --filter="EMAIL:${local.workflows_sa} AND DISABLED:False" --format="table(EMAIL, DISABLED)" && [ $COUNTER -lt $MAX_TRIES ]
do
sleep 3
sleep 10
printf "."
COUNTER=$((COUNTER + 1))
done
if [ $COUNTER -eq $MAX_TRIES ]; then
echo "workflows service account was not created, terraform can not continue!"
exit 1
fi
sleep 20
sleep 120
EOT
}

depends_on = [
module.data_processing_project_services,
null_resource.check_dataform_api
]
}


resource "google_project_iam_member" "worflow-dataform-dataform-editor" {
depends_on = [
module.data_processing_project_services,
null_resource.check_dataform_api,
null_resource.wait_for_workflows_sa_creation
module.workflow-dataform,
time_sleep.wait_for_workflow_dataform_service_account_role_propagation,
]

project = null_resource.check_workflows_api.id != "" ? module.data_processing_project_services.project_id : var.project_id
member = "serviceAccount:${google_service_account.workflow-dataform.email}"
role = "roles/dataform.editor"
}
33 changes: 33 additions & 0 deletions infrastructure/terraform/modules/feature-store/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -139,5 +139,38 @@ resource "google_project_iam_member" "vertex_ai_connection_sa_roles" {
"roles/bigquery.connectionAdmin"
])
role = each.key

# The lifecycle block is used to configure the lifecycle of the table. In this case, the ignore_changes attribute is set to all, which means that Terraform will ignore
# any changes to the table and will not attempt to update the table. The prevent_destroy attribute is set to true, which means that Terraform will prevent the table from being destroyed.
lifecycle {
ignore_changes = all
prevent_destroy = true
}
}


#module "vertex_ai_connection_sa_roles" {
# source = "terraform-google-modules/iam/google//modules/member_iam"
# version = "~> 8.0"
#
# service_account_address = google_bigquery_connection.vertex_ai_connection.cloud_resource[0].service_account_id
# project_id = null_resource.check_aiplatform_api.id != "" ? module.project_services.project_id : local.feature_store_project_id
# project_roles = [
# "roles/bigquery.jobUser",
# "roles/bigquery.dataEditor",
# "roles/storage.admin",
# "roles/storage.objectViewer",
# "roles/aiplatform.user",
# "roles/bigquery.connectionUser",
# "roles/bigquery.connectionAdmin"
# ]
# prefix = "serviceAccount"
#
# depends_on = [
# module.project_services,
# null_resource.check_aiplatform_api,
# google_bigquery_connection.vertex_ai_connection
# ]
#
#}

0 comments on commit b9b127f

Please sign in to comment.