Skip to content

Commit

Permalink
Pre next (#81)
Browse files Browse the repository at this point in the history
* removing aggregated features

* removing duplicates from audience segmentation

* implementing skipping windows

* fixing purchase propensity clv training feat eng

* fixing cltv pipelines

* implementing additional tables to agg vbb

* adding more documentation and fixing auto segmentation bug

* evolving the auto segmentation logic

* adding additional code doc

* adding more code documentation

* adding more code documentation

* cleaning code

* fixing issue with one class showing on training dataset

* rewriting non clear installation instructions

* adding extra logic to handle corner cases of rare sales

* adding extra logic to handle corner cases of rare sales

---------

Co-authored-by: Carlos Timoteo <[email protected]>
  • Loading branch information
chmstimoteo and Carlos Timoteo authored Apr 22, 2024
1 parent fe90e90 commit 7c4e854
Show file tree
Hide file tree
Showing 41 changed files with 5,197 additions and 3,596 deletions.
717 changes: 646 additions & 71 deletions config/config.yaml.tftpl

Large diffs are not rendered by default.

17 changes: 13 additions & 4 deletions infrastructure/terraform/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -192,16 +192,25 @@ Because a Cloud Shell session is ephemeral, your Cloud Shell session could termi

**Note:** The prerequisites for running this script are that you have set all the variable values in the TF variables file `$TERRAFORM_RUN_DIR}/terraform.tfvars` and have applied the terraform configuration once before.

To configure a new terminal session run following commands:
Follow the authentication workflow, since your credentials expires daily:

```bash
# Authenticate your user to Google Cloud
gcloud auth login
# Authenticate your application default login to Google Cloud with the right scopes for Terraform to run
gcloud auth application-default login --quiet --scopes="openid,https://www.googleapis.com/auth/userinfo.email,https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/sqlservice.login,https://www.googleapis.com/auth/analytics,https://www.googleapis.com/auth/analytics.edit,https://www.googleapis.com/auth/analytics.provision,https://www.googleapis.com/auth/analytics.readonly,https://www.googleapis.com/auth/accounts.reauth"
```

To resume working on a new terminal session run the following commands:

```bash
# Change directory to the source code root directory
SOURCE_ROOT="${HOME}/marketing-analytics-jumpstart"
cd ${SOURCE_ROOT}
. scripts/session-resume.sh
# Resume the terminal session using values stored on Terraform outputs
./scripts/session-resume.sh
```

Follow the authentication workflow, if prompted.

## Resources created

At this time, the Terraform scripts in this folder perform the following tasks:
Expand Down
55 changes: 51 additions & 4 deletions infrastructure/terraform/modules/activation/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,13 @@ locals {
trigger_function_account_email = "${local.app_prefix}-${local.trigger_function_account_name}@${var.project_id}.iam.gserviceaccount.com"

activation_type_configuration_file = "${local.source_root_dir}/templates/activation_type_configuration_template.tpl"
# This is calculating a hash number on the file content to keep track of changes and trigger redeployment of resources
# in case the file content changes.
activation_type_configuration_file_content_hash = filesha512(local.activation_type_configuration_file)

app_payload_template_file = "${local.source_root_dir}/templates/app_payload_template.jinja2"
# This is calculating a hash number on the file content to keep track of changes and trigger redeployment of resources
# in case the file content changes.
app_payload_template_file_content_hash = filesha512(local.activation_type_configuration_file)

activation_application_dir = "${local.source_root_dir}/python/activation"
Expand All @@ -50,7 +54,14 @@ locals {
"${local.activation_application_dir}/requirements.txt",
"${local.activation_application_dir}/pipeline_test.py",
]
# This is calculating a hash number on the files contents to keep track of changes and trigger redeployment of resources
# in case any of these files contents changes.
activation_application_content_hash = sha512(join("", [for f in local.activation_application_fileset : fileexists(f) ? filesha512(f) : sha512("file-not-found")]))

audience_segmentation_activation_query_file = "${local.source_root_dir}/sql/query/audience_segmentation_query_template.sql"
# This is calculating a hash number on the file content to keep track of changes and trigger redeployment of resources
# in case the file content changes.
audience_segmentation_activation_query_file_content_hash = filesha512(local.audience_segmentation_activation_query_file)
}

data "google_project" "activation_project" {
Expand Down Expand Up @@ -91,12 +102,14 @@ module "bigquery" {

dataset_id = local.app_prefix
dataset_name = local.app_prefix
description = "activation appliction logs"
description = "activation application logs"
project_id = var.project_id
location = var.data_location
delete_contents_on_destroy = false
}

# This resource executes gcloud commands to check whether the artifact registry API is enabled.
# Since enabling APIs can take a few seconds, we need to make the deployment wait until the API is enabled before resuming.
resource "null_resource" "check_artifactregistry_api" {
provisioner "local-exec" {
command = <<-EOT
Expand All @@ -117,12 +130,13 @@ resource "null_resource" "check_artifactregistry_api" {
}
}

# This resouce calls a python command defined inside the module ga4_setup that is responsible for creating
# all required custom events in the Google Analytics 4 property.
# Check the python file ga4-setup/setup.py for more information.
resource "null_resource" "create_custom_events" {
triggers = {
services_enabled_project = module.project_services.project_id
source_contents_hash = local.activation_type_configuration_file_content_hash
#source_activation_type_configuration_hash = local.activation_type_configuration_file_content_hash
#source_activation_application_python_hash = local.activation_application_content_hash
}
provisioner "local-exec" {
command = <<-EOT
Expand All @@ -132,9 +146,13 @@ resource "null_resource" "create_custom_events" {
}
}

# This resource calls a python command defines inside the module ga4_setup that is responsible for creating
# all required custom events in the Google Analytics 4 property.
# Check the python file ga4_setup/setup.py for more information.
resource "null_resource" "create_custom_dimensions" {
triggers = {
services_enabled_project = module.project_services.project_id
source_contents_hash = local.audience_segmentation_activation_query_file_content_hash
#source_activation_type_configuration_hash = local.activation_type_configuration_file_content_hash
#source_activation_application_python_hash = local.activation_application_content_hash
}
Expand All @@ -146,6 +164,7 @@ resource "null_resource" "create_custom_dimensions" {
}
}

# This resource creates an Artifact Registry repository for the docker images used by the Activation Application.
resource "google_artifact_registry_repository" "activation_repository" {
project = null_resource.check_artifactregistry_api.id != "" ? module.project_services.project_id : ""
location = var.location
Expand Down Expand Up @@ -189,16 +208,23 @@ module "trigger_function_account" {
description = "Account used to run the activation trigger function"
}

# This an external data that retrieves information about the Google Analytics 4 property using
# a python command defined in the module ga4_setup.
# This informatoin can then be used in other parts of the Terraform configuration to access the retrieved information.
data "external" "ga4_measurement_properties" {
program = ["bash", "-c", "${local.poetry_run_alias} ga4-setup --ga4_resource=measurement_properties --ga4_property_id=${var.ga4_property_id} --ga4_stream_id=${var.ga4_stream_id}"]
working_dir = local.source_root_dir
# The count attribute specifies how many times the external data source should be executed.
# This means that the external data source will be executed only if either the
# var.ga4_measurement_id or var.ga4_measurement_secret variable is not set.
count = (var.ga4_measurement_id == null || var.ga4_measurement_secret == null) ? 1 : 0

depends_on = [
module.project_services
]
}

# This module stores the values ga4-measurement-id and ga4-measurement-secret in Google Cloud Secret Manager.
module "secret_manager" {
source = "GoogleCloudPlatform/secret-manager/google"
version = "~> 0.1"
Expand All @@ -217,6 +243,7 @@ module "secret_manager" {
]
}

# This module creates a Cloud Storage bucket to be used by the Activation Application
module "pipeline_bucket" {
source = "terraform-google-modules/cloud-storage/google//modules/simple_bucket"
version = "~> 3.4.1"
Expand Down Expand Up @@ -246,12 +273,14 @@ module "pipeline_bucket" {
]
}

# This resource creates a bucket object using as content the measurement_protocol_payload_template_file file.
resource "google_storage_bucket_object" "measurement_protocol_payload_template_file" {
name = "${local.configuration_folder}/${local.measurement_protocol_payload_template_file}"
source = "${local.template_dir}/${local.measurement_protocol_payload_template_file}"
bucket = module.pipeline_bucket.name
}

# This resource creates a bucket object using as content the audience_segmentation_query_template_file file.
data "template_file" "audience_segmentation_query_template_file" {
template = file("${local.template_dir}/activation_query/${local.audience_segmentation_query_template_file}")

Expand All @@ -276,6 +305,7 @@ data "template_file" "auto_audience_segmentation_query_template_file" {
}
}

# This resource creates a bucket object using as content the auto_audience_segmentation_query_template_file file.
resource "google_storage_bucket_object" "auto_audience_segmentation_query_template_file" {
name = "${local.configuration_folder}/${local.auto_audience_segmentation_query_template_file}"
content = data.template_file.auto_audience_segmentation_query_template_file.rendered
Expand All @@ -291,6 +321,7 @@ data "template_file" "cltv_query_template_file" {
}
}

# This resource creates a bucket object using as content the cltv_query_template_file file.
resource "google_storage_bucket_object" "cltv_query_template_file" {
name = "${local.configuration_folder}/${local.cltv_query_template_file}"
content = data.template_file.cltv_query_template_file.rendered
Expand All @@ -306,12 +337,14 @@ data "template_file" "purchase_propensity_query_template_file" {
}
}

# This resource creates a bucket object using as content the purchase_propensity_query_template_file file.
resource "google_storage_bucket_object" "purchase_propensity_query_template_file" {
name = "${local.configuration_folder}/${local.purchase_propensity_query_template_file}"
content = data.template_file.purchase_propensity_query_template_file.rendered
bucket = module.pipeline_bucket.name
}

# This data resources creates a data resource that renders a template file and stores the rendered content in a variable.
data "template_file" "activation_type_configuration" {
template = file("${local.template_dir}/activation_type_configuration_template.tpl")

Expand All @@ -324,13 +357,16 @@ data "template_file" "activation_type_configuration" {
}
}

# This resource creates a bucket object using as content the activation_type_configuration.json file.
resource "google_storage_bucket_object" "activation_type_configuration_file" {
name = "${local.configuration_folder}/activation_type_configuration.json"
content = data.template_file.activation_type_configuration.rendered
bucket = module.pipeline_bucket.name
# Detects md5hash changes to redeploy this file to the GCS bucket.
detect_md5hash = base64encode("${local.activation_type_configuration_file_content_hash}${local.activation_application_content_hash}")
}

# This module submits a gcloud build to build a docker container image to be used by the Activation Application
module "activation_pipeline_container" {
source = "terraform-google-modules/gcloud/google"
version = "3.1.2"
Expand All @@ -345,6 +381,7 @@ module "activation_pipeline_container" {
}
}

# This module executes a gcloud command to build a dataflow flex template and uploads it to Dataflow
module "activation_pipeline_template" {
source = "terraform-google-modules/gcloud/google"
version = "3.1.2"
Expand All @@ -363,17 +400,20 @@ module "activation_pipeline_template" {
]
}

# This resource creates a Pub Sub topic to be used by the Activation Application
resource "google_pubsub_topic" "activation_trigger" {
name = "activation-trigger"
project = var.project_id
}

# This data resource generates a ZIP archive file containing the contents of the specified source_dir directory
data "archive_file" "activation_trigger_source" {
type = "zip"
output_path = "${local.trigger_function_dir}/${local.source_archive_file}"
source_dir = "${local.trigger_function_dir}/trigger_activation"
}

# This module creates a Cloud Sorage bucket and sets the trigger_function_account_email as the admin.
module "function_bucket" {
source = "terraform-google-modules/cloud-storage/google//modules/simple_bucket"
version = "~> 3.4.1"
Expand Down Expand Up @@ -403,17 +443,20 @@ module "function_bucket" {
]
}

# This resource creates a bucket object using as content the activation_trigger_archive zip file.
resource "google_storage_bucket_object" "activation_trigger_archive" {
name = local.source_archive_file
source = data.archive_file.activation_trigger_source.output_path
bucket = module.function_bucket.name
}

# This resource creates a Cloud Function version 2, with a python 3.11 runtime using the activation_trigger_archive zip file in the bucket as source code.
resource "google_cloudfunctions2_function" "activation_trigger_cf" {
name = "activation-trigger"
project = module.project_services.project_id
location = var.trigger_function_location

# Build config to prepare the code to run on Cloud Functions 2
build_config {
runtime = "python311"
source {
Expand All @@ -426,13 +469,15 @@ resource "google_cloudfunctions2_function" "activation_trigger_cf" {
docker_repository = "projects/${module.project_services.project_id}/locations/${var.trigger_function_location}/repositories/gcf-artifacts"
}

# Sets the trigger for the Cloud Function using the Pub Sub topic created above
event_trigger {
event_type = "google.cloud.pubsub.topic.v1.messagePublished"
pubsub_topic = google_pubsub_topic.activation_trigger.id
retry_policy = "RETRY_POLICY_DO_NOT_RETRY"
trigger_region = var.trigger_function_location
}

# Service endpoint configuration for the Cloud Function
service_config {
available_memory = "256M"
max_instance_count = 3
Expand All @@ -448,6 +493,7 @@ resource "google_cloudfunctions2_function" "activation_trigger_cf" {
LOG_DATA_SET = module.bigquery.bigquery_dataset.dataset_id
PIPELINE_WORKER_EMAIL = module.pipeline_service_account.email
}
# Sets the environment variables from the secrets stored on Secret Manager
secret_environment_variables {
project_id = var.project_id
key = "GA4_MEASUREMENT_ID"
Expand All @@ -461,12 +507,13 @@ resource "google_cloudfunctions2_function" "activation_trigger_cf" {
version = split("/", module.secret_manager.secret_versions[1])[5]
}
}

# lifecycle configuration ignores the changes to the source zip file
lifecycle {
ignore_changes = [build_config[0].source[0].storage_source[0].generation]
}
}

# This modules runs cloud commands that adds an invoker policy binding to a Cloud Function, allowing a specific service account to invoke the function.
module "add_invoker_binding" {
source = "terraform-google-modules/gcloud/google"
version = "3.1.2"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

# This is a module that activates APIs services in the Cloud Project.
# https://registry.terraform.io/modules/terraform-google-modules/project-factory/google/latest/submodules/project_services
module "data-processing-project-services" {
source = "terraform-google-modules/project-factory/google//modules/project_services"
Expand Down
6 changes: 6 additions & 0 deletions infrastructure/terraform/modules/data-store/dataform.tf
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,10 @@
# See the License for the specific language governing permissions and
# limitations under the License.

# This local block is used to declare two local variables: `dataform_available_locations` and `dataform_derived_region`.
# dataform_available_locations: This variable is a list of all the available regions for Google Dataform. It is used to validate the value of the dataform_region variable.
# dataform_derived_region: This variable derives the region to be used for the Dataform repository. It uses the value of the dataform_region variable if it is not empty,
# otherwise it uses the value of the google_default_region variable.
locals {
dataform_available_locations = [
"asia-east1",
Expand All @@ -31,6 +35,8 @@ locals {
]
dataform_derived_region = var.dataform_region != "" ? var.dataform_region : var.google_default_region
}

# This resource creates a Dataform repository.
resource "google_dataform_repository" "marketing-analytics" {
provider = google-beta
name = "marketing-analytics"
Expand Down
3 changes: 3 additions & 0 deletions infrastructure/terraform/modules/data-store/iam-binding.tf
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ locals {
dataform_sa = "service-${data.google_project.data_processing.number}@gcp-sa-dataform.iam.gserviceaccount.com"
}

# Wait for the dataform service account to be created
resource "null_resource" "wait_for_dataform_sa_creation" {
provisioner "local-exec" {
command = <<-EOT
Expand All @@ -51,6 +52,8 @@ resource "null_resource" "wait_for_dataform_sa_creation" {
google_dataform_repository.marketing-analytics
]
}

# This resource sets the Dataform service account IAM member roles
resource "google_project_iam_member" "dataform-serviceaccount" {
depends_on = [null_resource.wait_for_dataform_sa_creation]
for_each = toset([
Expand Down
Loading

0 comments on commit 7c4e854

Please sign in to comment.