then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry.
+On the next step, the Cloud Resource Manager API will be enabled and, then, your credentials will finally work.
## Set environment variables
Run the set variable script and follow the steps to provide value for every variable:
@@ -36,16 +43,29 @@ Run the set variable script and follow the steps to provide value for every vari
```
## Create the Terraform variables file
-
```sh
envsubst < "${SOURCE_ROOT}/infrastructure/cloudshell/terraform-template.tfvars" > "${TERRAFORM_RUN_DIR}/terraform.tfvars"
```
Provide value for the `dataform_github_token` variable in the generated
terraform.tfvars file
-## Authenticate with additional OAuth 2.0 scopes
+
+## Review your Terraform version
+Make sure you have installed terraform version is 1.9.7. We recommend you to use [tfenv](https://github.com/tfutils/tfenv) to manage your terraform version.
+`Tfenv` is a version manager inspired by rbenv, a Ruby programming language version manager.
+To install `tfenv`, run the following commands:
+```sh
+# Install via Homebrew or via Arch User Repository (AUR)
+# Follow instructions on https://github.com/tfutils/tfenv
+# Now, install the recommended terraform version
+tfenv install 1.9.7
+tfenv use 1.9.7
+terraform --version
+```
+For instance, the output on MacOS should be like:
```sh
-. scripts/common.sh;set_application_default_credentials $(pwd);set +o nounset;set +o errexit
+Terraform v1.9.7
+on darwin_amd64
```
## Create Terraform remote backend
@@ -61,16 +81,37 @@ terraform init:
terraform -chdir="${TERRAFORM_RUN_DIR}" init
```
-terraform apply:
+terraform plan:
```sh
-terraform -chdir="${TERRAFORM_RUN_DIR}" apply
+terraform -chdir="${TERRAFORM_RUN_DIR}" plan
```
-## Create Looker Studio Dashboard
-Extract the URL used to create the dashboard from the Terraform output value:
+terraform validate:
```sh
-echo "$(terraform -chdir=${TERRAFORM_RUN_DIR} output -raw lookerstudio_create_dashboard_url)"
+terraform -chdir="${TERRAFORM_RUN_DIR}" validate
```
-1. Click on the long URL from the command output. This will take you to the copy dashboard flow in Looker Studio.
-1. The copy may take a few moments to execute. If it does not, close the tab and try clicking the link again.
-1. Click on the button `Edit and share` to follow through and finish the copy process.
+If you run into errors, review and edit the configurations `${TERRAFORM_RUN_DIR}/terraform.tfvars` file. However, if there are still configurations errors, open a new [github issue](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/issues/).
+
+terraform apply:
+```sh
+terraform -chdir="${TERRAFORM_RUN_DIR}" apply
+```
+If you don't have a successful execution of certain resources, re-run `terraform -chdir="${TERRAFORM_RUN_DIR}" apply` a few more times until all is deployed successfully. However, if there are still resources not deployed, open a new [github issue](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/issues/).
+
+## Resources created
+
+At this time, the Terraform scripts in this folder perform the following tasks:
+
+- Enables the APIs needed
+- IAM bindings needed for the GCP services used
+- Secret in GCP Secret manager for the private GitHub repo
+- Dataform repository connected to the GitHub repo
+- Deploys the marketing data store (MDS), feature store, ML pipelines and activation application
+
+## Next Steps
+
+Follow the [post-installation guide](./POST-INSTALLATION.md) to start you daily operations.
+
+It is recommended to follow the post-installation guide before deploying the Looker Studio Dashboard, because you need the data and predictions tables to exist before consuming insights in your reports.
+
+**The Looker Studio Dashboard deployment is a separate [step](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/blob/main/python/lookerstudio/README.md).**
diff --git a/infrastructure/terraform/.terraform.lock.hcl b/infrastructure/terraform/.terraform.lock.hcl
index ece5c079..fb574eb5 100644
--- a/infrastructure/terraform/.terraform.lock.hcl
+++ b/infrastructure/terraform/.terraform.lock.hcl
@@ -2,145 +2,147 @@
# Manual edits may be lost in future updates.
provider "registry.terraform.io/hashicorp/archive" {
- version = "2.4.2"
+ version = "2.6.0"
hashes = [
- "h1:G4v6F6Lhqlo3EKGBKEK/kJRhNcQiRrhEdUiVpBHKHOA=",
- "h1:WfIjVbYA9s/uN2FwhGoiffT7CLFydy7MT1waFbt9YrY=",
- "zh:08faed7c9f42d82bc3d406d0d9d4971e2d1c2d34eae268ad211b8aca57b7f758",
- "zh:3564112ed2d097d7e0672378044a69b06642c326f6f1584d81c7cdd32ebf3a08",
- "zh:53cd9afd223c15828c1916e68cb728d2be1cbccb9545568d6c2b122d0bac5102",
- "zh:5ae4e41e3a1ce9d40b6458218a85bbde44f21723943982bca4a3b8bb7c103670",
- "zh:5b65499218b315b96e95c5d3463ea6d7c66245b59461217c99eaa1611891cd2c",
+ "h1:Ou6XKWvpo7IYgZnrFJs5MKzMqQMEYv8Z2iHSJ2mmnFw=",
+ "h1:rYAubRk7UHC/fzYqFV/VHc+7VIY01ugCxauyTYCNf9E=",
+ "zh:29273484f7423b7c5b3f5df34ccfc53e52bb5e3d7f46a81b65908e7a8fd69072",
+ "zh:3cba58ec3aea5f301caf2acc31e184c55d994cc648126cac39c63ae509a14179",
+ "zh:55170cd17dbfdea842852c6ae2416d057fec631ba49f3bb6466a7268cd39130e",
+ "zh:7197db402ba35631930c3a4814520f0ebe980ae3acb7f8b5a6f70ec90dc4a388",
"zh:78d5eefdd9e494defcb3c68d282b8f96630502cac21d1ea161f53cfe9bb483b3",
- "zh:7f45b35a8330bebd184c2545a41782ff58240ed6ba947274d9881dd5da44b02e",
- "zh:87e67891033214e55cfead1391d68e6a3bf37993b7607753237e82aa3250bb71",
- "zh:de3590d14037ad81fc5cedf7cfa44614a92452d7b39676289b704a962050bc5e",
- "zh:e7e6f2ea567f2dbb3baa81c6203be69f9cd6aeeb01204fd93e3cf181e099b610",
- "zh:fd24d03c89a7702628c2e5a3c732c0dede56fa75a08da4a1efe17b5f881c88e2",
- "zh:febf4b7b5f3ff2adff0573ef6361f09b6638105111644bdebc0e4f575373935f",
+ "zh:8bf7fe0915d7fb152a3a6b9162614d2ec82749a06dba13fab3f98d33c020ec4f",
+ "zh:8ce811844fd53adb0dabc9a541f8cb43aacfa7d8e39324e4bd3592b3428f5bfb",
+ "zh:bca795bca815b8ac90e3054c0a9ab1ccfb16eedbb3418f8ad473fc5ad6bf0ef7",
+ "zh:d9355a18df5a36cf19580748b23249de2eb445c231c36a353709f8f40a6c8432",
+ "zh:dc32cc32cfd8abf8752d34f2a783de0d3f7200c573b885ecb64ece5acea173b4",
+ "zh:ef498e20391bf7a280d0fd6fd6675621c85fbe4e92f0f517ae4394747db89bde",
+ "zh:f2bc5226c765b0c8055a7b6207d0fe1eb9484e3ec8880649d158827ac6ed3b22",
]
}
provider "registry.terraform.io/hashicorp/external" {
- version = "2.3.3"
+ version = "2.3.4"
constraints = ">= 2.2.2"
hashes = [
- "h1:H+3QlVPs/7CDa3I4KU/a23wYeGeJxeBlgvR7bfK1t1w=",
- "h1:Qi72kOSrEYgEt5itloFhDfmiFZ7wnRy3+F74XsRuUOw=",
- "zh:03d81462f9578ec91ce8e26f887e34151eda0e100f57e9772dbea86363588239",
- "zh:37ec2a20f6a3ec3a0fd95d3f3de26da6cb9534b30488bc45723e118a0911c0d8",
- "zh:4eb5b119179539f2749ce9de0e1b9629d025990f062f4f4dddc161562bb89d37",
- "zh:5a31bb58414f41bee5e09b939012df5b88654120b0238a89dfd6691ba197619a",
- "zh:6221a05e52a6a2d4f520ffe7cbc741f4f6080e0855061b0ed54e8be4a84eb9b7",
+ "h1:U6W8rgrdmR2pZ2cicFoGOSQ4GXuIf/4EK7s0vTJN7is=",
+ "h1:XWkRZOLKMjci9/JAtE8X8fWOt7A4u+9mgXSUjc4Wuyo=",
+ "zh:037fd82cd86227359bc010672cd174235e2d337601d4686f526d0f53c87447cb",
+ "zh:0ea1db63d6173d01f2fa8eb8989f0809a55135a0d8d424b08ba5dabad73095fa",
+ "zh:17a4d0a306566f2e45778fbac48744b6fd9c958aaa359e79f144c6358cb93af0",
+ "zh:298e5408ab17fd2e90d2cd6d406c6d02344fe610de5b7dae943a58b958e76691",
+ "zh:38ecfd29ee0785fd93164812dcbe0664ebbe5417473f3b2658087ca5a0286ecb",
+ "zh:59f6a6f31acf66f4ea3667a555a70eba5d406c6e6d93c2c641b81d63261eeace",
"zh:78d5eefdd9e494defcb3c68d282b8f96630502cac21d1ea161f53cfe9bb483b3",
- "zh:8bb068496b4679bef625e4710d9f3432e301c3a56602271f04e60eadf7f8a94c",
- "zh:94742aa5378bab626ce34f79bcef6a373e4f86ea7a8b762e9f71270a899e0d00",
- "zh:a485831b5a525cd8f40e8982fa37da40ff70b1ae092c8b755fcde123f0b1238d",
- "zh:a647ff16d071eabcabd87ea8183eb90a775a0294ddd735d742075d62fff09193",
- "zh:b74710c5954aaa3faf262c18d36a8c2407862d9f842c63e7fa92fa4de3d29df6",
- "zh:fa73d83edc92af2e551857594c2232ba6a9e3603ad34b0a5940865202c08d8d7",
+ "zh:ad0279dfd09d713db0c18469f585e58d04748ca72d9ada83883492e0dd13bd58",
+ "zh:c69f66fd21f5e2c8ecf7ca68d9091c40f19ad913aef21e3ce23836e91b8cbb5f",
+ "zh:d4a56f8c48aa86fc8e0c233d56850f5783f322d6336f3bf1916e293246b6b5d4",
+ "zh:f2b394ebd4af33f343835517e80fc876f79361f4688220833bc3c77655dd2202",
+ "zh:f31982f29f12834e5d21e010856eddd19d59cd8f449adf470655bfd19354377e",
]
}
provider "registry.terraform.io/hashicorp/google" {
- version = "4.85.0"
- constraints = ">= 3.43.0, >= 3.53.0, >= 3.63.0, >= 4.83.0, < 5.0.0, < 6.0.0"
+ version = "5.44.1"
+ constraints = ">= 3.43.0, >= 3.53.0, >= 4.83.0, >= 5.3.0, >= 5.22.0, 5.44.1, < 6.0.0, < 7.0.0"
hashes = [
- "h1:OVJ7KHmd+XnpxTIRwqwXKasUha9q1rxnq6m5iiETmTM=",
- "h1:aSRZcEKF2wOi/v24IA+k9J2Y7aKVV1cHi/R0V3EhxXQ=",
- "zh:17d60a6a6c1741cf1e09ac6731433a30950285eac88236e623ab4cbf23832ca3",
- "zh:1c70254c016439dbb75cab646b4beace6ceeff117c75d81f2cc27d41c312f752",
- "zh:35e2aa2cc7ac84ce55e05bb4de7b461b169d3582e56d3262e249ff09d64fe008",
- "zh:417afb08d7b2744429f6b76806f4134d62b0354acf98e8a6c00de3c24f2bb6ad",
- "zh:622165d09d21d9a922c86f1fc7177a400507f2a8c4a4513114407ae04da2dd29",
- "zh:7cdb8e39a8ea0939558d87d2cb6caceded9e21f21003d9e9f9ce648d5db0bc3a",
- "zh:851e737dc551d6004a860a8907fda65118fc2c7ede9fa828f7be704a2a39e68f",
- "zh:a331ad289a02a2c4473572a573dc389be0a604cdd9e03dd8dbc10297fb14f14d",
- "zh:b67fd531251380decd8dd1f849460d60f329f89df3d15f5815849a1dd001f430",
- "zh:be8785957acca4f97aa3e800b313b57d1fca07788761c8867c9bc701fbe0bdb5",
- "zh:cb6579a259fe020e1f88217d8f6937b2d5ace15b6406370977a1966eb31b1ca5",
+ "h1:/98txhnLWIG/xGcjMLPeRSJgqqlmCjBAvA/1BPgM+J8=",
+ "h1:CZ1vR4kLe+2PqoUwXhZROq2ufV60V92ObXAcl89HwZ8=",
+ "zh:25d4c2926f0e26f0736eb7913114d487212bdddbb636024ebf3ce74b1baad64b",
+ "zh:282da25af7332aba699f093ece72d6831dc96aba2c8ef1e1c461f47c03bd643b",
+ "zh:519e5939041003d935ca8b2cbd37962ebab0680eeaf19841160cda4bc2c8b860",
+ "zh:6a1b5f9d746d9c23a6bfcd50ed42a88612cbd17b60e8b298203cd87500b13fca",
+ "zh:8703797a82a700d3b739e1ae7d2bc541fe4aa55831d76a235c80dcdbf6e951b9",
+ "zh:9240c4b3e0946626f73ace992d8953c6e520c12b391c24d71b92cade55e3692d",
+ "zh:ab7e7f276efc2c20fbbe0cbb8c223420d3feea155d8e87a82cde0587f0ca97b3",
+ "zh:c2d88d26ddab980adafed9028fe3924613867a39fa7cf4325635f8ffffccf2dc",
+ "zh:cfb444355f8ae7844a3be616997181ec2fddebc28ff369557e1cb1d5239cb08a",
+ "zh:d70ed91db0b6850d7c9d5414512994d04a779de5a18bd51b1b5c09c9b64528cf",
+ "zh:ec302351a34341637eea1325aa6dd4ad09bd08084b2b2dabb481447b8b26967e",
"zh:f569b65999264a9416862bca5cd2a6177d94ccb0424f3a4ef424428912b9cb3c",
]
}
provider "registry.terraform.io/hashicorp/google-beta" {
- version = "4.85.0"
- constraints = ">= 3.43.0, >= 4.83.0, < 5.0.0, < 6.0.0"
+ version = "5.44.1"
+ constraints = ">= 3.43.0, >= 4.83.0, 5.44.1, < 6.0.0, < 7.0.0"
hashes = [
- "h1:YkCDGkP0AUZoNobLoxRnM52Pi4alYE9EFXalEu8p8E8=",
- "h1:fYXttGgML+C0aEl3rkOiAyo0UizA6rz7VsvNLTw678U=",
- "zh:40e9c7ec46955b4d79065a14185043a4ad6af8d0246715853fc5c99208b66980",
- "zh:5950a9ba2f96420ea5335b543e315b1a47a705f9a9abfc53c6fec52d084eddcb",
- "zh:5dfa98d32246a5d97e018f2b91b0e921cc6f061bc8591884f3b144f0d62f1c20",
- "zh:628d0ca35c6d4c35077859bb0a5534c1de44f23a91e190f9c3f06f2358172e75",
- "zh:6e78d54fd4de4151968149b4c3521f563a8b5c55aad423dba5968a9114b65ae4",
- "zh:91c3bc443188638353285bd35b06d3a3b39b42b3b4cc0637599a430438fba2f7",
- "zh:9e91b03363ebf39eea5ec0fbe7675f6979883aa9ad9a36664357d8513a007cf3",
- "zh:db9a8d6bfe075fb38c260986ab557d40e8d18e5698c62956a6da8120fae01d59",
- "zh:e41169c49f3bb53217905509e2ba8bb4680c373e1f54db7fac1b7f72943a1004",
- "zh:f32f55a8af605afbc940814e17493ac83d9d66cd6da9bbc247e0a833a0aa37ec",
+ "h1:4pdEV8QZSbjR0rRdpq8aU7DR+DurLv6xF70eHzkugkA=",
+ "h1:vzVQ2iIYox+sjIBjR+lyaf102430U+MoBAayeC468Y4=",
+ "zh:206f93a069dc6b28010e6f8e2f617d5a00b6dadf96291adf1ec88a2cfaa91ca8",
+ "zh:296471c122824c8d6d3597ad40f2716afce11b37af53a4a26e66c3b2e0e26586",
+ "zh:399244ebe27a60fa2cd78acb3911238eba926be8105d1caf06e604ca28ecfa12",
+ "zh:3f673c225af9119d51876ea20a5ce0cba31bb879b4b27462fa9efb33103a53a0",
+ "zh:57dd1e3406054660894df9b36060853b371299af27c2d78ad6789f0ca32a431b",
+ "zh:75e06806c3c82adfd569ee202d92c3add3e0822680e0d33153f34bb88f333a96",
+ "zh:84c8aeda08578669f1499b032bafbe54cf37a9f5785b18adc0d01189e7568a4b",
+ "zh:852ba977c1c947e49ac1743d5a1dfab127466775f2b7998f6e1a40ff966edc12",
+ "zh:cc33f33aeb61340309968ee675247c1b4eb7476a7204abf23e6ce43c183f8a59",
+ "zh:e8608e1c26790e321b8981362a62614e1ac14943684d1b459d932cbf75b7f9e7",
"zh:f569b65999264a9416862bca5cd2a6177d94ccb0424f3a4ef424428912b9cb3c",
- "zh:f6561a6badc3af842f9ad5bb926104954047f07cb90fadcca1357441cc67d91d",
+ "zh:f9dad1d0653d82aca6821064c6370452d3ba16e4cbfc6957e3d67b9266c89527",
]
}
provider "registry.terraform.io/hashicorp/local" {
- version = "2.5.1"
+ version = "2.5.2"
hashes = [
- "h1:8oTPe2VUL6E2d3OcrvqyjI4Nn/Y/UEQN26WLk5O/B0g=",
- "h1:tjcGlQAFA0kmQ4vKkIPPUC4it1UYxLbg4YvHOWRAJHA=",
- "zh:0af29ce2b7b5712319bf6424cb58d13b852bf9a777011a545fac99c7fdcdf561",
- "zh:126063ea0d79dad1f68fa4e4d556793c0108ce278034f101d1dbbb2463924561",
- "zh:196bfb49086f22fd4db46033e01655b0e5e036a5582d250412cc690fa7995de5",
- "zh:37c92ec084d059d37d6cffdb683ccf68e3a5f8d2eb69dd73c8e43ad003ef8d24",
- "zh:4269f01a98513651ad66763c16b268f4c2da76cc892ccfd54b401fff6cc11667",
- "zh:51904350b9c728f963eef0c28f1d43e73d010333133eb7f30999a8fb6a0cc3d8",
- "zh:73a66611359b83d0c3fcba2984610273f7954002febb8a57242bbb86d967b635",
+ "h1:JlMZD6nYqJ8sSrFfEAH0Vk/SL8WLZRmFaMUF9PJK5wM=",
+ "h1:p99F1AoV9z51aJ4EdItxz/vLwWIyhx/0Iw7L7sWSH1o=",
+ "zh:136299545178ce281c56f36965bf91c35407c11897f7082b3b983d86cb79b511",
+ "zh:3b4486858aa9cb8163378722b642c57c529b6c64bfbfc9461d940a84cd66ebea",
+ "zh:4855ee628ead847741aa4f4fc9bed50cfdbf197f2912775dd9fe7bc43fa077c0",
+ "zh:4b8cd2583d1edcac4011caafe8afb7a95e8110a607a1d5fb87d921178074a69b",
+ "zh:52084ddaff8c8cd3f9e7bcb7ce4dc1eab00602912c96da43c29b4762dc376038",
+ "zh:71562d330d3f92d79b2952ffdda0dad167e952e46200c767dd30c6af8d7c0ed3",
"zh:78d5eefdd9e494defcb3c68d282b8f96630502cac21d1ea161f53cfe9bb483b3",
- "zh:7ae387993a92bcc379063229b3cce8af7eaf082dd9306598fcd42352994d2de0",
- "zh:9e0f365f807b088646db6e4a8d4b188129d9ebdbcf2568c8ab33bddd1b82c867",
- "zh:b5263acbd8ae51c9cbffa79743fbcadcb7908057c87eb22fd9048268056efbc4",
- "zh:dfcd88ac5f13c0d04e24be00b686d069b4879cc4add1b7b1a8ae545783d97520",
+ "zh:805f81ade06ff68fa8b908d31892eaed5c180ae031c77ad35f82cb7a74b97cf4",
+ "zh:8b6b3ebeaaa8e38dd04e56996abe80db9be6f4c1df75ac3cccc77642899bd464",
+ "zh:ad07750576b99248037b897de71113cc19b1a8d0bc235eb99173cc83d0de3b1b",
+ "zh:b9f1c3bfadb74068f5c205292badb0661e17ac05eb23bfe8bd809691e4583d0e",
+ "zh:cc4cbcd67414fefb111c1bf7ab0bc4beb8c0b553d01719ad17de9a047adff4d1",
]
}
provider "registry.terraform.io/hashicorp/null" {
- version = "3.2.2"
+ version = "3.2.3"
+ constraints = ">= 2.1.0"
hashes = [
- "h1:vWAsYRd7MjYr3adj8BVKRohVfHpWQdvkIwUQ2Jf5FVM=",
- "h1:zT1ZbegaAYHwQa+QwIFugArWikRJI9dqohj8xb0GY88=",
- "zh:3248aae6a2198f3ec8394218d05bd5e42be59f43a3a7c0b71c66ec0df08b69e7",
- "zh:32b1aaa1c3013d33c245493f4a65465eab9436b454d250102729321a44c8ab9a",
- "zh:38eff7e470acb48f66380a73a5c7cdd76cc9b9c9ba9a7249c7991488abe22fe3",
- "zh:4c2f1faee67af104f5f9e711c4574ff4d298afaa8a420680b0cb55d7bbc65606",
- "zh:544b33b757c0b954dbb87db83a5ad921edd61f02f1dc86c6186a5ea86465b546",
- "zh:696cf785090e1e8cf1587499516b0494f47413b43cb99877ad97f5d0de3dc539",
- "zh:6e301f34757b5d265ae44467d95306d61bef5e41930be1365f5a8dcf80f59452",
+ "h1:+AnORRgFbRO6qqcfaQyeX80W0eX3VmjadjnUFUJTiXo=",
+ "h1:nKUqWEza6Lcv3xRlzeiRQrHtqvzX1BhIzjaOVXRYQXQ=",
+ "zh:22d062e5278d872fe7aed834f5577ba0a5afe34a3bdac2b81f828d8d3e6706d2",
+ "zh:23dead00493ad863729495dc212fd6c29b8293e707b055ce5ba21ee453ce552d",
+ "zh:28299accf21763ca1ca144d8f660688d7c2ad0b105b7202554ca60b02a3856d3",
+ "zh:55c9e8a9ac25a7652df8c51a8a9a422bd67d784061b1de2dc9fe6c3cb4e77f2f",
+ "zh:756586535d11698a216291c06b9ed8a5cc6a4ec43eee1ee09ecd5c6a9e297ac1",
"zh:78d5eefdd9e494defcb3c68d282b8f96630502cac21d1ea161f53cfe9bb483b3",
- "zh:913a929070c819e59e94bb37a2a253c228f83921136ff4a7aa1a178c7cce5422",
- "zh:aa9015926cd152425dbf86d1abdbc74bfe0e1ba3d26b3db35051d7b9ca9f72ae",
- "zh:bb04798b016e1e1d49bcc76d62c53b56c88c63d6f2dfe38821afef17c416a0e1",
- "zh:c23084e1b23577de22603cff752e59128d83cfecc2e6819edadd8cf7a10af11e",
+ "zh:9d5eea62fdb587eeb96a8c4d782459f4e6b73baeece4d04b4a40e44faaee9301",
+ "zh:a6355f596a3fb8fc85c2fb054ab14e722991533f87f928e7169a486462c74670",
+ "zh:b5a65a789cff4ada58a5baffc76cb9767dc26ec6b45c00d2ec8b1b027f6db4ed",
+ "zh:db5ab669cf11d0e9f81dc380a6fdfcac437aea3d69109c7aef1a5426639d2d65",
+ "zh:de655d251c470197bcbb5ac45d289595295acb8f829f6c781d4a75c8c8b7c7dd",
+ "zh:f5c68199f2e6076bce92a12230434782bf768103a427e9bb9abee99b116af7b5",
]
}
provider "registry.terraform.io/hashicorp/random" {
- version = "3.6.2"
+ version = "3.6.3"
+ constraints = ">= 2.1.0"
hashes = [
- "h1:R5qdQjKzOU16TziCN1vR3Exr/B+8WGK80glLTT4ZCPk=",
- "h1:wmG0QFjQ2OfyPy6BB7mQ57WtoZZGGV07uAPQeDmIrAE=",
- "zh:0ef01a4f81147b32c1bea3429974d4d104bbc4be2ba3cfa667031a8183ef88ec",
- "zh:1bcd2d8161e89e39886119965ef0f37fcce2da9c1aca34263dd3002ba05fcb53",
- "zh:37c75d15e9514556a5f4ed02e1548aaa95c0ecd6ff9af1119ac905144c70c114",
- "zh:4210550a767226976bc7e57d988b9ce48f4411fa8a60cd74a6b246baf7589dad",
- "zh:562007382520cd4baa7320f35e1370ffe84e46ed4e2071fdc7e4b1a9b1f8ae9b",
- "zh:5efb9da90f665e43f22c2e13e0ce48e86cae2d960aaf1abf721b497f32025916",
- "zh:6f71257a6b1218d02a573fc9bff0657410404fb2ef23bc66ae8cd968f98d5ff6",
+ "h1:Fnaec9vA8sZ8BXVlN3Xn9Jz3zghSETIKg7ch8oXhxno=",
+ "h1:f6jXn4MCv67kgcofx9D49qx1ZEBv8oyvwKDMPBr0A24=",
+ "zh:04ceb65210251339f07cd4611885d242cd4d0c7306e86dda9785396807c00451",
+ "zh:448f56199f3e99ff75d5c0afacae867ee795e4dfda6cb5f8e3b2a72ec3583dd8",
+ "zh:4b4c11ccfba7319e901df2dac836b1ae8f12185e37249e8d870ee10bb87a13fe",
+ "zh:4fa45c44c0de582c2edb8a2e054f55124520c16a39b2dfc0355929063b6395b1",
+ "zh:588508280501a06259e023b0695f6a18149a3816d259655c424d068982cbdd36",
+ "zh:737c4d99a87d2a4d1ac0a54a73d2cb62974ccb2edbd234f333abd079a32ebc9e",
"zh:78d5eefdd9e494defcb3c68d282b8f96630502cac21d1ea161f53cfe9bb483b3",
- "zh:9647e18f221380a85f2f0ab387c68fdafd58af6193a932417299cdcae4710150",
- "zh:bb6297ce412c3c2fa9fec726114e5e0508dd2638cad6a0cb433194930c97a544",
- "zh:f83e925ed73ff8a5ef6e3608ad9225baa5376446349572c2449c0c0b3cf184b7",
- "zh:fbef0781cb64de76b1df1ca11078aecba7800d82fd4a956302734999cfd9a4af",
+ "zh:a357ab512e5ebc6d1fda1382503109766e21bbfdfaa9ccda43d313c122069b30",
+ "zh:c51bfb15e7d52cc1a2eaec2a903ac2aff15d162c172b1b4c17675190e8147615",
+ "zh:e0951ee6fa9df90433728b96381fb867e3db98f66f735e0c3e24f8f16903f0ad",
+ "zh:e3cdcb4e73740621dabd82ee6a37d6cfce7fee2a03d8074df65086760f5cf556",
+ "zh:eff58323099f1bd9a0bec7cb04f717e7f1b2774c7d612bf7581797e1622613a0",
]
}
@@ -161,3 +163,23 @@ provider "registry.terraform.io/hashicorp/template" {
"zh:c979425ddb256511137ecd093e23283234da0154b7fa8b21c2687182d9aea8b2",
]
}
+
+provider "registry.terraform.io/hashicorp/time" {
+ version = "0.12.1"
+ hashes = [
+ "h1:6BhxSYBJdBBKyuqatOGkuPKVenfx6UmLdiI13Pb3his=",
+ "h1:j+ED7j0ZFJ4EDx7sdna76wsiIf397toylDN0dFi6v0U=",
+ "zh:090023137df8effe8804e81c65f636dadf8f9d35b79c3afff282d39367ba44b2",
+ "zh:26f1e458358ba55f6558613f1427dcfa6ae2be5119b722d0b3adb27cd001efea",
+ "zh:272ccc73a03384b72b964918c7afeb22c2e6be22460d92b150aaf28f29a7d511",
+ "zh:438b8c74f5ed62fe921bd1078abe628a6675e44912933100ea4fa26863e340e9",
+ "zh:78d5eefdd9e494defcb3c68d282b8f96630502cac21d1ea161f53cfe9bb483b3",
+ "zh:85c8bd8eefc4afc33445de2ee7fbf33a7807bc34eb3734b8eefa4e98e4cddf38",
+ "zh:98bbe309c9ff5b2352de6a047e0ec6c7e3764b4ed3dfd370839c4be2fbfff869",
+ "zh:9c7bf8c56da1b124e0e2f3210a1915e778bab2be924481af684695b52672891e",
+ "zh:d2200f7f6ab8ecb8373cda796b864ad4867f5c255cff9d3b032f666e4c78f625",
+ "zh:d8c7926feaddfdc08d5ebb41b03445166df8c125417b28d64712dccd9feef136",
+ "zh:e2412a192fc340c61b373d6c20c9d805d7d3dee6c720c34db23c2a8ff0abd71b",
+ "zh:e6ac6bba391afe728a099df344dbd6481425b06d61697522017b8f7a59957d44",
+ ]
+}
diff --git a/infrastructure/terraform/POST-INSTALLATION.md b/infrastructure/terraform/POST-INSTALLATION.md
new file mode 100644
index 00000000..a4445fb3
--- /dev/null
+++ b/infrastructure/terraform/POST-INSTALLATION.md
@@ -0,0 +1,196 @@
+
+## Post-Installation Guide
+
+Now that you have deployed all assets successfully in your Google Cloud Project, you may want to plan for operating the solution to be able to generate the predictions you need to create the audience segments you want for you Ads campaigns. To accomplish that, you gonna to plan a few things.
+
+First, you need to choose what kind of insight you are looking for to define the campaigns. Here are a few insights provided by each one of the use cases already provided to you:
+
+- **Aggregated Value Based Bidding ([value_based_bidding](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/blob/main/config/config.yaml.tftpl#L514))**: Attributes a numerical value to high value conversion events (user action) in relation to a target conversion event (typically purchase) so that Google Ads can improve the bidding strategy for users that reached these conversion events, as of now.
+- **Demographic Audience Segmentation ([audience_segmentation](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/blob/main/config/config.yaml.tftpl#L929))**: Attributes a cluster segment to an user using demographics data, including geographic location, device, traffic source and windowed user metrics looking XX days back.
+- **Interest based Audience Segmentation ([auto_audience_segmentation](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/blob/main/config/config.yaml.tftpl#L1018))**: Attributes a cluster segment to an user using pages navigations data looking XX days back, as of now.
+- **Purchase Propensity ([purchase_propensity](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/blob/main/config/config.yaml.tftpl#L629))**: Predicts a purchase propensity decile and a propensity score (likelihood between 0.0 - 0% and 1.0 - 100%) to an user using demographics data, including geographic location, device, traffic source and windowed user metrics looking XX days back to predict XX days ahead, as of now.
+- **Customer Lifetime Value ([customer_ltv](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/blob/main/config/config.yaml.tftpl#L1215))**: Predicts a lifetime value gain decile and a lifetime value revenue gain in USD (equal of bigger than 0.0) to an user using demographics data, including geographic location, device, traffic source and windowed user metrics looking XX-XXX days back to predict XX days ahead, as of now.
+- **Churn Propensity ([churn_propensity](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/blob/main/config/config.yaml.tftpl#L779))**: Predicts a churn propensity decile and a propensity score (likelihood between 0.0 - 0% and 1.0 - 100%) to an user using demographics data, including geographic location, device, traffic source and windowed user metrics looking XX days back to predict XX days ahead, as of now.
+
+Second, you need to measure how much data you are going to use to obtain the insights you need. Each one of the use cases above requires data in the following intervals, using as key metrics number of days and unique user events.
+
+- **Aggregated Value Based Bidding ([value_based_bidding](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/blob/main/config/config.yaml.tftpl#L1734))**: Minimum 30 days and maximum 1 year. The number of unique user events is not a key limitation. Note that you need at least 1000 training examples for the model to be trained successfully, to accomplish that we typically duplicate the rows until we have a minimum of 1000 rows in the training table for the "TRAIN" subset.
+- **Demographic Audience Segmentation ([audience_segmentation](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/blob/main/config/config.yaml.tftpl#L1779))**: Minimum 30 days and maximum 1 year. Minimum of 1000 unique user events per day. Note that you don't need more than 1M training examples for the model to perform well, make sure your training table doesn't contain more training examples than you need by applying exclusion clauses (i.e. WHERE, LIMIT clauses).
+- **Interest based Audience Segmentation ([auto_audience_segmentation](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/blob/main/config/config.yaml.tftpl#L1817))**: Minimum 30 days and maximum 1 year. Minimum of 1000 unique user events per day. Note that you don't need more than 1M training examples for the model to perform well, make sure your training table doesn't contain more training examples than you need by applying exclusion clauses (i.e. WHERE, LIMIT clauses).
+- **Purchase Propensity ([purchase_propensity](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/blob/main/config/config.yaml.tftpl#L1739))**: Minimum 90 days and maximum 2 years. Minimum of 1000 unique user events per day, of which a minimum of 1 target event per week. Note that you don't need more than 1M training examples for the model to perform well, make sure your training table doesn't contain more training examples than you need by applying exclusion clauses (i.e. WHERE, LIMIT clauses).
+- **Customer Lifetime Value ([customer_lifetime_value](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/blob/main/config/config.yaml.tftpl#L1798))**: Minimum 180 days and maximum 5 years. Minimum of 1000 unique user events per day, of which a minimum of 1 event per week that increases the lifetime value for an user. Note that you don't need more than 1M training examples for the model to perform well, make sure your training table doesn't contain more training examples than you need by applying exclusion clauses (i.e. WHERE, LIMIT clauses).
+- **Churn Propensity ([churn_propensity](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/blob/main/config/config.yaml.tftpl#L1758))**: Minimum 30 days and maximum 2 years. Minimum of 1000 unique user events per day, of which a minimum of 1 target event per week. Note that you don't need more than 1M training examples for the model to perform well, make sure your training table doesn't contain more training examples than you need by applying exclusion clauses (i.e. WHERE, LIMIT clauses).
+
+Third, the data must be processed by the Marketing Data Store; features must be prepared using the Feature Engineering procedure; and the training and inference pipelines must be triggered. For that, open your `config.yaml.tftpl` configuration file and check the `{pipeline-name}.execution.schedule` block to modify the scheduled time for each pipeline you are going to need to orchestrate that enables your use case. Here is a table of pipelines configuration you need to enable for every use case.
+
+| Use Case | Pipeline Configuration |
+| -------- | ---------------------- |
+| **Aggregated Value Based Bidding** | [feature-creation-aggregated-value-based-bidding](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/blob/main/config/config.yaml.tftpl#L473)
[value_based_bidding.training](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/blob/main/config/config.yaml.tftpl#L515)
[value_based_bidding.explanation](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/blob/main/config/config.yaml.tftpl#L591) |
+| **Demographic Audience Segmentation** | [feature-creation-audience-segmentation](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/blob/main/config/config.yaml.tftpl#L248)
[segmentation.training](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/blob/main/config/config.yaml.tftpl#L930)
[segmentation.prediction](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/blob/main/config/config.yaml.tftpl#L973) |
+| **Interest based Audience Segmentation** | [feature-creation-auto-audience-segmentation](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/blob/main/config/config.yaml.tftpl#L170)
[auto_segmentation.training](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/blob/main/config/config.yaml.tftpl#L1019)
[auto_segmentation.prediction](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/blob/main/config/config.yaml.tftpl#L1061) |
+| **Purchase Propensity** | [feature-creation-purchase-propensity](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/blob/main/config/config.yaml.tftpl#L315)
[purchase_propensity.training](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/blob/main/config/config.yaml.tftpl#L630)
[purchase_propensity.prediction](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/blob/main/config/config.yaml.tftpl#L725) |
+| **Customer Lifetime Value** | [feature-creation-customer-ltv](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/blob/main/config/config.yaml.tftpl#L419)
[propensity_clv.training](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/blob/main/config/config.yaml.tftpl#L1110)
[clv.training](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/blob/main/config/config.yaml.tftpl#L1221)
[clv.prediction](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/blob/main/config/config.yaml.tftpl#L1309) |
+| **Churn Propensity** | [feature-creation-churn-propensity](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/blob/main/config/config.yaml.tftpl#L370)
[churn_propensity.training](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/blob/main/config/config.yaml.tftpl#L780)
[churn_propensity.prediction](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/blob/main/config/config.yaml.tftpl#L875) |
+
+After you change these configurations, make sure you apply these changes in your deployed resources by re-running terraform.
+
+```bash
+terraform -chdir="${TERRAFORM_RUN_DIR}" apply
+```
+
+You can trigger your Cloud Workflow to execute your Dataform workflow at any time, or you can wait until the next day when the Cloud Workflow is going to be executed according to your schedule. There are two components in this solution that requires data for proper installation and functioning. One is the Looker Studio Dashboard, you only deploy the dashboard after you have executed all the steps in this Guide successfully. Another is the ML pipeline, the pipelines compilation requires views and tables to be created so that it can read their schema and define the column transformations to run during the pipeline execution.
+
+To manually start the data flow you must perform the following tasks:
+
+1. Run the Cloud Workflow
+
+ On the Google Cloud console, navigate to Workflows page. You will see a Workflow named `dataform-prod-incremental`, then under Actions, click on the three dots and `Execute` the Workflow.
+
+ **Note:** If you have a considerable amount of data (>XXX GBs of data) in your exported GA4 and Ads BigQuery datasets, it can take several minutes or hours to process all the data. Make sure that the processing has completed successfully before you continue to the next step.
+
+1. Invoke the BigQuery stored procedures having the prefix `invoke_backfill_*` to backfill the feature store in case the GA4 Export has been enabled before installing Marketing Analytics Jumpstart.
+
+ On the Google Cloud console, navigate to BigQuery page. On the query composer, run the following queries to invoke the stored procedures.
+ ```sql
+ ## There is no need to backfill the aggregated value based bidding features since there
+ ## is no aggregations performed before training. The transformation was applied in the
+ ## Marketing Data Store
+
+ ## Backfill customer ltv tables
+ CALL `feature_store.invoke_backfill_customer_lifetime_value_label`();
+ CALL `feature_store.invoke_backfill_user_lifetime_dimensions`();
+ CALL `feature_store.invoke_backfill_user_rolling_window_lifetime_metrics`();
+
+ ## Backfill purchase propensity tables
+ CALL `feature_store.invoke_backfill_user_dimensions`();
+ CALL `feature_store.invoke_backfill_user_rolling_window_metrics`();
+ CALL `feature_store.invoke_backfill_purchase_propensity_label`();
+
+ ## Backfill audience segmentation tables
+ CALL `feature_store.invoke_backfill_user_segmentation_dimensions`();
+ CALL `feature_store.invoke_backfill_user_lookback_metrics`();
+
+ ## There is no need to backfill the auto audience segmentation features since
+ ## they are dynamically prepared in the feature engineering pipeline using
+ ## python code
+
+ ## Backfill churn propensity tables
+ ## This use case reuses the user_dimensions and user_rolling_window_metrics,
+ ## make sure you invoke the backfill for these tables. CALLs are listed above
+ ## under backfill purchase propensity
+ CALL `feature_store.invoke_backfill_churn_propensity_label`();
+
+ ## Backfill for gemini insights
+ CALL `feature_store.invoke_backfill_user_scoped_metrics`();
+ CALL `gemini_insights.invoke_backfill_user_behaviour_revenue_insights`();
+ ```
+
+ **Note:** If you have a considerable amount of data (>XXX GBs of data) in your exported GA4 BigQuery datasets over the last six months, it can take several hours to backfill the feature data so that you can train your ML model. Make sure that the backfill procedures starts without errors before you continue to the next step.
+
+1. Check whether the feature store tables you have run backfill have rows in it.
+
+ On the Google Cloud console, navigate to BigQuery page. On the query composer, run the following queries to invoke the stored procedures.
+ ```sql
+ ## There are no tables used by the aggregated value based bidding use case
+ ## in the feature store.
+
+ ## Checking customer ltv tables are not empty
+ SELECT COUNT(user_pseudo_id) FROM `feature_store.customer_lifetime_value_label`;
+ SELECT COUNT(user_pseudo_id) FROM `feature_store.user_lifetime_dimensions`;
+ SELECT COUNT(user_pseudo_id) FROM `feature_store.user_rolling_window_lifetime_metrics`;
+
+ ## Checking purchase propensity tables are not empty
+ SELECT COUNT(user_pseudo_id) FROM `feature_store.user_dimensions`;
+ SELECT COUNT(user_pseudo_id) FROM `feature_store.user_rolling_window_metrics`;
+ SELECT COUNT(user_pseudo_id) FROM `feature_store.purchase_propensity_label`;
+
+ ## Checking audience segmentation tables are not empty
+ SELECT COUNT(user_pseudo_id) FROM `feature_store.user_segmentation_dimensions`;
+ SELECT COUNT(user_pseudo_id) FROM `feature_store.user_lookback_metrics`;
+
+ ## There are no tables used by the auto audience segmentation use case
+ ## in the feature store.
+
+ ## Checking churn propensity tables are not empty
+ ## This use case reuses the user_dimensions and user_rolling_window_metrics,
+ ## make sure you invoke the backfill for these tables. CALLs are listed above
+ ## under the instructions for backfill purchase propensity
+ SELECT COUNT(user_pseudo_id) FROM `feature_store.user_dimensions`;
+ SELECT COUNT(user_pseudo_id) FROM `feature_store.user_rolling_window_metrics`;
+ SELECT COUNT(user_pseudo_id) FROM `feature_store.churn_propensity_label`;
+
+ ## Checking gemini insights tables are not empty
+ SELECT COUNT(feature_date) FROM `feature_store.user_scoped_metrics`;
+ SELECT COUNT(feature_date) FROM `gemini_insights.user_behaviour_revenue_insights_daily`;
+ SELECT COUNT(feature_date) FROM `gemini_insights.user_behaviour_revenue_insights_weekly`;
+ SELECT COUNT(feature_date) FROM `gemini_insights.user_behaviour_revenue_insights_monthly`;
+ ```
+
+1. Redeploy the ML pipelines using Terraform.
+
+ On your code editor, change the variable `deploy_pipelines` from `true` to `false`, on the TF variables file `${TERRAFORM_RUN_DIR}/terraform.tfvars`.
+ Next, undeploy the ML pipelines component by applying the terraform configuration.
+
+ ```bash
+ terraform -chdir="${TERRAFORM_RUN_DIR}" apply
+ ```
+
+ Now, to deploy the ML pipelines component again, revert your changes on the TF variables file `${TERRAFORM_RUN_DIR}/terraform.tfvars` and apply the terraform configuration by running the commad above again.
+
+ **Note:** The training pipelines use schemas defined by a `custom_transformations` parameter in your `config.yaml` or by the training table/view schema itself.
+ So at first, during the first deployment i.e. `tf apply`, because the views are not created yet, we assume a fixed schema in case no `custom_transformations` parameter is provided.
+ Then, you need to redeploy to make sure that since all the table views exist now, redeploy the pipelines to make sure you fetch the right schema to be provided to the training pipelines.
+
+1. Once the feature store is populated and the pipelines are redeployed, manually invoke the BigQuery procedures for preparing the training datasets, which have the suffix `_training_preparation`.
+
+ On the Google Cloud console, navigate to BigQuery page. On the query composer, run the following queries to invoke the stored procedures.
+ ```sql
+ ## Training preparation for Aggregated Value Based Bidding
+ CALL `aggregated_vbb.invoke_aggregated_value_based_bidding_training_preparation`();
+
+ ## Training preparation for Customer Lifetime Value
+ CALL `customer_lifetime_value.invoke_customer_lifetime_value_training_preparation`();
+
+ ## Training preparation for Purchase Propensity
+ CALL `purchase_propensity.invoke_purchase_propensity_training_preparation`();
+
+ ## Training preparation for Audience Segmentation
+ CALL `audience_segmentation.invoke_audience_segmentation_training_preparation`();
+
+ ## Training preparation for Auto Audience Segmentation
+ CALL `auto_audience_segmentation.invoke_auto_audience_segmentation_training_preparation`();
+
+ ## Training preparation for Churn Propensity
+ CALL `churn_propensity.invoke_churn_propensity_training_preparation`();
+
+ ## There is no need to prepare training data for the gemini insights use case.
+ ## Gemini insights only require feature engineering the inference pipelines.
+ ## The gemini insights are saved in the gemini insights dataset, specified in the `config.yaml.tftpl` file.
+ ```
+
+1. Check whether the training preparation tables you have run the procedures above have rows in it.
+
+ On the Google Cloud console, navigate to BigQuery page. On the query composer, run the following queries to invoke the stored procedures.
+ ```sql
+ ## Checking aggregated value based bidding tables are not empty.
+ ## For training purposes, your dataset must always include at least 1,000 rows for tabular training data.
+ SELECT * FROM `aggregated_vbb.aggregated_value_based_bidding_training_full_dataset`;
+
+ ## Checking customer ltv tables are not empty
+ ## For training purposes, your dataset must always include at least 1,000 rows for tabular training data.
+ SELECT COUNT(user_pseudo_id) FROM `customer_lifetime_value.customer_lifetime_value_training_full_dataset`;
+
+ ## Checking purchase propensity tables are not empty
+ ## For training purposes, your dataset must always include at least 1,000 rows for tabular training data.
+ SELECT COUNT(user_pseudo_id) FROM `purchase_propensity.purchase_propensity_training_full_dataset`;
+
+ ## Checking audience segmentation tables are not empty
+ ## For training purposes, your dataset must always include at least 1,000 rows for tabular training data.
+ SELECT COUNT(user_pseudo_id) FROM `audience_segmentation.audience_segmentation_training_full_dataset`;
+
+ ## Checking churn propensity tables are not empty
+ ## For training purposes, your dataset must always include at least 1,000 rows for tabular training data.
+ SELECT COUNT(user_pseudo_id) FROM `churn_propensity.churn_propensity_training_full_dataset`;
+ ```
+
+Your Marketing Analytics Jumpstart solution is ready for daily operation. Plan for the days you want your model(s) to be trained, change the scheduler dates in the `config.yaml.tftpl` file or manually trigger training whenever you want. For more information, read the documentations in the [docs/ folder](../../docs/).
diff --git a/infrastructure/terraform/README.md b/infrastructure/terraform/README.md
index 793f9419..fcbb9246 100644
--- a/infrastructure/terraform/README.md
+++ b/infrastructure/terraform/README.md
@@ -1,11 +1,13 @@
-# Terraform Scripts
+# Manual Installation of Terraform Modules
-The Terraform scripts in this folder create the infrastructure to start data ingestion
-into BigQuery, create feature store, ML pipelines and Dataflow activation pipeline.
+The Terraform scripts in this folder and subfolders create the infrastructure to start data ingestion
+into BigQuery, create feature store, run ML pipelines and Dataflow activation application.
## Prerequisites
-Make sure the prerequisites listed in the [parent README](../README.md) are met. You can run the script
+Make sure the prerequisites listed in the [parent README](../README.md) are met.
+
+You can run the script
from [Cloud Shell](https://cloud.google.com/shell/docs/using-cloud-shelld.google.com/shell/docs/using-cloud-shell)
or a Linux machine or a Mac with `gcloud` command installed. The instructions provided are for the Cloud Shell
installation.
@@ -16,10 +18,11 @@ have plenty of disk space before continuing the installation.
If that is not your case, following the Cloud Shell documentation to [reset your Cloud Shell](https://cloud.google.com/shell/docs/resetting-cloud-shell).
-## Installation Guide
-Step by step installation guide with [![Open in Cloud Shell](https://gstatic.com/cloudssh/images/open-btn.svg)](https://shell.cloud.google.com/cloudshell/editor?cloudshell_git_repo=https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart.git&cloudshell_git_branch=main&cloudshell_workspace=&cloudshell_tutorial=infrastructure/cloudshell/tutorial.md)
+## Manual Installation Guide
+
+In this section, you find all the detailed steps required for you to manually install the Marketing Analytics Jumpstart solution. Following this process, you have greater flexibility and customization allowing you to choose which components of the solution you want to use or not.
-**Note:** If you are working from a forked repository, be sure to update the `cloudshell_git_repo` parameter to the URL of your forked repository for the button link above.
+Also, this method allows you to extend this solution and develop it to satisfy your own needs.
### Initial Environment Setup
@@ -40,51 +43,18 @@ Step by step installation guide with [![Open in Cloud Shell](https://gstatic.com
gcloud config set project $PROJECT_ID
```
-1. Install or update Python3
- Install a compatible version of Python 3.8-3.10 and set the CLOUDSDK_PYTHON environment variable to point to it.
+1. Install update uv for running python scripts
+ Install [uv](https://docs.astral.sh/uv/) that manages the python version and dependecies for the solution.
- ```bash
- sudo apt-get install python3.10
- CLOUDSDK_PYTHON=python3.10
+ ```sh
+ curl -LsSf https://astral.sh/uv/install.sh | sh
+ export PATH="$HOME/.local/bin:$PATH"
```
- If you are installing on a Mac:
- ```shell
- brew install python@3.10
- CLOUDSDK_PYTHON=python3.10
- ```
-1. Install Python's Poetry and set Poetry to use Python3.8-3.10 version
-
- [Poetry](https://python-poetry.org/docs/) is a Python's tool for dependency management and packaging.
-
- If you are installing on in Cloud Shell use the following commands:
- ```shell
- pipx install poetry
- ```
- If you don't have pipx installed - follow the [Pipx installation guide](https://pipx.pypa.io/stable/installation/)
- ```shell
- sudo apt update
- sudo apt install pipx
- pipx ensurepath
- ```
- Verify that `poetry` is on your $PATH variable:
- ```shell
- poetry --version
- ```
- If it fails - add it to your $PATH variable:
- ```shell
- export PATH="$HOME/.local/bin:$PATH"
- ```
- If you are installing on a Mac:
- ```shell
- brew install poetry
- ```
- Set poetry to use your latest python3
- ```shell
- SOURCE_ROOT=${HOME}/${REPO}
- cd ${SOURCE_ROOT}
- poetry env use python3
- ```
+ Check uv installation:
+ ```sh
+ uv --version
+ ```
1. Authenticate with additional OAuth 2.0 scopes needed to use the Google Analytics Admin API:
```shell
@@ -102,7 +72,7 @@ Step by step installation guide with [![Open in Cloud Shell](https://gstatic.com
1. Review your Terraform version
- Make sure you have installed terraform version is 1.5.7. We recommend you to use [tfenv](https://github.com/tfutils/tfenv) to manage your terraform version.
+ Make sure you have installed terraform version is 1.9.7. We recommend you to use [tfenv](https://github.com/tfutils/tfenv) to manage your terraform version.
`Tfenv` is a version manager inspired by rbenv, a Ruby programming language version manager.
To install `tfenv`, run the following commands:
@@ -112,14 +82,22 @@ Step by step installation guide with [![Open in Cloud Shell](https://gstatic.com
# Follow instructions on https://github.com/tfutils/tfenv
# Now, install the recommended terraform version
- tfenv install 1.5.7
- tfenv use 1.5.7
+ tfenv install 1.9.7
+ tfenv use 1.9.7
terraform --version
```
+ **Note:** If you have a Apple Silicon Macbook, you should install terraform by setting the `TFENV_ARCH` environment variable:
+ ```shell
+ TFENV_ARCH=amd64 tfenv install 1.9.7
+ tfenv use 1.9.7
+ terraform --version
+ ```
+ If not properly terraform version for your architecture is installed, `terraform .. init` will fail.
+
For instance, the output on MacOS should be like:
```shell
- Terraform v1.5.7
+ Terraform v1.9.7
on darwin_amd64
```
@@ -128,6 +106,7 @@ Step by step installation guide with [![Open in Cloud Shell](https://gstatic.com
Terraform stores state about managed infrastructure to map real-world resources to the configuration, keep track of metadata, and improve performance. Terraform stores this state in a local file by default, but you can also use a Terraform remote backend to store state remotely. [Remote state](https://developer.hashicorp.com/terraform/cdktf/concepts/remote-backends) makes it easier for teams to work together because all members have access to the latest state data in the remote store.
```bash
+ SOURCE_ROOT="${HOME}/${REPO}"
cd ${SOURCE_ROOT}
scripts/generate-tf-backend.sh
```
@@ -158,14 +137,23 @@ Step by step installation guide with [![Open in Cloud Shell](https://gstatic.com
or in multi-regions by assigning value such as
* `US` or `EU`
-1. Run Terraform to create resources:
+1. Run Terraform to initialize your environment, and validate if your configurations and variables are set as expected:
```bash
terraform -chdir="${TERRAFORM_RUN_DIR}" init
- terraform -chdir="${TERRAFORM_RUN_DIR}" apply
+ terraform -chdir="${TERRAFORM_RUN_DIR}" plan
+ terraform -chdir="${TERRAFORM_RUN_DIR}" validate
```
- If you don't have a successful execution from the beginning, re-run until all is deployed successfully.
+ If you run into errors, review and edit the `${TERRAFORM_RUN_DIR}/terraform.tfvars` file. However, if there are still configuration errors, open a new [github issue](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/issues/).
+
+1. Run Terraform to create resources:
+
+ ```bash
+ terraform -chdir="${TERRAFORM_RUN_DIR}" apply
+ ```
+
+ If you don't have a successful execution of certain resources, re-run `terraform -chdir="${TERRAFORM_RUN_DIR}" apply` a few more times until all is deployed successfully. However, if there are still resources not deployed, open a new [github issue](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/issues/).
### Resume terminal session
@@ -175,10 +163,10 @@ Because a Cloud Shell session is ephemeral, your Cloud Shell session could termi
Reset your Google Cloud Project ID variables:
- ```shell
- export PROJECT_ID="[your Google Cloud project id]"
- gcloud config set project $PROJECT_ID
- ```
+ ```bash
+ export PROJECT_ID="[your Google Cloud project id]"
+ gcloud config set project $PROJECT_ID
+ ```
Follow the authentication workflow, since your credentials expires daily:
@@ -213,198 +201,12 @@ At this time, the Terraform scripts in this folder perform the following tasks:
- Dataform repository connected to the GitHub repo
- Deploys the marketing data store (MDS), feature store, ML pipelines and activation application
-The Looker Studio Dashboard deployment is a separate [step](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/blob/main/python/lookerstudio/README.md).
+## Next Steps
-## Post-Installation Instructions
+Follow the [post-installation guide](./POST-INSTALLATION.md) to start you daily operations.
-Now that you have deployed all assets successfully in your Google Cloud Project, you may want to plan for operating the solution to be able to generate the predictions you need to create the audience segments you want for you Ads campaigns. To accomplish that, you gonna to plan a few things.
+It is recommended to follow the post-installation guide before deploying the Looker Studio Dashboard, because you need the data and predictions tables to exist before consuming insights in your reports.
-First, you need to choose what kind of insight you are looking for to define the campaigns. Here are a few insights provided by each one of the use cases already provided to you:
+**The Looker Studio Dashboard deployment is a separate [step](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/blob/main/python/lookerstudio/README.md).**
-- **Aggregated Value Based Bidding**: Attributes a numerical value to high value conversion events (user action) in relation to a target conversion event (typically purchase) so that Google Ads can improve the bidding strategy for users that reached these conversion events, as of now.
-- **Demographic Audience Segmentation**: Attributes a cluster segment to an user using demographics data, including geographic location, device, traffic source and windowed user metrics looking XX days back.
-- **Interest based Audience Segmentation**: Attributes a cluster segment to an user using pages navigations data looking XX days back, as of now.
-- **Purchase Propensity**: Predicts a purchase propensity decile and a propensity score (likelihood between 0.0 - 0% and 1.0 - 100%) to an user using demographics data, including geographic location, device, traffic source and windowed user metrics looking XX days back to predict XX days ahead, as of now.
-- **Customer Lifetime Value**: Predicts a lifetime value gain decile and a lifetime value revenue gain in USD (equal of bigger than 0.0) to an user using demographics data, including geographic location, device, traffic source and windowed user metrics looking XX-XXX days back to predict XX days ahead, as of now.
-- **Churn Propensity**: Predicts a churn propensity decile and a propensity score (likelihood between 0.0 - 0% and 1.0 - 100%) to an user using demographics data, including geographic location, device, traffic source and windowed user metrics looking XX days back to predict XX days ahead, as of now.
-
-Second, you need to measure how much data you are going to use to obtain the insights you need. Each one of the use cases above requires data in the following intervals, using as key metrics number of days and unique user events.
-
-- **Aggregated Value Based Bidding**: Minimum 30 days and maximum 1 year. The number of unique user events is not a key limitation. Note that you need at least 1000 training examples for the model to be trained successfully, to accomplish that we typically duplicate the rows until we have a minimum of 1000 rows in the training table for the "TRAIN" subset.
-- **Demographic Audience Segmentation**: Minimum 30 days and maximum 1 year. Minimum of 1000 unique user events per day. Note that you don't need more than 1M training examples for the model to perform well, make sure your training table doesn't contain more training examples than you need by applying exclusion clauses (i.e. WHERE, LIMIT clauses).
-- **Interest based Audience Segmentation**: Minimum 30 days and maximum 1 year. Minimum of 1000 unique user events per day. Note that you don't need more than 1M training examples for the model to perform well, make sure your training table doesn't contain more training examples than you need by applying exclusion clauses (i.e. WHERE, LIMIT clauses).
-- **Purchase Propensity**: Minimum 90 days and maximum 2 years. Minimum of 1000 unique user events per day, of which a minimum of 1 target event per week. Note that you don't need more than 1M training examples for the model to perform well, make sure your training table doesn't contain more training examples than you need by applying exclusion clauses (i.e. WHERE, LIMIT clauses).
-- **Customer Lifetime Value**: Minimum 180 days and maximum 5 years. Minimum of 1000 unique user events per day, of which a minimum of 1 event per week that increases the lifetime value for an user. Note that you don't need more than 1M training examples for the model to perform well, make sure your training table doesn't contain more training examples than you need by applying exclusion clauses (i.e. WHERE, LIMIT clauses).
-- **Churn Propensity**: Minimum 30 days and maximum 2 years. Minimum of 1000 unique user events per day, of which a minimum of 1 target event per week. Note that you don't need more than 1M training examples for the model to perform well, make sure your training table doesn't contain more training examples than you need by applying exclusion clauses (i.e. WHERE, LIMIT clauses).
-
-Third, the data must be processed by the Marketing Data Store; features must be prepared using the Feature Engineering procedure; and the training and inference pipelines must be triggered. For that, open your `config.yaml.tftpl` configuration file and check the `{pipeline-name}.execution.schedule` block to modify the scheduled time for each pipeline you gonna need to orchestrate that enables your use case. Here is a list of pipelines you need for every use case.
-
-- **Aggregated Value Based Bidding**: `feature-creation-aggregated-value-based-bidding`, `value_based_bidding.training`, `value_based_bidding.explanation`
-- **Demographic Audience Segmentation**: `feature-creation-audience-segmentation`, `segmentation.training`, `segmentation.prediction`
-- **Interest based Audience Segmentation**: `feature-creation-auto-audience-segmentation`, `auto_segmentation.training`, `auto_segmentation.prediction`
-- **Purchase Propensity**: `feature-creation-purchase-propensity`, `purchase_propensity.training`, `purchase_propensity.prediction`
-- **Customer Lifetime Value**: `feature-creation-customer-ltv`, `propensity_clv.training`, `clv.training`, `clv.prediction`
-- **Churn Propensity**: `feature-creation-churn-propensity`, `churn_propensity.training`, `churn_propensity.prediction`
-
-After you change these configurations, make sure you apply these changes in your deployed resources by re-running terraform.
-
-```bash
-terraform -chdir="${TERRAFORM_RUN_DIR}" apply
-```
-
-You can trigger your Cloud Workflow to execute your Dataform workflow at any time, or you can wait until the next day when the Cloud Workflow is going to be executed according to your schedule. There are two components in this solution that requires data for proper installation and functioning. One is the Looker Studio Dashboard, you only deploy the dashboard after you have executed all the steps in this Guide successfully. Another is the ML pipeline, the pipelines compilation requires views and tables to be created so that it can read their schema and define the column transformations to run during the pipeline execution.
-
-To manually start the data flow you must perform the following tasks:
-
-1. Run the Cloud Workflow
-
- On the Google Cloud console, navigate to Workflows page. You will see a Workflow named `dataform-prod-incremental`, then under Actions, click on the three dots and `Execute` the Workflow.
-
- **Note:** If you have a considerable amount of data (>XXX GBs of data) in your exported GA4 and Ads BigQuery datasets, it can take several minutes or hours to process all the data. Make sure that the processing has completed successfully before you continue to the next step.
-
-1. Invoke the BigQuery stored procedures having the prefix `invoke_backfill_*` to backfill the feature store in case the GA4 Export has been enabled before installing Marketing Analytics Jumpstart.
-
- On the Google Cloud console, navigate to BigQuery page. On the query composer, run the following queries to invoke the stored procedures.
- ```sql
- ## There is no need to backfill the aggregated value based bidding features since there
- ## is no aggregations performed before training. The transformation was applied in the
- ## Marketing Data Store
-
- ## Backfill customer ltv tables
- CALL `feature_store.invoke_backfill_customer_lifetime_value_label`();
- CALL `feature_store.invoke_backfill_user_lifetime_dimensions`();
- CALL `feature_store.invoke_backfill_user_rolling_window_lifetime_metrics`();
-
- ## Backfill purchase propensity tables
- CALL `feature_store.invoke_backfill_user_dimensions`();
- CALL `feature_store.invoke_backfill_user_rolling_window_metrics`();
- CALL `feature_store.invoke_backfill_purchase_propensity_label`();
-
- ## Backfill audience segmentation tables
- CALL `feature_store.invoke_backfill_user_segmentation_dimensions`();
- CALL `feature_store.invoke_backfill_user_lookback_metrics`();
-
- ## There is no need to backfill the auto audience segmentation features since
- ## they are dynamically prepared in the feature engineering pipeline using
- ## python code
-
- ## Backfill churn propensity tables
- ## This use case reuses the user_dimensions and user_rolling_window_metrics,
- ## make sure you invoke the backfill for these tables. CALLs are listed above
- ## under backfill purchase propensity
- CALL `feature_store.invoke_backfill_churn_propensity_label`();
-
- ## Backfill for gemini insights
- CALL `feature_store.invoke_backfill_user_scoped_metrics`();
- CALL `gemini_insights.invoke_backfill_user_behaviour_revenue_insights`();
- ```
-
- **Note:** If you have a considerable amount of data (>XXX GBs of data) in your exported GA4 BigQuery datasets over the last six months, it can take several hours to backfill the feature data so that you can train your ML model. Make sure that the backfill procedures starts without errors before you continue to the next step.
-
-1. Check whether the feature store tables you have run backfill have rows in it.
-
- On the Google Cloud console, navigate to BigQuery page. On the query composer, run the following queries to invoke the stored procedures.
- ```sql
- ## There are no tables used by the aggregated value based bidding use case
- ## in the feature store.
-
- ## Checking customer ltv tables are not empty
- SELECT COUNT(user_pseudo_id) FROM `feature_store.customer_lifetime_value_label`;
- SELECT COUNT(user_pseudo_id) FROM `feature_store.user_lifetime_dimensions`;
- SELECT COUNT(user_pseudo_id) FROM `feature_store.user_rolling_window_lifetime_metrics`;
-
- ## Checking purchase propensity tables are not empty
- SELECT COUNT(user_pseudo_id) FROM `feature_store.user_dimensions`;
- SELECT COUNT(user_pseudo_id) FROM `feature_store.user_rolling_window_metrics`;
- SELECT COUNT(user_pseudo_id) FROM `feature_store.purchase_propensity_label`;
-
- ## Checking audience segmentation tables are not empty
- SELECT COUNT(user_pseudo_id) FROM `feature_store.user_segmentation_dimensions`;
- SELECT COUNT(user_pseudo_id) FROM `feature_store.user_lookback_metrics`;
-
- ## There are no tables used by the auto audience segmentation use case
- ## in the feature store.
-
- ## Checking churn propensity tables are not empty
- ## This use case reuses the user_dimensions and user_rolling_window_metrics,
- ## make sure you invoke the backfill for these tables. CALLs are listed above
- ## under the instructions for backfill purchase propensity
- SELECT COUNT(user_pseudo_id) FROM `feature_store.user_dimensions`;
- SELECT COUNT(user_pseudo_id) FROM `feature_store.user_rolling_window_metrics`;
- SELECT COUNT(user_pseudo_id) FROM `feature_store.churn_propensity_label`;
-
- ## Checking gemini insights tables are not empty
- SELECT COUNT(feature_date) FROM `feature_store.user_scoped_metrics`;
- SELECT COUNT(feature_date) FROM `gemini_insights.user_behaviour_revenue_insights_daily`;
- SELECT COUNT(feature_date) FROM `gemini_insights.user_behaviour_revenue_insights_weekly`;
- SELECT COUNT(feature_date) FROM `gemini_insights.user_behaviour_revenue_insights_monthly`;
- ```
-
-1. Redeploy the ML pipelines using Terraform.
-
- On your code editor, change the variable `deploy_pipelines` from `true` to `false`, on the TF variables file `${TERRAFORM_RUN_DIR}/terraform.tfvars`.
- Next, undeploy the ML pipelines component by applying the terraform configuration.
-
- ```bash
- terraform -chdir="${TERRAFORM_RUN_DIR}" apply
- ```
-
- Now, to deploy the ML pipelines component again, revert your changes on the TF variables file `${TERRAFORM_RUN_DIR}/terraform.tfvars` and apply the terraform configuration by running the commad above again.
-
- **Note:** The training pipelines use schemas defined by a `custom_transformations` parameter in your `config.yaml` or by the training table/view schema itself.
- So at first, during the first deployment i.e. `tf apply`, because the views are not created yet, we assume a fixed schema in case no `custom_transformations` parameter is provided.
- Then, you need to redeploy to make sure that since all the table views exist now, redeploy the pipelines to make sure you fetch the right schema to be provided to the training pipelines.
-
-1. Once the feature store is populated and the pipelines are redeployed, manually invoke the BigQuery procedures for preparing the training datasets, which have the suffix `_training_preparation`.
-
- On the Google Cloud console, navigate to BigQuery page. On the query composer, run the following queries to invoke the stored procedures.
- ```sql
- ## Training preparation for Aggregated Value Based Bidding
- CALL `aggregated_vbb.invoke_aggregated_value_based_bidding_training_preparation`();
-
- ## Training preparation for Customer Lifetime Value
- CALL `customer_lifetime_value.invoke_customer_lifetime_value_training_preparation`();
-
- ## Training preparation for Purchase Propensity
- CALL `purchase_propensity.invoke_purchase_propensity_training_preparation`();
-
- ## Training preparation for Audience Segmentation
- CALL `audience_segmentation.invoke_audience_segmentation_training_preparation`();
-
- ## Training preparation for Auto Audience Segmentation
- CALL `auto_audience_segmentation.invoke_auto_audience_segmentation_training_preparation`();
-
- ## Training preparation for Churn Propensity
- CALL `churn_propensity.invoke_churn_propensity_training_preparation`();
-
- ## There is no need to prepare training data for the gemini insights use case.
- ## Gemini insights only require feature engineering the inference pipelines.
- ## The gemini insights are saved in the gemini insights dataset, specified in the `config.yaml.tftpl` file.
- ```
-
-1. Check whether the training preparation tables you have run the procedures above have rows in it.
-
- On the Google Cloud console, navigate to BigQuery page. On the query composer, run the following queries to invoke the stored procedures.
- ```sql
- ## Checking aggregated value based bidding tables are not empty.
- ## For training purposes, your dataset must always include at least 1,000 rows for tabular training data.
- SELECT * FROM `aggregated_vbb.aggregated_value_based_bidding_training_full_dataset`;
-
- ## Checking customer ltv tables are not empty
- ## For training purposes, your dataset must always include at least 1,000 rows for tabular training data.
- SELECT COUNT(user_pseudo_id) FROM `customer_lifetime_value.customer_lifetime_value_training_full_dataset`;
-
- ## Checking purchase propensity tables are not empty
- ## For training purposes, your dataset must always include at least 1,000 rows for tabular training data.
- SELECT COUNT(user_pseudo_id) FROM `purchase_propensity.purchase_propensity_training_full_dataset`;
-
- ## Checking audience segmentation tables are not empty
- ## For training purposes, your dataset must always include at least 1,000 rows for tabular training data.
- SELECT COUNT(user_pseudo_id) FROM `audience_segmentation.audience_segmentation_training_full_dataset`;
-
- ## Checking churn propensity tables are not empty
- ## For training purposes, your dataset must always include at least 1,000 rows for tabular training data.
- SELECT COUNT(user_pseudo_id) FROM `churn_propensity.churn_propensity_training_full_dataset`;
- ```
-Your Marketing Analytics Jumpstart solution is ready for daily operation. Plan for the days you want your model(s) to be trained, change the scheduler dates in the `config.yaml.tftpl` file or manually trigger training whenever you want. For more information, read the documentations in the [docs/ folder](../../docs/).
diff --git a/infrastructure/terraform/main.tf b/infrastructure/terraform/main.tf
index e2186b06..3129d3ad 100644
--- a/infrastructure/terraform/main.tf
+++ b/infrastructure/terraform/main.tf
@@ -43,22 +43,22 @@ provider "google" {
}
data "google_project" "feature_store_project" {
- provider = google
+ provider = google
project_id = var.feature_store_project_id
}
data "google_project" "activation_project" {
- provider = google
+ provider = google
project_id = var.activation_project_id
}
data "google_project" "data_processing_project" {
- provider = google
+ provider = google
project_id = var.data_processing_project_id
}
data "google_project" "data_project" {
- provider = google
+ provider = google
project_id = var.data_project_id
}
@@ -66,30 +66,30 @@ data "google_project" "data_project" {
# The locals block is used to define variables that are used in the configuration.
locals {
# The source_root_dir is the root directory of the project.
- source_root_dir = "../.."
+ source_root_dir = "../.."
# The config_file_name is the name of the config file.
- config_file_name = "config"
- # The poetry_run_alias is the alias of the poetry command.
- poetry_run_alias = "${var.poetry_cmd} run"
+ config_file_name = "config"
+ # The uv_run_alias is the alias of the uv run command.
+ uv_run_alias = "${var.uv_cmd} run"
# The mds_dataset_suffix is the suffix of the marketing data store dataset.
- mds_dataset_suffix = var.create_staging_environment ? "staging" : var.create_dev_environment ? "dev" : "prod"
+ mds_dataset_suffix = var.property_id
# The project_toml_file_path is the path to the project.toml file.
- project_toml_file_path = "${local.source_root_dir}/pyproject.toml"
+ project_toml_file_path = "${local.source_root_dir}/pyproject.toml"
# The project_toml_content_hash is the hash of the project.toml file.
# This is used for the triggers of the local-exec provisioner.
project_toml_content_hash = filesha512(local.project_toml_file_path)
# The generated_sql_queries_directory_path is the path to the generated sql queries directory.
generated_sql_queries_directory_path = "${local.source_root_dir}/sql/query"
# The generated_sql_queries_fileset is the list of files in the generated sql queries directory.
- generated_sql_queries_fileset = [for f in fileset(local.generated_sql_queries_directory_path, "*.sqlx") : "${local.generated_sql_queries_directory_path}/${f}"]
+ generated_sql_queries_fileset = [for f in fileset(local.generated_sql_queries_directory_path, "*.sqlx") : "${local.generated_sql_queries_directory_path}/${f}"]
# The generated_sql_queries_content_hash is the sha512 hash of file sha512 hashes in the generated sql queries directory.
- generated_sql_queries_content_hash = sha512(join("", [for f in local.generated_sql_queries_fileset : fileexists(f) ? filesha512(f) : sha512("file-not-found")]))
+ generated_sql_queries_content_hash = sha512(join("", [for f in local.generated_sql_queries_fileset : fileexists(f) ? filesha512(f) : sha512("file-not-found")]))
# The generated_sql_procedures_directory_path is the path to the generated sql procedures directory.
generated_sql_procedures_directory_path = "${local.source_root_dir}/sql/procedure"
# The generated_sql_procedures_fileset is the list of files in the generated sql procedures directory.
- generated_sql_procedures_fileset = [for f in fileset(local.generated_sql_procedures_directory_path, "*.sqlx") : "${local.generated_sql_procedures_directory_path}/${f}"]
+ generated_sql_procedures_fileset = [for f in fileset(local.generated_sql_procedures_directory_path, "*.sqlx") : "${local.generated_sql_procedures_directory_path}/${f}"]
# The generated_sql_procedures_content_hash is the sha512 hash of file sha512 hashes in the generated sql procedures directory.
- generated_sql_procedures_content_hash = sha512(join("", [for f in local.generated_sql_procedures_fileset : fileexists(f) ? filesha512(f) : sha512("file-not-found")]))
+ generated_sql_procedures_content_hash = sha512(join("", [for f in local.generated_sql_procedures_fileset : fileexists(f) ? filesha512(f) : sha512("file-not-found")]))
}
@@ -122,43 +122,30 @@ resource "local_file" "feature_store_configuration" {
pipelines_github_owner = var.pipelines_github_owner
pipelines_github_repo = var.pipelines_github_repo
# TODO: this needs to be specific to environment.
- location = var.destination_data_location
+ location = var.destination_data_location
+ time_zone = var.time_zone
+ pipeline_configuration = var.pipeline_configuration
+ non_ecomm_events_list = var.non_ecomm_events_list
+ non_ecomm_target_event = var.non_ecomm_target_event
})
}
-# Runs the poetry command to install the dependencies.
-# The command is: poetry install
-resource "null_resource" "poetry_install" {
- triggers = {
- create_command = "${var.poetry_cmd} lock && ${var.poetry_cmd} install"
- source_contents_hash = local.project_toml_content_hash
- }
-
- # Only run the command when `terraform apply` executes and the resource doesn't exist.
- provisioner "local-exec" {
- when = create
- command = self.triggers.create_command
- working_dir = local.source_root_dir
- }
-}
-
data "external" "check_ga4_property_type" {
- program = ["bash", "-c", "${local.poetry_run_alias} ga4-setup --ga4_resource=check_property_type --ga4_property_id=${var.ga4_property_id} --ga4_stream_id=${var.ga4_stream_id}"]
+ program = ["bash", "-c", "${local.uv_run_alias} ga4-setup --ga4_resource=check_property_type --ga4_property_id=${var.ga4_property_id} --ga4_stream_id=${var.ga4_stream_id}"]
working_dir = local.source_root_dir
- depends_on = [null_resource.poetry_install]
}
-# Runs the poetry invoke command to generate the sql queries and procedures.
+# Runs the uv invoke command to generate the sql queries and procedures.
# This command is executed before the feature store is created.
resource "null_resource" "generate_sql_queries" {
triggers = {
# The create command generates the sql queries and procedures.
- # The command is: poetry inv [function_name] --env-name=${local.config_file_name}
+ # The command is: uv inv [function_name] --env-name=${local.config_file_name}
# The --env-name argument is the name of the configuration file.
create_command = <<-EOT
- ${local.poetry_run_alias} inv apply-config-parameters-to-all-queries --env-name=${local.config_file_name}
- ${local.poetry_run_alias} inv apply-config-parameters-to-all-procedures --env-name=${local.config_file_name}
+ ${local.uv_run_alias} inv apply-config-parameters-to-all-queries --env-name=${local.config_file_name}
+ ${local.uv_run_alias} inv apply-config-parameters-to-all-procedures --env-name=${local.config_file_name}
EOT
# The destroy command removes the generated sql queries and procedures.
@@ -170,10 +157,6 @@ resource "null_resource" "generate_sql_queries" {
# The working directory is the root of the project.
working_dir = local.source_root_dir
- # The poetry_installed trigger is the ID of the null_resource.poetry_install resource.
- # This is used to ensure that the poetry command is run before the generate_sql_queries command.
- poetry_installed = null_resource.poetry_install.id
-
# The source_contents_hash trigger is the hash of the project.toml file.
# This is used to ensure that the generate_sql_queries command is run only if the project.toml file has changed.
# It also ensures that the generate_sql_queries command is run only if the sql queries and procedures have changed.
@@ -207,7 +190,7 @@ resource "null_resource" "generate_sql_queries" {
module "initial_project_services" {
source = "terraform-google-modules/project-factory/google//modules/project_services"
- version = "14.1.0"
+ version = "17.0.0"
disable_dependent_services = false
disable_services_on_destroy = false
@@ -304,8 +287,7 @@ resource "null_resource" "check_iam_api" {
# Create the data store module.
# The data store module creates the marketing data store in BigQuery, creates the ETL pipeline in Dataform
# for the marketing data from Google Ads and Google Analytics.
-# The data store is created only if the `create_prod_environment`, `create_staging_environment`
-# or `create_dev_environment` variable is set to true in the terraform.tfvars file.
+# The data store is created only if the `deploy_dataform` variable is set to true in the terraform.tfvars file.
# The data store is created in the `data_project_id` project.
module "data_store" {
# The source directory of the data store module.
@@ -315,14 +297,14 @@ module "data_store" {
google_default_region = var.google_default_region
# The dataform_region is set in the terraform.tfvars file. Its default value is "us-central1".
- dataform_region = var.dataform_region
+ dataform_region = var.dataform_region
# The source_ga4_export_project_id is set in the terraform.tfvars file.
# The source_ga4_export_dataset is set in the terraform.tfvars file.
# The source_ads_export_data is set in the terraform.tfvars file.
- source_ga4_export_project_id = var.source_ga4_export_project_id
- source_ga4_export_dataset = var.source_ga4_export_dataset
- source_ads_export_data = var.source_ads_export_data
+ source_ga4_export_project_id = var.source_ga4_export_project_id
+ source_ga4_export_dataset = var.source_ga4_export_dataset
+ source_ads_export_data = var.source_ads_export_data
ga4_incremental_processing_days_back = var.ga4_incremental_processing_days_back
# The data_processing_project_id is set in the terraform.tfvars file.
@@ -331,24 +313,16 @@ module "data_store" {
data_processing_project_id = var.data_processing_project_id
data_project_id = var.data_project_id
destination_data_location = var.destination_data_location
-
+
# The dataform_github_repo is set in the terraform.tfvars file.
# The dataform_github_token is set in the terraform.tfvars file.
dataform_github_repo = var.dataform_github_repo
dataform_github_token = var.dataform_github_token
- # The create_dev_environment is set in the terraform.tfvars file.
- # The create_dev_environment determines if the dev environment is created.
- # When the value is true, the dev environment is created.
- # The create_staging_environment is set in the terraform.tfvars file.
- # The create_staging_environment determines if the staging environment is created.
- # When the value is true, the staging environment is created.
- # The create_prod_environment is set in the terraform.tfvars file.
- # The create_prod_environment determines if the prod environment is created.
- # When the value is true, the prod environment is created.
- create_dev_environment = var.create_dev_environment
- create_staging_environment = var.create_staging_environment
- create_prod_environment = var.create_prod_environment
+ # The create_dataform determines if dataform is created.
+ # When the value is true, the dataform environment is created.
+ deploy_dataform = var.deploy_dataform
+ property_id = var.property_id
# The dev_data_project_id is the project ID of where the dev datasets will created.
#If not provided, data_project_id will be used.
@@ -374,6 +348,9 @@ module "data_store" {
# The project_owner_email is set in the terraform.tfvars file.
# An example of a valid email address is "william.mckinley@my-own-personal-domain.com".
project_owner_email = var.project_owner_email
+
+ # Set the time zone for the scheduled jobs
+ time_zone = var.time_zone
}
@@ -391,14 +368,14 @@ module "feature_store" {
# If the count is 1, the feature store is created.
# If the count is 0, the feature store is not created.
# This is done to avoid creating the feature store if the `deploy_feature_store` variable is set to false in the terraform.tfvars file.
- count = var.deploy_feature_store ? 1 : 0
- project_id = var.feature_store_project_id
+ count = var.deploy_feature_store ? 1 : 0
+ project_id = var.feature_store_project_id
# The region is the region in which the feature store is created.
# This is set to the default region in the terraform.tfvars file.
- region = var.google_default_region
+ region = var.google_default_region
# The sql_dir_input is the path to the sql directory.
# This is set to the path to the sql directory in the feature store module.
- sql_dir_input = null_resource.generate_sql_queries.id != "" ? "${local.source_root_dir}/sql" : ""
+ sql_dir_input = null_resource.generate_sql_queries.id != "" ? "${local.source_root_dir}/sql" : ""
}
@@ -411,18 +388,15 @@ module "pipelines" {
# The source is the path to the pipelines module.
source = "./modules/pipelines"
config_file_path = local_file.feature_store_configuration.id != "" ? local_file.feature_store_configuration.filename : ""
- poetry_run_alias = local.poetry_run_alias
+ uv_run_alias = local.uv_run_alias
# The count determines if the pipelines are created or not.
# If the count is 1, the pipelines are created.
# If the count is 0, the pipelines are not created.
# This is done to avoid creating the pipelines if the `deploy_pipelines` variable is set to false in the terraform.tfvars file.
- count = var.deploy_pipelines ? 1 : 0
- # The poetry_installed trigger is the ID of the null_resource.poetry_install resource.
- # This is used to ensure that the poetry command is run before the pipelines module is created.
- poetry_installed = null_resource.poetry_install.id
+ count = var.deploy_pipelines ? 1 : 0
# The project_id is the project in which the data is stored.
# This is set to the data project ID in the terraform.tfvars file.
- mds_project_id = var.data_project_id
+ mds_project_id = var.data_project_id
}
@@ -433,53 +407,50 @@ module "pipelines" {
# The activation function is created in the `activation_project_id` project.
module "activation" {
# The source is the path to the activation module.
- source = "./modules/activation"
+ source = "./modules/activation"
# The project_id is the project in which the activation function is created.
# This is set to the activation project ID in the terraform.tfvars file.
- project_id = var.activation_project_id
+ project_id = var.activation_project_id
# The project number of where the activation function is created.
# This is retrieved from the activation project id using the google_project data source.
- project_number = data.google_project.activation_project.number
+ project_number = data.google_project.activation_project.number
# The location is the google_default_region variable.
# This is set to the default region in the terraform.tfvars file.
- location = var.google_default_region
+ location = var.google_default_region
# The data_location is the destination_data_location variable.
# This is set to the destination data location in the terraform.tfvars file.
- data_location = var.destination_data_location
+ data_location = var.destination_data_location
# The trigger_function_location is the location of the trigger function.
# The trigger function is used to trigger the activation function.
# The trigger function is created in the same region as the activation function.
trigger_function_location = var.google_default_region
- # The poetry_cmd is the poetry_cmd variable.
- # This can be set on the poetry_cmd in the terraform.tfvars file.
- poetry_cmd = var.poetry_cmd
+ # The uv_run_alias is the uv_run_alias variable.
+ # This can be set on the uv_cmd in the terraform.tfvars file.
+ uv_run_alias = local.uv_run_alias
# The ga4_measurement_id is the ga4_measurement_id variable.
# This can be set on the ga4_measurement_id in the terraform.tfvars file.
- ga4_measurement_id = var.ga4_measurement_id
+ ga4_measurement_id = var.ga4_measurement_id
# The ga4_measurement_secret is the ga4_measurement_secret variable.
# This can be set on the ga4_measurement_secret in the terraform.tfvars file.
- ga4_measurement_secret = var.ga4_measurement_secret
+ ga4_measurement_secret = var.ga4_measurement_secret
# The ga4_property_id is the ga4_property_id variable.
# This is set on the ga4_property_id in the terraform.tfvars file.
# The ga4_property_id is the property ID of the GA4 data.
# You can find the property ID in the GA4 console.
- ga4_property_id = var.ga4_property_id
+ ga4_property_id = var.ga4_property_id
# The ga4_stream_id is the ga4_stream_id variable.
# This is set on the ga4_stream_id in the terraform.tfvars file.
# The ga4_stream_id is the stream ID of the GA4 data.
# You can find the stream ID in the GA4 console.
- ga4_stream_id = var.ga4_stream_id
+ ga4_stream_id = var.ga4_stream_id
# The count determines if the activation function is created or not.
# If the count is 1, the activation function is created.
# If the count is 0, the activation function is not created.
# This is done to avoid creating the activation function if the `deploy_activation` variable is set
# to false in the terraform.tfvars file.
- count = var.deploy_activation ? 1 : 0
- # The poetry_installed is the ID of the null_resource poetry_install
- # This is used to ensure that the poetry command is run before the activation module is created.
- poetry_installed = null_resource.poetry_install.id
- mds_project_id = var.data_project_id
- mds_dataset_suffix = local.mds_dataset_suffix
+ count = var.deploy_activation ? 1 : 0
+ mds_project_id = var.data_project_id
+ mds_dataset_suffix = local.mds_dataset_suffix
# The project_owner_email is set in the terraform.tfvars file.
# An example of a valid email address is "william.mckinley@my-own-personal-domain.com".
diff --git a/infrastructure/terraform/modules/activation/configuration-tables.tf b/infrastructure/terraform/modules/activation/configuration-tables.tf
new file mode 100644
index 00000000..0c2deecb
--- /dev/null
+++ b/infrastructure/terraform/modules/activation/configuration-tables.tf
@@ -0,0 +1,45 @@
+# Copyright 2024 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+locals {
+ vbb_activation_configuration_file = "vbb_activation_configuration.jsonl"
+}
+
+# JSON configuration file for smart bidding based activation
+resource "google_storage_bucket_object" "vbb_activation_configuration_file" {
+ name = "${local.configuration_folder}/${local.vbb_activation_configuration_file}"
+ source = "${local.template_dir}/${local.vbb_activation_configuration_file}"
+ bucket = module.pipeline_bucket.name
+}
+
+# This data resources creates a data resource that renders a template file and stores the rendered content in a variable.
+data "template_file" "load_vbb_activation_configuration_proc" {
+ template = file("${local.template_dir}/load_vbb_activation_configuration.sql.tpl")
+ vars = {
+ project_id = module.project_services.project_id
+ dataset = module.bigquery.bigquery_dataset.dataset_id
+ config_file_uri = "gs://${module.pipeline_bucket.name}/${google_storage_bucket_object.vbb_activation_configuration_file.output_name}"
+ }
+}
+
+# Store procedure that loads the json configuation file from GCS into a configuration table in BQ
+resource "google_bigquery_routine" "load_vbb_activation_configuration_proc" {
+ project = module.project_services.project_id
+ dataset_id = module.bigquery.bigquery_dataset.dataset_id
+ routine_id = "load_vbb_activation_configuration"
+ routine_type = "PROCEDURE"
+ language = "SQL"
+ definition_body = data.template_file.load_vbb_activation_configuration_proc.rendered
+ description = "Procedure for loading vbb activation configuration from GCS bucket"
+}
diff --git a/infrastructure/terraform/modules/activation/export-procedures.tf b/infrastructure/terraform/modules/activation/export-procedures.tf
index 55ab1001..86d942be 100644
--- a/infrastructure/terraform/modules/activation/export-procedures.tf
+++ b/infrastructure/terraform/modules/activation/export-procedures.tf
@@ -118,11 +118,34 @@ resource "google_bigquery_routine" "export_churn_propensity_procedure" {
routine_id = "export_churn_propensity_predictions"
routine_type = "PROCEDURE"
language = "SQL"
- definition_body = data.template_file.purchase_propensity_csv_export_query.rendered
- description = "Export purchase propensity predictions as CSV for GA4 User Data Import"
+ definition_body = data.template_file.churn_propensity_csv_export_query.rendered
+ description = "Export churn propensity predictions as CSV for GA4 User Data Import"
arguments {
name = "prediction_table_name"
mode = "IN"
data_type = jsonencode({ "typeKind" : "STRING" })
}
}
+
+data "template_file" "lead_score_propensity_csv_export_query" {
+ template = file("${local.source_root_dir}/templates/activation_user_import/lead_score_propensity_csv_export.sqlx")
+ vars = {
+ ga4_stream_id = var.ga4_stream_id
+ export_bucket = module.pipeline_bucket.name
+ }
+}
+
+resource "google_bigquery_routine" "export_lead_score_propensity_procedure" {
+ project = null_resource.check_bigquery_api.id != "" ? module.project_services.project_id : var.project_id
+ dataset_id = module.bigquery.bigquery_dataset.dataset_id
+ routine_id = "export_lead_score_propensity_predictions"
+ routine_type = "PROCEDURE"
+ language = "SQL"
+ definition_body = data.template_file.lead_score_propensity_csv_export_query.rendered
+ description = "Export lead score propensity predictions as CSV for GA4 User Data Import"
+ arguments {
+ name = "prediction_table_name"
+ mode = "IN"
+ data_type = jsonencode({ "typeKind" : "STRING" })
+ }
+}
\ No newline at end of file
diff --git a/infrastructure/terraform/modules/activation/main.tf b/infrastructure/terraform/modules/activation/main.tf
index 1123410c..ef58ce59 100644
--- a/infrastructure/terraform/modules/activation/main.tf
+++ b/infrastructure/terraform/modules/activation/main.tf
@@ -15,7 +15,6 @@
locals {
app_prefix = "activation"
source_root_dir = "../.."
- poetry_run_alias = "${var.poetry_cmd} run"
template_dir = "${local.source_root_dir}/templates"
pipeline_source_dir = "${local.source_root_dir}/python/activation"
trigger_function_dir = "${local.source_root_dir}/python/function"
@@ -24,12 +23,15 @@ locals {
auto_audience_segmentation_query_template_file = "auto_audience_segmentation_query_template.sqlx"
cltv_query_template_file = "cltv_query_template.sqlx"
purchase_propensity_query_template_file = "purchase_propensity_query_template.sqlx"
+ purchase_propensity_vbb_query_template_file = "purchase_propensity_vbb_query_template.sqlx"
+ lead_score_propensity_query_template_file = "lead_score_propensity_query_template.sqlx"
+ lead_score_propensity_vbb_query_template_file = "lead_score_propensity_vbb_query_template.sqlx"
churn_propensity_query_template_file = "churn_propensity_query_template.sqlx"
- measurement_protocol_payload_template_file = "app_payload_template.jinja2"
activation_container_image_id = "activation-pipeline"
docker_repo_prefix = "${var.location}-docker.pkg.dev/${var.project_id}"
activation_container_name = "dataflow/${local.activation_container_image_id}"
- source_archive_file = "activation_trigger_source.zip"
+ source_archive_file_prefix = "activation_trigger_source"
+ source_archive_file = "${local.source_archive_file_prefix}.zip"
pipeline_service_account_name = "dataflow-worker"
pipeline_service_account_email = "${local.app_prefix}-${local.pipeline_service_account_name}@${var.project_id}.iam.gserviceaccount.com"
@@ -37,19 +39,14 @@ locals {
trigger_function_account_name = "trigger-function"
trigger_function_account_email = "${local.app_prefix}-${local.trigger_function_account_name}@${var.project_id}.iam.gserviceaccount.com"
- builder_service_account_name = "build-job"
+ builder_service_account_name = "build-job"
builder_service_account_email = "${local.app_prefix}-${local.builder_service_account_name}@${var.project_id}.iam.gserviceaccount.com"
- activation_type_configuration_file = "${local.source_root_dir}/templates/activation_type_configuration_template.tpl"
+ activation_type_configuration_file = "${local.source_root_dir}/templates/activation_type_configuration_template.tpl"
# This is calculating a hash number on the file content to keep track of changes and trigger redeployment of resources
# in case the file content changes.
activation_type_configuration_file_content_hash = filesha512(local.activation_type_configuration_file)
- app_payload_template_file = "${local.source_root_dir}/templates/app_payload_template.jinja2"
- # This is calculating a hash number on the file content to keep track of changes and trigger redeployment of resources
- # in case the file content changes.
- app_payload_template_file_content_hash = filesha512(local.activation_type_configuration_file)
-
activation_application_dir = "${local.source_root_dir}/python/activation"
activation_application_fileset = [
"${local.activation_application_dir}/main.py",
@@ -61,6 +58,20 @@ locals {
# This is calculating a hash number on the files contents to keep track of changes and trigger redeployment of resources
# in case any of these files contents changes.
activation_application_content_hash = sha512(join("", [for f in local.activation_application_fileset : fileexists(f) ? filesha512(f) : sha512("file-not-found")]))
+
+ ga4_setup_source_file = "${local.source_root_dir}/python/ga4_setup/setup.py"
+ ga4_setup_source_file_content_hash = filesha512(local.ga4_setup_source_file)
+
+ # GCP Cloud Build is not available in all regions.
+ cloud_build_available_locations = [
+ "us-central1",
+ "us-west2",
+ "europe-west1",
+ "asia-east1",
+ "australia-southeast1",
+ "southamerica-east1"
+ ]
+
}
data "google_project" "activation_project" {
@@ -69,7 +80,7 @@ data "google_project" "activation_project" {
module "project_services" {
source = "terraform-google-modules/project-factory/google//modules/project_services"
- version = "14.1.0"
+ version = "17.0.0"
disable_dependent_services = false
disable_services_on_destroy = false
@@ -92,6 +103,7 @@ module "project_services" {
"analyticsadmin.googleapis.com",
"eventarc.googleapis.com",
"run.googleapis.com",
+ "cloudkms.googleapis.com"
]
}
@@ -301,11 +313,47 @@ resource "null_resource" "check_cloudbuild_api" {
depends_on = [
module.project_services
]
+
+ # The lifecycle block of the google_artifact_registry_repository resource defines a precondition that
+ # checks if the specified region is included in the vertex_pipelines_available_locations list.
+ # If the condition is not met, an error message is displayed and the Terraform configuration will fail.
+ lifecycle {
+ precondition {
+ condition = contains(local.cloud_build_available_locations, var.location)
+ error_message = "Cloud Build is not available in your default region: ${var.location}.\nSet 'google_default_region' variable to a valid Cloud Build location, see Restricted Regions in https://cloud.google.com/build/docs/locations."
+ }
+ }
+}
+
+# This resource executes gcloud commands to check whether the IAM API is enabled.
+# Since enabling APIs can take a few seconds, we need to make the deployment wait until the API is enabled before resuming.
+resource "null_resource" "check_cloudkms_api" {
+ provisioner "local-exec" {
+ command = <<-EOT
+ COUNTER=0
+ MAX_TRIES=100
+ while ! gcloud services list --project=${module.project_services.project_id} | grep -i "cloudkms.googleapis.com" && [ $COUNTER -lt $MAX_TRIES ]
+ do
+ sleep 6
+ printf "."
+ COUNTER=$((COUNTER + 1))
+ done
+ if [ $COUNTER -eq $MAX_TRIES ]; then
+ echo "cloud kms api is not enabled, terraform can not continue!"
+ exit 1
+ fi
+ sleep 20
+ EOT
+ }
+
+ depends_on = [
+ module.project_services
+ ]
}
module "bigquery" {
source = "terraform-google-modules/bigquery/google"
- version = "~> 5.4"
+ version = "8.1.0"
dataset_id = local.app_prefix
dataset_name = local.app_prefix
@@ -322,10 +370,11 @@ resource "null_resource" "create_custom_events" {
triggers = {
services_enabled_project = null_resource.check_analyticsadmin_api.id != "" ? module.project_services.project_id : var.project_id
source_contents_hash = local.activation_type_configuration_file_content_hash
+ source_file_content_hash = local.ga4_setup_source_file_content_hash
}
provisioner "local-exec" {
command = <<-EOT
- ${local.poetry_run_alias} ga4-setup --ga4_resource=custom_events --ga4_property_id=${var.ga4_property_id} --ga4_stream_id=${var.ga4_stream_id}
+ ${var.uv_run_alias} ga4-setup --ga4_resource=custom_events --ga4_property_id=${var.ga4_property_id} --ga4_stream_id=${var.ga4_stream_id}
EOT
working_dir = local.source_root_dir
}
@@ -337,12 +386,13 @@ resource "null_resource" "create_custom_events" {
resource "null_resource" "create_custom_dimensions" {
triggers = {
services_enabled_project = null_resource.check_analyticsadmin_api.id != "" ? module.project_services.project_id : var.project_id
+ source_file_content_hash = local.ga4_setup_source_file_content_hash
#source_activation_type_configuration_hash = local.activation_type_configuration_file_content_hash
#source_activation_application_python_hash = local.activation_application_content_hash
}
provisioner "local-exec" {
command = <<-EOT
- ${local.poetry_run_alias} ga4-setup --ga4_resource=custom_dimensions --ga4_property_id=${var.ga4_property_id} --ga4_stream_id=${var.ga4_stream_id}
+ ${var.uv_run_alias} ga4-setup --ga4_resource=custom_dimensions --ga4_property_id=${var.ga4_property_id} --ga4_stream_id=${var.ga4_stream_id}
EOT
working_dir = local.source_root_dir
}
@@ -359,7 +409,7 @@ resource "google_artifact_registry_repository" "activation_repository" {
module "pipeline_service_account" {
source = "terraform-google-modules/service-accounts/google"
- version = "~> 3.0"
+ version = "4.4.0"
project_id = null_resource.check_dataflow_api.id != "" ? module.project_services.project_id : var.project_id
prefix = local.app_prefix
names = [local.pipeline_service_account_name]
@@ -368,7 +418,7 @@ module "pipeline_service_account" {
"${module.project_services.project_id}=>roles/dataflow.worker",
"${module.project_services.project_id}=>roles/bigquery.dataEditor",
"${module.project_services.project_id}=>roles/bigquery.jobUser",
- "${module.project_services.project_id}=>roles/artifactregistry.writer",
+ "${module.project_services.project_id}=>roles/artifactregistry.writer",
]
display_name = "Dataflow worker Service Account"
description = "Activation Dataflow worker Service Account"
@@ -376,7 +426,7 @@ module "pipeline_service_account" {
module "trigger_function_account" {
source = "terraform-google-modules/service-accounts/google"
- version = "~> 3.0"
+ version = "4.4.0"
project_id = null_resource.check_pubsub_api.id != "" ? module.project_services.project_id : var.project_id
prefix = local.app_prefix
names = [local.trigger_function_account_name]
@@ -398,48 +448,139 @@ module "trigger_function_account" {
# a python command defined in the module ga4_setup.
# This informatoin can then be used in other parts of the Terraform configuration to access the retrieved information.
data "external" "ga4_measurement_properties" {
- program = ["bash", "-c", "${local.poetry_run_alias} ga4-setup --ga4_resource=measurement_properties --ga4_property_id=${var.ga4_property_id} --ga4_stream_id=${var.ga4_stream_id}"]
+ program = ["bash", "-c", "${var.uv_run_alias} ga4-setup --ga4_resource=measurement_properties --ga4_property_id=${var.ga4_property_id} --ga4_stream_id=${var.ga4_stream_id}"]
working_dir = local.source_root_dir
# The count attribute specifies how many times the external data source should be executed.
# This means that the external data source will be executed only if either the
# var.ga4_measurement_id or var.ga4_measurement_secret variable is not set.
- count = (var.ga4_measurement_id == null || var.ga4_measurement_secret == null || var.ga4_measurement_id == "" || var.ga4_measurement_secret == "") ? 1 : 0
+ count = (var.ga4_measurement_id == null || var.ga4_measurement_secret == null || var.ga4_measurement_id == "" || var.ga4_measurement_secret == "") ? 1 : 0
depends_on = [
module.project_services
]
}
+# It's used to create unique names for resources like KMS key rings or crypto keys,
+# ensuring they don't clash with existing resources.
+resource "random_id" "random_suffix" {
+ byte_length = 2
+}
+
+# This ensures that Secret Manager has a service identity within your project.
+# This identity is crucial for securely managing secrets and allowing Secret Manager
+# to interact with other Google Cloud services on your behalf.
+resource "google_project_service_identity" "secretmanager_sa" {
+ provider = google-beta
+ project = null_resource.check_cloudkms_api.id != "" ? module.project_services.project_id : var.project_id
+ service = "secretmanager.googleapis.com"
+}
+# This Key Ring can then be used to store and manage encryption keys for various purposes,
+# such as encrypting data at rest or protecting secrets.
+resource "google_kms_key_ring" "key_ring_regional" {
+ name = "key_ring_regional-${random_id.random_suffix.hex}"
+ # If you want your replicas in other locations, change the location in the var.location variable passed as a parameter to this submodule.
+ # if you your replicas stored global, set the location = "global".
+ location = var.location
+ project = null_resource.check_cloudkms_api.id != "" ? module.project_services.project_id : var.project_id
+}
+
+# This key can then be used for various encryption operations,
+# such as encrypting data before storing it in Google Cloud Storage
+# or protecting secrets within your application.
+resource "google_kms_crypto_key" "crypto_key_regional" {
+ name = "crypto-key-${random_id.random_suffix.hex}"
+ key_ring = google_kms_key_ring.key_ring_regional.id
+}
+
+# Defines an IAM policy that explicitly grants the Secret Manager service account
+# the ability to encrypt and decrypt data using a specific CryptoKey. This is a
+# common pattern for securely managing secrets, allowing Secret Manager to encrypt
+# or decrypt data without requiring direct access to the underlying encryption key material.
+data "google_iam_policy" "crypto_key_encrypter_decrypter" {
+ binding {
+ role = "roles/cloudkms.cryptoKeyEncrypterDecrypter"
+
+ members = [
+ "serviceAccount:${google_project_service_identity.secretmanager_sa.email}"
+ ]
+ }
+
+ depends_on = [
+ google_project_service_identity.secretmanager_sa,
+ google_kms_key_ring.key_ring_regional,
+ google_kms_crypto_key.crypto_key_regional
+ ]
+}
+
+# It sets the IAM policy for a KMS CryptoKey, specifically granting permissions defined
+# in another data source.
+resource "google_kms_crypto_key_iam_policy" "crypto_key" {
+ crypto_key_id = google_kms_crypto_key.crypto_key_regional.id
+ policy_data = data.google_iam_policy.crypto_key_encrypter_decrypter.policy_data
+}
+
+# It sets the IAM policy for a KMS Key Ring, granting specific permissions defined
+# in a data source.
+resource "google_kms_key_ring_iam_policy" "key_ring" {
+ key_ring_id = google_kms_key_ring.key_ring_regional.id
+ policy_data = data.google_iam_policy.crypto_key_encrypter_decrypter.policy_data
+}
+
# This module stores the values ga4-measurement-id and ga4-measurement-secret in Google Cloud Secret Manager.
module "secret_manager" {
source = "GoogleCloudPlatform/secret-manager/google"
- version = "~> 0.1"
- project_id = null_resource.check_secretmanager_api.id != "" ? module.project_services.project_id : var.project_id
+ version = "0.4.0"
+ project_id = google_kms_crypto_key_iam_policy.crypto_key.etag != "" && google_kms_key_ring_iam_policy.key_ring.etag != "" ? module.project_services.project_id : var.project_id
secrets = [
{
name = "ga4-measurement-id"
secret_data = (var.ga4_measurement_id == null || var.ga4_measurement_secret == null) ? data.external.ga4_measurement_properties[0].result["measurement_id"] : var.ga4_measurement_id
- automatic_replication = true
+ automatic_replication = false
},
{
name = "ga4-measurement-secret"
secret_data = (var.ga4_measurement_id == null || var.ga4_measurement_secret == null) ? data.external.ga4_measurement_properties[0].result["measurement_secret"] : var.ga4_measurement_secret
- automatic_replication = true
+ automatic_replication = false
},
]
+ # By commenting the user_managed_replication block, you will deploy replicas that may store the secret in different locations in the globe.
+ # This is not a desired behaviour, make sure you're aware of it before doing it.
+ # By default, to respect resources location, we prevent resources from being deployed globally by deploying secrets in the same region of the compute resources.
+ user_managed_replication = {
+ ga4-measurement-id = [
+ # If you want your replicas in other locations, uncomment the following lines and add them here.
+ # Check this example, as reference: https://github.com/GoogleCloudPlatform/terraform-google-secret-manager/blob/main/examples/multiple/main.tf#L91
+ {
+ location = var.location
+ kms_key_name = google_kms_crypto_key.crypto_key_regional.id
+ }
+ ]
+ ga4-measurement-secret = [
+ {
+ location = var.location
+ kms_key_name = google_kms_crypto_key.crypto_key_regional.id
+ }
+ ]
+ }
+
depends_on = [
- data.external.ga4_measurement_properties
+ data.external.ga4_measurement_properties,
+ google_kms_crypto_key.crypto_key_regional,
+ google_kms_key_ring.key_ring_regional,
+ google_project_service_identity.secretmanager_sa,
+ google_kms_crypto_key_iam_policy.crypto_key,
+ google_kms_key_ring_iam_policy.key_ring
]
}
# This module creates a Cloud Storage bucket to be used by the Activation Application
module "pipeline_bucket" {
- source = "terraform-google-modules/cloud-storage/google//modules/simple_bucket"
- version = "~> 3.4.1"
- project_id = null_resource.check_dataflow_api.id != "" ? module.project_services.project_id : var.project_id
- name = "${local.app_prefix}-app-${module.project_services.project_id}"
- location = var.location
+ source = "terraform-google-modules/cloud-storage/google//modules/simple_bucket"
+ version = "6.1.0"
+ project_id = null_resource.check_dataflow_api.id != "" ? module.project_services.project_id : var.project_id
+ name = "${local.app_prefix}-app-${module.project_services.project_id}"
+ location = var.location
# When deleting a bucket, this boolean option will delete all contained objects.
# If false, Terraform will fail to delete buckets which contain objects.
force_destroy = true
@@ -471,8 +612,8 @@ resource "google_project_iam_member" "cloud_build_job_service_account" {
module.project_services,
null_resource.check_artifactregistry_api,
data.google_project.project,
- ]
-
+ ]
+
project = null_resource.check_artifactregistry_api.id != "" ? module.project_services.project_id : var.project_id
member = "serviceAccount:${var.project_number}-compute@developer.gserviceaccount.com"
@@ -516,16 +657,16 @@ resource "google_project_iam_member" "cloud_build_job_service_account" {
}
data "google_project" "project" {
- project_id = null_resource.check_cloudbuild_api != "" ? module.project_services.project_id : var.project_id
+ project_id = null_resource.check_cloudbuild_api != "" ? module.project_services.project_id : var.project_id
}
# This module creates a Cloud Storage bucket to be used by the Cloud Build Log Bucket
module "build_logs_bucket" {
- source = "terraform-google-modules/cloud-storage/google//modules/simple_bucket"
- version = "~> 3.4.1"
- project_id = null_resource.check_cloudbuild_api != "" ? module.project_services.project_id : var.project_id
- name = "${local.app_prefix}-logs-${module.project_services.project_id}"
- location = var.location
+ source = "terraform-google-modules/cloud-storage/google//modules/simple_bucket"
+ version = "6.1.0"
+ project_id = null_resource.check_cloudbuild_api != "" ? module.project_services.project_id : var.project_id
+ name = "${local.app_prefix}-logs-${module.project_services.project_id}"
+ location = var.location
# When deleting a bucket, this boolean option will delete all contained objects.
# If false, Terraform will fail to delete buckets which contain objects.
force_destroy = true
@@ -543,8 +684,8 @@ module "build_logs_bucket" {
iam_members = [
{
- role = "roles/storage.admin"
- member = "serviceAccount:${var.project_number}-compute@developer.gserviceaccount.com"
+ role = "roles/storage.admin"
+ member = "serviceAccount:${var.project_number}-compute@developer.gserviceaccount.com"
}
]
@@ -554,13 +695,6 @@ module "build_logs_bucket" {
]
}
-# This resource creates a bucket object using as content the measurement_protocol_payload_template_file file.
-resource "google_storage_bucket_object" "measurement_protocol_payload_template_file" {
- name = "${local.configuration_folder}/${local.measurement_protocol_payload_template_file}"
- source = "${local.template_dir}/${local.measurement_protocol_payload_template_file}"
- bucket = module.pipeline_bucket.name
-}
-
# This resource creates a bucket object using as content the audience_segmentation_query_template_file file.
data "template_file" "audience_segmentation_query_template_file" {
template = file("${local.template_dir}/activation_query/${local.audience_segmentation_query_template_file}")
@@ -618,7 +752,7 @@ data "template_file" "churn_propensity_query_template_file" {
}
}
-# This resource creates a bucket object using as content the purchase_propensity_query_template_file file.
+# This resource creates a bucket object using as content the churn_propensity_query_template_file file.
resource "google_storage_bucket_object" "churn_propensity_query_template_file" {
name = "${local.configuration_folder}/${local.churn_propensity_query_template_file}"
content = data.template_file.churn_propensity_query_template_file.rendered
@@ -641,6 +775,58 @@ resource "google_storage_bucket_object" "purchase_propensity_query_template_file
bucket = module.pipeline_bucket.name
}
+# This resource creates a bucket object using as content the purchase_propensity_vbb_query_template_file file.
+data "template_file" "purchase_propensity_vbb_query_template_file" {
+ template = file("${local.template_dir}/activation_query/${local.purchase_propensity_vbb_query_template_file}")
+
+ vars = {
+ mds_project_id = var.mds_project_id
+ mds_dataset_suffix = var.mds_dataset_suffix
+ activation_project_id = var.project_id
+ dataset = module.bigquery.bigquery_dataset.dataset_id
+ }
+}
+
+resource "google_storage_bucket_object" "purchase_propensity_vbb_query_template_file" {
+ name = "${local.configuration_folder}/${local.purchase_propensity_vbb_query_template_file}"
+ content = data.template_file.purchase_propensity_vbb_query_template_file.rendered
+ bucket = module.pipeline_bucket.name
+}
+
+data "template_file" "lead_score_propensity_query_template_file" {
+ template = file("${local.template_dir}/activation_query/${local.lead_score_propensity_query_template_file}")
+
+ vars = {
+ mds_project_id = var.mds_project_id
+ mds_dataset_suffix = var.mds_dataset_suffix
+ }
+}
+
+# This resource creates a bucket object using as content the lead_score_propensity_query_template_file file.
+resource "google_storage_bucket_object" "lead_score_propensity_query_template_file" {
+ name = "${local.configuration_folder}/${local.lead_score_propensity_query_template_file}"
+ content = data.template_file.lead_score_propensity_query_template_file.rendered
+ bucket = module.pipeline_bucket.name
+}
+
+# This resource creates a bucket object using as content the lead_score_propensity_vbb_query_template_file file.
+data "template_file" "lead_score_propensity_vbb_query_template_file" {
+ template = file("${local.template_dir}/activation_query/${local.lead_score_propensity_vbb_query_template_file}")
+
+ vars = {
+ mds_project_id = var.mds_project_id
+ mds_dataset_suffix = var.mds_dataset_suffix
+ activation_project_id = var.project_id
+ dataset = module.bigquery.bigquery_dataset.dataset_id
+ }
+}
+
+resource "google_storage_bucket_object" "lead_score_propensity_vbb_query_template_file" {
+ name = "${local.configuration_folder}/${local.lead_score_propensity_vbb_query_template_file}"
+ content = data.template_file.lead_score_propensity_vbb_query_template_file.rendered
+ bucket = module.pipeline_bucket.name
+}
+
# This data resources creates a data resource that renders a template file and stores the rendered content in a variable.
data "template_file" "activation_type_configuration" {
template = file("${local.template_dir}/activation_type_configuration_template.tpl")
@@ -650,16 +836,18 @@ data "template_file" "activation_type_configuration" {
auto_audience_segmentation_query_template_gcs_path = "gs://${module.pipeline_bucket.name}/${google_storage_bucket_object.auto_audience_segmentation_query_template_file.output_name}"
cltv_query_template_gcs_path = "gs://${module.pipeline_bucket.name}/${google_storage_bucket_object.cltv_query_template_file.output_name}"
purchase_propensity_query_template_gcs_path = "gs://${module.pipeline_bucket.name}/${google_storage_bucket_object.purchase_propensity_query_template_file.output_name}"
+ purchase_propensity_vbb_query_template_gcs_path = "gs://${module.pipeline_bucket.name}/${google_storage_bucket_object.purchase_propensity_vbb_query_template_file.output_name}"
churn_propensity_query_template_gcs_path = "gs://${module.pipeline_bucket.name}/${google_storage_bucket_object.churn_propensity_query_template_file.output_name}"
- measurement_protocol_payload_template_gcs_path = "gs://${module.pipeline_bucket.name}/${google_storage_bucket_object.measurement_protocol_payload_template_file.output_name}"
+ lead_score_propensity_query_template_gcs_path = "gs://${module.pipeline_bucket.name}/${google_storage_bucket_object.lead_score_propensity_query_template_file.output_name}"
+ lead_score_propensity_vbb_query_template_gcs_path = "gs://${module.pipeline_bucket.name}/${google_storage_bucket_object.lead_score_propensity_vbb_query_template_file.output_name}"
}
}
# This resource creates a bucket object using as content the activation_type_configuration.json file.
resource "google_storage_bucket_object" "activation_type_configuration_file" {
- name = "${local.configuration_folder}/activation_type_configuration.json"
- content = data.template_file.activation_type_configuration.rendered
- bucket = module.pipeline_bucket.name
+ name = "${local.configuration_folder}/activation_type_configuration.json"
+ content = data.template_file.activation_type_configuration.rendered
+ bucket = module.pipeline_bucket.name
# Detects md5hash changes to redeploy this file to the GCS bucket.
detect_md5hash = base64encode("${local.activation_type_configuration_file_content_hash}${local.activation_application_content_hash}")
}
@@ -667,12 +855,19 @@ resource "google_storage_bucket_object" "activation_type_configuration_file" {
# This module submits a gcloud build to build a docker container image to be used by the Activation Application
module "activation_pipeline_container" {
source = "terraform-google-modules/gcloud/google"
- version = "3.1.2"
+ version = "3.5.0"
platform = "linux"
- #create_cmd_body = "builds submit --project=${module.project_services.project_id} --tag ${local.docker_repo_prefix}/${google_artifact_registry_repository.activation_repository.name}/${local.activation_container_name}:latest ${local.pipeline_source_dir}"
- create_cmd_body = "builds submit --project=${module.project_services.project_id} --tag ${local.docker_repo_prefix}/${google_artifact_registry_repository.activation_repository.name}/${local.activation_container_name}:latest --gcs-log-dir=gs://${module.build_logs_bucket.name} ${local.pipeline_source_dir}"
+ create_cmd_body = <<-EOT
+ builds submit \
+ --project=${module.project_services.project_id} \
+ --region ${var.location} \
+ --default-buckets-behavior=regional-user-owned-bucket \
+ --tag ${local.docker_repo_prefix}/${google_artifact_registry_repository.activation_repository.name}/${local.activation_container_name}:latest \
+ --gcs-log-dir=gs://${module.build_logs_bucket.name} \
+ ${local.pipeline_source_dir}
+ EOT
destroy_cmd_body = "artifacts docker images delete --project=${module.project_services.project_id} ${local.docker_repo_prefix}/${google_artifact_registry_repository.activation_repository.name}/${local.activation_container_name} --delete-tags"
create_cmd_triggers = {
@@ -686,9 +881,8 @@ module "activation_pipeline_container" {
# This module executes a gcloud command to build a dataflow flex template and uploads it to Dataflow
module "activation_pipeline_template" {
- source = "terraform-google-modules/gcloud/google"
- version = "3.1.2"
- additional_components = ["gsutil"]
+ source = "terraform-google-modules/gcloud/google"
+ version = "3.5.0"
platform = "linux"
create_cmd_body = "dataflow flex-template build --project=${module.project_services.project_id} \"gs://${module.pipeline_bucket.name}/dataflow/templates/${local.activation_container_image_id}.json\" --image \"${local.docker_repo_prefix}/${google_artifact_registry_repository.activation_repository.name}/${local.activation_container_name}:latest\" --sdk-language \"PYTHON\" --metadata-file \"${local.pipeline_source_dir}/metadata.json\""
@@ -718,11 +912,11 @@ data "archive_file" "activation_trigger_source" {
# This module creates a Cloud Sorage bucket and sets the trigger_function_account_email as the admin.
module "function_bucket" {
- source = "terraform-google-modules/cloud-storage/google//modules/simple_bucket"
- version = "~> 3.4.1"
- project_id = null_resource.check_cloudfunctions_api.id != "" ? module.project_services.project_id : var.project_id
- name = "${local.app_prefix}-trigger-${module.project_services.project_id}"
- location = var.location
+ source = "terraform-google-modules/cloud-storage/google//modules/simple_bucket"
+ version = "6.1.0"
+ project_id = null_resource.check_cloudfunctions_api.id != "" ? module.project_services.project_id : var.project_id
+ name = "${local.app_prefix}-trigger-${module.project_services.project_id}"
+ location = var.location
# When deleting a bucket, this boolean option will delete all contained objects.
# If false, Terraform will fail to delete buckets which contain objects.
force_destroy = true
@@ -750,7 +944,7 @@ module "function_bucket" {
# This resource creates a bucket object using as content the activation_trigger_archive zip file.
resource "google_storage_bucket_object" "activation_trigger_archive" {
- name = local.source_archive_file
+ name = "${local.source_archive_file_prefix}_${data.archive_file.activation_trigger_source.output_sha256}.zip"
source = data.archive_file.activation_trigger_source.output_path
bucket = module.function_bucket.name
}
@@ -821,7 +1015,7 @@ resource "google_cloudfunctions2_function" "activation_trigger_cf" {
# This modules runs cloud commands that adds an invoker policy binding to a Cloud Function, allowing a specific service account to invoke the function.
module "add_invoker_binding" {
source = "terraform-google-modules/gcloud/google"
- version = "3.1.2"
+ version = "3.5.0"
platform = "linux"
diff --git a/infrastructure/terraform/modules/activation/variables.tf b/infrastructure/terraform/modules/activation/variables.tf
index d3fb4759..5814361b 100644
--- a/infrastructure/terraform/modules/activation/variables.tf
+++ b/infrastructure/terraform/modules/activation/variables.tf
@@ -43,8 +43,8 @@ variable "trigger_function_location" {
type = string
}
-variable "poetry_cmd" {
- description = "alias for poetry command on the current system"
+variable "uv_run_alias" {
+ description = "alias for uv run command on the current system"
type = string
}
@@ -72,11 +72,6 @@ variable "ga4_stream_id" {
type = string
}
-variable "poetry_installed" {
- description = "Construct to specify dependency to poetry installed"
- type = string
-}
-
variable "mds_project_id" {
type = string
description = "MDS Project ID"
@@ -90,4 +85,4 @@ variable "mds_dataset_suffix" {
variable "project_owner_email" {
description = "Email address of the project owner."
type = string
-}
\ No newline at end of file
+}
diff --git a/infrastructure/terraform/modules/activation/versions.tf b/infrastructure/terraform/modules/activation/versions.tf
index 5a896e28..29fe3151 100644
--- a/infrastructure/terraform/modules/activation/versions.tf
+++ b/infrastructure/terraform/modules/activation/versions.tf
@@ -20,7 +20,12 @@ terraform {
required_providers {
google = {
source = "hashicorp/google"
- version = ">= 3.43.0, >= 3.53.0, >= 3.63.0, >= 4.83.0, < 5.0.0, < 6.0.0"
+ version = "5.44.1"
+ }
+
+ google-beta = {
+ source = "hashicorp/google-beta"
+ version = "5.44.1"
}
}
diff --git a/infrastructure/terraform/modules/data-store/data-processing-services.tf b/infrastructure/terraform/modules/data-store/data-processing-services.tf
index dd7e66b9..86f80a29 100644
--- a/infrastructure/terraform/modules/data-store/data-processing-services.tf
+++ b/infrastructure/terraform/modules/data-store/data-processing-services.tf
@@ -16,7 +16,7 @@
# https://registry.terraform.io/modules/terraform-google-modules/project-factory/google/latest/submodules/project_services
module "data_processing_project_services" {
source = "terraform-google-modules/project-factory/google//modules/project_services"
- version = "14.1.0"
+ version = "17.0.0"
disable_dependent_services = false
disable_services_on_destroy = false
@@ -116,4 +116,4 @@ resource "null_resource" "check_dataform_api" {
depends_on = [
module.data_processing_project_services
]
-}
\ No newline at end of file
+}
diff --git a/infrastructure/terraform/modules/data-store/dataform.tf b/infrastructure/terraform/modules/data-store/dataform.tf
index 2c4d600e..24944803 100644
--- a/infrastructure/terraform/modules/data-store/dataform.tf
+++ b/infrastructure/terraform/modules/data-store/dataform.tf
@@ -54,9 +54,9 @@ locals {
resource "google_dataform_repository" "marketing-analytics" {
provider = google-beta
# This is the name of the Dataform Repository created in your project
- name = "marketing-analytics"
- project = null_resource.check_dataform_api.id != "" ? module.data_processing_project_services.project_id : data.google_project.data_processing.project_id
- region = local.dataform_derived_region
+ name = "marketing-analytics"
+ project = null_resource.check_dataform_api.id != "" ? module.data_processing_project_services.project_id : data.google_project.data_processing.project_id
+ region = local.dataform_derived_region
lifecycle {
precondition {
@@ -74,4 +74,4 @@ resource "google_dataform_repository" "marketing-analytics" {
depends_on = [
module.data_processing_project_services
]
-}
\ No newline at end of file
+}
diff --git a/infrastructure/terraform/modules/data-store/iam-binding.tf b/infrastructure/terraform/modules/data-store/iam-binding.tf
index 4cac19f7..564efd16 100644
--- a/infrastructure/terraform/modules/data-store/iam-binding.tf
+++ b/infrastructure/terraform/modules/data-store/iam-binding.tf
@@ -12,18 +12,6 @@
# See the License for the specific language governing permissions and
# limitations under the License.
-# TODO: we might not need to have this email role at all.
-resource "google_project_iam_member" "email-role" {
- for_each = toset([
- "roles/iam.serviceAccountUser", // TODO: is it really needed?
- "roles/dataform.admin",
- "roles/dataform.editor"
- ])
- role = each.key
- member = "user:${var.project_owner_email}"
- project = null_resource.check_dataform_api.id != "" ? module.data_processing_project_services.project_id : data.google_project.data_processing.project_id
-}
-
# Check the Dataform Service Account Access Requirements for more information
# https://cloud.google.com/dataform/docs/required-access
locals {
@@ -38,7 +26,7 @@ resource "null_resource" "wait_for_dataform_sa_creation" {
MAX_TRIES=100
while ! gcloud asset search-all-iam-policies --scope=projects/${module.data_processing_project_services.project_id} --flatten="policy.bindings[].members[]" --filter="policy.bindings.members~\"serviceAccount:\"" --format="value(policy.bindings.members.split(sep=\":\").slice(1))" | grep -i "${local.dataform_sa}" && [ $COUNTER -lt $MAX_TRIES ]
do
- sleep 3
+ sleep 10
printf "."
COUNTER=$((COUNTER + 1))
done
@@ -46,7 +34,7 @@ resource "null_resource" "wait_for_dataform_sa_creation" {
echo "dataform service account was not created, terraform can not continue!"
exit 1
fi
- sleep 20
+ sleep 120
EOT
}
@@ -56,61 +44,185 @@ resource "null_resource" "wait_for_dataform_sa_creation" {
]
}
+module "email-role" {
+ source = "terraform-google-modules/iam/google//modules/member_iam"
+ version = "~> 8.0"
+
+ service_account_address = var.project_owner_email
+ project_id = null_resource.check_dataform_api.id != "" ? module.data_processing_project_services.project_id : data.google_project.data_processing.project_id
+ project_roles = [
+ "roles/iam.serviceAccountUser", // TODO: is it really needed?
+ "roles/dataform.admin",
+ "roles/dataform.editor"
+ ]
+ prefix = "user"
+}
+#resource "google_project_iam_member" "email-role" {
+# for_each = toset([
+# "roles/iam.serviceAccountUser", // TODO: is it really needed?
+# "roles/dataform.admin",
+# "roles/dataform.editor"
+# ])
+# role = each.key
+# member = "user:${var.project_owner_email}"
+# project = null_resource.check_dataform_api.id != "" ? module.data_processing_project_services.project_id : data.google_project.data_processing.project_id
+#}
+
+# Propagation time for change of access policy typically takes 2 minutes
+# according to https://cloud.google.com/iam/docs/access-change-propagation
+# this wait make sure the policy changes are propagated before proceeding
+# with the build
+resource "time_sleep" "wait_for_email_role_propagation" {
+ create_duration = "120s"
+ depends_on = [
+ module.email-role
+ ]
+}
+
# This resource sets the Dataform service account IAM member roles
-resource "google_project_iam_member" "dataform-serviceaccount" {
+module "dataform-serviceaccount" {
+ source = "terraform-google-modules/iam/google//modules/member_iam"
+ version = "~> 8.0"
depends_on = [
google_dataform_repository.marketing-analytics,
null_resource.check_dataform_api,
- null_resource.wait_for_dataform_sa_creation
- ]
- for_each = toset([
+ null_resource.wait_for_dataform_sa_creation,
+ time_sleep.wait_for_email_role_propagation
+ ]
+ service_account_address = local.dataform_sa
+ project_id = null_resource.check_dataform_api.id != "" ? module.data_processing_project_services.project_id : data.google_project.data_processing.project_id
+ project_roles = [
"roles/secretmanager.secretAccessor",
- "roles/bigquery.jobUser"
- ])
- role = each.key
- member = "serviceAccount:${local.dataform_sa}"
- project = null_resource.check_dataform_api.id != "" ? module.data_processing_project_services.project_id : data.google_project.data_processing.project_id
+ "roles/bigquery.jobUser",
+ "roles/bigquery.dataOwner",
+ ]
+ prefix = "serviceAccount"
}
+# This resource sets the Dataform service account IAM member roles
+#resource "google_project_iam_member" "dataform-serviceaccount" {
+# depends_on = [
+# google_dataform_repository.marketing-analytics,
+# null_resource.check_dataform_api,
+# null_resource.wait_for_dataform_sa_creation,
+# time_sleep.wait_for_email_role_propagation
+# ]
+# for_each = toset([
+# "roles/secretmanager.secretAccessor",
+# "roles/bigquery.jobUser",
+# "roles/bigquery.dataOwner",
+# ])
+# role = each.key
+# member = "serviceAccount:${local.dataform_sa}"
+# project = null_resource.check_dataform_api.id != "" ? module.data_processing_project_services.project_id : data.google_project.data_processing.project_id
+#}
-// Owner role to BigQuery in the destination data project the Dataform SA.
-// Multiple datasets will be created; it requires project-level permissions
-resource "google_project_iam_member" "dataform-bigquery-data-owner" {
+# Propagation time for change of access policy typically takes 2 minutes
+# according to https://cloud.google.com/iam/docs/access-change-propagation
+# this wait make sure the policy changes are propagated before proceeding
+# with the build
+resource "time_sleep" "wait_for_dataform-serviceaccount_role_propagation" {
+ create_duration = "120s"
depends_on = [
- google_dataform_repository.marketing-analytics,
- null_resource.check_dataform_api,
- null_resource.wait_for_dataform_sa_creation
- ]
- for_each = toset([
- "roles/bigquery.dataOwner",
- ])
- role = each.key
- member = "serviceAccount:${local.dataform_sa}"
- project = null_resource.check_dataform_api.id != "" ? module.data_processing_project_services.project_id : data.google_project.data_processing.project_id
+ module.dataform-serviceaccount
+ ]
}
// Read access to the GA4 exports
-resource "google_bigquery_dataset_iam_member" "dataform-ga4-export-reader" {
+module "dataform-ga4-export-reader" {
+ source = "terraform-google-modules/iam/google//modules/bigquery_datasets_iam"
+ version = "~> 8.0"
depends_on = [
google_dataform_repository.marketing-analytics,
null_resource.check_dataform_api,
- null_resource.wait_for_dataform_sa_creation
+ null_resource.wait_for_dataform_sa_creation,
+ time_sleep.wait_for_dataform-serviceaccount_role_propagation
+ ]
+ project = var.source_ga4_export_project_id
+ bigquery_datasets = [
+ var.source_ga4_export_dataset,
+ ]
+ mode = "authoritative"
+
+ bindings = {
+ "roles/bigquery.dataViewer" = [
+ "serviceAccount:${local.dataform_sa}",
+ ]
+ "roles/bigquery.dataEditor" = [
+ "serviceAccount:${local.dataform_sa}",
]
- role = "roles/bigquery.dataViewer"
- member = "serviceAccount:${local.dataform_sa}"
- project = var.source_ga4_export_project_id
- dataset_id = var.source_ga4_export_dataset
+ }
+}
+#resource "google_bigquery_dataset_iam_member" "dataform-ga4-export-reader" {
+# depends_on = [
+# google_dataform_repository.marketing-analytics,
+# null_resource.check_dataform_api,
+# null_resource.wait_for_dataform_sa_creation,
+# time_sleep.wait_for_dataform-serviceaccount_role_propagation
+# ]
+# role = "roles/bigquery.dataViewer"
+# member = "serviceAccount:${local.dataform_sa}"
+# project = var.source_ga4_export_project_id
+# dataset_id = var.source_ga4_export_dataset
+#}
+
+# Propagation time for change of access policy typically takes 2 minutes
+# according to https://cloud.google.com/iam/docs/access-change-propagation
+# this wait make sure the policy changes are propagated before proceeding
+# with the build
+resource "time_sleep" "wait_for_dataform-ga4-export-reader_role_propagation" {
+ create_duration = "120s"
+ depends_on = [
+ module.dataform-ga4-export-reader
+ ]
}
// Read access to the Ads datasets
-resource "google_bigquery_dataset_iam_member" "dataform-ads-export-reader" {
+module "dataform-ads-export-reader" {
+ source = "terraform-google-modules/iam/google//modules/bigquery_datasets_iam"
+ version = "~> 8.0"
depends_on = [
google_dataform_repository.marketing-analytics,
null_resource.check_dataform_api,
- null_resource.wait_for_dataform_sa_creation
+ null_resource.wait_for_dataform_sa_creation,
+ time_sleep.wait_for_dataform-ga4-export-reader_role_propagation
+ ]
+ count = length(var.source_ads_export_data)
+ project = var.source_ads_export_data[count.index].project
+ bigquery_datasets = [
+ var.source_ads_export_data[count.index].dataset,
+ ]
+ mode = "authoritative"
+
+ bindings = {
+ "roles/bigquery.dataViewer" = [
+ "serviceAccount:${local.dataform_sa}",
+ ]
+ "roles/bigquery.dataEditor" = [
+ "serviceAccount:${local.dataform_sa}",
]
- count = length(var.source_ads_export_data)
- role = "roles/bigquery.dataViewer"
- member = "serviceAccount:${local.dataform_sa}"
- project = var.source_ads_export_data[count.index].project
- dataset_id = var.source_ads_export_data[count.index].dataset
+ }
+}
+#resource "google_bigquery_dataset_iam_member" "dataform-ads-export-reader" {
+# depends_on = [
+# google_dataform_repository.marketing-analytics,
+# null_resource.check_dataform_api,
+# null_resource.wait_for_dataform_sa_creation,
+# time_sleep.wait_for_dataform-ga4-export-reader_role_propagation
+# ]
+# count = length(var.source_ads_export_data)
+# role = "roles/bigquery.dataViewer"
+# member = "serviceAccount:${local.dataform_sa}"
+# project = var.source_ads_export_data[count.index].project
+# dataset_id = var.source_ads_export_data[count.index].dataset
+#}
+
+# Propagation time for change of access policy typically takes 2 minutes
+# according to https://cloud.google.com/iam/docs/access-change-propagation
+# this wait make sure the policy changes are propagated before proceeding
+# with the build
+resource "time_sleep" "wait_for_dataform-ads-export-reader_role_propagation" {
+ create_duration = "120s"
+ depends_on = [
+ module.dataform-ads-export-reader
+ ]
}
diff --git a/infrastructure/terraform/modules/data-store/main.tf b/infrastructure/terraform/modules/data-store/main.tf
index 214f170c..9704661b 100644
--- a/infrastructure/terraform/modules/data-store/main.tf
+++ b/infrastructure/terraform/modules/data-store/main.tf
@@ -21,7 +21,7 @@ data "google_project" "data_processing" {
}
data "google_secret_manager_secret" "github_secret_name" {
- secret_id = google_secret_manager_secret.github-secret.name
+ secret_id = google_secret_manager_secret.github-secret.secret_id
project = var.data_processing_project_id
}
@@ -29,88 +29,19 @@ provider "google" {
region = var.google_default_region
}
-# This module sets up a Dataform workflow environment for the "dev" environment.
-module "dataform-workflow-dev" {
- # The count argument specifies how many instances of the module should be created.
- # In this case, it's set to var.create_dev_environment ? 1 : 0, which means that
- # the module will be created only if the var.create_dev_environment variable is set to `true`.
- # Check the terraform.tfvars file for more information.
- count = var.create_dev_environment ? 1 : 0
- # the path to the Terraform module that will be used to create the Dataform workflow environment.
- source = "../dataform-workflow"
-
- project_id = null_resource.check_dataform_api.id != "" ? module.data_processing_project_services.project_id : data.google_project.data_processing.project_id
- # The name of the Dataform workflow environment.
- environment = "dev"
- region = var.google_default_region
- # The ID of the Dataform repository that will be used by the Dataform workflow environment.
- dataform_repository_id = google_dataform_repository.marketing-analytics.id
- # A list of tags that will be used to filter the Dataform files that are included in the Dataform workflow environment.
- includedTags = ["ga4"]
-
- source_ga4_export_project_id = var.source_ga4_export_project_id
- source_ga4_export_dataset = var.source_ga4_export_dataset
- ga4_incremental_processing_days_back = var.ga4_incremental_processing_days_back
- source_ads_export_data = var.source_ads_export_data
- destination_bigquery_project_id = length(var.dev_data_project_id) > 0 ? var.staging_data_project_id : var.data_project_id
- destination_bigquery_dataset_location = length(var.dev_destination_data_location) > 0 ? var.dev_destination_data_location : var.destination_data_location
-
- # The daily schedule for running the Dataform workflow.
- # Depending on the hour that your Google Analytics 4 BigQuery Export is set,
- # you may have to change this to execute at a later time of the day.
- # Observe that the GA4 BigQuery Export Schedule documentation
- # https://support.google.com/analytics/answer/9358801?hl=en#:~:text=A%20full%20export%20of%20data,(see%20Streaming%20export%20below).
- # Check https://crontab.guru/#0_5-23/4_*_*_* to see next execution times.
- daily_schedule = "0 5-23/4 * * *"
-}
-
-# This module sets up a Dataform workflow environment for the "staging" environment.
-module "dataform-workflow-staging" {
- # The count argument specifies how many instances of the module should be created.
- # In this case, it's set to var.create_staging_environment ? 1 : 0, which means that
- # the module will be created only if the var.create_staging_environment variable is set to `true`.
- # Check the terraform.tfvars file for more information.
- count = var.create_staging_environment ? 1 : 0
- # the path to the Terraform module that will be used to create the Dataform workflow environment.
- source = "../dataform-workflow"
-
- project_id = null_resource.check_dataform_api.id != "" ? module.data_processing_project_services.project_id : data.google_project.data_processing.project_id
- # The name of the Dataform workflow environment.
- environment = "staging"
- region = var.google_default_region
- # The ID of the Dataform repository that will be used by the Dataform workflow environment.
- dataform_repository_id = google_dataform_repository.marketing-analytics.id
- # A list of tags that will be used to filter the Dataform files that are included in the Dataform workflow environment.
- includedTags = ["ga4"]
-
- source_ga4_export_project_id = var.source_ga4_export_project_id
- source_ga4_export_dataset = var.source_ga4_export_dataset
- source_ads_export_data = var.source_ads_export_data
- destination_bigquery_project_id = length(var.staging_data_project_id) > 0 ? var.staging_data_project_id : var.data_project_id
- destination_bigquery_dataset_location = length(var.staging_destination_data_location) > 0 ? var.staging_destination_data_location : var.destination_data_location
-
- # The daily schedule for running the Dataform workflow.
- # Depending on the hour that your Google Analytics 4 BigQuery Export is set,
- # you may have to change this to execute at a later time of the day.
- # Observe that the GA4 BigQuery Export Schedule documentation
- # https://support.google.com/analytics/answer/9358801?hl=en#:~:text=A%20full%20export%20of%20data,(see%20Streaming%20export%20below).
- # Check https://crontab.guru/#0_5-23/4_*_*_* to see next execution times.
- daily_schedule = "0 5-23/4 * * *"
-}
-
# This module sets up a Dataform workflow environment for the "prod" environment.
module "dataform-workflow-prod" {
# The count argument specifies how many instances of the module should be created.
- # In this case, it's set to var.create_prod_environment ? 1 : 0, which means that
- # the module will be created only if the var.create_prod_environment variable is set to `true`.
+ # In this case, it's set to var.deploy_dataform ? 1 : 0, which means that
+ # the module will be created only if the var.deploy_dataform variable is set to `true`.
# Check the terraform.tfvars file for more information.
- count = var.create_prod_environment ? 1 : 0
+ count = var.deploy_dataform ? 1 : 0
# the path to the Terraform module that will be used to create the Dataform workflow environment.
source = "../dataform-workflow"
- project_id = null_resource.check_dataform_api.id != "" ? module.data_processing_project_services.project_id : data.google_project.data_processing.project_id
+ project_id = null_resource.check_dataform_api.id != "" ? module.data_processing_project_services.project_id : data.google_project.data_processing.project_id
# The name of the Dataform workflow environment.
- environment = "prod"
+ property_id = var.property_id
region = var.google_default_region
dataform_repository_id = google_dataform_repository.marketing-analytics.id
@@ -127,4 +58,5 @@ module "dataform-workflow-prod" {
# https://support.google.com/analytics/answer/9358801?hl=en#:~:text=A%20full%20export%20of%20data,(see%20Streaming%20export%20below).
# Check https://crontab.guru/#0_5-23/2_*_*_* to see next execution times.
daily_schedule = "0 5-23/2 * * *"
+ time_zone = var.time_zone
}
diff --git a/infrastructure/terraform/modules/data-store/secretmanager.tf b/infrastructure/terraform/modules/data-store/secretmanager.tf
index d89b86e5..2d4d6889 100644
--- a/infrastructure/terraform/modules/data-store/secretmanager.tf
+++ b/infrastructure/terraform/modules/data-store/secretmanager.tf
@@ -14,11 +14,26 @@
resource "google_secret_manager_secret" "github-secret" {
secret_id = "Github_token"
- project = null_resource.check_secretmanager_api.id != "" ? module.data_processing_project_services.project_id : data.google_project.data_processing.project_id
+ project = null_resource.check_secretmanager_api.id != "" ? module.data_processing_project_services.project_id : data.google_project.data_processing.project_id
+ # This replication strategy will deploy replicas that may store the secret in different locations in the globe.
+ # This is not a desired behaviour, make sure you're aware of it before enabling it.
+ #replication {
+ # auto {}
+ #}
+
+ # By default, to respect resources location, we prevent resources from being deployed globally by deploying secrets in the same region of the compute resources.
+ # If the replication strategy is seto to `auto {}` above, comment the following lines or else there will be an error being issued by terraform.
replication {
- #automatic = true
- auto {}
+ user_managed {
+ replicas {
+ location = var.google_default_region
+ }
+ # If you want your replicas in other locations, uncomment the following lines and add them here.
+ #replicas {
+ # location = "us-east1"
+ #}
+ }
}
depends_on = [
@@ -28,7 +43,7 @@ resource "google_secret_manager_secret" "github-secret" {
}
resource "google_secret_manager_secret_version" "secret-version-github" {
- secret = google_secret_manager_secret.github-secret.id
+ secret = google_secret_manager_secret.github-secret.id
secret_data = var.dataform_github_token
#deletion_policy = "DISABLE"
@@ -38,4 +53,4 @@ resource "google_secret_manager_secret_version" "secret-version-github" {
null_resource.check_dataform_api,
null_resource.check_secretmanager_api
]
-}
\ No newline at end of file
+}
diff --git a/infrastructure/terraform/modules/data-store/variables.tf b/infrastructure/terraform/modules/data-store/variables.tf
index 62bc24f1..bd11aab7 100644
--- a/infrastructure/terraform/modules/data-store/variables.tf
+++ b/infrastructure/terraform/modules/data-store/variables.tf
@@ -38,21 +38,15 @@ variable "project_owner_email" {
}
variable "dataform_github_repo" {
- description = "Private Github repo for Dataform."
+ description = "Private GitHub repo for Dataform."
type = string
}
variable "dataform_github_token" {
- description = "Github token for Dataform repo."
+ description = "GitHub token for Dataform repo."
type = string
}
-variable "create_dev_environment" {
- description = "Indicates that a development environment needs to be created"
- type = bool
- default = true
-}
-
variable "dev_data_project_id" {
description = "Project ID of where the dev datasets will created. If not provided, data_project_id will be used."
type = string
@@ -65,12 +59,6 @@ variable "dev_destination_data_location" {
default = ""
}
-variable "create_staging_environment" {
- description = "Indicates that a staging environment needs to be created"
- type = bool
- default = true
-}
-
variable "staging_data_project_id" {
description = "Project ID of where the staging datasets will created. If not provided, data_project_id will be used."
type = string
@@ -83,12 +71,18 @@ variable "staging_destination_data_location" {
default = ""
}
-variable "create_prod_environment" {
- description = "Indicates that a production environment needs to be created"
+variable "deploy_dataform" {
+ description = "Indicates that a dataform workspace needs to be created"
type = bool
default = true
}
+variable "property_id" {
+ description = "Google Analytics 4 Property id to create an MDS for it"
+ type = string
+ default = ""
+}
+
variable "prod_data_project_id" {
description = "Project ID of where the prod datasets will created. If not provided, data_project_id will be used."
type = string
@@ -112,7 +106,7 @@ variable "source_ga4_export_dataset" {
}
variable "ga4_incremental_processing_days_back" {
- type = string
+ type = string
default = "3"
}
@@ -128,4 +122,8 @@ variable "source_ads_export_data" {
variable "dataform_region" {
description = "Specify dataform region when dataform is not available in the default cloud region of choice"
type = string
-}
\ No newline at end of file
+}
+
+variable "time_zone" {
+ type = string
+}
diff --git a/infrastructure/terraform/modules/data-store/versions.tf b/infrastructure/terraform/modules/data-store/versions.tf
index 8821ac39..54e0ceda 100644
--- a/infrastructure/terraform/modules/data-store/versions.tf
+++ b/infrastructure/terraform/modules/data-store/versions.tf
@@ -20,7 +20,12 @@ terraform {
required_providers {
google = {
source = "hashicorp/google"
- version = ">= 3.43.0, >= 3.53.0, >= 3.63.0, >= 4.83.0, < 5.0.0, < 6.0.0"
+ version = "5.44.1"
+ }
+
+ google-beta = {
+ source = "hashicorp/google-beta"
+ version = "5.44.1"
}
}
diff --git a/infrastructure/terraform/modules/dataform-workflow/README.md b/infrastructure/terraform/modules/dataform-workflow/README.md
index 8b2bdff5..9ff7546b 100644
--- a/infrastructure/terraform/modules/dataform-workflow/README.md
+++ b/infrastructure/terraform/modules/dataform-workflow/README.md
@@ -1 +1 @@
-# Dataform workflow module
\ No newline at end of file
+# Dataform workflow module
diff --git a/infrastructure/terraform/modules/dataform-workflow/dataform-workflow.tf b/infrastructure/terraform/modules/dataform-workflow/dataform-workflow.tf
index a0d7d153..99e3921c 100644
--- a/infrastructure/terraform/modules/dataform-workflow/dataform-workflow.tf
+++ b/infrastructure/terraform/modules/dataform-workflow/dataform-workflow.tf
@@ -22,10 +22,10 @@ locals {
# This resources creates a workflow that runs the Dataform incremental pipeline.
resource "google_workflows_workflow" "dataform-incremental-workflow" {
project = null_resource.check_workflows_api.id != "" ? module.data_processing_project_services.project_id : var.project_id
- name = "dataform-${var.environment}-incremental"
+ name = "dataform-${var.property_id}-incremental"
region = var.region
- description = "Dataform incremental workflow for ${var.environment} environment"
- service_account = google_service_account.workflow-dataform.email
+ description = "Dataform incremental workflow for ${var.property_id} ga4 property"
+ service_account = module.workflow-dataform.email
# The source code includes the following steps:
# Init: This step initializes the workflow by assigning the value of the dataform_repository_id variable to the repository variable.
# Create Compilation Result: This step creates a compilation result for the Dataform repository. The compilation result includes the git commit hash and the code compilation configuration.
@@ -49,7 +49,7 @@ main:
defaultDatabase: ${var.destination_bigquery_project_id}
defaultLocation: ${var.destination_bigquery_dataset_location}
vars:
- env: ${var.environment}
+ ga4_property_id: '${var.property_id}'
ga4_export_project: ${var.source_ga4_export_project_id}
ga4_export_dataset: ${var.source_ga4_export_dataset}
ga4_incremental_processing_days_back: '${var.ga4_incremental_processing_days_back}'
diff --git a/infrastructure/terraform/modules/dataform-workflow/scheduler.tf b/infrastructure/terraform/modules/dataform-workflow/scheduler.tf
index 3947b3b8..fed10fc8 100644
--- a/infrastructure/terraform/modules/dataform-workflow/scheduler.tf
+++ b/infrastructure/terraform/modules/dataform-workflow/scheduler.tf
@@ -14,12 +14,12 @@
# This creates a Cloud Scheduler job that triggers the Dataform incremental workflow on a daily schedule.
resource "google_cloud_scheduler_job" "daily-dataform-increments" {
- project = module.data_processing_project_services.project_id
- name = "daily-dataform-${var.environment}"
- description = "Daily Dataform ${var.environment} environment incremental update"
+ project = module.data_processing_project_services.project_id
+ name = "daily-dataform-${var.property_id}"
+ description = "Daily Dataform ${var.property_id} property export incremental update"
# The schedule attribute specifies the schedule for the job. In this case, the job is scheduled to run daily at the specified times.
- schedule = var.daily_schedule
- time_zone = "America/New_York"
+ schedule = var.daily_schedule
+ time_zone = var.time_zone
# The attempt_deadline attribute specifies the maximum amount of time that the job will attempt to run before failing.
# In this case, the job will attempt to run for a maximum of 5 minutes before failing.
attempt_deadline = "320s"
@@ -35,7 +35,7 @@ resource "google_cloud_scheduler_job" "daily-dataform-increments" {
uri = "https://workflowexecutions.googleapis.com/v1/projects/${module.data_processing_project_services.project_id}/locations/${var.region}/workflows/${google_workflows_workflow.dataform-incremental-workflow.name}/executions"
oauth_token {
- service_account_email = google_service_account.scheduler.email
+ service_account_email = module.scheduler.email
}
}
}
diff --git a/infrastructure/terraform/modules/dataform-workflow/service-account.tf b/infrastructure/terraform/modules/dataform-workflow/service-account.tf
index 39d31811..95e518e1 100644
--- a/infrastructure/terraform/modules/dataform-workflow/service-account.tf
+++ b/infrastructure/terraform/modules/dataform-workflow/service-account.tf
@@ -12,20 +12,36 @@
# See the License for the specific language governing permissions and
# limitations under the License.
-resource "google_service_account" "scheduler" {
+locals {
+ scheduler_sa = "workflow-scheduler-${var.property_id}@${module.data_processing_project_services.project_id}.iam.gserviceaccount.com"
+ workflows_sa = "workflow-dataform-${var.property_id}@${module.data_processing_project_services.project_id}.iam.gserviceaccount.com"
+}
+
+module "scheduler" {
+ source = "terraform-google-modules/service-accounts/google//modules/simple-sa"
+ version = "~> 4.0"
+
+ project_id = null_resource.check_cloudscheduler_api.id != "" ? module.data_processing_project_services.project_id : var.project_id
+ name = "workflow-scheduler-${var.property_id}"
+ project_roles = [
+ "roles/workflows.invoker"
+ ]
+
depends_on = [
module.data_processing_project_services,
null_resource.check_cloudscheduler_api,
- ]
-
- project = null_resource.check_cloudscheduler_api.id != "" ? module.data_processing_project_services.project_id : var.project_id
- account_id = "workflow-scheduler-${var.environment}"
- display_name = "Service Account to schedule Dataform workflows in ${var.environment}"
+ ]
}
-locals {
- scheduler_sa = "workflow-scheduler-${var.environment}@${module.data_processing_project_services.project_id}.iam.gserviceaccount.com"
- workflows_sa = "workflow-dataform-${var.environment}@${module.data_processing_project_services.project_id}.iam.gserviceaccount.com"
+# Propagation time for change of access policy typically takes 2 minutes
+# according to https://cloud.google.com/iam/docs/access-change-propagation
+# this wait make sure the policy changes are propagated before proceeding
+# with the build
+resource "time_sleep" "wait_for_scheduler_service_account_role_propagation" {
+ create_duration = "120s"
+ depends_on = [
+ module.scheduler
+ ]
}
# Wait for the scheduler service account to be created
@@ -37,7 +53,7 @@ resource "null_resource" "wait_for_scheduler_sa_creation" {
MAX_TRIES=100
while ! gcloud iam service-accounts list --project=${module.data_processing_project_services.project_id} --filter="EMAIL:${local.scheduler_sa} AND DISABLED:False" --format="table(EMAIL, DISABLED)" && [ $COUNTER -lt $MAX_TRIES ]
do
- sleep 3
+ sleep 10
printf "."
COUNTER=$((COUNTER + 1))
done
@@ -45,37 +61,44 @@ resource "null_resource" "wait_for_scheduler_sa_creation" {
echo "scheduler service account was not created, terraform can not continue!"
exit 1
fi
- sleep 20
+ sleep 120
EOT
}
depends_on = [
module.data_processing_project_services,
- null_resource.check_dataform_api
+ time_sleep.wait_for_scheduler_service_account_role_propagation,
+ null_resource.check_dataform_api,
+ module.scheduler,
]
}
-resource "google_project_iam_member" "scheduler-workflow-invoker" {
- depends_on = [
- module.data_processing_project_services,
- null_resource.check_cloudscheduler_api,
- null_resource.wait_for_scheduler_sa_creation
- ]
+module "workflow-dataform" {
+ source = "terraform-google-modules/service-accounts/google//modules/simple-sa"
+ version = "~> 4.0"
- project = null_resource.check_cloudscheduler_api.id != "" ? module.data_processing_project_services.project_id : var.project_id
- member = "serviceAccount:${google_service_account.scheduler.email}"
- role = "roles/workflows.invoker"
-}
+ project_id = null_resource.check_workflows_api.id != "" ? module.data_processing_project_services.project_id : var.project_id
+ name = "workflow-dataform-${var.property_id}"
+ project_roles = [
+ "roles/dataform.editor"
+ ]
-resource "google_service_account" "workflow-dataform" {
depends_on = [
module.data_processing_project_services,
null_resource.check_workflows_api,
- ]
-
- project = null_resource.check_workflows_api.id != "" ? module.data_processing_project_services.project_id : var.project_id
- account_id = "workflow-dataform-${var.environment}"
- display_name = "Service Account to run Dataform workflows in ${var.environment}"
+ null_resource.check_dataform_api,
+ ]
+}
+
+# Propagation time for change of access policy typically takes 2 minutes
+# according to https://cloud.google.com/iam/docs/access-change-propagation
+# this wait make sure the policy changes are propagated before proceeding
+# with the build
+resource "time_sleep" "wait_for_workflow_dataform_service_account_role_propagation" {
+ create_duration = "120s"
+ depends_on = [
+ module.workflow-dataform
+ ]
}
# Wait for the workflows service account to be created
@@ -86,7 +109,7 @@ resource "null_resource" "wait_for_workflows_sa_creation" {
MAX_TRIES=100
while ! gcloud iam service-accounts list --project=${module.data_processing_project_services.project_id} --filter="EMAIL:${local.workflows_sa} AND DISABLED:False" --format="table(EMAIL, DISABLED)" && [ $COUNTER -lt $MAX_TRIES ]
do
- sleep 3
+ sleep 10
printf "."
COUNTER=$((COUNTER + 1))
done
@@ -94,25 +117,14 @@ resource "null_resource" "wait_for_workflows_sa_creation" {
echo "workflows service account was not created, terraform can not continue!"
exit 1
fi
- sleep 20
+ sleep 120
EOT
}
depends_on = [
module.data_processing_project_services,
- null_resource.check_dataform_api
+ null_resource.check_dataform_api,
+ module.workflow-dataform,
+ time_sleep.wait_for_workflow_dataform_service_account_role_propagation,
]
}
-
-
-resource "google_project_iam_member" "worflow-dataform-dataform-editor" {
- depends_on = [
- module.data_processing_project_services,
- null_resource.check_dataform_api,
- null_resource.wait_for_workflows_sa_creation
- ]
-
- project = null_resource.check_workflows_api.id != "" ? module.data_processing_project_services.project_id : var.project_id
- member = "serviceAccount:${google_service_account.workflow-dataform.email}"
- role = "roles/dataform.editor"
-}
\ No newline at end of file
diff --git a/infrastructure/terraform/modules/dataform-workflow/services.tf b/infrastructure/terraform/modules/dataform-workflow/services.tf
index 0271f228..c78dc529 100644
--- a/infrastructure/terraform/modules/dataform-workflow/services.tf
+++ b/infrastructure/terraform/modules/dataform-workflow/services.tf
@@ -15,7 +15,7 @@
# https://registry.terraform.io/modules/terraform-google-modules/project-factory/google/latest/submodules/project_services
module "data_processing_project_services" {
source = "terraform-google-modules/project-factory/google//modules/project_services"
- version = "14.1.0"
+ version = "17.0.0"
disable_dependent_services = false
disable_services_on_destroy = false
@@ -142,4 +142,4 @@ resource "null_resource" "check_cloudscheduler_api" {
depends_on = [
module.data_processing_project_services
]
-}
\ No newline at end of file
+}
diff --git a/infrastructure/terraform/modules/dataform-workflow/variables.tf b/infrastructure/terraform/modules/dataform-workflow/variables.tf
index 60b014d8..97d5dc73 100644
--- a/infrastructure/terraform/modules/dataform-workflow/variables.tf
+++ b/infrastructure/terraform/modules/dataform-workflow/variables.tf
@@ -22,12 +22,12 @@ variable "region" {
type = string
}
-variable "environment" {
+variable "property_id" {
type = string
}
variable "daily_schedule" {
- type = string
+ type = string
# This schedule executes every days, each 2 hours between 5AM and 11PM.
default = "0 5-23/2 * * *" #"2 5 * * *"
}
@@ -45,7 +45,7 @@ variable "source_ga4_export_dataset" {
}
variable "ga4_incremental_processing_days_back" {
- type = string
+ type = string
default = "3"
}
@@ -74,4 +74,8 @@ variable "gitCommitish" {
variable "includedTags" {
type = list(string)
default = []
-}
\ No newline at end of file
+}
+
+variable "time_zone" {
+ type = string
+}
diff --git a/infrastructure/terraform/modules/dataform-workflow/versions.tf b/infrastructure/terraform/modules/dataform-workflow/versions.tf
index 8821ac39..54e0ceda 100644
--- a/infrastructure/terraform/modules/dataform-workflow/versions.tf
+++ b/infrastructure/terraform/modules/dataform-workflow/versions.tf
@@ -20,7 +20,12 @@ terraform {
required_providers {
google = {
source = "hashicorp/google"
- version = ">= 3.43.0, >= 3.53.0, >= 3.63.0, >= 4.83.0, < 5.0.0, < 6.0.0"
+ version = "5.44.1"
+ }
+
+ google-beta = {
+ source = "hashicorp/google-beta"
+ version = "5.44.1"
}
}
diff --git a/infrastructure/terraform/modules/feature-store/bigquery-datasets.tf b/infrastructure/terraform/modules/feature-store/bigquery-datasets.tf
index c51927c3..52816e44 100644
--- a/infrastructure/terraform/modules/feature-store/bigquery-datasets.tf
+++ b/infrastructure/terraform/modules/feature-store/bigquery-datasets.tf
@@ -14,14 +14,14 @@
# This resource creates a BigQuery dataset called `feature_store`.
resource "google_bigquery_dataset" "feature_store" {
- dataset_id = local.config_bigquery.dataset.feature_store.name
- friendly_name = local.config_bigquery.dataset.feature_store.friendly_name
- project = null_resource.check_bigquery_api.id != "" ? local.feature_store_project_id : var.project_id
- description = local.config_bigquery.dataset.feature_store.description
- location = local.config_bigquery.dataset.feature_store.location
+ dataset_id = local.config_bigquery.dataset.feature_store.name
+ friendly_name = local.config_bigquery.dataset.feature_store.friendly_name
+ project = null_resource.check_bigquery_api.id != "" ? local.feature_store_project_id : var.project_id
+ description = local.config_bigquery.dataset.feature_store.description
+ location = local.config_bigquery.dataset.feature_store.location
# The max_time_travel_hours attribute specifies the maximum number of hours that data in the dataset can be accessed using time travel queries.
# In this case, the maximum time travel hours is set to the value of the local file config.yaml section bigquery.dataset.feature_store.max_time_travel_hours configuration.
- max_time_travel_hours = local.config_bigquery.dataset.feature_store.max_time_travel_hours
+ max_time_travel_hours = local.config_bigquery.dataset.feature_store.max_time_travel_hours
# The delete_contents_on_destroy attribute specifies whether the contents of the dataset should be deleted when the dataset is destroyed.
# In this case, the delete_contents_on_destroy attribute is set to false, which means that the contents of the dataset will not be deleted when the dataset is destroyed.
delete_contents_on_destroy = false
@@ -40,14 +40,14 @@ resource "google_bigquery_dataset" "feature_store" {
# This resource creates a BigQuery dataset called `purchase_propensity`.
resource "google_bigquery_dataset" "purchase_propensity" {
- dataset_id = local.config_bigquery.dataset.purchase_propensity.name
- friendly_name = local.config_bigquery.dataset.purchase_propensity.friendly_name
- project = null_resource.check_bigquery_api.id != "" ? local.purchase_propensity_project_id : local.feature_store_project_id
- description = local.config_bigquery.dataset.purchase_propensity.description
- location = local.config_bigquery.dataset.purchase_propensity.location
+ dataset_id = local.config_bigquery.dataset.purchase_propensity.name
+ friendly_name = local.config_bigquery.dataset.purchase_propensity.friendly_name
+ project = null_resource.check_bigquery_api.id != "" ? local.purchase_propensity_project_id : local.feature_store_project_id
+ description = local.config_bigquery.dataset.purchase_propensity.description
+ location = local.config_bigquery.dataset.purchase_propensity.location
# The max_time_travel_hours attribute specifies the maximum number of hours that data in the dataset can be accessed using time travel queries.
# In this case, the maximum time travel hours is set to the value of the local file config.yaml section bigquery.dataset.feature_store.max_time_travel_hours configuration.
- max_time_travel_hours = local.config_bigquery.dataset.purchase_propensity.max_time_travel_hours
+ max_time_travel_hours = local.config_bigquery.dataset.purchase_propensity.max_time_travel_hours
# The delete_contents_on_destroy attribute specifies whether the contents of the dataset should be deleted when the dataset is destroyed.
# In this case, the delete_contents_on_destroy attribute is set to false, which means that the contents of the dataset will not be deleted when the dataset is destroyed.
delete_contents_on_destroy = false
@@ -66,14 +66,40 @@ resource "google_bigquery_dataset" "purchase_propensity" {
# This resource creates a BigQuery dataset called `churn_propensity`.
resource "google_bigquery_dataset" "churn_propensity" {
- dataset_id = local.config_bigquery.dataset.churn_propensity.name
- friendly_name = local.config_bigquery.dataset.churn_propensity.friendly_name
- project = null_resource.check_bigquery_api.id != "" ? local.churn_propensity_project_id : local.feature_store_project_id
- description = local.config_bigquery.dataset.churn_propensity.description
- location = local.config_bigquery.dataset.churn_propensity.location
+ dataset_id = local.config_bigquery.dataset.churn_propensity.name
+ friendly_name = local.config_bigquery.dataset.churn_propensity.friendly_name
+ project = null_resource.check_bigquery_api.id != "" ? local.churn_propensity_project_id : local.feature_store_project_id
+ description = local.config_bigquery.dataset.churn_propensity.description
+ location = local.config_bigquery.dataset.churn_propensity.location
# The max_time_travel_hours attribute specifies the maximum number of hours that data in the dataset can be accessed using time travel queries.
# In this case, the maximum time travel hours is set to the value of the local file config.yaml section bigquery.dataset.feature_store.max_time_travel_hours configuration.
- max_time_travel_hours = local.config_bigquery.dataset.churn_propensity.max_time_travel_hours
+ max_time_travel_hours = local.config_bigquery.dataset.churn_propensity.max_time_travel_hours
+ # The delete_contents_on_destroy attribute specifies whether the contents of the dataset should be deleted when the dataset is destroyed.
+ # In this case, the delete_contents_on_destroy attribute is set to false, which means that the contents of the dataset will not be deleted when the dataset is destroyed.
+ delete_contents_on_destroy = false
+
+ labels = {
+ version = "prod"
+ }
+
+ # The lifecycle block allows you to configure the lifecycle of the dataset.
+ # In this case, the ignore_changes attribute is set to all, which means that
+ # Terraform will ignore any changes to the dataset and will not attempt to update the dataset.
+ lifecycle {
+ ignore_changes = all
+ }
+}
+
+# This resource creates a BigQuery dataset called `lead_score_propensity`.
+resource "google_bigquery_dataset" "lead_score_propensity" {
+ dataset_id = local.config_bigquery.dataset.lead_score_propensity.name
+ friendly_name = local.config_bigquery.dataset.lead_score_propensity.friendly_name
+ project = null_resource.check_bigquery_api.id != "" ? local.lead_score_propensity_project_id : local.feature_store_project_id
+ description = local.config_bigquery.dataset.lead_score_propensity.description
+ location = local.config_bigquery.dataset.lead_score_propensity.location
+ # The max_time_travel_hours attribute specifies the maximum number of hours that data in the dataset can be accessed using time travel queries.
+ # In this case, the maximum time travel hours is set to the value of the local file config.yaml section bigquery.dataset.feature_store.max_time_travel_hours configuration.
+ max_time_travel_hours = local.config_bigquery.dataset.lead_score_propensity.max_time_travel_hours
# The delete_contents_on_destroy attribute specifies whether the contents of the dataset should be deleted when the dataset is destroyed.
# In this case, the delete_contents_on_destroy attribute is set to false, which means that the contents of the dataset will not be deleted when the dataset is destroyed.
delete_contents_on_destroy = false
@@ -92,14 +118,14 @@ resource "google_bigquery_dataset" "churn_propensity" {
# This resource creates a BigQuery dataset called `customer_lifetime_value`.
resource "google_bigquery_dataset" "customer_lifetime_value" {
- dataset_id = local.config_bigquery.dataset.customer_lifetime_value.name
- friendly_name = local.config_bigquery.dataset.customer_lifetime_value.friendly_name
- project = null_resource.check_bigquery_api.id != "" ? local.customer_lifetime_value_project_id : local.feature_store_project_id
- description = local.config_bigquery.dataset.customer_lifetime_value.description
- location = local.config_bigquery.dataset.customer_lifetime_value.location
+ dataset_id = local.config_bigquery.dataset.customer_lifetime_value.name
+ friendly_name = local.config_bigquery.dataset.customer_lifetime_value.friendly_name
+ project = null_resource.check_bigquery_api.id != "" ? local.customer_lifetime_value_project_id : local.feature_store_project_id
+ description = local.config_bigquery.dataset.customer_lifetime_value.description
+ location = local.config_bigquery.dataset.customer_lifetime_value.location
# The max_time_travel_hours attribute specifies the maximum number of hours that data in the dataset can be accessed using time travel queries.
# In this case, the maximum time travel hours is set to the value of the local file config.yaml section bigquery.dataset.customer_lifetime_value.max_time_travel_hours configuration.
- max_time_travel_hours = local.config_bigquery.dataset.customer_lifetime_value.max_time_travel_hours
+ max_time_travel_hours = local.config_bigquery.dataset.customer_lifetime_value.max_time_travel_hours
# The delete_contents_on_destroy attribute specifies whether the contents of the dataset should be deleted when the dataset is destroyed.
# In this case, the delete_contents_on_destroy attribute is set to false, which means that the contents of the dataset will not be deleted when the dataset is destroyed.
delete_contents_on_destroy = false
@@ -118,14 +144,14 @@ resource "google_bigquery_dataset" "customer_lifetime_value" {
# This resource creates a BigQuery dataset called `audience_segmentation`.
resource "google_bigquery_dataset" "audience_segmentation" {
- dataset_id = local.config_bigquery.dataset.audience_segmentation.name
- friendly_name = local.config_bigquery.dataset.audience_segmentation.friendly_name
- project = null_resource.check_bigquery_api.id != "" ? local.audience_segmentation_project_id : local.feature_store_project_id
- description = local.config_bigquery.dataset.audience_segmentation.description
- location = local.config_bigquery.dataset.audience_segmentation.location
+ dataset_id = local.config_bigquery.dataset.audience_segmentation.name
+ friendly_name = local.config_bigquery.dataset.audience_segmentation.friendly_name
+ project = null_resource.check_bigquery_api.id != "" ? local.audience_segmentation_project_id : local.feature_store_project_id
+ description = local.config_bigquery.dataset.audience_segmentation.description
+ location = local.config_bigquery.dataset.audience_segmentation.location
# The max_time_travel_hours attribute specifies the maximum number of hours that data in the dataset can be accessed using time travel queries.
# In this case, the maximum time travel hours is set to the value of the local file config.yaml section bigquery.dataset.audience_segmentation.max_time_travel_hours configuration.
- max_time_travel_hours = local.config_bigquery.dataset.audience_segmentation.max_time_travel_hours
+ max_time_travel_hours = local.config_bigquery.dataset.audience_segmentation.max_time_travel_hours
# The delete_contents_on_destroy attribute specifies whether the contents of the dataset should be deleted when the dataset is destroyed.
# In this case, the delete_contents_on_destroy attribute is set to false, which means that the contents of the dataset will not be deleted when the dataset is destroyed.
delete_contents_on_destroy = false
@@ -144,14 +170,14 @@ resource "google_bigquery_dataset" "audience_segmentation" {
# This resource creates a BigQuery dataset called `auto_audience_segmentation`.
resource "google_bigquery_dataset" "auto_audience_segmentation" {
- dataset_id = local.config_bigquery.dataset.auto_audience_segmentation.name
- friendly_name = local.config_bigquery.dataset.auto_audience_segmentation.friendly_name
- project = null_resource.check_bigquery_api.id != "" ? local.auto_audience_segmentation_project_id : local.feature_store_project_id
- description = local.config_bigquery.dataset.auto_audience_segmentation.description
- location = local.config_bigquery.dataset.auto_audience_segmentation.location
+ dataset_id = local.config_bigquery.dataset.auto_audience_segmentation.name
+ friendly_name = local.config_bigquery.dataset.auto_audience_segmentation.friendly_name
+ project = null_resource.check_bigquery_api.id != "" ? local.auto_audience_segmentation_project_id : local.feature_store_project_id
+ description = local.config_bigquery.dataset.auto_audience_segmentation.description
+ location = local.config_bigquery.dataset.auto_audience_segmentation.location
# The max_time_travel_hours attribute specifies the maximum number of hours that data in the dataset can be accessed using time travel queries.
# In this case, the maximum time travel hours is set to the value of the local file config.yaml section bigquery.dataset.auto_audience_segmentation.max_time_travel_hours configuration.
- max_time_travel_hours = local.config_bigquery.dataset.auto_audience_segmentation.max_time_travel_hours
+ max_time_travel_hours = local.config_bigquery.dataset.auto_audience_segmentation.max_time_travel_hours
# The delete_contents_on_destroy attribute specifies whether the contents of the dataset should be deleted when the dataset is destroyed.
# In this case, the delete_contents_on_destroy attribute is set to false, which means that the contents of the dataset will not be deleted when the dataset is destroyed.
delete_contents_on_destroy = false
@@ -200,7 +226,7 @@ locals {
module "aggregated_vbb" {
source = "terraform-google-modules/bigquery/google"
- version = "~> 5.4"
+ version = "8.1.0"
dataset_id = local.config_bigquery.dataset.aggregated_vbb.name
dataset_name = local.config_bigquery.dataset.aggregated_vbb.friendly_name
@@ -216,18 +242,18 @@ module "aggregated_vbb" {
}
tables = [for table_id in local.aggregated_vbb_tables :
- {
- table_id = table_id
- schema = file("../../sql/schema/table/${table_id}.json")
- # The max_time_travel_hours attribute specifies the maximum number of hours that data in the dataset can be accessed using time travel queries.
- # In this case, the maximum time travel hours is set to the value of the local file config.yaml section bigquery.dataset.auto_audience_segmentation.max_time_travel_hours configuration.
- max_time_travel_hours = local.config_bigquery.dataset.aggregated_vbb.max_time_travel_hours
- deletion_protection = false
- time_partitioning = null,
- range_partitioning = null,
- expiration_time = null,
- clustering = [],
- labels = {},
+ {
+ table_id = table_id
+ schema = file("../../sql/schema/table/${table_id}.json")
+ # The max_time_travel_hours attribute specifies the maximum number of hours that data in the dataset can be accessed using time travel queries.
+ # In this case, the maximum time travel hours is set to the value of the local file config.yaml section bigquery.dataset.auto_audience_segmentation.max_time_travel_hours configuration.
+ max_time_travel_hours = local.config_bigquery.dataset.aggregated_vbb.max_time_travel_hours
+ deletion_protection = false
+ time_partitioning = null,
+ range_partitioning = null,
+ expiration_time = null,
+ clustering = [],
+ labels = {},
}]
}
@@ -236,13 +262,13 @@ module "aggregated_vbb" {
# the aggregated predictions generated by the predictions pipelines.
module "aggregated_predictions" {
source = "terraform-google-modules/bigquery/google"
- version = "~> 5.4"
+ version = "8.1.0"
- dataset_id = local.config_bigquery.dataset.aggregated_predictions.name
- dataset_name = local.config_bigquery.dataset.aggregated_predictions.friendly_name
- description = local.config_bigquery.dataset.aggregated_predictions.description
- project_id = local.config_bigquery.dataset.aggregated_predictions.project_id
- location = local.config_bigquery.dataset.aggregated_predictions.location
+ dataset_id = local.config_bigquery.dataset.aggregated_predictions.name
+ dataset_name = local.config_bigquery.dataset.aggregated_predictions.friendly_name
+ description = local.config_bigquery.dataset.aggregated_predictions.description
+ project_id = local.config_bigquery.dataset.aggregated_predictions.project_id
+ location = local.config_bigquery.dataset.aggregated_predictions.location
# The delete_contents_on_destroy attribute specifies whether the contents of the dataset should be deleted when the dataset is destroyed.
# In this case, the delete_contents_on_destroy attribute is set to true, which means that the contents of the dataset will be deleted when the dataset is destroyed.
delete_contents_on_destroy = true
@@ -250,7 +276,7 @@ module "aggregated_predictions" {
# The tables attribute is used to configure the BigQuery table within the dataset
tables = [
{
- table_id = "latest"
+ table_id = "latest"
# The schema of the table, defined in a JSON file.
schema = file("../../sql/schema/table/aggregated_predictions_latest.json")
time_partitioning = null,
@@ -291,7 +317,7 @@ locals {
module "gemini_insights" {
source = "terraform-google-modules/bigquery/google"
- version = "~> 5.4"
+ version = "8.1.0"
dataset_id = local.config_bigquery.dataset.gemini_insights.name
dataset_name = local.config_bigquery.dataset.gemini_insights.friendly_name
@@ -300,26 +326,27 @@ module "gemini_insights" {
location = local.config_bigquery.dataset.gemini_insights.location
# The delete_contents_on_destroy attribute specifies whether the contents of the dataset should be deleted when the dataset is destroyed.
# In this case, the delete_contents_on_destroy attribute is set to false, which means that the contents of the dataset will not be deleted when the dataset is destroyed.
- delete_contents_on_destroy = true
+ delete_contents_on_destroy = false
+ deletion_protection = true
dataset_labels = {
- version = "prod",
- dataset_id = local.config_bigquery.dataset.gemini_insights.name
+ version = "prod",
+ dataset_id = local.config_bigquery.dataset.gemini_insights.name
}
tables = [for table_id in local.gemini_insights_tables :
- {
- table_id = table_id
- schema = file("../../sql/schema/table/${table_id}.json")
- # The max_time_travel_hours attribute specifies the maximum number of hours that data in the dataset can be accessed using time travel queries.
- # In this case, the maximum time travel hours is set to the value of the local file config.yaml section bigquery.dataset.gemini_insights.max_time_travel_hours configuration.
- max_time_travel_hours = local.config_bigquery.dataset.gemini_insights.max_time_travel_hours
- deletion_protection = false
- time_partitioning = null,
- range_partitioning = null,
- expiration_time = null,
- clustering = [],
- labels = {},
+ {
+ table_id = table_id
+ schema = file("../../sql/schema/table/${table_id}.json")
+ # The max_time_travel_hours attribute specifies the maximum number of hours that data in the dataset can be accessed using time travel queries.
+ # In this case, the maximum time travel hours is set to the value of the local file config.yaml section bigquery.dataset.gemini_insights.max_time_travel_hours configuration.
+ max_time_travel_hours = local.config_bigquery.dataset.gemini_insights.max_time_travel_hours
+ deletion_protection = true
+ time_partitioning = null,
+ range_partitioning = null,
+ expiration_time = null,
+ clustering = [],
+ labels = {},
}]
}
@@ -347,4 +374,4 @@ resource "null_resource" "check_gemini_insights_dataset_exists" {
depends_on = [
module.gemini_insights.google_bigquery_dataset
]
-}
\ No newline at end of file
+}
diff --git a/infrastructure/terraform/modules/feature-store/bigquery-procedures.tf b/infrastructure/terraform/modules/feature-store/bigquery-procedures.tf
index d34ee0e2..19f5a197 100644
--- a/infrastructure/terraform/modules/feature-store/bigquery-procedures.tf
+++ b/infrastructure/terraform/modules/feature-store/bigquery-procedures.tf
@@ -54,13 +54,13 @@ data "local_file" "aggregated_value_based_bidding_training_preparation_file" {
# The procedure is typically invoked before running the Aggregated Value Based Bidding model to ensure that the input data
# is in the correct format and contains the necessary features for training.
resource "google_bigquery_routine" "aggregated_value_based_bidding_training_preparation" {
- project = null_resource.check_bigquery_api.id != "" ? local.aggregated_vbb_project_id : local.feature_store_project_id
- dataset_id = module.aggregated_vbb.bigquery_dataset.dataset_id
- routine_id = "aggregated_value_based_bidding_training_preparation"
- routine_type = "PROCEDURE"
- language = "SQL"
+ project = null_resource.check_bigquery_api.id != "" ? local.aggregated_vbb_project_id : local.feature_store_project_id
+ dataset_id = module.aggregated_vbb.bigquery_dataset.dataset_id
+ routine_id = "aggregated_value_based_bidding_training_preparation"
+ routine_type = "PROCEDURE"
+ language = "SQL"
definition_body = data.local_file.aggregated_value_based_bidding_training_preparation_file.content
- description = "Procedure that prepares features for Aggregated VBB model training."
+ description = "Procedure that prepares features for Aggregated VBB model training."
}
@@ -78,13 +78,13 @@ data "local_file" "aggregated_value_based_bidding_explanation_preparation_file"
# The procedure is typically invoked before running the Aggregated Value Based Bidding model to ensure that the input data
# is in the correct format and contains the necessary features for explanation.
resource "google_bigquery_routine" "aggregated_value_based_bidding_explanation_preparation" {
- project = null_resource.check_bigquery_api.id != "" ? local.aggregated_vbb_project_id : local.feature_store_project_id
- dataset_id = module.aggregated_vbb.bigquery_dataset.dataset_id
- routine_id = "aggregated_value_based_bidding_explanation_preparation"
- routine_type = "PROCEDURE"
- language = "SQL"
+ project = null_resource.check_bigquery_api.id != "" ? local.aggregated_vbb_project_id : local.feature_store_project_id
+ dataset_id = module.aggregated_vbb.bigquery_dataset.dataset_id
+ routine_id = "aggregated_value_based_bidding_explanation_preparation"
+ routine_type = "PROCEDURE"
+ language = "SQL"
definition_body = data.local_file.aggregated_value_based_bidding_explanation_preparation_file.content
- description = "Procedure that prepares features for Aggregated VBB model explanation."
+ description = "Procedure that prepares features for Aggregated VBB model explanation."
}
# This resource reads the contents of a local SQL file named auto_audience_segmentation_inference_preparation.sql and
@@ -350,6 +350,32 @@ resource "google_bigquery_routine" "churn_propensity_inference_preparation" {
}
}
+# This resource reads the contents of a local SQL file named lead_score_propensity_inference_preparation.sql and
+# stores it in a variable named lead_score_propensity_inference_preparation_file.content.
+# The SQL file is expected to contain the definition of a BigQuery procedure named lead_score_propensity_inference_preparation.
+data "local_file" "lead_score_propensity_inference_preparation_file" {
+ filename = "${local.sql_dir}/procedure/lead_score_propensity_inference_preparation.sql"
+}
+
+# The lead_score_propensity_inference_preparation procedure is designed to prepare features for the Lead Score Propensity model.
+# ##
+# The procedure is typically invoked before prediction the Lead Score Propensity model to ensure that the features data
+# is in the correct format and contains the necessary features for prediction.
+resource "google_bigquery_routine" "lead_score_propensity_inference_preparation" {
+ project = null_resource.check_bigquery_api.id != "" ? local.lead_score_propensity_project_id : local.feature_store_project_id
+ dataset_id = google_bigquery_dataset.lead_score_propensity.dataset_id
+ routine_id = "lead_score_propensity_inference_preparation"
+ routine_type = "PROCEDURE"
+ language = "SQL"
+ definition_body = data.local_file.lead_score_propensity_inference_preparation_file.content
+ description = "Procedure that prepares features for Lead Score Propensity model inference. User-per-day granularity level features. Run this procedure every time before Lead Score Propensity model predict."
+ arguments {
+ name = "inference_date"
+ mode = "INOUT"
+ data_type = jsonencode({ "typeKind" : "DATE" })
+ }
+}
+
# This resource reads the contents of a local SQL file named purchase_propensity_label.sql and
# stores it in a variable named purchase_propensity_label_file.content.
# The SQL file is expected to contain the definition of a BigQuery procedure named purchase_propensity_label.
@@ -422,6 +448,42 @@ resource "google_bigquery_routine" "churn_propensity_label" {
}
}
+# This resource reads the contents of a local SQL file named lead_score_propensity_label.sql and
+# stores it in a variable named lead_score_propensity_label_file.content.
+# The SQL file is expected to contain the definition of a BigQuery procedure named lead_score_propensity_label.
+data "local_file" "lead_score_propensity_label_file" {
+ filename = "${local.sql_dir}/procedure/lead_score_propensity_label.sql"
+}
+
+# The lead_score_propensity_label procedure is designed to prepare label for the Lead Score Propensity model.
+# ##
+# The procedure is typically invoked before training the Lead Score Propensity model to ensure that the labeled data
+# is in the correct format and ready for training.
+resource "google_bigquery_routine" "lead_score_propensity_label" {
+ project = null_resource.check_bigquery_api.id != "" ? local.feature_store_project_id : local.feature_store_project_id
+ dataset_id = google_bigquery_dataset.feature_store.dataset_id
+ routine_id = "lead_score_propensity_label"
+ routine_type = "PROCEDURE"
+ language = "SQL"
+ definition_body = data.local_file.lead_score_propensity_label_file.content
+ description = "User-per-day granularity level labels. Run this procedure daily."
+ arguments {
+ name = "input_date"
+ mode = "INOUT"
+ data_type = jsonencode({ "typeKind" : "DATE" })
+ }
+ arguments {
+ name = "end_date"
+ mode = "INOUT"
+ data_type = jsonencode({ "typeKind" : "DATE" })
+ }
+ arguments {
+ name = "rows_added"
+ mode = "OUT"
+ data_type = jsonencode({ "typeKind" : "INT64" })
+ }
+}
+
# This resource reads the contents of a local SQL file named purchase_propensity_training_preparation.sql and
# stores it in a variable named purchase_propensity_training_preparation_file.content.
# The SQL file is expected to contain the definition of a BigQuery procedure named purchase_propensity_training_preparation.
@@ -463,6 +525,46 @@ resource "google_bigquery_routine" "purchase_propensity_training_preparation" {
}
}
+# This resource reads the contents of a local SQL file named lead_score_propensity_training_preparation.sql and
+# stores it in a variable named lead_score_propensity_training_preparation_file.content.
+# The SQL file is expected to contain the definition of a BigQuery procedure named lead_score_propensity_training_preparation.
+data "local_file" "lead_score_propensity_training_preparation_file" {
+ filename = "${local.sql_dir}/procedure/lead_score_propensity_training_preparation.sql"
+}
+
+# The lead_score_propensity_training_preparation procedure is designed to prepare features for the Lead Score Propensity model.
+# ##
+# The procedure is typically invoked before training the Lead Score Propensity model to ensure that the features data
+# is in the correct format and contains the necessary features for training.
+resource "google_bigquery_routine" "lead_score_propensity_training_preparation" {
+ project = null_resource.check_bigquery_api.id != "" ? local.lead_score_propensity_project_id : local.feature_store_project_id
+ dataset_id = google_bigquery_dataset.lead_score_propensity.dataset_id
+ routine_id = "lead_score_propensity_training_preparation"
+ routine_type = "PROCEDURE"
+ language = "SQL"
+ definition_body = data.local_file.lead_score_propensity_training_preparation_file.content
+ description = "Procedure that prepares features for Lead Score Propensity model training. User-per-day granularity level features. Run this procedure every time before Lead Score Propensity model train."
+ arguments {
+ name = "start_date"
+ mode = "INOUT"
+ data_type = jsonencode({ "typeKind" : "DATE" })
+ }
+ arguments {
+ name = "end_date"
+ mode = "INOUT"
+ data_type = jsonencode({ "typeKind" : "DATE" })
+ }
+ arguments {
+ name = "train_split_end_number"
+ mode = "INOUT"
+ data_type = jsonencode({ "typeKind" : "INT64" })
+ }
+ arguments {
+ name = "validation_split_end_number"
+ mode = "INOUT"
+ data_type = jsonencode({ "typeKind" : "INT64" })
+ }
+}
# This resource reads the contents of a local SQL file named churn_propensity_training_preparation.sql and
# stores it in a variable named churn_propensity_training_preparation_file.content.
@@ -685,6 +787,42 @@ resource "google_bigquery_routine" "user_rolling_window_metrics" {
}
}
+# This resource reads the contents of a local SQL file named user_rolling_window_lead_metrics.sql and
+# stores it in a variable named user_rolling_window_lead_metrics_file.content.
+# The SQL file is expected to contain the definition of a BigQuery procedure named user_rolling_window_lead_metrics.
+data "local_file" "user_rolling_window_lead_metrics_file" {
+ filename = "${local.sql_dir}/procedure/user_rolling_window_lead_metrics.sql"
+}
+
+# The user_rolling_window_lead_metrics procedure is designed to prepare the features for the Purchase Propensity model.
+# ##
+# The procedure is typically invoked before training the Purchase Propensity model to ensure that the features data
+# is in the correct format and ready for training.
+resource "google_bigquery_routine" "user_rolling_window_lead_metrics" {
+ project = null_resource.check_bigquery_api.id != "" ? local.feature_store_project_id : local.feature_store_project_id
+ dataset_id = google_bigquery_dataset.feature_store.dataset_id
+ routine_id = "user_rolling_window_lead_metrics"
+ routine_type = "PROCEDURE"
+ language = "SQL"
+ definition_body = data.local_file.user_rolling_window_lead_metrics_file.content
+ description = "User-per-day granularity level metrics. Run this procedure daily. Metrics calculated using a rolling window operation."
+ arguments {
+ name = "input_date"
+ mode = "INOUT"
+ data_type = jsonencode({ "typeKind" : "DATE" })
+ }
+ arguments {
+ name = "end_date"
+ mode = "INOUT"
+ data_type = jsonencode({ "typeKind" : "DATE" })
+ }
+ arguments {
+ name = "rows_added"
+ mode = "OUT"
+ data_type = jsonencode({ "typeKind" : "INT64" })
+ }
+}
+
# This resource reads the contents of a local SQL file named user_scoped_lifetime_metrics.sql
data "local_file" "user_scoped_lifetime_metrics_file" {
filename = "${local.sql_dir}/procedure/user_scoped_lifetime_metrics.sql"
@@ -880,6 +1018,14 @@ resource "google_bigquery_routine" "user_behaviour_revenue_insights" {
depends_on = [
null_resource.check_gemini_model_exists
]
+
+ # The lifecycle block is used to configure the lifecycle of the table. In this case, the ignore_changes attribute is set to all, which means that Terraform will ignore
+ # any changes to the table and will not attempt to update the table. The prevent_destroy attribute is set to true, which means that Terraform will prevent the table from being destroyed.
+ lifecycle {
+ ignore_changes = all
+ #prevent_destroy = true
+ create_before_destroy = true
+ }
}
/*
@@ -930,6 +1076,20 @@ resource "google_bigquery_routine" "invoke_backfill_churn_propensity_label" {
description = "Procedure that backfills the churn_propensity_label feature table. Run this procedure occasionally before training the models."
}
+data "local_file" "invoke_backfill_lead_score_propensity_label_file" {
+ filename = "${local.sql_dir}/query/invoke_backfill_lead_score_propensity_label.sql"
+}
+
+resource "google_bigquery_routine" "invoke_backfill_lead_score_propensity_label" {
+ project = null_resource.check_bigquery_api.id != "" ? local.feature_store_project_id : local.feature_store_project_id
+ dataset_id = google_bigquery_dataset.feature_store.dataset_id
+ routine_id = "invoke_backfill_lead_score_propensity_label"
+ routine_type = "PROCEDURE"
+ language = "SQL"
+ definition_body = data.local_file.invoke_backfill_lead_score_propensity_label_file.content
+ description = "Procedure that backfills the lead_score_propensity_label feature table. Run this procedure occasionally before training the models."
+}
+
data "local_file" "invoke_backfill_user_dimensions_file" {
filename = "${local.sql_dir}/query/invoke_backfill_user_dimensions.sql"
}
@@ -1003,6 +1163,20 @@ resource "google_bigquery_routine" "invoke_backfill_user_rolling_window_metrics"
description = "Procedure that backfills the user_rolling_window_metrics feature table. Run this procedure occasionally before training the models."
}
+data "local_file" "invoke_backfill_user_rolling_window_lead_metrics_file" {
+ filename = "${local.sql_dir}/query/invoke_backfill_user_rolling_window_lead_metrics.sql"
+}
+
+resource "google_bigquery_routine" "invoke_backfill_user_rolling_window_lead_metrics" {
+ project = null_resource.check_bigquery_api.id != "" ? local.feature_store_project_id : local.feature_store_project_id
+ dataset_id = google_bigquery_dataset.feature_store.dataset_id
+ routine_id = "invoke_backfill_user_rolling_window_lead_metrics"
+ routine_type = "PROCEDURE"
+ language = "SQL"
+ definition_body = data.local_file.invoke_backfill_user_rolling_window_lead_metrics_file.content
+ description = "Procedure that backfills the user_rolling_window_lead_metrics feature table. Run this procedure occasionally before training the models."
+}
+
data "local_file" "invoke_backfill_user_scoped_lifetime_metrics_file" {
filename = "${local.sql_dir}/query/invoke_backfill_user_scoped_lifetime_metrics.sql"
@@ -1091,6 +1265,14 @@ resource "google_bigquery_routine" "invoke_backfill_user_behaviour_revenue_insig
null_resource.check_gemini_model_exists,
null_resource.create_gemini_model
]
+
+ # The lifecycle block is used to configure the lifecycle of the table. In this case, the ignore_changes attribute is set to all, which means that Terraform will ignore
+ # any changes to the table and will not attempt to update the table. The prevent_destroy attribute is set to true, which means that Terraform will prevent the table from being destroyed.
+ lifecycle {
+ ignore_changes = all
+ #prevent_destroy = true
+ create_before_destroy = true
+ }
}
/*
@@ -1139,6 +1321,19 @@ resource "google_bigquery_routine" "invoke_churn_propensity_inference_preparatio
definition_body = data.local_file.invoke_churn_propensity_inference_preparation_file.content
}
+data "local_file" "invoke_lead_score_propensity_inference_preparation_file" {
+ filename = "${local.sql_dir}/query/invoke_lead_score_propensity_inference_preparation.sql"
+}
+
+resource "google_bigquery_routine" "invoke_lead_score_propensity_inference_preparation" {
+ project = null_resource.check_bigquery_api.id != "" ? local.lead_score_propensity_project_id : local.feature_store_project_id
+ dataset_id = google_bigquery_dataset.lead_score_propensity.dataset_id
+ routine_id = "invoke_lead_score_propensity_inference_preparation"
+ routine_type = "PROCEDURE"
+ language = "SQL"
+ definition_body = data.local_file.invoke_lead_score_propensity_inference_preparation_file.content
+}
+
data "local_file" "invoke_audience_segmentation_inference_preparation_file" {
filename = "${local.sql_dir}/query/invoke_audience_segmentation_inference_preparation.sql"
@@ -1222,6 +1417,19 @@ resource "google_bigquery_routine" "invoke_churn_propensity_training_preparation
}
+data "local_file" "invoke_lead_score_propensity_training_preparation_file" {
+ filename = "${local.sql_dir}/query/invoke_lead_score_propensity_training_preparation.sql"
+}
+
+resource "google_bigquery_routine" "invoke_lead_score_propensity_training_preparation" {
+ project = null_resource.check_bigquery_api.id != "" ? local.lead_score_propensity_project_id : local.feature_store_project_id
+ dataset_id = google_bigquery_dataset.lead_score_propensity.dataset_id
+ routine_id = "invoke_lead_score_propensity_training_preparation"
+ routine_type = "PROCEDURE"
+ language = "SQL"
+ definition_body = data.local_file.invoke_lead_score_propensity_training_preparation_file.content
+}
+
data "local_file" "invoke_audience_segmentation_training_preparation_file" {
filename = "${local.sql_dir}/query/invoke_audience_segmentation_training_preparation.sql"
}
@@ -1242,11 +1450,11 @@ data "local_file" "invoke_aggregated_value_based_bidding_training_preparation_fi
# Terraform resource for invoking the bigquery stored procedure
resource "google_bigquery_routine" "invoke_aggregated_value_based_bidding_training_preparation" {
- project = null_resource.check_bigquery_api.id != "" ? local.aggregated_vbb_project_id : local.feature_store_project_id
- dataset_id = module.aggregated_vbb.bigquery_dataset.dataset_id
- routine_id = "invoke_aggregated_value_based_bidding_training_preparation"
- routine_type = "PROCEDURE"
- language = "SQL"
+ project = null_resource.check_bigquery_api.id != "" ? local.aggregated_vbb_project_id : local.feature_store_project_id
+ dataset_id = module.aggregated_vbb.bigquery_dataset.dataset_id
+ routine_id = "invoke_aggregated_value_based_bidding_training_preparation"
+ routine_type = "PROCEDURE"
+ language = "SQL"
definition_body = data.local_file.invoke_aggregated_value_based_bidding_training_preparation_file.content
}
@@ -1257,11 +1465,11 @@ data "local_file" "invoke_aggregated_value_based_bidding_explanation_preparation
# Terraform resource for invoking the bigquery stored procedure
resource "google_bigquery_routine" "invoke_aggregated_value_based_bidding_explanation_preparation" {
- project = null_resource.check_bigquery_api.id != "" ? local.aggregated_vbb_project_id : local.feature_store_project_id
- dataset_id = module.aggregated_vbb.bigquery_dataset.dataset_id
- routine_id = "invoke_aggregated_value_based_bidding_explanation_preparation"
- routine_type = "PROCEDURE"
- language = "SQL"
+ project = null_resource.check_bigquery_api.id != "" ? local.aggregated_vbb_project_id : local.feature_store_project_id
+ dataset_id = module.aggregated_vbb.bigquery_dataset.dataset_id
+ routine_id = "invoke_aggregated_value_based_bidding_explanation_preparation"
+ routine_type = "PROCEDURE"
+ language = "SQL"
definition_body = data.local_file.invoke_aggregated_value_based_bidding_explanation_preparation_file.content
}
@@ -1298,6 +1506,20 @@ resource "google_bigquery_routine" "invoke_purchase_propensity_label" {
}
+data "local_file" "invoke_lead_score_propensity_label_file" {
+ filename = "${local.sql_dir}/query/invoke_lead_score_propensity_label.sql"
+}
+
+resource "google_bigquery_routine" "invoke_lead_score_propensity_label" {
+ project = null_resource.check_bigquery_api.id != "" ? local.feature_store_project_id : local.feature_store_project_id
+ dataset_id = google_bigquery_dataset.feature_store.dataset_id
+ routine_id = "invoke_lead_score_propensity_label"
+ routine_type = "PROCEDURE"
+ language = "SQL"
+ definition_body = data.local_file.invoke_lead_score_propensity_label_file.content
+ description = "Procedure that invokes the lead_score_propensity_label table. Daily granularity level. Run this procedure daily before running prediction pipelines."
+}
+
data "local_file" "invoke_churn_propensity_label_file" {
filename = "${local.sql_dir}/query/invoke_churn_propensity_label.sql"
}
@@ -1387,6 +1609,20 @@ resource "google_bigquery_routine" "invoke_user_rolling_window_metrics" {
}
+data "local_file" "invoke_user_rolling_window_lead_metrics_file" {
+ filename = "${local.sql_dir}/query/invoke_user_rolling_window_lead_metrics.sql"
+}
+
+resource "google_bigquery_routine" "invoke_user_rolling_window_lead_metrics" {
+ project = null_resource.check_bigquery_api.id != "" ? local.feature_store_project_id : local.feature_store_project_id
+ dataset_id = google_bigquery_dataset.feature_store.dataset_id
+ routine_id = "invoke_user_rolling_window_lead_metrics"
+ routine_type = "PROCEDURE"
+ language = "SQL"
+ definition_body = data.local_file.invoke_user_rolling_window_lead_metrics_file.content
+ description = "Procedure that invokes the user_rolling_window_lead_metrics table. Daily granularity level. Run this procedure daily before running prediction pipelines."
+}
+
data "local_file" "invoke_user_scoped_lifetime_metrics_file" {
filename = "${local.sql_dir}/query/invoke_user_scoped_lifetime_metrics.sql"
}
@@ -1448,7 +1684,7 @@ data "local_file" "invoke_user_session_event_aggregated_metrics_file" {
}
resource "google_bigquery_routine" "invoke_user_session_event_aggregated_metrics" {
- project = null_resource.check_bigquery_api.id != "" ? local.purchase_propensity_project_id : local.feature_store_project_id
+ project = null_resource.check_bigquery_api.id != "" ? local.feature_store_project_id : local.feature_store_project_id
dataset_id = google_bigquery_dataset.feature_store.dataset_id
routine_id = "invoke_user_session_event_aggregated_metrics"
routine_type = "PROCEDURE"
@@ -1465,20 +1701,31 @@ data "local_file" "create_gemini_model_file" {
resource "null_resource" "create_gemini_model" {
triggers = {
vertex_ai_connection_exists = google_bigquery_connection.vertex_ai_connection.id,
- gemini_dataset_exists = module.gemini_insights.bigquery_dataset.id,
+ gemini_dataset_exists = module.gemini_insights.bigquery_dataset.id,
check_gemini_dataset_listed = null_resource.check_gemini_insights_dataset_exists.id
+ role_propagated = time_sleep.wait_for_vertex_ai_connection_sa_role_propagation.id
}
provisioner "local-exec" {
- command = <<-EOT
- ${local.poetry_run_alias} bq query --use_legacy_sql=false --max_rows=100 --maximum_bytes_billed=10000000 < ${data.local_file.create_gemini_model_file.filename}
+ command = <<-EOT
+ sleep 120
+ ${var.uv_run_alias} bq query --use_legacy_sql=false --max_rows=100 --maximum_bytes_billed=10000000 < ${data.local_file.create_gemini_model_file.filename}
EOT
}
+ # The lifecycle block is used to configure the lifecycle of the table. In this case, the ignore_changes attribute is set to all, which means that Terraform will ignore
+ # any changes to the table and will not attempt to update the table. The prevent_destroy attribute is set to true, which means that Terraform will prevent the table from being destroyed.
+ lifecycle {
+ ignore_changes = all
+ #prevent_destroy = true
+ create_before_destroy = true
+ }
+
depends_on = [
google_bigquery_connection.vertex_ai_connection,
module.gemini_insights.google_bigquery_dataset,
- null_resource.check_gemini_insights_dataset_exists
+ null_resource.check_gemini_insights_dataset_exists,
+ time_sleep.wait_for_vertex_ai_connection_sa_role_propagation,
]
}
@@ -1486,7 +1733,8 @@ resource "null_resource" "create_gemini_model" {
resource "null_resource" "check_gemini_model_exists" {
triggers = {
vertex_ai_connection_exists = google_bigquery_connection.vertex_ai_connection.id
- gemini_model_created = null_resource.create_gemini_model.id
+ gemini_model_created = null_resource.create_gemini_model.id
+ role_propagated = time_sleep.wait_for_vertex_ai_connection_sa_role_propagation.id
}
provisioner "local-exec" {
@@ -1498,18 +1746,19 @@ resource "null_resource" "check_gemini_model_exists" {
sleep 5
printf "."
COUNTER=$((COUNTER + 1))
+ if [ $COUNTER -eq $MAX_TRIES ]; then
+ echo "Gemini model was not created, terraform can not continue!"
+ exit 1
+ fi
done
- if [ $COUNTER -eq $MAX_TRIES ]; then
- echo "Gemini model was not created, terraform can not continue!"
- exit 1
- fi
- sleep 5
EOT
}
depends_on = [
google_bigquery_connection.vertex_ai_connection,
- null_resource.create_gemini_model
+ null_resource.create_gemini_model,
+ time_sleep.wait_for_vertex_ai_connection_sa_role_propagation,
+ google_project_iam_member.vertex_ai_connection_sa_roles
]
}
@@ -1530,4 +1779,4 @@ resource "google_bigquery_routine" "invoke_user_behaviour_revenue_insights" {
null_resource.check_gemini_model_exists,
null_resource.create_gemini_model
]
-}
\ No newline at end of file
+}
diff --git a/infrastructure/terraform/modules/feature-store/bigquery-tables.tf b/infrastructure/terraform/modules/feature-store/bigquery-tables.tf
index c968bda2..e74fb1ed 100644
--- a/infrastructure/terraform/modules/feature-store/bigquery-tables.tf
+++ b/infrastructure/terraform/modules/feature-store/bigquery-tables.tf
@@ -15,10 +15,10 @@
# This resource creates a BigQuery table named audience_segmentation_inference_preparation
# in the dataset specified by google_bigquery_dataset.audience_segmentation.dataset_id.
resource "google_bigquery_table" "audience_segmentation_inference_preparation" {
- project = google_bigquery_dataset.audience_segmentation.project
- dataset_id = google_bigquery_dataset.audience_segmentation.dataset_id
- table_id = local.config_bigquery.table.audience_segmentation_inference_preparation.table_name
- description = local.config_bigquery.table.audience_segmentation_inference_preparation.table_description
+ project = google_bigquery_dataset.audience_segmentation.project
+ dataset_id = google_bigquery_dataset.audience_segmentation.dataset_id
+ table_id = local.config_bigquery.table.audience_segmentation_inference_preparation.table_name
+ description = local.config_bigquery.table.audience_segmentation_inference_preparation.table_description
# The deletion_protection attribute specifies whether the table should be protected from deletion. In this case, it's set to false, which means that the table can be deleted.
deletion_protection = false
@@ -34,10 +34,10 @@ resource "google_bigquery_table" "audience_segmentation_inference_preparation" {
# This resource creates a BigQuery table named customer_lifetime_value_inference_preparation
# in the dataset specified by google_bigquery_dataset.customer_lifetime_value.dataset_id.
resource "google_bigquery_table" "customer_lifetime_value_inference_preparation" {
- project = google_bigquery_dataset.customer_lifetime_value.project
- dataset_id = google_bigquery_dataset.customer_lifetime_value.dataset_id
- table_id = local.config_bigquery.table.customer_lifetime_value_inference_preparation.table_name
- description = local.config_bigquery.table.customer_lifetime_value_inference_preparation.table_description
+ project = google_bigquery_dataset.customer_lifetime_value.project
+ dataset_id = google_bigquery_dataset.customer_lifetime_value.dataset_id
+ table_id = local.config_bigquery.table.customer_lifetime_value_inference_preparation.table_name
+ description = local.config_bigquery.table.customer_lifetime_value_inference_preparation.table_description
# The deletion_protection attribute specifies whether the table should be protected from deletion. In this case, it's set to false, which means that the table can be deleted.
deletion_protection = false
@@ -53,10 +53,10 @@ resource "google_bigquery_table" "customer_lifetime_value_inference_preparation"
# This resource creates a BigQuery table named customer_lifetime_value_label
# in the dataset specified by google_bigquery_dataset.feature_store.dataset_id.
resource "google_bigquery_table" "customer_lifetime_value_label" {
- project = google_bigquery_dataset.feature_store.project
- dataset_id = google_bigquery_dataset.feature_store.dataset_id
- table_id = local.config_bigquery.table.customer_lifetime_value_label.table_name
- description = local.config_bigquery.table.customer_lifetime_value_label.table_description
+ project = google_bigquery_dataset.feature_store.project
+ dataset_id = google_bigquery_dataset.feature_store.dataset_id
+ table_id = local.config_bigquery.table.customer_lifetime_value_label.table_name
+ description = local.config_bigquery.table.customer_lifetime_value_label.table_description
# The deletion_protection attribute specifies whether the table should be protected from deletion. In this case, it's set to false, which means that the table can be deleted.
deletion_protection = false
@@ -79,10 +79,10 @@ resource "google_bigquery_table" "customer_lifetime_value_label" {
# This resource creates a BigQuery table named purchase_propensity_inference_preparation
# in the dataset specified by google_bigquery_dataset.purchase_propensity.dataset_id.
resource "google_bigquery_table" "purchase_propensity_inference_preparation" {
- project = google_bigquery_dataset.purchase_propensity.project
- dataset_id = google_bigquery_dataset.purchase_propensity.dataset_id
- table_id = local.config_bigquery.table.purchase_propensity_inference_preparation.table_name
- description = local.config_bigquery.table.purchase_propensity_inference_preparation.table_description
+ project = google_bigquery_dataset.purchase_propensity.project
+ dataset_id = google_bigquery_dataset.purchase_propensity.dataset_id
+ table_id = local.config_bigquery.table.purchase_propensity_inference_preparation.table_name
+ description = local.config_bigquery.table.purchase_propensity_inference_preparation.table_description
# The deletion_protection attribute specifies whether the table should be protected from deletion. In this case, it's set to false, which means that the table can be deleted.
deletion_protection = false
@@ -97,10 +97,10 @@ resource "google_bigquery_table" "purchase_propensity_inference_preparation" {
# This resource creates a BigQuery table named churn_propensity_inference_preparation
# in the dataset specified by google_bigquery_dataset.churn_propensity.dataset_id.
resource "google_bigquery_table" "churn_propensity_inference_preparation" {
- project = google_bigquery_dataset.churn_propensity.project
- dataset_id = google_bigquery_dataset.churn_propensity.dataset_id
- table_id = local.config_bigquery.table.churn_propensity_inference_preparation.table_name
- description = local.config_bigquery.table.churn_propensity_inference_preparation.table_description
+ project = google_bigquery_dataset.churn_propensity.project
+ dataset_id = google_bigquery_dataset.churn_propensity.dataset_id
+ table_id = local.config_bigquery.table.churn_propensity_inference_preparation.table_name
+ description = local.config_bigquery.table.churn_propensity_inference_preparation.table_description
# The deletion_protection attribute specifies whether the table should be protected from deletion. In this case, it's set to false, which means that the table can be deleted.
deletion_protection = false
@@ -112,13 +112,31 @@ resource "google_bigquery_table" "churn_propensity_inference_preparation" {
schema = file("${local.sql_dir}/schema/table/churn_propensity_inference_preparation.json")
}
+# This resource creates a BigQuery table named lead_score_propensity_inference_preparation
+# in the dataset specified by google_bigquery_dataset.lead_score_propensity.dataset_id.
+resource "google_bigquery_table" "lead_score_propensity_inference_preparation" {
+ project = google_bigquery_dataset.lead_score_propensity.project
+ dataset_id = google_bigquery_dataset.lead_score_propensity.dataset_id
+ table_id = local.config_bigquery.table.lead_score_propensity_inference_preparation.table_name
+ description = local.config_bigquery.table.lead_score_propensity_inference_preparation.table_description
+
+ # The deletion_protection attribute specifies whether the table should be protected from deletion. In this case, it's set to false, which means that the table can be deleted.
+ deletion_protection = false
+ labels = {
+ version = "prod"
+ }
+
+ # The schema attribute specifies the schema of the table. In this case, the schema is defined in the JSON file.
+ schema = file("${local.sql_dir}/schema/table/lead_score_propensity_inference_preparation.json")
+}
+
# This resource creates a BigQuery table named purchase_propensity_label
# in the dataset specified by google_bigquery_dataset.feature_store.dataset_id.
resource "google_bigquery_table" "purchase_propensity_label" {
- project = google_bigquery_dataset.feature_store.project
- dataset_id = google_bigquery_dataset.feature_store.dataset_id
- table_id = local.config_bigquery.table.purchase_propensity_label.table_name
- description = local.config_bigquery.table.purchase_propensity_label.table_description
+ project = google_bigquery_dataset.feature_store.project
+ dataset_id = google_bigquery_dataset.feature_store.dataset_id
+ table_id = local.config_bigquery.table.purchase_propensity_label.table_name
+ description = local.config_bigquery.table.purchase_propensity_label.table_description
# The deletion_protection attribute specifies whether the table should be protected from deletion. In this case, it's set to false, which means that the table can be deleted.
deletion_protection = false
@@ -140,10 +158,10 @@ resource "google_bigquery_table" "purchase_propensity_label" {
# This resource creates a BigQuery table named churn_propensity_label
# in the dataset specified by google_bigquery_dataset.feature_store.dataset_id.
resource "google_bigquery_table" "churn_propensity_label" {
- project = google_bigquery_dataset.feature_store.project
- dataset_id = google_bigquery_dataset.feature_store.dataset_id
- table_id = local.config_bigquery.table.churn_propensity_label.table_name
- description = local.config_bigquery.table.churn_propensity_label.table_description
+ project = google_bigquery_dataset.feature_store.project
+ dataset_id = google_bigquery_dataset.feature_store.dataset_id
+ table_id = local.config_bigquery.table.churn_propensity_label.table_name
+ description = local.config_bigquery.table.churn_propensity_label.table_description
# The deletion_protection attribute specifies whether the table should be protected from deletion. In this case, it's set to false, which means that the table can be deleted.
deletion_protection = false
@@ -162,13 +180,38 @@ resource "google_bigquery_table" "churn_propensity_label" {
}
}
+# This resource creates a BigQuery table named lead_score_propensity_label
+# in the dataset specified by google_bigquery_dataset.feature_store.dataset_id.
+resource "google_bigquery_table" "lead_score_propensity_label" {
+ project = google_bigquery_dataset.feature_store.project
+ dataset_id = google_bigquery_dataset.feature_store.dataset_id
+ table_id = local.config_bigquery.table.lead_score_propensity_label.table_name
+ description = local.config_bigquery.table.lead_score_propensity_label.table_description
+
+ # The deletion_protection attribute specifies whether the table should be protected from deletion. In this case, it's set to false, which means that the table can be deleted.
+ deletion_protection = false
+ labels = {
+ version = "prod"
+ }
+
+ # The schema attribute specifies the schema of the table. In this case, the schema is defined in the JSON file.
+ schema = file("${local.sql_dir}/schema/table/lead_score_propensity_label.json")
+
+ # The lifecycle block is used to configure the lifecycle of the table. In this case, the ignore_changes attribute is set to all, which means that Terraform will ignore
+ # any changes to the table and will not attempt to update the table. The prevent_destroy attribute is set to true, which means that Terraform will prevent the table from being destroyed.
+ lifecycle {
+ ignore_changes = all
+ prevent_destroy = true
+ }
+}
+
# This resource creates a BigQuery table named user_dimensions
# in the dataset specified by google_bigquery_dataset.feature_store.dataset_id.
resource "google_bigquery_table" "user_dimensions" {
- project = google_bigquery_dataset.feature_store.project
- dataset_id = google_bigquery_dataset.feature_store.dataset_id
- table_id = local.config_bigquery.table.user_dimensions.table_name
- description = local.config_bigquery.table.user_dimensions.table_description
+ project = google_bigquery_dataset.feature_store.project
+ dataset_id = google_bigquery_dataset.feature_store.dataset_id
+ table_id = local.config_bigquery.table.user_dimensions.table_name
+ description = local.config_bigquery.table.user_dimensions.table_description
# The deletion_protection attribute specifies whether the table should be protected from deletion. In this case, it's set to false, which means that the table can be deleted.
deletion_protection = false
@@ -190,10 +233,10 @@ resource "google_bigquery_table" "user_dimensions" {
# This resource creates a BigQuery table named user_lifetime_dimensions
# in the dataset specified by google_bigquery_dataset.feature_store.dataset_id.
resource "google_bigquery_table" "user_lifetime_dimensions" {
- project = google_bigquery_dataset.feature_store.project
- dataset_id = google_bigquery_dataset.feature_store.dataset_id
- table_id = local.config_bigquery.table.user_lifetime_dimensions.table_name
- description = local.config_bigquery.table.user_lifetime_dimensions.table_description
+ project = google_bigquery_dataset.feature_store.project
+ dataset_id = google_bigquery_dataset.feature_store.dataset_id
+ table_id = local.config_bigquery.table.user_lifetime_dimensions.table_name
+ description = local.config_bigquery.table.user_lifetime_dimensions.table_description
# The deletion_protection attribute specifies whether the table should be protected from deletion. In this case, it's set to false, which means that the table can be deleted.
deletion_protection = false
@@ -215,10 +258,10 @@ resource "google_bigquery_table" "user_lifetime_dimensions" {
# This resource creates a BigQuery table named user_lookback_metrics
# in the dataset specified by google_bigquery_dataset.feature_store.dataset_id.
resource "google_bigquery_table" "user_lookback_metrics" {
- project = google_bigquery_dataset.feature_store.project
- dataset_id = google_bigquery_dataset.feature_store.dataset_id
- table_id = local.config_bigquery.table.user_lookback_metrics.table_name
- description = local.config_bigquery.table.user_lookback_metrics.table_description
+ project = google_bigquery_dataset.feature_store.project
+ dataset_id = google_bigquery_dataset.feature_store.dataset_id
+ table_id = local.config_bigquery.table.user_lookback_metrics.table_name
+ description = local.config_bigquery.table.user_lookback_metrics.table_description
# The deletion_protection attribute specifies whether the table should be protected from deletion. In this case, it's set to false, which means that the table can be deleted.
deletion_protection = false
@@ -240,10 +283,10 @@ resource "google_bigquery_table" "user_lookback_metrics" {
# This resource creates a BigQuery table named user_rolling_window_lifetime_metrics
# in the dataset specified by google_bigquery_dataset.feature_store.dataset_id.
resource "google_bigquery_table" "user_rolling_window_lifetime_metrics" {
- project = google_bigquery_dataset.feature_store.project
- dataset_id = google_bigquery_dataset.feature_store.dataset_id
- table_id = local.config_bigquery.table.user_rolling_window_lifetime_metrics.table_name
- description = local.config_bigquery.table.user_rolling_window_lifetime_metrics.table_description
+ project = google_bigquery_dataset.feature_store.project
+ dataset_id = google_bigquery_dataset.feature_store.dataset_id
+ table_id = local.config_bigquery.table.user_rolling_window_lifetime_metrics.table_name
+ description = local.config_bigquery.table.user_rolling_window_lifetime_metrics.table_description
# The deletion_protection attribute specifies whether the table should be protected from deletion. In this case, it's set to false, which means that the table can be deleted.
deletion_protection = false
@@ -265,10 +308,10 @@ resource "google_bigquery_table" "user_rolling_window_lifetime_metrics" {
# This resource creates a BigQuery table named user_rolling_window_metrics
# in the dataset specified by google_bigquery_dataset.feature_store.dataset_id.
resource "google_bigquery_table" "user_rolling_window_metrics" {
- project = google_bigquery_dataset.feature_store.project
- dataset_id = google_bigquery_dataset.feature_store.dataset_id
- table_id = local.config_bigquery.table.user_rolling_window_metrics.table_name
- description = local.config_bigquery.table.user_rolling_window_metrics.table_description
+ project = google_bigquery_dataset.feature_store.project
+ dataset_id = google_bigquery_dataset.feature_store.dataset_id
+ table_id = local.config_bigquery.table.user_rolling_window_metrics.table_name
+ description = local.config_bigquery.table.user_rolling_window_metrics.table_description
# The deletion_protection attribute specifies whether the table should be protected from deletion. In this case, it's set to false, which means that the table can be deleted.
deletion_protection = false
@@ -287,13 +330,38 @@ resource "google_bigquery_table" "user_rolling_window_metrics" {
}
}
+# This resource creates a BigQuery table named user_rolling_window_lead_metrics
+# in the dataset specified by google_bigquery_dataset.feature_store.dataset_id.
+resource "google_bigquery_table" "user_rolling_window_lead_metrics" {
+ project = google_bigquery_dataset.feature_store.project
+ dataset_id = google_bigquery_dataset.feature_store.dataset_id
+ table_id = local.config_bigquery.table.user_rolling_window_lead_metrics.table_name
+ description = local.config_bigquery.table.user_rolling_window_lead_metrics.table_description
+
+ # The deletion_protection attribute specifies whether the table should be protected from deletion. In this case, it's set to false, which means that the table can be deleted.
+ deletion_protection = false
+ labels = {
+ version = "prod"
+ }
+
+ # The schema attribute specifies the schema of the table. In this case, the schema is defined in the JSON file.
+ schema = file("${local.sql_dir}/schema/table/user_rolling_window_lead_metrics.json")
+
+ # The lifecycle block is used to configure the lifecycle of the table. In this case, the ignore_changes attribute is set to all, which means that Terraform will ignore
+ # any changes to the table and will not attempt to update the table. The prevent_destroy attribute is set to true, which means that Terraform will prevent the table from being destroyed.
+ lifecycle {
+ ignore_changes = all
+ prevent_destroy = true
+ }
+}
+
# This resource creates a BigQuery table named user_scoped_lifetime_metrics
# in the dataset specified by google_bigquery_dataset.feature_store.dataset_id.
resource "google_bigquery_table" "user_scoped_lifetime_metrics" {
- project = google_bigquery_dataset.feature_store.project
- dataset_id = google_bigquery_dataset.feature_store.dataset_id
- table_id = local.config_bigquery.table.user_scoped_lifetime_metrics.table_name
- description = local.config_bigquery.table.user_scoped_lifetime_metrics.table_description
+ project = google_bigquery_dataset.feature_store.project
+ dataset_id = google_bigquery_dataset.feature_store.dataset_id
+ table_id = local.config_bigquery.table.user_scoped_lifetime_metrics.table_name
+ description = local.config_bigquery.table.user_scoped_lifetime_metrics.table_description
# The deletion_protection attribute specifies whether the table should be protected from deletion. In this case, it's set to false, which means that the table can be deleted.
deletion_protection = false
@@ -315,10 +383,10 @@ resource "google_bigquery_table" "user_scoped_lifetime_metrics" {
# This resource creates a BigQuery table named user_scoped_metrics
# in the dataset specified by google_bigquery_dataset.feature_store.dataset_id.
resource "google_bigquery_table" "user_scoped_metrics" {
- project = google_bigquery_dataset.feature_store.project
- dataset_id = google_bigquery_dataset.feature_store.dataset_id
- table_id = local.config_bigquery.table.user_scoped_metrics.table_name
- description = local.config_bigquery.table.user_scoped_metrics.table_description
+ project = google_bigquery_dataset.feature_store.project
+ dataset_id = google_bigquery_dataset.feature_store.dataset_id
+ table_id = local.config_bigquery.table.user_scoped_metrics.table_name
+ description = local.config_bigquery.table.user_scoped_metrics.table_description
# The deletion_protection attribute specifies whether the table should be protected from deletion. In this case, it's set to false, which means that the table can be deleted.
deletion_protection = false
@@ -340,10 +408,10 @@ resource "google_bigquery_table" "user_scoped_metrics" {
# This resource creates a BigQuery table named user_scoped_segmentation_metrics
# in the dataset specified by google_bigquery_dataset.feature_store.dataset_id.
resource "google_bigquery_table" "user_scoped_segmentation_metrics" {
- project = google_bigquery_dataset.feature_store.project
- dataset_id = google_bigquery_dataset.feature_store.dataset_id
- table_id = local.config_bigquery.table.user_scoped_segmentation_metrics.table_name
- description = local.config_bigquery.table.user_scoped_segmentation_metrics.table_description
+ project = google_bigquery_dataset.feature_store.project
+ dataset_id = google_bigquery_dataset.feature_store.dataset_id
+ table_id = local.config_bigquery.table.user_scoped_segmentation_metrics.table_name
+ description = local.config_bigquery.table.user_scoped_segmentation_metrics.table_description
# The deletion_protection attribute specifies whether the table should be protected from deletion. In this case, it's set to false, which means that the table can be deleted.
deletion_protection = false
@@ -365,10 +433,10 @@ resource "google_bigquery_table" "user_scoped_segmentation_metrics" {
# This resource creates a BigQuery table named user_segmentation_dimensions
# in the dataset specified by google_bigquery_dataset.feature_store.dataset_id.
resource "google_bigquery_table" "user_segmentation_dimensions" {
- project = google_bigquery_dataset.feature_store.project
- dataset_id = google_bigquery_dataset.feature_store.dataset_id
- table_id = local.config_bigquery.table.user_segmentation_dimensions.table_name
- description = local.config_bigquery.table.user_segmentation_dimensions.table_description
+ project = google_bigquery_dataset.feature_store.project
+ dataset_id = google_bigquery_dataset.feature_store.dataset_id
+ table_id = local.config_bigquery.table.user_segmentation_dimensions.table_name
+ description = local.config_bigquery.table.user_segmentation_dimensions.table_description
# The deletion_protection attribute specifies whether the table should be protected from deletion. In this case, it's set to false, which means that the table can be deleted.
deletion_protection = false
@@ -390,10 +458,10 @@ resource "google_bigquery_table" "user_segmentation_dimensions" {
# This resource creates a BigQuery table named user_session_event_aggregated_metrics
# in the dataset specified by google_bigquery_dataset.feature_store.dataset_id.
resource "google_bigquery_table" "user_session_event_aggregated_metrics" {
- project = google_bigquery_dataset.feature_store.project
- dataset_id = google_bigquery_dataset.feature_store.dataset_id
- table_id = local.config_bigquery.table.user_session_event_aggregated_metrics.table_name
- description = local.config_bigquery.table.user_session_event_aggregated_metrics.table_description
+ project = google_bigquery_dataset.feature_store.project
+ dataset_id = google_bigquery_dataset.feature_store.dataset_id
+ table_id = local.config_bigquery.table.user_session_event_aggregated_metrics.table_name
+ description = local.config_bigquery.table.user_session_event_aggregated_metrics.table_description
# The deletion_protection attribute specifies whether the table should be protected from deletion. In this case, it's set to false, which means that the table can be deleted.
deletion_protection = false
diff --git a/infrastructure/terraform/modules/feature-store/main.tf b/infrastructure/terraform/modules/feature-store/main.tf
index a17662bb..9f1a8d7b 100644
--- a/infrastructure/terraform/modules/feature-store/main.tf
+++ b/infrastructure/terraform/modules/feature-store/main.tf
@@ -21,10 +21,10 @@ locals {
config_bigquery = local.config_vars.bigquery
feature_store_project_id = local.config_vars.bigquery.dataset.feature_store.project_id
sql_dir = var.sql_dir_input
- poetry_run_alias = "${var.poetry_cmd} run"
builder_repository_id = "marketing-analytics-jumpstart-base-repo"
purchase_propensity_project_id = null_resource.check_bigquery_api.id != "" ? local.config_vars.bigquery.dataset.purchase_propensity.project_id : local.feature_store_project_id
churn_propensity_project_id = null_resource.check_bigquery_api.id != "" ? local.config_vars.bigquery.dataset.churn_propensity.project_id : local.feature_store_project_id
+ lead_score_propensity_project_id = null_resource.check_bigquery_api.id != "" ? local.config_vars.bigquery.dataset.lead_score_propensity.project_id : local.feature_store_project_id
audience_segmentation_project_id = null_resource.check_bigquery_api.id != "" ? local.config_vars.bigquery.dataset.audience_segmentation.project_id : local.feature_store_project_id
auto_audience_segmentation_project_id = null_resource.check_bigquery_api.id != "" ? local.config_vars.bigquery.dataset.auto_audience_segmentation.project_id : local.feature_store_project_id
aggregated_vbb_project_id = null_resource.check_bigquery_api.id != "" ? local.config_vars.bigquery.dataset.aggregated_vbb.project_id : local.feature_store_project_id
@@ -35,9 +35,9 @@ locals {
module "project_services" {
source = "terraform-google-modules/project-factory/google//modules/project_services"
- version = "14.1.0"
+ version = "17.0.0"
- disable_dependent_services = true
+ disable_dependent_services = false
disable_services_on_destroy = false
project_id = local.feature_store_project_id
@@ -113,10 +113,18 @@ resource "null_resource" "check_aiplatform_api" {
## Note: The cloud resource nested object has only one output only field - serviceAccountId.
resource "google_bigquery_connection" "vertex_ai_connection" {
connection_id = "vertex_ai"
- project = null_resource.check_aiplatform_api.id != "" ? module.project_services.project_id : local.feature_store_project_id
- location = local.config_bigquery.region
+ project = null_resource.check_aiplatform_api.id != "" ? module.project_services.project_id : local.feature_store_project_id
+ location = local.config_bigquery.region
cloud_resource {}
-}
+
+ # The lifecycle block is used to configure the lifecycle of the table. In this case, the ignore_changes attribute is set to all, which means that Terraform will ignore
+ # any changes to the table and will not attempt to update the table. The prevent_destroy attribute is set to true, which means that Terraform will prevent the table from being destroyed.
+ lifecycle {
+ ignore_changes = all
+ #prevent_destroy = true
+ create_before_destroy = true
+ }
+}
# This resource binds the service account to the required roles
@@ -125,8 +133,8 @@ resource "google_project_iam_member" "vertex_ai_connection_sa_roles" {
module.project_services,
null_resource.check_aiplatform_api,
google_bigquery_connection.vertex_ai_connection
- ]
-
+ ]
+
project = null_resource.check_aiplatform_api.id != "" ? module.project_services.project_id : local.feature_store_project_id
member = "serviceAccount:${google_bigquery_connection.vertex_ai_connection.cloud_resource[0].service_account_id}"
@@ -140,5 +148,58 @@ resource "google_project_iam_member" "vertex_ai_connection_sa_roles" {
"roles/bigquery.connectionAdmin"
])
role = each.key
+
+ # The lifecycle block is used to configure the lifecycle of the table. In this case, the ignore_changes attribute is set to all, which means that Terraform will ignore
+ # any changes to the table and will not attempt to update the table. The prevent_destroy attribute is set to true, which means that Terraform will prevent the table from being destroyed.
+ lifecycle {
+ ignore_changes = all
+ #prevent_destroy = true
+ create_before_destroy = true
+ }
}
+# Propagation time for change of access policy typically takes 2 minutes
+# according to https://cloud.google.com/iam/docs/access-change-propagation
+# this wait make sure the policy changes are propagated before proceeding
+# with the build
+resource "time_sleep" "wait_for_vertex_ai_connection_sa_role_propagation" {
+ create_duration = "120s"
+ depends_on = [
+ google_project_iam_member.vertex_ai_connection_sa_roles
+ ]
+
+ # The lifecycle block is used to configure the lifecycle of the table. In this case, the ignore_changes attribute is set to all, which means that Terraform will ignore
+ # any changes to the table and will not attempt to update the table. The prevent_destroy attribute is set to true, which means that Terraform will prevent the table from being destroyed.
+ lifecycle {
+ ignore_changes = all
+ #prevent_destroy = true
+ create_before_destroy = true
+ }
+}
+
+
+#module "vertex_ai_connection_sa_roles" {
+# source = "terraform-google-modules/iam/google//modules/member_iam"
+# version = "~> 8.0"
+#
+# service_account_address = google_bigquery_connection.vertex_ai_connection.cloud_resource[0].service_account_id
+# project_id = null_resource.check_aiplatform_api.id != "" ? module.project_services.project_id : local.feature_store_project_id
+# project_roles = [
+# "roles/bigquery.jobUser",
+# "roles/bigquery.dataEditor",
+# "roles/storage.admin",
+# "roles/storage.objectViewer",
+# "roles/aiplatform.user",
+# "roles/bigquery.connectionUser",
+# "roles/bigquery.connectionAdmin"
+# ]
+# prefix = "serviceAccount"
+#
+# depends_on = [
+# module.project_services,
+# null_resource.check_aiplatform_api,
+# google_bigquery_connection.vertex_ai_connection
+# ]
+#
+#}
+
diff --git a/infrastructure/terraform/modules/feature-store/variables.tf b/infrastructure/terraform/modules/feature-store/variables.tf
index d20b92b7..a9bc07a5 100644
--- a/infrastructure/terraform/modules/feature-store/variables.tf
+++ b/infrastructure/terraform/modules/feature-store/variables.tf
@@ -37,8 +37,8 @@ variable "sql_dir_input" {
description = "SQL queries directory"
}
-variable "poetry_cmd" {
- description = "alias for poetry command on the current system"
+variable "uv_run_alias" {
+ description = "alias for uv run command on the current system"
type = string
- default = "poetry"
+ default = "uv run"
}
diff --git a/infrastructure/terraform/modules/feature-store/versions.tf b/infrastructure/terraform/modules/feature-store/versions.tf
index 5a896e28..29fe3151 100644
--- a/infrastructure/terraform/modules/feature-store/versions.tf
+++ b/infrastructure/terraform/modules/feature-store/versions.tf
@@ -20,7 +20,12 @@ terraform {
required_providers {
google = {
source = "hashicorp/google"
- version = ">= 3.43.0, >= 3.53.0, >= 3.63.0, >= 4.83.0, < 5.0.0, < 6.0.0"
+ version = "5.44.1"
+ }
+
+ google-beta = {
+ source = "hashicorp/google-beta"
+ version = "5.44.1"
}
}
diff --git a/infrastructure/terraform/modules/monitor/main.tf b/infrastructure/terraform/modules/monitor/main.tf
index 21252202..4129828b 100644
--- a/infrastructure/terraform/modules/monitor/main.tf
+++ b/infrastructure/terraform/modules/monitor/main.tf
@@ -28,11 +28,13 @@ locals {
activation_project_url = "${local.p_key}=${var.activation_project_id}"
mds_dataform_repo = "marketing-analytics"
+
+ purchase_propensity_dataset = "purchase_propensity"
}
module "project_services" {
source = "terraform-google-modules/project-factory/google//modules/project_services"
- version = "14.1.0"
+ version = "17.0.0"
disable_dependent_services = false
disable_services_on_destroy = false
@@ -72,7 +74,7 @@ resource "null_resource" "check_bigquery_api" {
module "dashboard_bigquery" {
source = "terraform-google-modules/bigquery/google"
- version = "~> 5.4"
+ version = "8.1.0"
dataset_id = local.dashboard_dataset_name
dataset_name = local.dashboard_dataset_name
@@ -95,7 +97,7 @@ module "dashboard_bigquery" {
module "load_bucket" {
source = "terraform-google-modules/cloud-storage/google//modules/simple_bucket"
- version = "~> 3.4.1"
+ version = "6.1.0"
project_id = module.project_services.project_id
name = "maj-monitor-${module.project_services.project_id}"
location = var.location
@@ -163,7 +165,7 @@ locals {
module "log_export_bigquery" {
source = "terraform-google-modules/bigquery/google"
- version = "~> 5.4"
+ version = "8.1.0"
dataset_id = local.log_dataset_name
dataset_name = local.log_dataset_name
@@ -259,3 +261,29 @@ data "template_file" "looker_studio_dashboard_url" {
dataflow_log_table_id = local.dataflow_log_table_id
}
}
+
+data "template_file" "purchase_propensity_prediction_stats_query" {
+ template = file("${local.source_root_dir}/templates/purchase_propensity_smart_bidding_view.sql.tpl")
+ vars = {
+ project_id = var.feature_store_project_id
+ purchase_propensity_dataset = local.purchase_propensity_dataset
+ activation_dataset = "activation"
+ }
+}
+
+data "google_bigquery_dataset" "purchase_propensity_dataset" {
+ dataset_id = local.purchase_propensity_dataset
+ project = var.feature_store_project_id
+}
+
+resource "google_bigquery_table" "purchase_propensity_prediction_stats" {
+ project = var.feature_store_project_id
+ dataset_id = data.google_bigquery_dataset.purchase_propensity_dataset.dataset_id
+ table_id = "purchase_propensity_prediction_stats"
+ deletion_protection = false
+
+ view {
+ query = data.template_file.purchase_propensity_prediction_stats_query.rendered
+ use_legacy_sql = false
+ }
+}
diff --git a/infrastructure/terraform/modules/monitor/versions.tf b/infrastructure/terraform/modules/monitor/versions.tf
index 5a896e28..29fe3151 100644
--- a/infrastructure/terraform/modules/monitor/versions.tf
+++ b/infrastructure/terraform/modules/monitor/versions.tf
@@ -20,7 +20,12 @@ terraform {
required_providers {
google = {
source = "hashicorp/google"
- version = ">= 3.43.0, >= 3.53.0, >= 3.63.0, >= 4.83.0, < 5.0.0, < 6.0.0"
+ version = "5.44.1"
+ }
+
+ google-beta = {
+ source = "hashicorp/google-beta"
+ version = "5.44.1"
}
}
diff --git a/infrastructure/terraform/modules/pipelines/main.tf b/infrastructure/terraform/modules/pipelines/main.tf
index 05d4969b..5de45c5d 100644
--- a/infrastructure/terraform/modules/pipelines/main.tf
+++ b/infrastructure/terraform/modules/pipelines/main.tf
@@ -35,9 +35,9 @@ locals {
module "project_services" {
source = "terraform-google-modules/project-factory/google//modules/project_services"
- version = "14.1.0"
+ version = "17.0.0"
- disable_dependent_services = true
+ disable_dependent_services = false
disable_services_on_destroy = false
project_id = local.pipeline_vars.project_id
@@ -52,7 +52,9 @@ module "project_services" {
"artifactregistry.googleapis.com",
"aiplatform.googleapis.com",
"dataflow.googleapis.com",
- "bigqueryconnection.googleapis.com"
+ "bigqueryconnection.googleapis.com",
+ "servicenetworking.googleapis.com",
+ "compute.googleapis.com"
]
}
@@ -159,4 +161,30 @@ resource "null_resource" "check_artifactregistry_api" {
depends_on = [
module.project_services
]
-}
\ No newline at end of file
+}
+
+# This resource executes gcloud commands to check whether the Service Networking API is enabled.
+# Since enabling APIs can take a few seconds, we need to make the deployment wait until the API is enabled before resuming.
+resource "null_resource" "check_servicenetworking_api" {
+ provisioner "local-exec" {
+ command = <<-EOT
+ COUNTER=0
+ MAX_TRIES=100
+ while ! gcloud services list --project=${module.project_services.project_id} | grep -i "servicenetworking.googleapis.com" && [ $COUNTER -lt $MAX_TRIES ]
+ do
+ sleep 6
+ printf "."
+ COUNTER=$((COUNTER + 1))
+ done
+ if [ $COUNTER -eq $MAX_TRIES ]; then
+ echo "service networking api is not enabled, terraform can not continue!"
+ exit 1
+ fi
+ sleep 20
+ EOT
+ }
+
+ depends_on = [
+ module.project_services
+ ]
+}
diff --git a/infrastructure/terraform/modules/pipelines/pipelines.tf b/infrastructure/terraform/modules/pipelines/pipelines.tf
index 1c36ef62..0074a536 100644
--- a/infrastructure/terraform/modules/pipelines/pipelines.tf
+++ b/infrastructure/terraform/modules/pipelines/pipelines.tf
@@ -18,6 +18,14 @@ resource "google_service_account" "service_account" {
account_id = local.pipeline_vars.service_account_id
display_name = local.pipeline_vars.service_account_id
description = "Service Account to run Vertex AI Pipelines"
+
+ # The lifecycle block is used to configure the lifecycle of the table. In this case, the ignore_changes attribute is set to all, which means that Terraform will ignore
+ # any changes to the table and will not attempt to update the table. The prevent_destroy attribute is set to true, which means that Terraform will prevent the table from being destroyed.
+ lifecycle {
+ ignore_changes = all
+ #prevent_destroy = true
+ create_before_destroy = true
+ }
}
# Wait for the pipelines service account to be created
@@ -53,8 +61,8 @@ resource "google_project_iam_member" "pipelines_sa_roles" {
module.project_services,
null_resource.check_aiplatform_api,
null_resource.wait_for_vertex_pipelines_sa_creation
- ]
-
+ ]
+
project = null_resource.check_aiplatform_api.id != "" ? module.project_services.project_id : local.pipeline_vars.project_id
member = "serviceAccount:${google_service_account.service_account.email}"
@@ -68,9 +76,18 @@ resource "google_project_iam_member" "pipelines_sa_roles" {
"roles/artifactregistry.reader",
"roles/pubsub.publisher",
"roles/dataflow.developer",
- "roles/bigquery.connectionUser"
+ "roles/bigquery.connectionUser",
+ "roles/compute.networkUser"
])
role = each.key
+
+ # The lifecycle block is used to configure the lifecycle of the table. In this case, the ignore_changes attribute is set to all, which means that Terraform will ignore
+ # any changes to the table and will not attempt to update the table. The prevent_destroy attribute is set to true, which means that Terraform will prevent the table from being destroyed.
+ lifecycle {
+ ignore_changes = all
+ #prevent_destroy = true
+ create_before_destroy = true
+ }
}
# This resource binds the service account to the required roles in the mds project
@@ -79,8 +96,8 @@ resource "google_project_iam_member" "pipelines_sa_mds_project_roles" {
module.project_services,
null_resource.check_aiplatform_api,
null_resource.wait_for_vertex_pipelines_sa_creation
- ]
-
+ ]
+
project = null_resource.check_bigquery_api.id != "" ? module.project_services.project_id : local.pipeline_vars.project_id
member = "serviceAccount:${google_service_account.service_account.email}"
@@ -88,6 +105,14 @@ resource "google_project_iam_member" "pipelines_sa_mds_project_roles" {
"roles/bigquery.dataViewer"
])
role = each.key
+
+ # The lifecycle block is used to configure the lifecycle of the table. In this case, the ignore_changes attribute is set to all, which means that Terraform will ignore
+ # any changes to the table and will not attempt to update the table. The prevent_destroy attribute is set to true, which means that Terraform will prevent the table from being destroyed.
+ lifecycle {
+ ignore_changes = all
+ #prevent_destroy = true
+ create_before_destroy = true
+ }
}
# This resource creates a service account to run the dataflow jobs
@@ -96,6 +121,14 @@ resource "google_service_account" "dataflow_worker_service_account" {
account_id = local.dataflow_vars.worker_service_account_id
display_name = local.dataflow_vars.worker_service_account_id
description = "Service Account to run Dataflow jobs"
+
+ # The lifecycle block is used to configure the lifecycle of the table. In this case, the ignore_changes attribute is set to all, which means that Terraform will ignore
+ # any changes to the table and will not attempt to update the table. The prevent_destroy attribute is set to true, which means that Terraform will prevent the table from being destroyed.
+ lifecycle {
+ ignore_changes = all
+ #prevent_destroy = true
+ create_before_destroy = true
+ }
}
# Wait for the dataflow worker service account to be created
@@ -130,8 +163,8 @@ resource "google_project_iam_member" "dataflow_worker_sa_roles" {
module.project_services,
null_resource.check_dataflow_api,
null_resource.wait_for_dataflow_worker_sa_creation
- ]
-
+ ]
+
project = null_resource.check_dataflow_api.id != "" ? module.project_services.project_id : local.pipeline_vars.project_id
member = "serviceAccount:${google_service_account.dataflow_worker_service_account.email}"
@@ -142,6 +175,14 @@ resource "google_project_iam_member" "dataflow_worker_sa_roles" {
"roles/storage.objectAdmin",
])
role = each.key
+
+ # The lifecycle block is used to configure the lifecycle of the table. In this case, the ignore_changes attribute is set to all, which means that Terraform will ignore
+ # any changes to the table and will not attempt to update the table. The prevent_destroy attribute is set to true, which means that Terraform will prevent the table from being destroyed.
+ lifecycle {
+ ignore_changes = all
+ #prevent_destroy = true
+ create_before_destroy = true
+ }
}
# This resource binds the service account to the required roles
@@ -151,11 +192,19 @@ resource "google_service_account_iam_member" "dataflow_sa_iam" {
module.project_services,
null_resource.check_dataflow_api,
null_resource.wait_for_dataflow_worker_sa_creation
- ]
-
+ ]
+
service_account_id = "projects/${module.project_services.project_id}/serviceAccounts/${google_service_account.dataflow_worker_service_account.email}"
role = "roles/iam.serviceAccountUser"
member = "serviceAccount:${google_service_account.service_account.email}"
+
+ # The lifecycle block is used to configure the lifecycle of the table. In this case, the ignore_changes attribute is set to all, which means that Terraform will ignore
+ # any changes to the table and will not attempt to update the table. The prevent_destroy attribute is set to true, which means that Terraform will prevent the table from being destroyed.
+ lifecycle {
+ ignore_changes = all
+ #prevent_destroy = true
+ create_before_destroy = true
+ }
}
# This resource creates a Cloud Storage Bucket for the pipeline artifacts
@@ -167,14 +216,14 @@ resource "google_storage_bucket" "pipelines_bucket" {
uniform_bucket_level_access = true
# The force_destroy attribute specifies whether the bucket should be forcibly destroyed
# even if it contains objects. In this case, it's set to false, which means that the bucket will not be destroyed if it contains objects.
- force_destroy = false
+ force_destroy = false
- # The lifecycle block allows you to configure the lifecycle of the bucket.
- # In this case, the ignore_changes attribute is set to all, which means that Terraform
- # will ignore any changes to the bucket's lifecycle configuration. The prevent_destroy attribute is set to false, which means that the bucket can be destroyed.
+ # The lifecycle block is used to configure the lifecycle of the table. In this case, the ignore_changes attribute is set to all, which means that Terraform will ignore
+ # any changes to the table and will not attempt to update the table. The prevent_destroy attribute is set to true, which means that Terraform will prevent the table from being destroyed.
lifecycle {
ignore_changes = all
- prevent_destroy = false ##true
+ #prevent_destroy = true
+ create_before_destroy = true
}
}
@@ -187,14 +236,14 @@ resource "google_storage_bucket" "custom_model_bucket" {
uniform_bucket_level_access = true
# The force_destroy attribute specifies whether the bucket should be forcibly destroyed
# even if it contains objects. In this case, it's set to false, which means that the bucket will not be destroyed if it contains objects.
- force_destroy = false
+ force_destroy = false
- # The lifecycle block allows you to configure the lifecycle of the bucket.
- # In this case, the ignore_changes attribute is set to all, which means that Terraform
- # will ignore any changes to the bucket's lifecycle configuration. The prevent_destroy attribute is set to false, which means that the bucket can be destroyed.
+ # The lifecycle block is used to configure the lifecycle of the table. In this case, the ignore_changes attribute is set to all, which means that Terraform will ignore
+ # any changes to the table and will not attempt to update the table. The prevent_destroy attribute is set to true, which means that Terraform will prevent the table from being destroyed.
lifecycle {
ignore_changes = all
- prevent_destroy = false ##true
+ #prevent_destroy = true
+ create_before_destroy = true
}
}
@@ -246,7 +295,7 @@ resource "google_artifact_registry_repository" "pipelines-repo" {
repository_id = local.artifact_registry_vars.pipelines_repo.name
description = "Pipelines Repository"
# The format is kubeflow pipelines YAML files.
- format = "KFP"
+ format = "KFP"
# The lifecycle block of the google_artifact_registry_repository resource defines a precondition that
# checks if the specified region is included in the vertex_pipelines_available_locations list.
@@ -266,7 +315,7 @@ resource "google_artifact_registry_repository" "pipelines_docker_repo" {
repository_id = local.artifact_registry_vars.pipelines_docker_repo.name
description = "Docker Images Repository"
# The format is Docker images.
- format = "DOCKER"
+ format = "DOCKER"
}
locals {
@@ -308,13 +357,12 @@ resource "null_resource" "build_push_pipelines_components_image" {
docker_repo_id = google_artifact_registry_repository.pipelines_docker_repo.id
docker_repo_create_time = google_artifact_registry_repository.pipelines_docker_repo.create_time
source_content_hash = local.component_image_content_hash
- poetry_installed = var.poetry_installed
}
# The provisioner block specifies the command that will be executed to build and push the base component image.
# This command will execute the build-push function in the base_component_image module, which will build and push the base component image to the specified Docker repository.
provisioner "local-exec" {
- command = "${var.poetry_run_alias} python -m base_component_image.build-push -c ${local.config_file_path_relative_python_run_dir}"
+ command = "${var.uv_run_alias} python -m base_component_image.build-push -c ${local.config_file_path_relative_python_run_dir}"
working_dir = self.triggers.working_dir
}
}
@@ -350,6 +398,31 @@ resource "null_resource" "check_pipeline_docker_image_pushed" {
## Feature Engineering Pipelines
#######
+# This resource is used to compile and upload the Vertex AI pipeline for feature engineering - lead score propensity use case
+resource "null_resource" "compile_feature_engineering_lead_score_propensity_pipeline" {
+ triggers = {
+ working_dir = "${local.source_root_dir}/python"
+ tag = local.compile_pipelines_tag
+ pipelines_repo_id = google_artifact_registry_repository.pipelines-repo.id
+ pipelines_repo_create_time = google_artifact_registry_repository.pipelines-repo.create_time
+ source_content_hash = local.pipelines_content_hash
+ upstream_resource_dependency = null_resource.check_pipeline_docker_image_pushed.id
+ }
+
+ # The provisioner block specifies the command that will be executed to compile and upload the pipeline.
+ # This command will execute the compiler function in the pipelines module, which will compile the pipeline YAML file, and the uploader function,
+ # which will upload the pipeline YAML file to the specified Artifact Registry repository. The scheduler function will then schedule the pipeline to run on a regular basis.
+ provisioner "local-exec" {
+ command = <<-EOT
+ ${var.uv_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.feature-creation-lead-score-propensity.execution -o fe_lead_score_propensity.yaml
+ ${var.uv_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f fe_lead_score_propensity.yaml -t ${self.triggers.tag} -t latest
+ ${var.uv_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.feature-creation-lead-score-propensity.execution -i fe_lead_score_propensity.yaml
+ EOT
+ working_dir = self.triggers.working_dir
+ }
+}
+
+
# This resource is used to compile and upload the Vertex AI pipeline for feature engineering - auto audience segmentation use case
resource "null_resource" "compile_feature_engineering_auto_audience_segmentation_pipeline" {
triggers = {
@@ -358,7 +431,7 @@ resource "null_resource" "compile_feature_engineering_auto_audience_segmentation
pipelines_repo_id = google_artifact_registry_repository.pipelines-repo.id
pipelines_repo_create_time = google_artifact_registry_repository.pipelines-repo.create_time
source_content_hash = local.pipelines_content_hash
- upstream_resource_dependency = null_resource.build_push_pipelines_components_image.id
+ upstream_resource_dependency = null_resource.compile_feature_engineering_lead_score_propensity_pipeline.id
}
# The provisioner block specifies the command that will be executed to compile and upload the pipeline.
@@ -366,9 +439,9 @@ resource "null_resource" "compile_feature_engineering_auto_audience_segmentation
# which will upload the pipeline YAML file to the specified Artifact Registry repository. The scheduler function will then schedule the pipeline to run on a regular basis.
provisioner "local-exec" {
command = <<-EOT
- ${var.poetry_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.feature-creation-auto-audience-segmentation.execution -o fe_auto_audience_segmentation.yaml
- ${var.poetry_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f fe_auto_audience_segmentation.yaml -t ${self.triggers.tag} -t latest
- ${var.poetry_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.feature-creation-auto-audience-segmentation.execution -i fe_auto_audience_segmentation.yaml
+ ${var.uv_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.feature-creation-auto-audience-segmentation.execution -o fe_auto_audience_segmentation.yaml
+ ${var.uv_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f fe_auto_audience_segmentation.yaml -t ${self.triggers.tag} -t latest
+ ${var.uv_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.feature-creation-auto-audience-segmentation.execution -i fe_auto_audience_segmentation.yaml
EOT
working_dir = self.triggers.working_dir
}
@@ -390,9 +463,9 @@ resource "null_resource" "compile_feature_engineering_aggregated_value_based_bid
# which will upload the pipeline YAML file to the specified Artifact Registry repository. The scheduler function will then schedule the pipeline to run on a regular basis.
provisioner "local-exec" {
command = <<-EOT
- ${var.poetry_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.feature-creation-aggregated-value-based-bidding.execution -o fe_agg_vbb.yaml
- ${var.poetry_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f fe_agg_vbb.yaml -t ${self.triggers.tag} -t latest
- ${var.poetry_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.feature-creation-aggregated-value-based-bidding.execution -i fe_agg_vbb.yaml
+ ${var.uv_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.feature-creation-aggregated-value-based-bidding.execution -o fe_agg_vbb.yaml
+ ${var.uv_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f fe_agg_vbb.yaml -t ${self.triggers.tag} -t latest
+ ${var.uv_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.feature-creation-aggregated-value-based-bidding.execution -i fe_agg_vbb.yaml
EOT
working_dir = self.triggers.working_dir
}
@@ -414,9 +487,9 @@ resource "null_resource" "compile_feature_engineering_audience_segmentation_pipe
# which will upload the pipeline YAML file to the specified Artifact Registry repository. The scheduler function will then schedule the pipeline to run on a regular basis.
provisioner "local-exec" {
command = <<-EOT
- ${var.poetry_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.feature-creation-audience-segmentation.execution -o fe_audience_segmentation.yaml
- ${var.poetry_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f fe_audience_segmentation.yaml -t ${self.triggers.tag} -t latest
- ${var.poetry_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.feature-creation-audience-segmentation.execution -i fe_audience_segmentation.yaml
+ ${var.uv_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.feature-creation-audience-segmentation.execution -o fe_audience_segmentation.yaml
+ ${var.uv_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f fe_audience_segmentation.yaml -t ${self.triggers.tag} -t latest
+ ${var.uv_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.feature-creation-audience-segmentation.execution -i fe_audience_segmentation.yaml
EOT
working_dir = self.triggers.working_dir
}
@@ -438,9 +511,9 @@ resource "null_resource" "compile_feature_engineering_purchase_propensity_pipeli
# which will upload the pipeline YAML file to the specified Artifact Registry repository. The scheduler function will then schedule the pipeline to run on a regular basis.
provisioner "local-exec" {
command = <<-EOT
- ${var.poetry_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.feature-creation-purchase-propensity.execution -o fe_purchase_propensity.yaml
- ${var.poetry_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f fe_purchase_propensity.yaml -t ${self.triggers.tag} -t latest
- ${var.poetry_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.feature-creation-purchase-propensity.execution -i fe_purchase_propensity.yaml
+ ${var.uv_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.feature-creation-purchase-propensity.execution -o fe_purchase_propensity.yaml
+ ${var.uv_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f fe_purchase_propensity.yaml -t ${self.triggers.tag} -t latest
+ ${var.uv_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.feature-creation-purchase-propensity.execution -i fe_purchase_propensity.yaml
EOT
working_dir = self.triggers.working_dir
}
@@ -462,9 +535,9 @@ resource "null_resource" "compile_feature_engineering_churn_propensity_pipeline"
# which will upload the pipeline YAML file to the specified Artifact Registry repository. The scheduler function will then schedule the pipeline to run on a regular basis.
provisioner "local-exec" {
command = <<-EOT
- ${var.poetry_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.feature-creation-churn-propensity.execution -o fe_churn_propensity.yaml
- ${var.poetry_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f fe_churn_propensity.yaml -t ${self.triggers.tag} -t latest
- ${var.poetry_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.feature-creation-churn-propensity.execution -i fe_churn_propensity.yaml
+ ${var.uv_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.feature-creation-churn-propensity.execution -o fe_churn_propensity.yaml
+ ${var.uv_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f fe_churn_propensity.yaml -t ${self.triggers.tag} -t latest
+ ${var.uv_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.feature-creation-churn-propensity.execution -i fe_churn_propensity.yaml
EOT
working_dir = self.triggers.working_dir
}
@@ -486,9 +559,9 @@ resource "null_resource" "compile_feature_engineering_customer_lifetime_value_pi
# which will upload the pipeline YAML file to the specified Artifact Registry repository. The scheduler function will then schedule the pipeline to run on a regular basis.
provisioner "local-exec" {
command = <<-EOT
- ${var.poetry_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.feature-creation-customer-ltv.execution -o fe_customer_ltv.yaml
- ${var.poetry_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f fe_customer_ltv.yaml -t ${self.triggers.tag} -t latest
- ${var.poetry_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.feature-creation-customer-ltv.execution -i fe_customer_ltv.yaml
+ ${var.uv_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.feature-creation-customer-ltv.execution -o fe_customer_ltv.yaml
+ ${var.uv_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f fe_customer_ltv.yaml -t ${self.triggers.tag} -t latest
+ ${var.uv_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.feature-creation-customer-ltv.execution -i fe_customer_ltv.yaml
EOT
working_dir = self.triggers.working_dir
}
@@ -498,12 +571,54 @@ resource "null_resource" "compile_feature_engineering_customer_lifetime_value_pi
## Training and Inference Pipelines
###
+# This resource is used to compile and upload the Vertex AI pipeline for training the propensity model - lead score propensity use case
+resource "null_resource" "compile_lead_score_propensity_training_pipelines" {
+ triggers = {
+ working_dir = "${local.source_root_dir}/python"
+ tag = local.compile_pipelines_tag
+ upstream_resource_dependency = null_resource.compile_feature_engineering_customer_lifetime_value_pipeline.id
+ }
+
+ # The provisioner block specifies the command that will be executed to compile and upload the pipeline.
+ # This command will execute the compiler function in the pipelines module, which will compile the pipeline YAML file, and the uploader function,
+ # which will upload the pipeline YAML file to the specified Artifact Registry repository. The scheduler function will then schedule the pipeline to run on a regular basis.
+ provisioner "local-exec" {
+ command = <<-EOT
+ ${var.uv_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.lead_score_propensity.training -o lead_score_propensity_training.yaml
+ ${var.uv_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f lead_score_propensity_training.yaml -t ${self.triggers.tag} -t latest
+ ${var.uv_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.lead_score_propensity.training -i lead_score_propensity_training.yaml
+ EOT
+ working_dir = self.triggers.working_dir
+ }
+}
+
+# This resource is used to compile and upload the Vertex AI pipeline for prediction using the propensity model - lead score propensity use case
+resource "null_resource" "compile_lead_score_propensity_prediction_pipelines" {
+ triggers = {
+ working_dir = "${local.source_root_dir}/python"
+ tag = local.compile_pipelines_tag
+ upstream_resource_dependency = null_resource.compile_lead_score_propensity_training_pipelines.id
+ }
+
+ # The provisioner block specifies the command that will be executed to compile and upload the pipeline.
+ # This command will execute the compiler function in the pipelines module, which will compile the pipeline YAML file, and the uploader function,
+ # which will upload the pipeline YAML file to the specified Artifact Registry repository. The scheduler function will then schedule the pipeline to run on a regular basis.
+ provisioner "local-exec" {
+ command = <<-EOT
+ ${var.uv_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.lead_score_propensity.prediction -o lead_score_propensity_prediction.yaml
+ ${var.uv_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f lead_score_propensity_prediction.yaml -t ${self.triggers.tag} -t latest
+ ${var.uv_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.lead_score_propensity.prediction -i lead_score_propensity_prediction.yaml
+ EOT
+ working_dir = self.triggers.working_dir
+ }
+}
+
# This resource is used to compile and upload the Vertex AI pipeline for training the propensity model - purchase propensity use case
resource "null_resource" "compile_purchase_propensity_training_pipelines" {
triggers = {
working_dir = "${local.source_root_dir}/python"
tag = local.compile_pipelines_tag
- upstream_resource_dependency = null_resource.compile_feature_engineering_customer_lifetime_value_pipeline.id
+ upstream_resource_dependency = null_resource.compile_lead_score_propensity_prediction_pipelines.id
}
# The provisioner block specifies the command that will be executed to compile and upload the pipeline.
@@ -511,9 +626,9 @@ resource "null_resource" "compile_purchase_propensity_training_pipelines" {
# which will upload the pipeline YAML file to the specified Artifact Registry repository. The scheduler function will then schedule the pipeline to run on a regular basis.
provisioner "local-exec" {
command = <<-EOT
- ${var.poetry_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.purchase_propensity.training -o purchase_propensity_training.yaml
- ${var.poetry_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f purchase_propensity_training.yaml -t ${self.triggers.tag} -t latest
- ${var.poetry_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.purchase_propensity.training -i purchase_propensity_training.yaml
+ ${var.uv_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.purchase_propensity.training -o purchase_propensity_training.yaml
+ ${var.uv_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f purchase_propensity_training.yaml -t ${self.triggers.tag} -t latest
+ ${var.uv_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.purchase_propensity.training -i purchase_propensity_training.yaml
EOT
working_dir = self.triggers.working_dir
}
@@ -532,9 +647,9 @@ resource "null_resource" "compile_purchase_propensity_prediction_pipelines" {
# which will upload the pipeline YAML file to the specified Artifact Registry repository. The scheduler function will then schedule the pipeline to run on a regular basis.
provisioner "local-exec" {
command = <<-EOT
- ${var.poetry_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.purchase_propensity.prediction -o purchase_propensity_prediction.yaml
- ${var.poetry_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f purchase_propensity_prediction.yaml -t ${self.triggers.tag} -t latest
- ${var.poetry_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.purchase_propensity.prediction -i purchase_propensity_prediction.yaml
+ ${var.uv_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.purchase_propensity.prediction -o purchase_propensity_prediction.yaml
+ ${var.uv_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f purchase_propensity_prediction.yaml -t ${self.triggers.tag} -t latest
+ ${var.uv_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.purchase_propensity.prediction -i purchase_propensity_prediction.yaml
EOT
working_dir = self.triggers.working_dir
}
@@ -553,9 +668,9 @@ resource "null_resource" "compile_propensity_clv_training_pipelines" {
# which will upload the pipeline YAML file to the specified Artifact Registry repository. The scheduler function will then schedule the pipeline to run on a regular basis.
provisioner "local-exec" {
command = <<-EOT
- ${var.poetry_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.propensity_clv.training -o propensity_clv_training.yaml
- ${var.poetry_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f propensity_clv_training.yaml -t ${self.triggers.tag} -t latest
- ${var.poetry_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.propensity_clv.training -i propensity_clv_training.yaml
+ ${var.uv_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.propensity_clv.training -o propensity_clv_training.yaml
+ ${var.uv_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f propensity_clv_training.yaml -t ${self.triggers.tag} -t latest
+ ${var.uv_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.propensity_clv.training -i propensity_clv_training.yaml
EOT
working_dir = self.triggers.working_dir
}
@@ -574,9 +689,9 @@ resource "null_resource" "compile_clv_training_pipelines" {
# which will upload the pipeline YAML file to the specified Artifact Registry repository. The scheduler function will then schedule the pipeline to run on a regular basis.
provisioner "local-exec" {
command = <<-EOT
- ${var.poetry_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.clv.training -o clv_training.yaml
- ${var.poetry_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f clv_training.yaml -t ${self.triggers.tag} -t latest
- ${var.poetry_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.clv.training -i clv_training.yaml
+ ${var.uv_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.clv.training -o clv_training.yaml
+ ${var.uv_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f clv_training.yaml -t ${self.triggers.tag} -t latest
+ ${var.uv_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.clv.training -i clv_training.yaml
EOT
working_dir = self.triggers.working_dir
}
@@ -595,9 +710,9 @@ resource "null_resource" "compile_clv_prediction_pipelines" {
# which will upload the pipeline YAML file to the specified Artifact Registry repository. The scheduler function will then schedule the pipeline to run on a regular basis.
provisioner "local-exec" {
command = <<-EOT
- ${var.poetry_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.clv.prediction -o clv_prediction.yaml
- ${var.poetry_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f clv_prediction.yaml -t ${self.triggers.tag} -t latest
- ${var.poetry_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.clv.prediction -i clv_prediction.yaml
+ ${var.uv_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.clv.prediction -o clv_prediction.yaml
+ ${var.uv_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f clv_prediction.yaml -t ${self.triggers.tag} -t latest
+ ${var.uv_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.clv.prediction -i clv_prediction.yaml
EOT
working_dir = self.triggers.working_dir
}
@@ -616,9 +731,9 @@ resource "null_resource" "compile_segmentation_training_pipelines" {
# which will upload the pipeline YAML file to the specified Artifact Registry repository. The scheduler function will then schedule the pipeline to run on a regular basis.
provisioner "local-exec" {
command = <<-EOT
- ${var.poetry_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.segmentation.training -o segmentation_training.yaml
- ${var.poetry_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f segmentation_training.yaml -t ${self.triggers.tag} -t latest
- ${var.poetry_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.segmentation.training -i segmentation_training.yaml
+ ${var.uv_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.segmentation.training -o segmentation_training.yaml
+ ${var.uv_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f segmentation_training.yaml -t ${self.triggers.tag} -t latest
+ ${var.uv_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.segmentation.training -i segmentation_training.yaml
EOT
working_dir = self.triggers.working_dir
}
@@ -637,9 +752,9 @@ resource "null_resource" "compile_segmentation_prediction_pipelines" {
# which will upload the pipeline YAML file to the specified Artifact Registry repository. The scheduler function will then schedule the pipeline to run on a regular basis.
provisioner "local-exec" {
command = <<-EOT
- ${var.poetry_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.segmentation.prediction -o segmentation_prediction.yaml
- ${var.poetry_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f segmentation_prediction.yaml -t ${self.triggers.tag} -t latest
- ${var.poetry_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.segmentation.prediction -i segmentation_prediction.yaml
+ ${var.uv_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.segmentation.prediction -o segmentation_prediction.yaml
+ ${var.uv_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f segmentation_prediction.yaml -t ${self.triggers.tag} -t latest
+ ${var.uv_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.segmentation.prediction -i segmentation_prediction.yaml
EOT
working_dir = self.triggers.working_dir
}
@@ -658,9 +773,9 @@ resource "null_resource" "compile_auto_segmentation_training_pipelines" {
# which will upload the pipeline YAML file to the specified Artifact Registry repository. The scheduler function will then schedule the pipeline to run on a regular basis.
provisioner "local-exec" {
command = <<-EOT
- ${var.poetry_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.auto_segmentation.training -o auto_segmentation_training.yaml
- ${var.poetry_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f auto_segmentation_training.yaml -t ${self.triggers.tag} -t latest
- ${var.poetry_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.auto_segmentation.training -i auto_segmentation_training.yaml
+ ${var.uv_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.auto_segmentation.training -o auto_segmentation_training.yaml
+ ${var.uv_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f auto_segmentation_training.yaml -t ${self.triggers.tag} -t latest
+ ${var.uv_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.auto_segmentation.training -i auto_segmentation_training.yaml
EOT
working_dir = self.triggers.working_dir
}
@@ -679,9 +794,9 @@ resource "null_resource" "compile_auto_segmentation_prediction_pipelines" {
# which will upload the pipeline YAML file to the specified Artifact Registry repository. The scheduler function will then schedule the pipeline to run on a regular basis.
provisioner "local-exec" {
command = <<-EOT
- ${var.poetry_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.auto_segmentation.prediction -o auto_segmentation_prediction.yaml
- ${var.poetry_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f auto_segmentation_prediction.yaml -t ${self.triggers.tag} -t latest
- ${var.poetry_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.auto_segmentation.prediction -i auto_segmentation_prediction.yaml
+ ${var.uv_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.auto_segmentation.prediction -o auto_segmentation_prediction.yaml
+ ${var.uv_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f auto_segmentation_prediction.yaml -t ${self.triggers.tag} -t latest
+ ${var.uv_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.auto_segmentation.prediction -i auto_segmentation_prediction.yaml
EOT
working_dir = self.triggers.working_dir
}
@@ -700,9 +815,9 @@ resource "null_resource" "compile_value_based_bidding_training_pipelines" {
# which will upload the pipeline YAML file to the specified Artifact Registry repository. The scheduler function will then schedule the pipeline to run on a regular basis.
provisioner "local-exec" {
command = <<-EOT
- ${var.poetry_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.value_based_bidding.training -o vbb_training.yaml
- ${var.poetry_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f vbb_training.yaml -t ${self.triggers.tag} -t latest
- ${var.poetry_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.value_based_bidding.training -i vbb_training.yaml
+ ${var.uv_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.value_based_bidding.training -o vbb_training.yaml
+ ${var.uv_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f vbb_training.yaml -t ${self.triggers.tag} -t latest
+ ${var.uv_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.value_based_bidding.training -i vbb_training.yaml
EOT
working_dir = self.triggers.working_dir
}
@@ -721,9 +836,9 @@ resource "null_resource" "compile_value_based_bidding_explanation_pipelines" {
# which will upload the pipeline YAML file to the specified Artifact Registry repository. The scheduler function will then schedule the pipeline to run on a regular basis.
provisioner "local-exec" {
command = <<-EOT
- ${var.poetry_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.value_based_bidding.explanation -o vbb_explanation.yaml
- ${var.poetry_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f vbb_explanation.yaml -t ${self.triggers.tag} -t latest
- ${var.poetry_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.value_based_bidding.explanation -i vbb_explanation.yaml
+ ${var.uv_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.value_based_bidding.explanation -o vbb_explanation.yaml
+ ${var.uv_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f vbb_explanation.yaml -t ${self.triggers.tag} -t latest
+ ${var.uv_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.value_based_bidding.explanation -i vbb_explanation.yaml
EOT
working_dir = self.triggers.working_dir
}
@@ -742,9 +857,9 @@ resource "null_resource" "compile_churn_propensity_training_pipelines" {
# which will upload the pipeline YAML file to the specified Artifact Registry repository. The scheduler function will then schedule the pipeline to run on a regular basis.
provisioner "local-exec" {
command = <<-EOT
- ${var.poetry_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.churn_propensity.training -o churn_propensity_training.yaml
- ${var.poetry_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f churn_propensity_training.yaml -t ${self.triggers.tag} -t latest
- ${var.poetry_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.churn_propensity.training -i churn_propensity_training.yaml
+ ${var.uv_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.churn_propensity.training -o churn_propensity_training.yaml
+ ${var.uv_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f churn_propensity_training.yaml -t ${self.triggers.tag} -t latest
+ ${var.uv_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.churn_propensity.training -i churn_propensity_training.yaml
EOT
working_dir = self.triggers.working_dir
}
@@ -763,9 +878,9 @@ resource "null_resource" "compile_churn_propensity_prediction_pipelines" {
# which will upload the pipeline YAML file to the specified Artifact Registry repository. The scheduler function will then schedule the pipeline to run on a regular basis.
provisioner "local-exec" {
command = <<-EOT
- ${var.poetry_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.churn_propensity.prediction -o churn_propensity_prediction.yaml
- ${var.poetry_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f churn_propensity_prediction.yaml -t ${self.triggers.tag} -t latest
- ${var.poetry_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.churn_propensity.prediction -i churn_propensity_prediction.yaml
+ ${var.uv_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.churn_propensity.prediction -o churn_propensity_prediction.yaml
+ ${var.uv_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f churn_propensity_prediction.yaml -t ${self.triggers.tag} -t latest
+ ${var.uv_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.churn_propensity.prediction -i churn_propensity_prediction.yaml
EOT
working_dir = self.triggers.working_dir
}
@@ -784,9 +899,9 @@ resource "null_resource" "compile_reporting_preparation_aggregate_predictions_pi
# which will upload the pipeline YAML file to the specified Artifact Registry repository. The scheduler function will then schedule the pipeline to run on a regular basis.
provisioner "local-exec" {
command = <<-EOT
- ${var.poetry_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.reporting_preparation.execution -o reporting_preparation.yaml
- ${var.poetry_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f reporting_preparation.yaml -t ${self.triggers.tag} -t latest
- ${var.poetry_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.reporting_preparation.execution -i reporting_preparation.yaml
+ ${var.uv_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.reporting_preparation.execution -o reporting_preparation.yaml
+ ${var.uv_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f reporting_preparation.yaml -t ${self.triggers.tag} -t latest
+ ${var.uv_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.reporting_preparation.execution -i reporting_preparation.yaml
EOT
working_dir = self.triggers.working_dir
}
@@ -805,10 +920,10 @@ resource "null_resource" "compile_gemini_insights_pipelines" {
# which will upload the pipeline YAML file to the specified Artifact Registry repository. The scheduler function will then schedule the pipeline to run on a regular basis.
provisioner "local-exec" {
command = <<-EOT
- ${var.poetry_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.gemini_insights.execution -o gemini_insights.yaml
- ${var.poetry_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f gemini_insights.yaml -t ${self.triggers.tag} -t latest
- ${var.poetry_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.gemini_insights.execution -i gemini_insights.yaml
+ ${var.uv_run_alias} python -m pipelines.compiler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.gemini_insights.execution -o gemini_insights.yaml
+ ${var.uv_run_alias} python -m pipelines.uploader -c ${local.config_file_path_relative_python_run_dir} -f gemini_insights.yaml -t ${self.triggers.tag} -t latest
+ ${var.uv_run_alias} python -m pipelines.scheduler -c ${local.config_file_path_relative_python_run_dir} -p vertex_ai.pipelines.gemini_insights.execution -i gemini_insights.yaml
EOT
working_dir = self.triggers.working_dir
}
-}
\ No newline at end of file
+}
diff --git a/infrastructure/terraform/modules/pipelines/variables.tf b/infrastructure/terraform/modules/pipelines/variables.tf
index 3afaaed3..3c618cac 100644
--- a/infrastructure/terraform/modules/pipelines/variables.tf
+++ b/infrastructure/terraform/modules/pipelines/variables.tf
@@ -17,13 +17,8 @@ variable "config_file_path" {
description = "pipelines config file"
}
-variable "poetry_run_alias" {
- description = "alias for poetry run command on the current system"
- type = string
-}
-
-variable "poetry_installed" {
- description = "Construct to specify dependency to poetry installed"
+variable "uv_run_alias" {
+ description = "alias for uv run command on the current system"
type = string
}
diff --git a/infrastructure/terraform/modules/pipelines/versions.tf b/infrastructure/terraform/modules/pipelines/versions.tf
index 5a896e28..29fe3151 100644
--- a/infrastructure/terraform/modules/pipelines/versions.tf
+++ b/infrastructure/terraform/modules/pipelines/versions.tf
@@ -20,7 +20,12 @@ terraform {
required_providers {
google = {
source = "hashicorp/google"
- version = ">= 3.43.0, >= 3.53.0, >= 3.63.0, >= 4.83.0, < 5.0.0, < 6.0.0"
+ version = "5.44.1"
+ }
+
+ google-beta = {
+ source = "hashicorp/google-beta"
+ version = "5.44.1"
}
}
diff --git a/infrastructure/terraform/terraform-sample.tfvars b/infrastructure/terraform/terraform-sample.tfvars
index d52b9d0f..0a32ba0c 100644
--- a/infrastructure/terraform/terraform-sample.tfvars
+++ b/infrastructure/terraform/terraform-sample.tfvars
@@ -16,10 +16,7 @@
tf_state_project_id = "Google Cloud project where the terraform state file is stored"
-create_dev_environment = false
-create_staging_environment = false
-create_prod_environment = true
-
+deploy_dataform = true
deploy_activation = true
deploy_feature_store = true
deploy_pipelines = true
@@ -28,6 +25,7 @@ deploy_monitoring = true
#################### DATA VARIABLES #################################
data_project_id = "Project id where the MDS datasets will be created"
+property_id = "Google Analytics 4 property id to identify an unique MDS deployment"
destination_data_location = "BigQuery location (either regional or multi-regional) for the MDS BigQuery datasets."
data_processing_project_id = "Project id where the Dataform will be installed and run"
source_ga4_export_project_id = "Project id which contains the GA4 export dataset"
@@ -40,6 +38,154 @@ source_ads_export_data = [
#################### FEATURE STORE VARIABLES #################################
feature_store_project_id = "Project ID where feature store resources will be created"
+# List of comma separated events used in the lead score feature engineering e.g. (["scroll_50", "scroll_90", "view_search_results", ..])
+non_ecomm_events_list = ["scroll_50", "view_search_results"]
+non_ecomm_target_event = "target event used in the lead score propensity use case"
+
+################### PIPELINE CONFIGURATIONS ##################################
+
+pipeline_configuration = {
+ feature-creation-auto-audience-segmentation = {
+ execution = {
+ schedule = {
+ state = "ACTIVE"
+ }
+ }
+ }
+ feature-creation-audience-segmentation = {
+ execution = {
+ schedule = {
+ state = "ACTIVE"
+ }
+ }
+ }
+ feature-creation-purchase-propensity = {
+ execution = {
+ schedule = {
+ state = "ACTIVE"
+ }
+ }
+ }
+ feature-creation-churn-propensity = {
+ execution = {
+ schedule = {
+ state = "ACTIVE"
+ }
+ }
+ }
+ feature-creation-customer-ltv = {
+ execution = {
+ schedule = {
+ state = "ACTIVE"
+ }
+ }
+ }
+ feature-creation-aggregated-value-based-bidding = {
+ execution = {
+ schedule = {
+ state = "ACTIVE"
+ }
+ }
+ }
+ feature-creation-lead-score-propensity = {
+ execution = {
+ schedule = {
+ state = "ACTIVE"
+ }
+ }
+ }
+ value_based_bidding = {
+ training = {
+ schedule = {
+ state = "PAUSED"
+ }
+ }
+ explanation = {
+ schedule = {
+ state = "ACTIVE"
+ }
+ }
+ }
+ purchase_propensity = {
+ training = {
+ schedule = {
+ state = "PAUSED"
+ }
+ }
+ prediction = {
+ schedule = {
+ state = "ACTIVE"
+ }
+ }
+ }
+ churn_propensity = {
+ training = {
+ schedule = {
+ state = "PAUSED"
+ }
+ }
+ prediction = {
+ schedule = {
+ state = "ACTIVE"
+ }
+ }
+ }
+ segmentation = {
+ training = {
+ schedule = {
+ state = "PAUSED"
+ }
+ }
+ prediction = {
+ schedule = {
+ state = "ACTIVE"
+ }
+ }
+ }
+ auto_segmentation = {
+ training = {
+ schedule = {
+ state = "PAUSED"
+ }
+ }
+ prediction = {
+ schedule = {
+ state = "ACTIVE"
+ }
+ }
+ }
+ propensity_clv = {
+ training = {
+ schedule = {
+ state = "PAUSED"
+ }
+ }
+ }
+ clv = {
+ training = {
+ schedule = {
+ state = "ACTIVE"
+ }
+ }
+ prediction = {
+ schedule = {
+ state = "ACTIVE"
+ }
+ }
+ }
+ lead_score_propensity = {
+ training = {
+ schedule = {
+ state = "PAUSED"
+ }
+ }
+ prediction = {
+ schedule = {
+ state = "ACTIVE"
+ }
+ }
+ }
+ }
#################### ML MODEL VARIABLES #################################
@@ -47,19 +193,19 @@ website_url = "Customer Website URL" # i.e. "https://shop.googlemerchandisestore
#################### ACTIVATION VARIABLES #################################
-activation_project_id = "Project ID where activation resources will be created"
+activation_project_id = "Project ID where activation resources will be created"
#################### GA4 VARIABLES #################################
-ga4_property_id = "Google Analytics property id"
-ga4_stream_id = "Google Analytics data stream id"
-ga4_measurement_id = "Google Analytics measurement id"
-ga4_measurement_secret = "Google Analytics measurement secret"
+ga4_property_id = "Google Analytics property id"
+ga4_stream_id = "Google Analytics data stream id"
+ga4_measurement_id = "Google Analytics measurement id"
+ga4_measurement_secret = "Google Analytics measurement secret"
#################### GITHUB VARIABLES #################################
-project_owner_email = "Project owner email"
-dataform_github_repo = "URL of the GitHub or GitLab repo which contains the Dataform scripts. Should start with https://"
+project_owner_email = "Project owner email"
+dataform_github_repo = "URL of the GitHub or GitLab repo which contains the Dataform scripts. Should start with https://"
# Personal access tokens are intended to access GitHub resources on behalf of yourself.
# Generate a github developer token for the repo above following this link:
# https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens#creating-a-personal-access-token-classic
diff --git a/infrastructure/terraform/variables.tf b/infrastructure/terraform/variables.tf
index 978b6480..f30554ce 100644
--- a/infrastructure/terraform/variables.tf
+++ b/infrastructure/terraform/variables.tf
@@ -56,7 +56,7 @@ variable "project_owner_email" {
}
variable "dataform_github_repo" {
- description = "Private Github repo for Dataform."
+ description = "Private GitHub repo for Dataform."
type = string
validation {
condition = substr(var.dataform_github_repo, 0, 8) == "https://"
@@ -65,7 +65,7 @@ variable "dataform_github_repo" {
}
variable "dataform_github_token" {
- description = "Github token for Dataform repo."
+ description = "GitHub token for Dataform repo."
type = string
}
@@ -81,12 +81,6 @@ variable "pipelines_github_owner" {
default = "temporarily unused"
}
-variable "create_dev_environment" {
- description = "Indicates that a development environment needs to be created"
- type = bool
- default = true
-}
-
variable "dev_data_project_id" {
description = "Project ID of where the dev datasets will created. If not provided, data_project_id will be used."
type = string
@@ -99,12 +93,6 @@ variable "dev_destination_data_location" {
default = ""
}
-variable "create_staging_environment" {
- description = "Indicates that a staging environment needs to be created"
- type = bool
- default = true
-}
-
variable "staging_data_project_id" {
description = "Project ID of where the staging datasets will created. If not provided, data_project_id will be used."
type = string
@@ -117,10 +105,10 @@ variable "staging_destination_data_location" {
default = ""
}
-variable "create_prod_environment" {
- description = "Indicates that a production environment needs to be created"
- type = bool
- default = true
+variable "property_id" {
+ description = "Google Analytics 4 Property ID to install the MDS"
+ type = string
+ default = ""
}
variable "prod_data_project_id" {
@@ -147,8 +135,8 @@ variable "source_ga4_export_dataset" {
variable "ga4_incremental_processing_days_back" {
description = "Past number of days to process GA4 exported data"
- type = string
- default = "3"
+ type = string
+ default = "3"
}
variable "source_ads_export_data" {
@@ -189,6 +177,12 @@ variable "ga4_measurement_secret" {
sensitive = true
}
+variable "deploy_dataform" {
+ description = "Toggler for activation module"
+ type = bool
+ default = false
+}
+
variable "deploy_activation" {
description = "Toggler for activation module"
type = bool
@@ -225,10 +219,10 @@ variable "feature_store_config_env" {
default = "config"
}
-variable "poetry_cmd" {
- description = "alias for poetry run command on the current system"
+variable "uv_cmd" {
+ description = "alias for uv run command on the current system"
type = string
- default = "poetry"
+ default = "uv"
}
variable "feature_store_project_id" {
@@ -238,6 +232,132 @@ variable "feature_store_project_id" {
variable "website_url" {
description = "Website url to be provided to the auto segmentation model"
- type = string
- default = null
+ type = string
+ default = null
+}
+
+variable "time_zone" {
+ description = "Timezone for scheduled jobs"
+ type = string
+ default = "America/New_York"
+}
+
+variable "pipeline_configuration" {
+ description = "Pipeline configuration that will alternate certain settings in the config.yaml.tftpl"
+ type = map(
+ map(
+ object({
+ schedule = object({
+ # The `state` defines the state of the pipeline.
+ # In case you don't want to schedule the pipeline, set the state to `PAUSED`.
+ state = string
+ })
+ })
+ )
+ )
+
+ default = {
+ feature-creation-auto-audience-segmentation = {
+ execution = {
+ schedule = {
+ state = "PAUSED"
+ }
+ }
+ }
+ feature-creation-audience-segmentation = {
+ execution = {
+ schedule = {
+ state = "PAUSED"
+ }
+ }
+ }
+ feature-creation-purchase-propensity = {
+ execution = {
+ schedule = {
+ state = "PAUSED"
+ }
+ }
+ }
+ feature-creation-churn-propensity = {
+ execution = {
+ schedule = {
+ state = "PAUSED"
+ }
+ }
+ }
+ feature-creation-customer-ltv = {
+ execution = {
+ schedule = {
+ state = "PAUSED"
+ }
+ }
+ }
+ feature-creation-aggregated-value-based-bidding = {
+ execution = {
+ schedule = {
+ state = "PAUSED"
+ }
+ }
+ }
+ value_based_bidding = {
+ training = {
+ schedule = {
+ state = "PAUSED"
+ }
+ }
+ explanation = {
+ schedule = {
+ state = "PAUSED"
+ }
+ }
+ }
+ purchase_propensity = {
+ training = {
+ schedule = {
+ state = "PAUSED"
+ }
+ }
+ prediction = {
+ schedule = {
+ state = "PAUSED"
+ }
+ }
+ }
+ churn_propensity = {
+ training = {
+ schedule = {
+ state = "PAUSED"
+ }
+ }
+ prediction = {
+ schedule = {
+ state = "PAUSED"
+ }
+ }
+ }
+ }
+ validation {
+ condition = alltrue([
+ for p in keys(var.pipeline_configuration) : alltrue([
+ for c in keys(var.pipeline_configuration[p]) : (
+ try(var.pipeline_configuration[p][c].schedule.state, "") == "ACTIVE" ||
+ try(var.pipeline_configuration[p][c].schedule.state, "") == "PAUSED"
+ )
+ ])
+ ])
+ error_message = "The 'state' field must be either 'PAUSED' or 'ACTIVE' for all pipeline configurations."
+ }
}
+
+
+variable "non_ecomm_events_list" {
+ description = "Short list of prioritized events that are correlated to the non ecommerce target event"
+ type = list(string)
+ default = []
+}
+
+variable "non_ecomm_target_event" {
+ description = "Non ecommerce target event for the lead score propensity feature transformation"
+ type = string
+ default = "login"
+}
\ No newline at end of file
diff --git a/infrastructure/terraform/versions.tf b/infrastructure/terraform/versions.tf
index 5a896e28..29fe3151 100644
--- a/infrastructure/terraform/versions.tf
+++ b/infrastructure/terraform/versions.tf
@@ -20,7 +20,12 @@ terraform {
required_providers {
google = {
source = "hashicorp/google"
- version = ">= 3.43.0, >= 3.53.0, >= 3.63.0, >= 4.83.0, < 5.0.0, < 6.0.0"
+ version = "5.44.1"
+ }
+
+ google-beta = {
+ source = "hashicorp/google-beta"
+ version = "5.44.1"
}
}
diff --git a/notebooks/quick_installation.ipynb b/notebooks/quick_installation.ipynb
new file mode 100644
index 00000000..6f7e54db
--- /dev/null
+++ b/notebooks/quick_installation.ipynb
@@ -0,0 +1,564 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 0,
+ "metadata": {
+ "colab": {
+ "provenance": []
+ },
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3"
+ },
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# Marketing Analytics Jumpstart Quick Installation\n",
+ "\n",
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " Run in Colab\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ " Run in Colab Enterprise\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ " View on GitHub\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ " Open in Vertex AI Workbench\n",
+ " \n",
+ " | \n",
+ "
"
+ ],
+ "metadata": {
+ "id": "AKtB_GVpt2QJ"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Follow this Colab notebook to quick install the Marketing Analytics Jumpstart solution on a Google Cloud Project.\n",
+ "\n",
+ "> **Note:** You need access to the Google Analytics 4 Property, Google Ads Account and a Google Cloud project in which you will deploy Marketing Analytics Jumpstart, with the following permissions:\n",
+ ">> * Google Analytics Property Editor or Owner\n",
+ ">>\n",
+ ">> * Google Ads Reader\n",
+ ">>\n",
+ ">> * Project Owner for a Google Cloud Project\n",
+ ">>\n",
+ ">> * GitHub or GitLab account priviledges for repo creation and access token. [Details](https://cloud.google.com/dataform/docs/connect-repository)\n",
+ "\n",
+ "\n",
+ "\n",
+ "Total Installation time is around **35-40 minutes**."
+ ],
+ "metadata": {
+ "id": "mj-8n9jIyTn-"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### 1. Authenticate to Google Cloud Platform\n",
+ "\n",
+ "Click the ( βΆ ) button to authenticate you to the Google Cloud Project.\n",
+ "\n",
+ "***Time: 30 seconds.***"
+ ],
+ "metadata": {
+ "id": "DDGHqJNhq5Oi"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @title\n",
+ "from google.colab import auth\n",
+ "auth.authenticate_user()\n",
+ "\n",
+ "print('Authenticated')"
+ ],
+ "metadata": {
+ "id": "9TyPgnleJGGZ",
+ "cellView": "form",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "0051c92e-932a-4067-941a-728c76623b28"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Authenticated\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### 2. Installation Configurations\n",
+ "\n",
+ "Fill-out the form, and Click the ( βΆ ) button.\n",
+ "\n",
+ "***Time: 10 minutes.***"
+ ],
+ "metadata": {
+ "id": "mq1yqwr8qcx1"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @markdown ---\n",
+ "# @markdown # Google Cloud Platform\n",
+ "# @markdown Copy the `Project ID` from the \"Project Info\" card in the console [Dashboard](https://console.cloud.google.com/home/dashboard).\n",
+ "GOOGLE_CLOUD_PROJECT_ID = \"your-project-id\" #@param {type:\"string\"}\n",
+ "GOOGLE_CLOUD_QUOTA_PROJECT = GOOGLE_CLOUD_PROJECT_ID\n",
+ "PROJECT_ID = GOOGLE_CLOUD_PROJECT_ID\n",
+ "MAJ_DEFAULT_PROJECT_ID = GOOGLE_CLOUD_PROJECT_ID\n",
+ "# @markdown ---\n",
+ "# @markdown # Google Analytics 4\n",
+ "# @markdown For a quick installation, copy the Google Analytics 4 property ID and stream ID. You will find it in your Google Analytics 4 console, under Admin settings.\n",
+ "GA4_PROPERTY_ID = \"1234567890\" #@param {type:\"string\"}\n",
+ "MAJ_GA4_PROPERTY_ID = GA4_PROPERTY_ID\n",
+ "GA4_STREAM_ID = \"1234567890\" #@param {type:\"string\"}\n",
+ "MAJ_GA4_STREAM_ID = GA4_STREAM_ID\n",
+ "# @markdown The website your Google Analytics 4 events are coming from.\n",
+ "WEBSITE_URL = \"https://shop.googlemerchandisestore.com\" #@param {type:\"string\", placeholder:\"Full web URL\"}\n",
+ "MAJ_WEBSITE_URL = WEBSITE_URL\n",
+ "# @markdown ---\n",
+ "# @markdown # Google Ads\n",
+ "# @markdown For a quick installation, copy the Google Ads Customer ID. You will find it in your Google Ads console. It must be in the following format: `\"CUSTOMERID\"` (without dashes).\n",
+ "GOOGLE_ADS_CUSTOMER_ID= \"1234567890\" #@param {type:\"string\", placeholder:\"GAds Account Number (e.g. 4717384083)\"}\n",
+ "MAJ_ADS_EXPORT_TABLE_SUFFIX = \"_\"+GOOGLE_ADS_CUSTOMER_ID\n",
+ "# @markdown ---\n",
+ "# @markdown # Github\n",
+ "# @markdown For a quick installation, use your email credentials that allows you to create a dataform repository connected to a remote Github repository, more info [here](https://cloud.google.com/dataform/docs/connect-repository).\n",
+ "GITHUB_REPO_OWNER_EMAIL = \"user@company.com\" #@param {type:\"string\", placeholder:\"user@company.com\"}\n",
+ "MAJ_DATAFORM_REPO_OWNER_EMAIL = GITHUB_REPO_OWNER_EMAIL\n",
+ "MAJ_DATAFORM_GITHUB_REPO_URL = \"https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart-dataform.git\"\n",
+ "# @markdown For a quick installation, reuse or create your [GitHub personal access token](https://cloud.google.com/dataform/docs/connect-repository#connect-https)\n",
+ "GITHUB_PERSONAL_TOKEN = \"your_github_personal_access_token\" #@param {type:\"string\"}\n",
+ "MAJ_DATAFORM_GITHUB_TOKEN = GITHUB_PERSONAL_TOKEN\n",
+ "# @markdown ---\n",
+ "\n",
+ "import os\n",
+ "os.environ['GOOGLE_CLOUD_PROJECT_ID'] = GOOGLE_CLOUD_PROJECT_ID\n",
+ "os.environ['GOOGLE_CLOUD_QUOTA_PROJECT'] = GOOGLE_CLOUD_QUOTA_PROJECT\n",
+ "os.environ['PROJECT_ID'] = PROJECT_ID\n",
+ "os.environ['MAJ_DEFAULT_PROJECT_ID'] = MAJ_DEFAULT_PROJECT_ID\n",
+ "!export SOURCE_ROOT=$(pwd)\n",
+ "!export TERRAFORM_RUN_DIR={SOURCE_ROOT}/infrastructure/terraform\n",
+ "REPO=\"marketing-analytics-jumpstart\"\n",
+ "!if [ ! -d \"/content/{REPO}\" ]; then git clone https://github.com/GoogleCloudPlatform/{REPO}.git ; fi\n",
+ "SOURCE_ROOT=\"/content/\"+REPO\n",
+ "%cd {SOURCE_ROOT}\n",
+ "!echo \"Enabling APIs\"\n",
+ "!gcloud config set project {GOOGLE_CLOUD_PROJECT_ID}\n",
+ "!. ~/.bashrc\n",
+ "!gcloud projects add-iam-policy-binding {GOOGLE_CLOUD_PROJECT_ID} --member user:{MAJ_DATAFORM_REPO_OWNER_EMAIL} --role=roles/bigquery.admin\n",
+ "!source ./scripts/common.sh && enable_all_apis > /dev/null\n",
+ "!echo \"APIs enabled\"\n",
+ "\n",
+ "from google.cloud import bigquery\n",
+ "# Construct a BigQuery client object.\n",
+ "client = bigquery.Client(project=GOOGLE_CLOUD_PROJECT_ID)\n",
+ "# Replace with your desired dataset ID suffix\n",
+ "dataset_id_suffix = MAJ_GA4_PROPERTY_ID\n",
+ "location = ''\n",
+ "dataset_id = ''\n",
+ "# Iterate through datasets and find the one with the matching suffix\n",
+ "for dataset in client.list_datasets():\n",
+ " dataset_id = dataset.dataset_id\n",
+ " if dataset_id.endswith(dataset_id_suffix):\n",
+ " dataset_ref = client.get_dataset(dataset.reference)\n",
+ " location = dataset_ref.location\n",
+ " print(f\"GA4 Dataset ID: {dataset_id}, Location: {location}\")\n",
+ " break\n",
+ "else:\n",
+ " print(f\"No dataset found with ID suffix: {dataset_id_suffix}\")\n",
+ "MAJ_MDS_DATA_LOCATION = location\n",
+ "MAJ_GA4_EXPORT_PROJECT_ID = GOOGLE_CLOUD_PROJECT_ID\n",
+ "MAJ_GA4_EXPORT_DATASET = dataset_id\n",
+ "\n",
+ "if MAJ_MDS_DATA_LOCATION == 'US':\n",
+ " MAJ_DEFAULT_REGION = 'us-central1'\n",
+ "elif MAJ_MDS_DATA_LOCATION == 'EU':\n",
+ " MAJ_DEFAULT_REGION = 'europe-west1'\n",
+ "else:\n",
+ " MAJ_DEFAULT_REGION = MAJ_MDS_DATA_LOCATION\n",
+ "MAJ_MDS_PROJECT_ID=MAJ_DEFAULT_PROJECT_ID\n",
+ "MAJ_MDS_DATAFORM_PROJECT_ID=MAJ_DEFAULT_PROJECT_ID\n",
+ "MAJ_FEATURE_STORE_PROJECT_ID=MAJ_DEFAULT_PROJECT_ID\n",
+ "MAJ_ACTIVATION_PROJECT_ID=MAJ_DEFAULT_PROJECT_ID\n",
+ "MAJ_ADS_EXPORT_PROJECT_ID = GOOGLE_CLOUD_PROJECT_ID\n",
+ "project_id=MAJ_ADS_EXPORT_PROJECT_ID\n",
+ "location = MAJ_MDS_DATA_LOCATION\n",
+ "table_suffix = MAJ_ADS_EXPORT_TABLE_SUFFIX\n",
+ "# Query to find datasets that contain tables with the specified suffix.\n",
+ "query = f\"\"\"\n",
+ " SELECT table_schema as dataset_id\n",
+ " FROM `{project_id}.region-{location}.INFORMATION_SCHEMA.TABLES`\n",
+ " WHERE table_name LIKE '%{table_suffix}'\n",
+ " GROUP BY table_schema\n",
+ "\"\"\"\n",
+ "# Run the query and fetch the results.\n",
+ "query_job = client.query(query)\n",
+ "results = query_job.result()\n",
+ "# Print the dataset IDs that match the criteria.\n",
+ "ads_dataset_id = ''\n",
+ "for row in results:\n",
+ " ads_dataset_id = row.dataset_id\n",
+ " print(f\"GAds dataset: {row.dataset_id}, Location: {location}\")\n",
+ "MAJ_ADS_EXPORT_DATASET = ads_dataset_id\n",
+ "\n",
+ "os.environ['MAJ_DEFAULT_REGION'] = MAJ_DEFAULT_REGION\n",
+ "os.environ['MAJ_MDS_PROJECT_ID'] = MAJ_MDS_PROJECT_ID\n",
+ "os.environ['MAJ_MDS_DATAFORM_PROJECT_ID'] = MAJ_MDS_DATAFORM_PROJECT_ID\n",
+ "os.environ['MAJ_FEATURE_STORE_PROJECT_ID'] = MAJ_FEATURE_STORE_PROJECT_ID\n",
+ "os.environ['MAJ_ACTIVATION_PROJECT_ID'] = MAJ_ACTIVATION_PROJECT_ID\n",
+ "os.environ['MAJ_MDS_DATA_LOCATION'] = MAJ_MDS_DATA_LOCATION\n",
+ "os.environ['MAJ_GA4_EXPORT_PROJECT_ID'] = MAJ_GA4_EXPORT_PROJECT_ID\n",
+ "os.environ['MAJ_GA4_EXPORT_DATASET'] = MAJ_GA4_EXPORT_DATASET\n",
+ "os.environ['MAJ_ADS_EXPORT_PROJECT_ID'] = MAJ_ADS_EXPORT_PROJECT_ID\n",
+ "os.environ['MAJ_ADS_EXPORT_DATASET'] = MAJ_ADS_EXPORT_DATASET\n",
+ "os.environ['MAJ_ADS_EXPORT_TABLE_SUFFIX'] = MAJ_ADS_EXPORT_TABLE_SUFFIX\n",
+ "os.environ['MAJ_WEBSITE_URL'] = MAJ_WEBSITE_URL\n",
+ "os.environ['MAJ_GA4_PROPERTY_ID'] = MAJ_GA4_PROPERTY_ID\n",
+ "os.environ['MAJ_GA4_STREAM_ID'] = MAJ_GA4_STREAM_ID\n",
+ "os.environ['MAJ_DATAFORM_REPO_OWNER_EMAIL'] = MAJ_DATAFORM_REPO_OWNER_EMAIL\n",
+ "os.environ['MAJ_DATAFORM_GITHUB_REPO_URL'] = MAJ_DATAFORM_GITHUB_REPO_URL\n",
+ "os.environ['MAJ_DATAFORM_GITHUB_TOKEN'] = MAJ_DATAFORM_GITHUB_TOKEN\n",
+ "\n",
+ "!sudo apt-get -qq -o=Dpkg::Use-Pty=0 install gettext\n",
+ "!envsubst < \"{SOURCE_ROOT}/infrastructure/cloudshell/terraform-template.tfvars\" > \"{SOURCE_ROOT}/infrastructure/terraform/terraform.tfvars\"\n",
+ "\n",
+ "!gcloud config set disable_prompts true\n",
+ "!gcloud config set project {PROJECT_ID}\n",
+ "\n",
+ "from IPython.display import clear_output\n",
+ "clear_output(wait=True)\n",
+ "print(\"SUCCESS\")"
+ ],
+ "metadata": {
+ "id": "dMcepKg8IQWj",
+ "cellView": "form",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "6abb8686-27d9-4a4d-bd25-bafd7ebc1a1c"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "SUCCESS\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### 3. Authenticate using application default credentials Google Cloud Platform\n",
+ "\n",
+ "Click the ( βΆ ) button to create your Terraform application default credentials to the Google Cloud Project.\n",
+ "\n",
+ "*To complete this step, you will be prompted to copy/paste a password from another window into the prompt below.*\n",
+ "\n",
+ "**Note:** *Click on the hidden input box after the colon, as shown below.*\n",
+ "\n",
+ "![image (14).png]()\n",
+ "\n",
+ "***Time: 2 minute.***"
+ ],
+ "metadata": {
+ "id": "mOISt4ShqIbc"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @title\n",
+ "!gcloud config set disable_prompts false\n",
+ "!gcloud auth application-default login --quiet --scopes=\"openid,https://www.googleapis.com/auth/userinfo.email,https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/sqlservice.login,https://www.googleapis.com/auth/analytics,https://www.googleapis.com/auth/analytics.edit,https://www.googleapis.com/auth/analytics.provision,https://www.googleapis.com/auth/analytics.readonly,https://www.googleapis.com/auth/accounts.reauth\"\n",
+ "!gcloud auth application-default set-quota-project {PROJECT_ID}\n",
+ "!export GOOGLE_APPLICATION_CREDENTIALS=/content/.config/application_default_credentials.json\n",
+ "\n",
+ "clear_output(wait=True)\n",
+ "print(\"SUCCESS\")"
+ ],
+ "metadata": {
+ "id": "3cAwp6CRLSVf",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "39d3a2a0-811c-4506-d197-af7a7e086d74",
+ "cellView": "form"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "SUCCESS\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### 4. Prepare environment for Installation\n",
+ "\n",
+ "Click the ( βΆ ) button to prepare the environment for an end-to-end installation.\n",
+ "\n",
+ "***Time: 5 minutes.***"
+ ],
+ "metadata": {
+ "id": "WYG5sjFEqX2X"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @title\n",
+ "%%capture\n",
+ "%%bash\n",
+ "# prompt: install packages\n",
+ "apt-get install python3.10\n",
+ "CLOUDSDK_PYTHON=python3.10\n",
+ "\n",
+ "#prompt: install uv\n",
+ "curl -LsSf https://astral.sh/uv/install.sh | sh\n",
+ "\n",
+ "export PATH=\"/root/.local/bin:$PATH\"\n",
+ "uv --version\n",
+ "\n",
+ "git clone --depth=1 https://github.com/tfutils/tfenv.git ~/.tfenv\n",
+ "echo 'export PATH=\"~/.tfenv/bin:$PATH\"' >> ~/.bash_profile\n",
+ "echo 'export PATH=$PATH:~/.tfenv/bin' >> ~/.bashrc\n",
+ "export PATH=\"$PATH:~/.tfenv/bin\"\n",
+ "\n",
+ "mkdir -p ~/.local/bin/\n",
+ ". ~/.profile\n",
+ "ln -s ~/.tfenv/bin/* ~/.local/bin\n",
+ "which tfenv\n",
+ "tfenv --version\n",
+ "\n",
+ "tfenv install 1.9.7\n",
+ "tfenv use 1.9.7\n",
+ "terraform --version\n",
+ "\n",
+ "export PATH=\"$PATH:~/.tfenv/bin\"\n",
+ "export PROJECT_ID=$(gcloud config get project --format=json | tr -d '\"')\n",
+ "source ./scripts/generate-tf-backend.sh"
+ ],
+ "metadata": {
+ "id": "hmdklTTuQ_9d",
+ "cellView": "form"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### 5. Run Installation\n",
+ "\n",
+ "Click the ( βΆ ) button to run the installation end-to-end.\n",
+ "After clicking the button, expand this section to observe that all cells have successfully executed without issues.\n",
+ "\n",
+ "***Time: 25-30 minutes.***"
+ ],
+ "metadata": {
+ "id": "US36yJ8lmqnP"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @title\n",
+ "%%capture\n",
+ "%%bash\n",
+ "export PATH=\"$PATH:~/.tfenv/bin\"\n",
+ "export GOOGLE_APPLICATION_CREDENTIALS=/content/.config/application_default_credentials.json\n",
+ "TERRAFORM_RUN_DIR=$(pwd)/infrastructure/terraform\n",
+ "terraform -chdir=\"${TERRAFORM_RUN_DIR}\" init"
+ ],
+ "metadata": {
+ "id": "5UIbC_z9bgy4",
+ "cellView": "form"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @title\n",
+ "%%capture\n",
+ "%%bash\n",
+ "export PATH=\"$PATH:~/.tfenv/bin\"\n",
+ "export PATH=\"/root/.local/bin:$PATH\"\n",
+ "export PATH=\"$PATH:$(which gcloud)\"\n",
+ "export GOOGLE_APPLICATION_CREDENTIALS=/content/.config/application_default_credentials.json\n",
+ "TERRAFORM_RUN_DIR=$(pwd)/infrastructure/terraform\n",
+ "terraform -chdir=\"${TERRAFORM_RUN_DIR}\" apply -target=module.data_store -auto-approve"
+ ],
+ "metadata": {
+ "id": "BGteib5ebsA-",
+ "cellView": "form"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @title\n",
+ "%%capture\n",
+ "%%bash\n",
+ "export PATH=\"$PATH:~/.tfenv/bin\"\n",
+ "export PATH=\"/root/.local/bin:$PATH\"\n",
+ "export PATH=\"$PATH:$(which gcloud)\"\n",
+ "export GOOGLE_APPLICATION_CREDENTIALS=/content/.config/application_default_credentials.json\n",
+ "TERRAFORM_RUN_DIR=$(pwd)/infrastructure/terraform\n",
+ "terraform -chdir=\"${TERRAFORM_RUN_DIR}\" apply -target=module.feature_store -auto-approve"
+ ],
+ "metadata": {
+ "cellView": "form",
+ "id": "dwD5DRRM2Ryl"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @title\n",
+ "%%capture\n",
+ "%%bash\n",
+ "export PATH=\"$PATH:~/.tfenv/bin\"\n",
+ "export PATH=\"/root/.local/bin:$PATH\"\n",
+ "export PATH=\"$PATH:$(which gcloud)\"\n",
+ "export GOOGLE_APPLICATION_CREDENTIALS=/content/.config/application_default_credentials.json\n",
+ "TERRAFORM_RUN_DIR=$(pwd)/infrastructure/terraform\n",
+ "terraform -chdir=\"${TERRAFORM_RUN_DIR}\" apply -target=module.pipelines -auto-approve"
+ ],
+ "metadata": {
+ "cellView": "form",
+ "id": "KrEr1yXS1_oA"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @title\n",
+ "%%capture\n",
+ "%%bash\n",
+ "export PATH=\"$PATH:~/.tfenv/bin\"\n",
+ "export PATH=\"/root/.local/bin:$PATH\"\n",
+ "export PATH=\"$PATH:$(which gcloud)\"\n",
+ "export GOOGLE_APPLICATION_CREDENTIALS=/content/.config/application_default_credentials.json\n",
+ "TERRAFORM_RUN_DIR=$(pwd)/infrastructure/terraform\n",
+ "terraform -chdir=\"${TERRAFORM_RUN_DIR}\" apply -target=module.activation -auto-approve"
+ ],
+ "metadata": {
+ "collapsed": true,
+ "cellView": "form",
+ "id": "7-Qr46vR2bLl"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @title\n",
+ "%%capture\n",
+ "%%bash\n",
+ "export PATH=\"$PATH:~/.tfenv/bin\"\n",
+ "export PATH=\"/root/.local/bin:$PATH\"\n",
+ "export PATH=\"$PATH:$(which gcloud)\"\n",
+ "export GOOGLE_APPLICATION_CREDENTIALS=/content/.config/application_default_credentials.json\n",
+ "TERRAFORM_RUN_DIR=$(pwd)/infrastructure/terraform\n",
+ "terraform -chdir=\"${TERRAFORM_RUN_DIR}\" apply -target=module.monitoring -auto-approve"
+ ],
+ "metadata": {
+ "collapsed": true,
+ "cellView": "form",
+ "id": "ElOBpEV3Mtbc"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @title\n",
+ "%%capture\n",
+ "%%bash\n",
+ "export PATH=\"$PATH:~/.tfenv/bin\"\n",
+ "export PATH=\"/root/.local/bin:$PATH\"\n",
+ "export PATH=\"$PATH:$(which gcloud)\"\n",
+ "export GOOGLE_APPLICATION_CREDENTIALS=/content/.config/application_default_credentials.json\n",
+ "TERRAFORM_RUN_DIR=$(pwd)/infrastructure/terraform\n",
+ "terraform -chdir=\"${TERRAFORM_RUN_DIR}\" apply -auto-approve"
+ ],
+ "metadata": {
+ "cellView": "form",
+ "id": "eyZNdewu2zQI"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @title\n",
+ "print(\"SUCCESS!\")"
+ ],
+ "metadata": {
+ "id": "1h7k6jFYpLPO",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "266267e5-ac12-4621-ec7f-19e051027edb",
+ "cellView": "form"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "SUCCESS!\n"
+ ]
+ }
+ ]
+ }
+ ]
+}
\ No newline at end of file
diff --git a/pyproject.toml b/pyproject.toml
index 3e0abe24..d806eea1 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -12,6 +12,15 @@
# See the License for the specific language governing permissions and
# limitations under the License.
+[project]
+name = "marketing-analytics-jumpstart"
+version = "1.0.0"
+description = "Marketing Analytics Jumpstart"
+authors = [{name = "Marketing Analytics Solutions Architects", email = "ma-se@google.com"}]
+license = "Apache 2.0"
+readme = "README.md"
+requires-python = ">=3.8,<3.11"
+
[tool.poetry]
name = "marketing-analytics-jumpstart"
version = "1.0.0"
@@ -23,7 +32,8 @@ packages = [{include = "python"}]
[tool.poetry.dependencies]
python = ">=3.8,<3.11"
-google-cloud-aiplatform = "1.52.0"
+#google-cloud-aiplatform = "1.52.0"
+google-cloud-aiplatform = "1.70.0"
shapely = "<2.0.0"
google-cloud = "^0.34.0"
jinja2 = ">=3.0.1,<4.0.0"
@@ -37,11 +47,13 @@ google-cloud-bigquery = "2.30.0"
google-cloud-pipeline-components = "2.6.0"
google-auth = "^2.14.1"
google-cloud-storage = "^2.6.0"
+kfp = "2.4.0"
## Fixing this error: https://stackoverflow.com/questions/76175487/sudden-importerror-cannot-import-name-appengine-from-requests-packages-urlli
-kfp = "2.0.0-rc.2"
+#kfp = "2.0.0-rc.2"
#kfp = {version = "2.0.0-b12", allow-prereleases = true}
#kfp = {version = "2.0.0-b16", allow-prereleases = true}
-kfp-server-api = "2.0.0-rc.1"
+kfp-server-api = "2.0.5"
+#kfp-server-api = "2.0.0-rc.1"
#kfp-server-api = "2.0.0.a6"
#kfp-server-api = "2.0.0b1"
urllib3 = "1.26.18"
@@ -62,9 +74,11 @@ pyarrow = "15.0.2"
google-auth-oauthlib = "^1.2.1"
oauth2client = "^4.1.3"
google-cloud-core = "^2.4.1"
+sympy="1.13.1"
+google-cloud-resource-manager="1.13.0"
[tool.poetry.group.component_vertex.dependencies]
-google-cloud-aiplatform = "1.52.0"
+google-cloud-aiplatform = "1.70.0"
shapely = "<2.0.0"
toml = "0.10.2"
diff --git a/python/activation/main.py b/python/activation/main.py
index 6bf3ff15..21c416bd 100644
--- a/python/activation/main.py
+++ b/python/activation/main.py
@@ -62,6 +62,7 @@ def _add_argparse_args(cls, parser):
- purchase-propensity-15-15
- purchase-propensity-15-7
- churn-propensity-30-15
+ - lead-score-propensity-5-1
activation_type_configuration: The GCS path to the configuration file for all activation types.
"""
@@ -110,6 +111,7 @@ def _add_argparse_args(cls, parser):
purchase-propensity-15-15
purchase-propensity-15-7
churn-propensity-30-15
+ lead-score-propensity-5-1
''',
required=True
)
@@ -330,7 +332,6 @@ class TransformToPayload(beam.DoFn):
The DoFn takes the following arguments:
- - template_str: The Jinja2 template string used to generate the Measurement Protocol payload.
- event_name: The name of the event to be sent to Google Analytics 4.
The DoFn yields the following output:
@@ -338,33 +339,28 @@ class TransformToPayload(beam.DoFn):
- A dictionary containing the Measurement Protocol payload.
The DoFn performs the following steps:
-
1. Removes bad shaping strings in the `client_id` field.
- 2. Renders the Jinja2 template string using the provided data and event name.
- 3. Converts the rendered template string into a JSON object.
+ 2. Converts the rendered template string into a JSON object.
4. Handles any JSON decoding errors.
The DoFn is used to ensure that the Measurement Protocol payload is formatted correctly before being sent to Google Analytics 4.
"""
- def __init__(self, template_str, event_name):
+ def __init__(self, event_name):
"""
Initializes the DoFn.
Args:
- template_str: The Jinja2 template string used to generate the Measurement Protocol payload.
event_name: The name of the event to be sent to Google Analytics 4.
"""
- self.template_str = template_str
self.date_format = "%Y-%m-%d"
self.date_time_format = "%Y-%m-%d %H:%M:%S.%f %Z"
self.event_name = event_name
-
-
- def setup(self):
- """
- Sets up the Jinja2 environment.
- """
- self.payload_template = Environment(loader=BaseLoader).from_string(self.template_str)
+ self.consent_obj = {
+ 'ad_user_data':'GRANTED',
+ 'ad_personalization':'GRANTED'
+ }
+ self.user_property_prefix = 'user_prop_'
+ self.event_parameter_prefix = 'event_param_'
def process(self, element):
@@ -384,21 +380,17 @@ def process(self, element):
_client_id = element['client_id'].replace(r'', '')
_client_id = element['client_id'].replace(r'q=">', '')
-
- payload_str = self.payload_template.render(
- client_id=_client_id,
- user_id=self.generate_user_id_key_value_pair(element),
- event_timestamp=self.date_to_micro(element["inference_date"]),
- event_name=self.event_name,
- session_id=element['session_id'],
- user_properties=self.generate_user_properties(element),
- )
+
result = {}
- try:
- result = json.loads(r'{}'.format(payload_str))
- except json.decoder.JSONDecodeError as e:
- logging.error(payload_str)
- logging.error(traceback.format_exc())
+ result['client_id'] = _client_id
+ if element['user_id']:
+ result['user_id'] = element['user_id']
+ result['timestamp_micros'] = self.date_to_micro(element["inference_date"])
+ result['nonPersonalizedAds'] = False
+ result['consent'] = self.consent_obj
+ result['user_properties'] = self.extract_user_properties(element)
+ result['events'] = [self.extract_event(element)]
+
yield result
@@ -419,62 +411,40 @@ def date_to_micro(self, date_str):
return int(datetime.datetime.strptime(date_str, self.date_format).timestamp() * 1E6)
- def generate_param_fields(self, element):
+ def extract_user_properties(self, element):
"""
- Generates a JSON string containing the parameter fields of the element.
+ Generates a dictionary containing the user properties of the element.
Args:
element: The element to be processed.
Returns:
- A JSON string containing the parameter fields of the element.
+ A dictionary containing the user properties of the element.
"""
- element_copy = element.copy()
- del element_copy['client_id']
- del element_copy['user_id']
- del element_copy['session_id']
- del element_copy['inference_date']
- element_copy = {k: v for k, v in element_copy.items() if v}
- return json.dumps(element_copy, cls=DecimalEncoder)
+ user_properties = {}
+ for k, v in element.items():
+ if k.startswith(self.user_property_prefix) and v:
+ user_properties[k[len(self.user_property_prefix):]] = {'value': str(v)}
+ return user_properties
-
- def generate_user_properties(self, element):
- """
- Generates a JSON string containing the user properties of the element.
-
- Args:
- element: The element to be processed.
-
- Returns:
- A JSON string containing the user properties of the element.
+ def extract_event(self, element):
"""
- element_copy = element.copy()
- del element_copy['client_id']
- del element_copy['user_id']
- del element_copy['session_id']
- del element_copy['inference_date']
- user_properties_obj = {}
- for k, v in element_copy.items():
- if v:
- user_properties_obj[k] = {'value': str(v)}
- return json.dumps(user_properties_obj, cls=DecimalEncoder)
-
+ Generates a dictionary containing the event parameters from the element.
- def generate_user_id_key_value_pair(self, element):
- """
- If the user_id field is not empty generate the key/value string with the user_id.
- else return empty string
Args:
element: The element to be processed.
Returns:
- A string containing the key and value with the user_id.
+ A dictionary containing the event parameters from the element.
"""
- user_id = element['user_id']
- if user_id:
- return f'"user_id": "{user_id}",'
- return ""
-
+ event = {
+ 'name': self.event_name,
+ 'params': {}
+ }
+ for k, v in element.items():
+ if k.startswith(self.event_parameter_prefix) and v:
+ event['params'][k[len(self.event_parameter_prefix):]] = v
+ return event
@@ -519,8 +489,7 @@ def load_activation_type_configuration(args):
# Create the activation type configuration dictionary.
configuration = {
'activation_event_name': activation_config['activation_event_name'],
- 'source_query_template': Environment(loader=BaseLoader).from_string(gcs_read_file(args.project, activation_config['source_query_template']).replace('\n', ' ')),
- 'measurement_protocol_payload_template': gcs_read_file(args.project, activation_config['measurement_protocol_payload_template'])
+ 'source_query_template': Environment(loader=BaseLoader).from_string(gcs_read_file(args.project, activation_config['source_query_template']).replace('\n', ' '))
}
return configuration
@@ -589,7 +558,7 @@ def run(argv=None):
query=load_from_source_query,
use_json_exports=True,
use_standard_sql=True)
- | 'Prepare Measurement Protocol API payload' >> beam.ParDo(TransformToPayload(activation_type_configuration['measurement_protocol_payload_template'], activation_type_configuration['activation_event_name']))
+ | 'Prepare Measurement Protocol API payload' >> beam.ParDo(TransformToPayload(activation_type_configuration['activation_event_name']))
| 'POST event to Measurement Protocol API' >> beam.ParDo(CallMeasurementProtocolAPI(activation_options.ga4_measurement_id, activation_options.ga4_api_secret, debug=activation_options.use_api_validation))
)
diff --git a/python/base_component_image/pyproject.toml b/python/base_component_image/pyproject.toml
index 3ce3fc2f..49aabede 100644
--- a/python/base_component_image/pyproject.toml
+++ b/python/base_component_image/pyproject.toml
@@ -2,25 +2,29 @@
name = "ma-components"
version = "1.0.0"
description = "contains components used in marketing analytics project. the need is to package the components and containerise so that they can be used from the python function based component"
-authors = ["Christos Aniftos "]
+authors = ["Marketing Analytics Solutions Architects "]
+license = "Apache 2.0"
readme = "README.md"
packages = [{include = "ma_components"}]
[tool.poetry.dependencies]
python = ">=3.8,<3.11"
pip = "23.3"
+kfp = "2.4.0"
## Fixing this error: https://stackoverflow.com/questions/76175487/sudden-importerror-cannot-import-name-appengine-from-requests-packages-urlli
-kfp = "2.0.0-rc.2"
+#kfp = "2.0.0-rc.2"
#kfp = {version = "2.0.0-b12", allow-prereleases = true}
#kfp = {version = "2.0.0-b16", allow-prereleases = true}
-kfp-server-api = "2.0.0-rc.1"
+kfp-server-api = "2.0.5"
+#kfp-server-api = "2.0.0-rc.1"
#kfp-server-api = "2.0.0.a6"
#kfp-server-api = "2.0.0b1"
urllib3 = "1.26.18"
toml = "^0.10.2"
docker = "^6.0.1"
google-cloud-bigquery = "2.30.0"
-google-cloud-aiplatform = "1.52.0"
+#google-cloud-aiplatform = "1.52.0"
+google-cloud-aiplatform = "1.70.0"
shapely = "<2.0.0"
google-cloud-pubsub = "2.15.0"
#google-cloud-pipeline-components = "1.0.33"
@@ -35,6 +39,8 @@ pyarrow = "15.0.2"
google-auth-oauthlib = "^1.2.1"
oauth2client = "^4.1.3"
google-cloud-core = "^2.4.1"
+sympy="1.13.1"
+google-cloud-resource-manager="1.13.0"
[build-system]
requires = ["poetry-core>=1.0.0"]
diff --git a/python/ga4_setup/setup.py b/python/ga4_setup/setup.py
index 03204812..dd4c885d 100644
--- a/python/ga4_setup/setup.py
+++ b/python/ga4_setup/setup.py
@@ -276,6 +276,7 @@ def create_custom_dimensions(configuration: map):
create_custom_dimensions_for('CLTV', ['cltv_decile'], existing_dimensions, configuration)
create_custom_dimensions_for('Auto Audience Segmentation', ['a_a_s_prediction'], existing_dimensions, configuration)
create_custom_dimensions_for('Churn Propensity', ['c_p_prediction', 'c_p_decile'], existing_dimensions, configuration)
+ create_custom_dimensions_for('Lead Score Propensity', ['l_s_p_prediction', 'l_s_p_decile'], existing_dimensions, configuration)
@@ -513,9 +514,14 @@ def entry():
if args.ga4_resource == "check_property_type":
property = get_property(configuration)
- result = {
- 'supported': f"{property.property_type == property.property_type.PROPERTY_TYPE_ORDINARY}"
- }
+ is_property_supported = set((property.property_type.PROPERTY_TYPE_ORDINARY, property.property_type.PROPERTY_TYPE_SUBPROPERTY, property.property_type.PROPERTY_TYPE_ROLLUP))
+
+ result = {}
+ if property.property_type in is_property_supported:
+ result = {'supported': "True"}
+ else:
+ result = {'supported': "False"}
+
print(json.dumps(result))
# python setup.py --ga4_resource=custom_events
diff --git a/python/lookerstudio/README.md b/python/lookerstudio/README.md
index dc019f63..aa624b3f 100644
--- a/python/lookerstudio/README.md
+++ b/python/lookerstudio/README.md
@@ -1,5 +1,30 @@
# Marketing Analytics Jumpstart Looker Studio Dashboard
+## Prerequisites
+This Looker Studio dashboard relies on specific BigQuery tables that should be present in your project. These tables are created during the deployment of the Marketing Analytics Jumpstart and by the data processing pipelines of the solution.
+Before deploying the dashboard, make sure the pre-requisite tables exist. If tables are missing, ensure the corresponding pipelines have run successfully.
+
+| Table | Dataset | Source Process | Troubleshooting Link |
+| -------- | ------- | ------- | --------- |
+| session_date | marketing_ga4_v1_* | Dataform Execution| [Workflow Execution Logs](https://console.cloud.google.com/bigquery/dataform/locations/us-central1/repositories/marketing-analytics/details/workflows) |
+| session_device_daily_metrics | marketing_ga4_v1_* | Dataform Execution| [Workflow Execution Logs](https://console.cloud.google.com/bigquery/dataform/locations/us-central1/repositories/marketing-analytics/details/workflows) |
+| latest | aggregated_predictions | feature-store terraform module and aggregated_predictions.aggregate_last_day_predictions stored procedure | [Aggregating stored prodedure](https://console.cloud.google.com/bigquery?ws=!1m5!1m4!6m3!1s!2saggregated_predictions!3saggregate_last_day_predictions) |
+| resource_link | maj_dashboard | monitor terraform module | [Dashboard dataset](https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1s!2smaj_dashboard) |
+| dataform_googleapis_com_workflow_invocation_completion | maj_logs | monitor terraform module | [maj_logs dataset](https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1s!2smaj_logs) |
+| event | marketing_ga4_base_* | Dataform Execution | [Workflow Execution Logs](https://console.cloud.google.com/bigquery/dataform/locations/us-central1/repositories/marketing-analytics/details/workflows) |
+| session_location_daily_metrics | marketing_ga4_v1_* | Dataform Execution | [Workflow Execution Logs](https://console.cloud.google.com/bigquery/dataform/locations/us-central1/repositories/marketing-analytics/details/workflows) |
+| aggregated_value_based_bidding_volume_weekly | aggregated_vbb | feature-store terraform module and aggregated_vbb.invoke_aggregated_value_based_bidding_explanation_preparation stored procedure | [aggregated_value_based_bidding_explanation_preparation](https://console.cloud.google.com/bigquery?ws=!1m5!1m4!6m3!1s!2saggregated_vbb!3sinvoke_aggregated_value_based_bidding_explanation_preparation) |
+| event_page | marketing_ga4_v1_* | Dataform Execution| [Workflow Execution Logs](https://console.cloud.google.com/bigquery/dataform/locations/us-central1/repositories/marketing-analytics/details/workflows) |
+| unique_page_views | marketing_ga4_v1_* | Dataform Execution| [Workflow Execution Logs](https://console.cloud.google.com/bigquery/dataform/locations/us-central1/repositories/marketing-analytics/details/workflows) |
+| aggregated_value_based_bidding_correlation | aggregated_vbb | feature-store terraform module and aggregated_vbb.invoke_aggregated_value_based_bidding_explanation_preparation stored procedure | [aggregated_value_based_bidding_explanation_preparation](https://console.cloud.google.com/bigquery?ws=!1m5!1m4!6m3!1s!2saggregated_vbb!3sinvoke_aggregated_value_based_bidding_explanation_preparation) |
+| ad_performance_conversions | marketing_ads_v1_* | Dataform Execution | [Workflow Execution Logs](https://console.cloud.google.com/bigquery/dataform/locations/us-central1/repositories/marketing-analytics/details/workflows) |
+| user_behaviour_revenue_insights_daily | gemini_insights | feature-store terraform module and gemini_insights.user_behaviour_revenue_insights stored procedure | [User Behaviour Revenue Insights](https://console.cloud.google.com/bigquery?ws=!1m5!1m4!6m3!1s!2sgemini_insights!3suser_behaviour_revenue_insights) |
+| dataflow_googleapis_com_job_message | maj_logs | monitor terraform module | [maj_logs dataset](https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1s!2smaj_logs) |
+| vbb_weights | aggregated_vbb | feature-store terraform module and VBB explanation pipeline | [VBB Explanation Pipeline](https://console.cloud.google.com/vertex-ai/pipelines/schedules) |
+| page_session_daily_metrics | marketing_ga4_v1_* | Dataform Execution| [Workflow Execution Logs](https://console.cloud.google.com/bigquery/dataform/locations/us-central1/repositories/marketing-analytics/details/workflows) |
+| aiplatform_googleapis_com_pipeline_job_events | maj_logs | monitor terraform module | [maj_logs dataset](https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1s!2smaj_logs) |
+| aggregated_value_based_bidding_volume_daily | aggregated_vbb | feature-store terraform module and aggregated_vbb.invoke_aggregated_value_based_bidding_explanation_preparation stored procedure | [aggregated_value_based_bidding_explanation_preparation](https://console.cloud.google.com/bigquery?ws=!1m5!1m4!6m3!1s!2saggregated_vbb!3sinvoke_aggregated_value_based_bidding_explanation_preparation) |
+
## Extract Looker Studio dashboard URL
Extract the URL used to create the dashboard from the Terraform output value:
diff --git a/python/pipelines/automl_tabular_pl_v4.yaml b/python/pipelines/automl_tabular_pl_v4.yaml
index 4d20b803..6bdc8cfb 100644
--- a/python/pipelines/automl_tabular_pl_v4.yaml
+++ b/python/pipelines/automl_tabular_pl_v4.yaml
@@ -11151,21 +11151,21 @@ root:
isOptional: true
parameterType: BOOLEAN
distill_batch_predict_machine_type:
- defaultValue: n1-standard-16
+ defaultValue: n1-highmem-8
description: 'The prediction server machine type for
batch predict component in the model distillation.'
isOptional: true
parameterType: STRING
distill_batch_predict_max_replica_count:
- defaultValue: 25.0
+ defaultValue: 5.0
description: 'The max number of prediction server
for batch predict component in the model distillation.'
isOptional: true
parameterType: NUMBER_INTEGER
distill_batch_predict_starting_replica_count:
- defaultValue: 25.0
+ defaultValue: 5.0
description: 'The initial number of
prediction server for batch predict component in the model distillation.'
@@ -11201,14 +11201,14 @@ root:
isOptional: true
parameterType: STRING
evaluation_batch_explain_max_replica_count:
- defaultValue: 10.0
+ defaultValue: 5.0
description: 'The max number of prediction
server for batch explain components during evaluation.'
isOptional: true
parameterType: NUMBER_INTEGER
evaluation_batch_explain_starting_replica_count:
- defaultValue: 10.0
+ defaultValue: 5.0
description: 'The initial number of
prediction server for batch explain components during evaluation.'
@@ -11222,14 +11222,14 @@ root:
isOptional: true
parameterType: STRING
evaluation_batch_predict_max_replica_count:
- defaultValue: 20.0
+ defaultValue: 5.0
description: 'The max number of prediction
server for batch predict components during evaluation.'
isOptional: true
parameterType: NUMBER_INTEGER
evaluation_batch_predict_starting_replica_count:
- defaultValue: 20.0
+ defaultValue: 5.0
description: 'The initial number of
prediction server for batch predict components during evaluation.'
@@ -11279,7 +11279,7 @@ root:
description: The GCP region that runs the pipeline components.
parameterType: STRING
max_selected_features:
- defaultValue: 1000.0
+ defaultValue: 100.0
description: number of features to select for training.
isOptional: true
parameterType: NUMBER_INTEGER
@@ -11356,7 +11356,7 @@ root:
isOptional: true
parameterType: BOOLEAN
stage_1_num_parallel_trials:
- defaultValue: 35.0
+ defaultValue: 5.0
description: Number of parallel trails for stage 1.
isOptional: true
parameterType: NUMBER_INTEGER
@@ -11367,7 +11367,7 @@ root:
isOptional: true
parameterType: LIST
stage_2_num_parallel_trials:
- defaultValue: 35.0
+ defaultValue: 5.0
description: Number of parallel trails for stage 2.
isOptional: true
parameterType: NUMBER_INTEGER
diff --git a/python/pipelines/compiler.py b/python/pipelines/compiler.py
index 6b5224dd..97bbc62c 100644
--- a/python/pipelines/compiler.py
+++ b/python/pipelines/compiler.py
@@ -31,6 +31,7 @@
'vertex_ai.pipelines.feature-creation-purchase-propensity.execution': "pipelines.feature_engineering_pipelines.purchase_propensity_feature_engineering_pipeline",
'vertex_ai.pipelines.feature-creation-churn-propensity.execution': "pipelines.feature_engineering_pipelines.churn_propensity_feature_engineering_pipeline",
'vertex_ai.pipelines.feature-creation-customer-ltv.execution': "pipelines.feature_engineering_pipelines.customer_lifetime_value_feature_engineering_pipeline",
+ 'vertex_ai.pipelines.feature-creation-lead-score-propensity.execution': "pipelines.feature_engineering_pipelines.lead_score_propensity_feature_engineering_pipeline",
'vertex_ai.pipelines.auto_segmentation.training': "pipelines.auto_segmentation_pipelines.training_pl",
'vertex_ai.pipelines.auto_segmentation.prediction': "pipelines.auto_segmentation_pipelines.prediction_pl",
'vertex_ai.pipelines.segmentation.training': "pipelines.segmentation_pipelines.training_pl",
@@ -39,6 +40,8 @@
'vertex_ai.pipelines.purchase_propensity.prediction': "pipelines.tabular_pipelines.prediction_binary_classification_pl",
'vertex_ai.pipelines.churn_propensity.training': None, # tabular workflows pipelines is precompiled
'vertex_ai.pipelines.churn_propensity.prediction': "pipelines.tabular_pipelines.prediction_binary_classification_pl",
+ 'vertex_ai.pipelines.lead_score_propensity.training': None, # tabular workflows pipelines is precompiled
+ 'vertex_ai.pipelines.lead_score_propensity.prediction': "pipelines.tabular_pipelines.prediction_binary_classification_pl",
'vertex_ai.pipelines.propensity_clv.training': None, # tabular workflows pipelines is precompiled
'vertex_ai.pipelines.clv.training': None, # tabular workflows pipelines is precompiled
'vertex_ai.pipelines.clv.prediction': "pipelines.tabular_pipelines.prediction_binary_classification_regression_pl",
diff --git a/python/pipelines/components/bigquery/component.py b/python/pipelines/components/bigquery/component.py
index c4aa542f..e52a511e 100644
--- a/python/pipelines/components/bigquery/component.py
+++ b/python/pipelines/components/bigquery/component.py
@@ -879,7 +879,7 @@ def bq_dynamic_query_exec_output(
# Construct query template
template = jinja2.Template("""
CREATE OR REPLACE TABLE `{{project_id}}.{{dataset}}.{{create_table}}` AS (
- SELECT
+ SELECT DISTINCT
feature,
ROUND(100 * SUM(users) OVER (ORDER BY users DESC) / SUM(users) OVER (), 2) as cumulative_traffic_percent,
@@ -892,7 +892,7 @@ def bq_dynamic_query_exec_output(
SELECT
user_pseudo_id,
user_id,
- page_location as page_path
+ LOWER(page_location) as page_path
FROM `{{mds_project_id}}.{{mds_dataset}}.event`
WHERE
event_name = 'page_view'
@@ -1423,4 +1423,4 @@ def execute_query_with_retries(query):
logging.error(f"Query failed after retries: {e}")
-
\ No newline at end of file
+
diff --git a/python/pipelines/feature_engineering_pipelines.py b/python/pipelines/feature_engineering_pipelines.py
index deb7b88b..a15ffa12 100644
--- a/python/pipelines/feature_engineering_pipelines.py
+++ b/python/pipelines/feature_engineering_pipelines.py
@@ -196,8 +196,73 @@ def audience_segmentation_feature_engineering_pipeline(
location=location,
query=query_audience_segmentation_inference_preparation,
timeout=timeout).set_display_name('audience_segmentation_inference_preparation').after(*phase_1)
-
-
+
+
+@dsl.pipeline()
+def lead_score_propensity_feature_engineering_pipeline(
+ project_id: str,
+ location: Optional[str],
+ query_lead_score_propensity_label: str,
+ query_user_dimensions: str,
+ query_user_rolling_window_metrics: str,
+ query_lead_score_propensity_inference_preparation: str,
+ query_lead_score_propensity_training_preparation: str,
+ timeout: Optional[float] = 3600.0
+):
+ """
+ This pipeline defines the steps for feature engineering for the lead score propensity model.
+
+ Args:
+ project_id: The Google Cloud project ID.
+ location: The Google Cloud region where the pipeline will be run.
+ query_lead_score_propensity_label: The SQL query that will be used to calculate the purchase propensity label.
+ query_user_dimensions: The SQL query that will be used to calculate the user dimensions.
+ query_user_rolling_window_metrics: The SQL query that will be used to calculate the user rolling window metrics.
+ query_lead_score_propensity_inference_preparation: The SQL query that will be used to prepare the inference data.
+ query_lead_score_propensity_training_preparation: The SQL query that will be used to prepare the training data.
+ timeout: The timeout for the pipeline in seconds.
+
+ Returns:
+ None
+ """
+
+ # Features Preparation
+ phase_1 = list()
+ phase_1.append(
+ sp(
+ project=project_id,
+ location=location,
+ query=query_lead_score_propensity_label,
+ timeout=timeout).set_display_name('lead_score_propensity_label')
+ )
+ phase_1.append(
+ sp(
+ project=project_id,
+ location=location,
+ query=query_user_dimensions,
+ timeout=timeout).set_display_name('user_dimensions')
+ )
+ phase_1.append(
+ sp(
+ project=project_id,
+ location=location,
+ query=query_user_rolling_window_metrics,
+ timeout=timeout).set_display_name('user_rolling_window_metrics')
+ )
+ # Training data preparation
+ purchase_propensity_train_prep = sp(
+ project=project_id,
+ location=location,
+ query=query_lead_score_propensity_training_preparation,
+ timeout=timeout).set_display_name('lead_score_propensity_training_preparation').after(*phase_1)
+ # Inference data preparation
+ purchase_propensity_inf_prep = sp(
+ project=project_id,
+ location=location,
+ query=query_lead_score_propensity_inference_preparation,
+ timeout=timeout).set_display_name('lead_score_propensity_inference_preparation').after(*phase_1)
+
+
@dsl.pipeline()
def purchase_propensity_feature_engineering_pipeline(
project_id: str,
diff --git a/python/pipelines/pipeline_ops.py b/python/pipelines/pipeline_ops.py
index a1b94675..2b152b29 100644
--- a/python/pipelines/pipeline_ops.py
+++ b/python/pipelines/pipeline_ops.py
@@ -17,6 +17,7 @@
from tracemalloc import start
import pip
+from sympy import preview
from kfp import compiler
from google.cloud.aiplatform.pipeline_jobs import PipelineJob, _set_enable_caching_value
from google.cloud.aiplatform import TabularDataset, Artifact
@@ -625,6 +626,30 @@ def get_gcp_bearer_token() -> str:
return bearer_token
+def _get_project_number(project_id) -> str:
+ """
+ Retrieves the project number from a project id
+
+ Returns:
+ A string containing the project number
+
+ Raises:
+ Exception: If an error occurs while retrieving the resource manager project object.
+ """
+ from google.cloud import resourcemanager_v3
+
+ # Create a resource manager client
+ client = resourcemanager_v3.ProjectsClient()
+
+ # Get the project number
+ project = client.get_project(name=f"projects/{project_id}").name
+ project_number = project.split('/')[-1]
+
+ logging.info(f"Project Number: {project_number}")
+
+ return project_number
+
+
# Function to schedule the pipeline.
def schedule_pipeline(
project_id: str,
@@ -637,6 +662,8 @@ def schedule_pipeline(
max_concurrent_run_count: str,
start_time: str,
end_time: str,
+ subnetwork: str = "default",
+ use_private_service_access: bool = False,
pipeline_parameters: Dict[str, Any] = None,
pipeline_parameters_substitutions: Optional[Dict[str, Any]] = None,
) -> dict:
@@ -654,6 +681,8 @@ def schedule_pipeline(
max_concurrent_run_count: The maximum number of concurrent pipeline runs.
start_time: The start time of the schedule.
end_time: The end time of the schedule.
+ subnetwork: The VPC subnetwork name to be used in VPC peering.
+ use_private_service_access: A flag to define whether to use the VPC private service access or not.
Returns:
A dictionary containing information about the scheduled pipeline.
@@ -663,6 +692,9 @@ def schedule_pipeline(
"""
from google.cloud import aiplatform
+ from google.cloud.aiplatform.preview.pipelinejobschedule import (
+ pipeline_job_schedules as preview_pipeline_job_schedules,
+ )
# Substitute pipeline parameters with necessary substitutions
if pipeline_parameters_substitutions != None:
@@ -676,19 +708,55 @@ def schedule_pipeline(
pipeline_job = aiplatform.PipelineJob(
template_path=template_path,
pipeline_root=pipeline_root,
+ location=region,
display_name=f"{pipeline_name}",
)
- # Create the schedule with the pipeline job defined
- pipeline_job_schedule = pipeline_job.create_schedule(
+ # https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.PipelineJobSchedule
+ # Create a schedule for the pipeline job
+ pipeline_job_schedule = preview_pipeline_job_schedules.PipelineJobSchedule(
display_name=f"{pipeline_name}",
- cron=cron,
- max_concurrent_run_count=max_concurrent_run_count,
- start_time=start_time,
- end_time=end_time,
- service_account=pipeline_sa,
+ pipeline_job=pipeline_job,
+ location=region
)
+ # Get the project number to use in the network identifier
+ project_number = _get_project_number(project_id)
+
+ # Create the schedule using the pipeline job schedule
+ # Using the VPC private service access or not, depending on the flag
+ if use_private_service_access:
+ pipeline_job_schedule.create(
+ cron_expression=cron,
+ max_concurrent_run_count=max_concurrent_run_count,
+ start_time=start_time,
+ end_time=end_time,
+ max_run_count=2,
+ service_account=pipeline_sa,
+ network=f"projects/{project_number}/global/networks/{subnetwork}",
+ create_request_timeout=None,
+ )
+ else:
+ pipeline_job_schedule.create(
+ cron_expression=cron,
+ max_concurrent_run_count=max_concurrent_run_count,
+ start_time=start_time,
+ end_time=end_time,
+ max_run_count=2,
+ service_account=pipeline_sa,
+ create_request_timeout=None,
+ )
+
+ # Old version - Create the schedule with the pipeline job defined
+ #pipeline_job_schedule = pipeline_job.create_schedule(
+ # display_name=f"{pipeline_name}",
+ # cron=cron,
+ # max_concurrent_run_count=max_concurrent_run_count,
+ # start_time=start_time,
+ # end_time=end_time,
+ # service_account=pipeline_sa,
+ #)
+
logging.info(f"Pipeline scheduled : {pipeline_name}")
return pipeline_job
@@ -903,4 +971,4 @@ def run_pipeline(
if (pl.has_failed):
raise RuntimeError("Pipeline execution failed")
return pl
-
\ No newline at end of file
+
diff --git a/python/pipelines/scheduler.py b/python/pipelines/scheduler.py
index fbdd9933..7e00dc8e 100644
--- a/python/pipelines/scheduler.py
+++ b/python/pipelines/scheduler.py
@@ -37,8 +37,11 @@ def check_extention(file_path: str, type: str = '.yaml'):
'vertex_ai.pipelines.feature-creation-purchase-propensity.execution': "pipelines.feature_engineering_pipelines.purchase_propensity_feature_engineering_pipeline",
'vertex_ai.pipelines.feature-creation-churn-propensity.execution': "pipelines.feature_engineering_pipelines.churn_propensity_feature_engineering_pipeline",
'vertex_ai.pipelines.feature-creation-customer-ltv.execution': "pipelines.feature_engineering_pipelines.customer_lifetime_value_feature_engineering_pipeline",
+ 'vertex_ai.pipelines.feature-creation-lead-score-propensity.execution': "pipelines.feature_engineering_pipelines.lead_score_propensity_feature_engineering_pipeline",
'vertex_ai.pipelines.purchase_propensity.training': None, # tabular workflows pipelines is precompiled
'vertex_ai.pipelines.purchase_propensity.prediction': "pipelines.tabular_pipelines.prediction_binary_classification_pl",
+ 'vertex_ai.pipelines.lead_score_propensity.training': None, # tabular workflows pipelines is precompiled
+ 'vertex_ai.pipelines.lead_score_propensity.prediction': "pipelines.tabular_pipelines.prediction_binary_classification_pl",
'vertex_ai.pipelines.churn_propensity.training': None, # tabular workflows pipelines is precompiled
'vertex_ai.pipelines.churn_propensity.prediction': "pipelines.tabular_pipelines.prediction_binary_classification_pl",
'vertex_ai.pipelines.segmentation.training': "pipelines.segmentation_pipelines.training_pl",
@@ -138,7 +141,9 @@ def check_extention(file_path: str, type: str = '.yaml'):
cron=my_pipeline_vars['schedule']['cron'],
max_concurrent_run_count=my_pipeline_vars['schedule']['max_concurrent_run_count'],
start_time=my_pipeline_vars['schedule']['start_time'],
- end_time=my_pipeline_vars['schedule']['end_time']
+ end_time=my_pipeline_vars['schedule']['end_time'],
+ subnetwork=my_pipeline_vars['schedule']['subnetwork'],
+ use_private_service_access=my_pipeline_vars['schedule']['use_private_service_access'],
)
if my_pipeline_vars['schedule']['state'] == 'PAUSED':
diff --git a/python/pipelines/transformations-lead-score-propensity.json b/python/pipelines/transformations-lead-score-propensity.json
new file mode 100644
index 00000000..28ca5e70
--- /dev/null
+++ b/python/pipelines/transformations-lead-score-propensity.json
@@ -0,0 +1,368 @@
+[
+ {
+ "numeric": {
+ "column_name": "user_ltv_revenue",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "categorical": {
+ "column_name": "device_category"
+ }
+ },
+ {
+ "categorical": {
+ "column_name": "device_mobile_brand_name"
+ }
+ },
+ {
+ "categorical": {
+ "column_name": "device_mobile_model_name"
+ }
+ },
+ {
+ "categorical": {
+ "column_name": "device_os"
+ }
+ },
+ {
+ "categorical": {
+ "column_name": "device_language"
+ }
+ },
+ {
+ "categorical": {
+ "column_name": "device_web_browser"
+ }
+ },
+ {
+ "categorical": {
+ "column_name": "geo_sub_continent"
+ }
+ },
+ {
+ "categorical": {
+ "column_name": "geo_country"
+ }
+ },
+ {
+ "categorical": {
+ "column_name": "geo_region"
+ }
+ },
+ {
+ "categorical": {
+ "column_name": "geo_city"
+ }
+ },
+ {
+ "categorical": {
+ "column_name": "geo_metro"
+ }
+ },
+ {
+ "categorical": {
+ "column_name": "last_traffic_source_medium"
+ }
+ },
+ {
+ "categorical": {
+ "column_name": "last_traffic_source_name"
+ }
+ },
+ {
+ "categorical": {
+ "column_name": "last_traffic_source_source"
+ }
+ },
+ {
+ "categorical": {
+ "column_name": "first_traffic_source_medium"
+ }
+ },
+ {
+ "categorical": {
+ "column_name": "first_traffic_source_name"
+ }
+ },
+ {
+ "categorical": {
+ "column_name": "first_traffic_source_source"
+ }
+ },
+ {
+ "categorical": {
+ "column_name": "has_signed_in_with_user_id"
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "scroll_50_past_1_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "scroll_50_past_2_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "scroll_50_past_3_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "scroll_50_past_4_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "scroll_50_past_5_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "scroll_90_past_1_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "scroll_90_past_2_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "scroll_90_past_3_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "scroll_90_past_4_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "scroll_90_past_5_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "view_search_results_past_1_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "view_search_results_past_2_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "view_search_results_past_3_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "view_search_results_past_4_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "view_search_results_past_5_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "file_download_past_1_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "file_download_past_2_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "file_download_past_3_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "file_download_past_4_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "file_download_past_5_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_add_to_list_past_1_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_add_to_list_past_2_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_add_to_list_past_3_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_add_to_list_past_4_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_add_to_list_past_5_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_print_past_1_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_print_past_2_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_print_past_3_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_print_past_4_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_print_past_5_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "sign_up_past_1_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "sign_up_past_2_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "sign_up_past_3_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "sign_up_past_4_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "sign_up_past_5_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_favorite_past_1_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_favorite_past_2_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_favorite_past_3_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_favorite_past_4_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_favorite_past_5_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_add_to_menu_past_1_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_add_to_menu_past_2_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_add_to_menu_past_3_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_add_to_menu_past_4_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_add_to_menu_past_5_day",
+ "invalid_values_allowed": true
+ }
+ }
+]
\ No newline at end of file
diff --git a/scripts/common.sh b/scripts/common.sh
index 926eec7f..dbdbff0a 100644
--- a/scripts/common.sh
+++ b/scripts/common.sh
@@ -46,6 +46,8 @@ declare -a apis_array=("cloudresourcemanager.googleapis.com"
"bigquerymigration.googleapis.com"
"bigquerydatatransfer.googleapis.com"
"dataform.googleapis.com"
+ "cloudkms.googleapis.com"
+ "servicenetworking.googleapis.com"
)
get_project_id() {
diff --git a/scripts/generate-tf-backend.sh b/scripts/generate-tf-backend.sh
index 5a1178fc..2611dc0e 100755
--- a/scripts/generate-tf-backend.sh
+++ b/scripts/generate-tf-backend.sh
@@ -19,15 +19,15 @@ set -o nounset
. scripts/common.sh
-section_open "Check if the necessary dependencies are available: gcloud, gsutil, terraform, poetry"
+section_open "Check if the necessary dependencies are available: gcloud, gsutil, terraform, uv"
check_exec_dependency "gcloud"
check_exec_version "gcloud"
check_exec_dependency "gsutil"
check_exec_version "gsutil"
check_exec_dependency "terraform"
check_exec_version "terraform"
- check_exec_dependency "poetry"
- check_exec_version "poetry"
+ check_exec_dependency "uv"
+ check_exec_version "uv"
section_close
section_open "Check if the necessary variables are set: PROJECT_ID"
@@ -51,10 +51,6 @@ section_open "Enable all the required APIs"
enable_all_apis
section_close
-section_open "Install poetry libraries in the virtual environment for Terraform"
- poetry install
-section_close
-
section_open "Creating a new Google Cloud Storage bucket to store the Terraform state in ${TF_STATE_PROJECT} project, bucket: ${TF_STATE_BUCKET}"
if gsutil ls -b gs://"${TF_STATE_BUCKET}" >/dev/null 2>&1; then
printf "The ${TF_STATE_BUCKET} Google Cloud Storage bucket already exists. \n"
diff --git a/scripts/quick-install.sh b/scripts/quick-install.sh
new file mode 100755
index 00000000..57d0ed2c
--- /dev/null
+++ b/scripts/quick-install.sh
@@ -0,0 +1,137 @@
+#!/usr/bin/env sh
+
+# Copyright 2023 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+set -o errexit
+set -o nounset
+#set -x
+
+. scripts/common.sh
+
+section_open "Setting the gcloud project id"
+ # Ask user to input the project id
+ echo "Input the GCP Project Id where you want to deploy Marketing Analytics Jumpstart:"
+ read TF_STATE_PROJECT_ID
+ # Set the project id to the environment variable
+ export TF_STATE_PROJECT_ID
+ # Set the project id to the environment variable
+ export GOOGLE_CLOUD_PROJECT=${TF_STATE_PROJECT_ID}
+ # Set the project id to the environment variable
+ export GOOGLE_CLOUD_QUOTA_PROJECT=$GOOGLE_CLOUD_PROJECT
+ # Set the project id to the environment variable
+ export PROJECT_ID=$GOOGLE_CLOUD_PROJECT
+ # Disable prompts
+ gcloud config set disable_prompts true
+ # Set the project id to the gcloud configuration
+ gcloud config set project "${TF_STATE_PROJECT_ID}"
+section_close
+
+section_open "Enable all the required APIs"
+ enable_all_apis
+section_close
+
+section_open "Authenticate to Google Cloud Project"
+ gcloud auth login --project "${TF_STATE_PROJECT_ID}"
+ echo "Close the browser tab that was open and press any key to continue.."
+ read moveon
+section_close
+
+section_open "Setting Google Application Default Credentials"
+ gcloud config set disable_prompts false
+ gcloud auth application-default login --quiet --scopes="openid,https://www.googleapis.com/auth/userinfo.email,https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/sqlservice.login,https://www.googleapis.com/auth/analytics,https://www.googleapis.com/auth/analytics.edit,https://www.googleapis.com/auth/analytics.provision,https://www.googleapis.com/auth/analytics.readonly,https://www.googleapis.com/auth/accounts.reauth"
+ echo "Close the browser tab that was open and press any key to continue.."
+ read moveon
+ CREDENTIAL_FILE=`gcloud auth application-default set-quota-project "${PROJECT_ID}" 2>&1 | grep -e "Credentials saved to file:" | cut -d "[" -f2 | cut -d "]" -f1`
+ export GOOGLE_APPLICATION_CREDENTIALS=${CREDENTIAL_FILE}
+section_close
+
+section_open "Check OS system"
+ unameOut="$(uname -s)"
+ case "${unameOut}" in
+ Linux*) machine=Linux;;
+ Darwin*) machine=Mac;;
+ CYGWIN*) machine=Cygwin;;
+ MINGW*) machine=MinGw;;
+ MSYS_NT*) machine=Git;;
+ *) machine="UNKNOWN:${unameOut}"
+ esac
+ echo ${machine}
+section_close
+
+section_open "Configuring environment"
+ SOURCE_ROOT=$(pwd)
+ cd ${SOURCE_ROOT}
+
+ # Install python3.10
+ sudo chown -R ctimoteo /usr/local/sbin
+ chmod u+w /usr/local/sbin
+ if [ $machine == "Linux" ]; then
+ sudo DEBIAN_FRONTEND=noninteractive apt-get -qq -o=Dpkg::Use-Pty=0 install python3.10 --assume-yes
+ elif [ $machine == "Darwin" ]; then
+ brew install python@3.10
+ fi
+ CLOUDSDK_PYTHON=python3.10
+
+ # Install pipx
+ if [ $machine == "Linux" ]; then
+ sudo apt update
+ sudo apt install pipx
+ elif [ $machine == "Darwin" ]; then
+ brew install pipx
+ fi
+ pipx ensurepath
+
+ #pip3 install poetry
+ pipx install poetry
+ export PATH="$HOME/.local/bin:$PATH"
+ poetry env use python3.10
+ poetry --version
+
+ # Install tfenv
+ if [ ! -d ~/.tfenv ]; then
+ git clone --depth=1 https://github.com/tfutils/tfenv.git ~/.tfenv
+ echo 'export PATH="$HOME/.tfenv/bin:$PATH"' >> ~/.bash_profile
+ echo 'export PATH=$PATH:$HOME/.tfenv/bin' >> ~/.bashrc
+ fi
+ export PATH="$PATH:$HOME/.tfenv/bin"
+
+ # Install terraform version
+ tfenv install 1.5.7
+ tfenv use 1.5.7
+ terraform --version
+
+ # Generate TF backend
+ . scripts/generate-tf-backend.sh
+section_close
+
+section_open "Preparing Terraform Environment File"
+ TERRAFORM_RUN_DIR=${SOURCE_ROOT}/infrastructure/terraform
+ if [ ! -f $TERRAFORM_RUN_DIR/terraform.tfvars ]; then
+ . scripts/set-env.sh
+ sudo apt-get -qq -o=Dpkg::Use-Pty=0 install gettext
+ envsubst < "${SOURCE_ROOT}/infrastructure/cloudshell/terraform-template.tfvars" > "${TERRAFORM_RUN_DIR}/terraform.tfvars"
+ fi
+section_close
+
+section_open "Deploying Terraform Infrastructure Resources"
+ export PATH="$HOME/.local/bin:$PATH"
+ export PATH="$PATH:$HOME/.tfenv/bin"
+ terraform -chdir="${TERRAFORM_RUN_DIR}" init
+ terraform -chdir="${TERRAFORM_RUN_DIR}" apply
+section_close
+
+#set +x
+set +o nounset
+set +o errexit
diff --git a/sql/procedure/lead_score_propensity_inference_preparation.sqlx b/sql/procedure/lead_score_propensity_inference_preparation.sqlx
new file mode 100644
index 00000000..d9753b88
--- /dev/null
+++ b/sql/procedure/lead_score_propensity_inference_preparation.sqlx
@@ -0,0 +1,352 @@
+-- Copyright 2023 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- http://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+DECLARE lastest_processed_time_ud TIMESTAMP;
+DECLARE lastest_processed_time_useam TIMESTAMP;
+DECLARE lastest_processed_time_uwlm TIMESTAMP;
+DECLARE lastest_processed_time_um TIMESTAMP;
+
+-- Setting procedure to lookback from the day before `inference_date`
+SET inference_date = DATE_SUB(inference_date, INTERVAL 1 DAY);
+
+SET lastest_processed_time_ud = (SELECT MAX(processed_timestamp) FROM `{{feature_store_project_id}}.{{feature_store_dataset}}.user_dimensions` WHERE feature_date = inference_date LIMIT 1);
+SET lastest_processed_time_useam = (SELECT MAX(processed_timestamp) FROM `{{feature_store_project_id}}.{{feature_store_dataset}}.user_session_event_aggregated_metrics` WHERE feature_date = inference_date LIMIT 1);
+SET lastest_processed_time_uwlm = (SELECT MAX(processed_timestamp) FROM `{{feature_store_project_id}}.{{feature_store_dataset}}.user_rolling_window_lead_metrics` WHERE feature_date = inference_date LIMIT 1);
+SET lastest_processed_time_um = (SELECT MAX(processed_timestamp) FROM `{{feature_store_project_id}}.{{feature_store_dataset}}.user_scoped_metrics` WHERE feature_date = inference_date LIMIT 1);
+
+CREATE OR REPLACE TEMP TABLE inference_preparation_ud as (
+ SELECT DISTINCT
+ -- The user pseudo id
+ UD.user_pseudo_id,
+ -- The user id
+ MAX(UD.user_id) OVER(user_dimensions_window) AS user_id,
+ -- The feature date
+ UD.feature_date,
+ -- The user lifetime value revenue
+ MAX(UD.user_ltv_revenue) OVER(user_dimensions_window) AS user_ltv_revenue,
+ -- The device category
+ MAX(UD.device_category) OVER(user_dimensions_window) AS device_category,
+ -- The device brand name
+ MAX(UD.device_mobile_brand_name) OVER(user_dimensions_window) AS device_mobile_brand_name,
+ -- The device model name
+ MAX(UD.device_mobile_model_name) OVER(user_dimensions_window) AS device_mobile_model_name,
+ -- The device operating system
+ MAX(UD.device_os) OVER(user_dimensions_window) AS device_os,
+ -- The device language
+ MAX(UD.device_language) OVER(user_dimensions_window) AS device_language,
+ -- The device web browser
+ MAX(UD.device_web_browser) OVER(user_dimensions_window) AS device_web_browser,
+ -- The user sub continent
+ MAX(UD.geo_sub_continent) OVER(user_dimensions_window) AS geo_sub_continent,
+ -- The user country
+ MAX(UD.geo_country) OVER(user_dimensions_window) AS geo_country,
+ -- The user region
+ MAX(UD.geo_region) OVER(user_dimensions_window) AS geo_region,
+ -- The user city
+ MAX(UD.geo_city) OVER(user_dimensions_window) AS geo_city,
+ -- The user metro
+ MAX(UD.geo_metro) OVER(user_dimensions_window) AS geo_metro,
+ -- The user last traffic source medium
+ MAX(UD.last_traffic_source_medium) OVER(user_dimensions_window) AS last_traffic_source_medium,
+ -- The user last traffic source name
+ MAX(UD.last_traffic_source_name) OVER(user_dimensions_window) AS last_traffic_source_name,
+ -- The user last traffic source source
+ MAX(UD.last_traffic_source_source) OVER(user_dimensions_window) AS last_traffic_source_source,
+ -- The user first traffic source medium
+ MAX(UD.first_traffic_source_medium) OVER(user_dimensions_window) AS first_traffic_source_medium,
+ -- The user first traffic source name
+ MAX(UD.first_traffic_source_name) OVER(user_dimensions_window) AS first_traffic_source_name,
+ -- The user first traffic source source
+ MAX(UD.first_traffic_source_source) OVER(user_dimensions_window) AS first_traffic_source_source,
+ -- Whether the user has signed in with user ID
+ MAX(UD.has_signed_in_with_user_id) OVER(user_dimensions_window) AS has_signed_in_with_user_id,
+FROM
+ `{{feature_store_project_id}}.{{feature_store_dataset}}.user_dimensions` UD
+INNER JOIN
+ `{{project_id}}.{{mds_dataset}}.latest_event_per_user_last_72_hours` LEU
+ON
+ UD.user_pseudo_id = LEU.user_pseudo_id
+WHERE
+ -- In the future consider `feature_date BETWEEN start_date AND end_date`, to process multiple days. Modify Partition BY
+ UD.feature_date = inference_date
+ AND UD.processed_timestamp = lastest_processed_time_ud
+WINDOW
+ user_dimensions_window AS (PARTITION BY UD.user_pseudo_id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
+);
+
+
+CREATE OR REPLACE TEMP TABLE inference_preparation_uwlm as (
+ SELECT DISTINCT
+ -- User pseudo id
+ UWLM.user_pseudo_id,
+ -- Feature date
+ UWLM.feature_date{% for feature in short_list_features %},
+ -- Calculate the maximum value for each metric over the window
+ MAX(UWLM.{{feature.feature_name}}_past_1_day) OVER(user_rolling_lead_window) AS {{feature.feature_name}}_past_1_day,
+ MAX(UWLM.{{feature.feature_name}}_past_2_day) OVER(user_rolling_lead_window) AS {{feature.feature_name}}_past_2_day,
+ MAX(UWLM.{{feature.feature_name}}_past_3_day) OVER(user_rolling_lead_window) AS {{feature.feature_name}}_past_3_day,
+ MAX(UWLM.{{feature.feature_name}}_past_4_day) OVER(user_rolling_lead_window) AS {{feature.feature_name}}_past_4_day,
+ MAX(UWLM.{{feature.feature_name}}_past_5_day) OVER(user_rolling_lead_window) AS {{feature.feature_name}}_past_5_day{% endfor %}
+FROM
+ `{{feature_store_project_id}}.{{feature_store_dataset}}.user_rolling_window_lead_metrics` UWLM
+INNER JOIN
+ `{{project_id}}.{{mds_dataset}}.latest_event_per_user_last_72_hours` LEU
+ON
+ UWLM.user_pseudo_id = LEU.user_pseudo_id
+WHERE
+ -- Filter for the features in the inferecen date
+ UWLM.feature_date = inference_date
+ AND UWLM.processed_timestamp = lastest_processed_time_uwlm
+WINDOW
+ user_rolling_lead_window AS (PARTITION BY UWLM.user_pseudo_id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
+);
+
+-- This is a temp table consolidating all features over the dates intervals.
+CREATE OR REPLACE TEMP TABLE inference_preparation as (
+ SELECT DISTINCT
+ UD.user_pseudo_id,
+ UD.user_id,
+ UD.feature_date,
+ UD.user_ltv_revenue,
+ UD.device_category,
+ UD.device_mobile_brand_name,
+ UD.device_mobile_model_name,
+ UD.device_os,
+ UD.device_language,
+ UD.device_web_browser,
+ UD.geo_sub_continent,
+ UD.geo_country,
+ UD.geo_region,
+ UD.geo_city,
+ UD.geo_metro,
+ UD.last_traffic_source_medium,
+ UD.last_traffic_source_name,
+ UD.last_traffic_source_source,
+ UD.first_traffic_source_medium,
+ UD.first_traffic_source_name,
+ UD.first_traffic_source_source,
+ UD.has_signed_in_with_user_id{% for feature in short_list_features %},
+ UWLM.{{feature.feature_name}}_past_1_day,
+ UWLM.{{feature.feature_name}}_past_2_day,
+ UWLM.{{feature.feature_name}}_past_3_day,
+ UWLM.{{feature.feature_name}}_past_4_day,
+ UWLM.{{feature.feature_name}}_past_5_day{% endfor %}
+FROM
+ inference_preparation_ud UD
+INNER JOIN
+ inference_preparation_uwlm UWLM
+ON
+ UWLM.user_pseudo_id = UD.user_pseudo_id
+ AND UWLM.feature_date = UD.feature_date
+);
+
+DELETE FROM `{{project_id}}.{{dataset}}.{{insert_table}}` WHERE TRUE;
+
+INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}`
+(
+ feature_date,
+ user_pseudo_id,
+ user_id,
+ user_ltv_revenue,
+ device_category,
+ device_mobile_brand_name,
+ device_mobile_model_name,
+ device_os,
+ device_language,
+ device_web_browser,
+ geo_sub_continent,
+ geo_country,
+ geo_region,
+ geo_city,
+ geo_metro,
+ last_traffic_source_medium,
+ last_traffic_source_name,
+ last_traffic_source_source,
+ first_traffic_source_medium,
+ first_traffic_source_name,
+ first_traffic_source_source,
+ has_signed_in_with_user_id{% for feature in short_list_features %},
+ {{feature.feature_name}}_past_1_day,
+ {{feature.feature_name}}_past_2_day,
+ {{feature.feature_name}}_past_3_day,
+ {{feature.feature_name}}_past_4_day,
+ {{feature.feature_name}}_past_5_day{% endfor %}
+)
+SELECT DISTINCT
+feature_date,
+ user_pseudo_id,
+ user_id,
+ MIN(user_ltv_revenue) OVER(PARTITION BY user_pseudo_id, feature_date) as user_ltv_revenue,
+ device_category,
+ device_mobile_brand_name,
+ device_mobile_model_name,
+ device_os,
+ device_language,
+ device_web_browser,
+ geo_sub_continent,
+ geo_country,
+ geo_region,
+ geo_city,
+ geo_metro,
+ last_traffic_source_medium,
+ last_traffic_source_name,
+ last_traffic_source_source,
+ first_traffic_source_medium,
+ first_traffic_source_name,
+ first_traffic_source_source,
+ has_signed_in_with_user_id{% for feature in short_list_features %},
+ {{feature.feature_name}}_past_1_day,
+ {{feature.feature_name}}_past_2_day,
+ {{feature.feature_name}}_past_3_day,
+ {{feature.feature_name}}_past_4_day,
+ {{feature.feature_name}}_past_5_day{% endfor %}
+FROM inference_preparation;
+
+
+CREATE OR REPLACE TABLE `{{project_id}}.{{dataset}}.lead_score_propensity_inference_5_1` AS(
+ SELECT DISTINCT
+ CURRENT_TIMESTAMP() AS processed_timestamp,
+ feature_date,
+ user_pseudo_id,
+ LAST_VALUE(user_id) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS user_id,
+ LAST_VALUE(user_ltv_revenue) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS user_ltv_revenue,
+ LAST_VALUE(device_category) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS device_category,
+ LAST_VALUE(device_mobile_brand_name) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS device_mobile_brand_name,
+ LAST_VALUE(device_mobile_model_name) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS device_mobile_model_name,
+ LAST_VALUE(device_os) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS device_os,
+ LAST_VALUE(device_language) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS device_language,
+ LAST_VALUE(device_web_browser) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS device_web_browser,
+ LAST_VALUE(geo_sub_continent) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS geo_sub_continent,
+ LAST_VALUE(geo_country) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS geo_country,
+ LAST_VALUE(geo_region) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS geo_region,
+ LAST_VALUE(geo_city) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS geo_city,
+ LAST_VALUE(geo_metro) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS geo_metro,
+ LAST_VALUE(last_traffic_source_medium) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS last_traffic_source_medium,
+ LAST_VALUE(last_traffic_source_name) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS last_traffic_source_name,
+ LAST_VALUE(last_traffic_source_source) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS last_traffic_source_source,
+ LAST_VALUE(first_traffic_source_medium) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS first_traffic_source_medium,
+ LAST_VALUE(first_traffic_source_name) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS first_traffic_source_name,
+ LAST_VALUE(first_traffic_source_source) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS first_traffic_source_source,
+ LAST_VALUE(has_signed_in_with_user_id) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS has_signed_in_with_user_id{% for feature in short_list_features %},
+ LAST_VALUE({{feature.feature_name}}_past_1_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS {{feature.feature_name}}_past_1_day,
+ LAST_VALUE({{feature.feature_name}}_past_2_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS {{feature.feature_name}}_past_2_day,
+ LAST_VALUE({{feature.feature_name}}_past_3_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS {{feature.feature_name}}_past_3_day,
+ LAST_VALUE({{feature.feature_name}}_past_4_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS {{feature.feature_name}}_past_4_day,
+ LAST_VALUE({{feature.feature_name}}_past_5_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS {{feature.feature_name}}_past_5_day{% endfor %}
+ FROM `{{project_id}}.{{dataset}}.{{insert_table}}`
+);
+
+
+CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_lead_score_propensity_inference_5_1`
+(processed_timestamp,
+ feature_date,
+ user_pseudo_id,
+ user_id,
+ user_ltv_revenue,
+ device_category,
+ device_mobile_brand_name,
+ device_mobile_model_name,
+ device_os,
+ device_language,
+ device_web_browser,
+ geo_sub_continent,
+ geo_country,
+ geo_region,
+ geo_city,
+ geo_metro,
+ last_traffic_source_medium,
+ last_traffic_source_name,
+ last_traffic_source_source,
+ first_traffic_source_medium,
+ first_traffic_source_name,
+ first_traffic_source_source,
+ has_signed_in_with_user_id{% for feature in short_list_features %},
+ {{feature.feature_name}}_past_1_day,
+ {{feature.feature_name}}_past_2_day,
+ {{feature.feature_name}}_past_3_day,
+ {{feature.feature_name}}_past_4_day,
+ {{feature.feature_name}}_past_5_day{% endfor %})
+OPTIONS(
+ --expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL {{expiration_duration_hours}} HOUR),
+ friendly_name="v_lead_score_propensity_inference_5_1",
+ description="View Lead Score Propensity Inference dataset using 5 days back to predict 1 day ahead. View expires after 48h and should run daily.",
+ labels=[("org_unit", "development")]
+) AS
+SELECT DISTINCT
+ processed_timestamp,
+ feature_date,
+ user_pseudo_id,
+ user_id,
+ user_ltv_revenue,
+ device_category,
+ device_mobile_brand_name,
+ device_mobile_model_name,
+ device_os,
+ device_language,
+ device_web_browser,
+ geo_sub_continent,
+ geo_country,
+ geo_region,
+ geo_city,
+ geo_metro,
+ last_traffic_source_medium,
+ last_traffic_source_name,
+ last_traffic_source_source,
+ first_traffic_source_medium,
+ first_traffic_source_name,
+ first_traffic_source_source,
+ has_signed_in_with_user_id{% for feature in short_list_features %},
+ {{feature.feature_name}}_past_1_day,
+ {{feature.feature_name}}_past_2_day,
+ {{feature.feature_name}}_past_3_day,
+ {{feature.feature_name}}_past_4_day,
+ {{feature.feature_name}}_past_5_day{% endfor %}
+FROM (
+SELECT DISTINCT
+ processed_timestamp,
+ feature_date,
+ user_pseudo_id,
+ user_id,
+ user_ltv_revenue,
+ device_category,
+ device_mobile_brand_name,
+ device_mobile_model_name,
+ device_os,
+ device_language,
+ device_web_browser,
+ geo_sub_continent,
+ geo_country,
+ geo_region,
+ geo_city,
+ geo_metro,
+ last_traffic_source_medium,
+ last_traffic_source_name,
+ last_traffic_source_source,
+ first_traffic_source_medium,
+ first_traffic_source_name,
+ first_traffic_source_source,
+ has_signed_in_with_user_id,{% for feature in short_list_features %}
+ {{feature.feature_name}}_past_1_day,
+ {{feature.feature_name}}_past_2_day,
+ {{feature.feature_name}}_past_3_day,
+ {{feature.feature_name}}_past_4_day,
+ {{feature.feature_name}}_past_5_day,{% endfor %}
+ -- Row number partitioned by user pseudo id ordered by feature date descending
+ ROW_NUMBER() OVER (PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS user_row_order
+ FROM `{{project_id}}.{{dataset}}.lead_score_propensity_inference_5_1`
+)
+WHERE
+ -- Filter only for the most recent user example
+ user_row_order = 1;
+
diff --git a/sql/procedure/lead_score_propensity_label.sqlx b/sql/procedure/lead_score_propensity_label.sqlx
new file mode 100644
index 00000000..fc27c071
--- /dev/null
+++ b/sql/procedure/lead_score_propensity_label.sqlx
@@ -0,0 +1,102 @@
+-- Copyright 2023 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- http://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- Run these windows aggregations every day. For each date in training and inference date ranges.
+-- Setting procedure to lookback from the day before `input_date` until the day before `end_date`
+SET input_date = DATE_SUB(input_date, INTERVAL 1 DAY);
+SET end_date = DATE_SUB(end_date, INTERVAL 1 DAY);
+
+-- Future User metrics: 1-day future {{target_event}}s per user
+CREATE OR REPLACE TEMP TABLE future_{{target_event}}s_per_user AS (
+ SELECT
+ -- User's unique identifier
+ user_pseudo_id,
+ -- The date for which future {{target_event}}s are being calculated
+ input_date as event_date,
+ -- Calculates the maximum count of distinct events for users who made a {{target_event}}s 1 day after `input_date`
+ MAX(COUNT(DISTINCT CASE DATE_DIFF(event_date, input_date, DAY) = 1 WHEN TRUE THEN event_timestamp END)) OVER(PARTITION BY user_pseudo_id) AS {{target_event}}_day_1
+ FROM `{{mds_project_id}}.{{mds_dataset}}.event` as E
+ INNER JOIN `{{mds_project_id}}.{{mds_dataset}}.device` as D
+ ON E.device_type_id = D.device_type_id
+ -- Filters events to be within the date range defined by input_date and end_date from dates_interval
+ WHERE event_date BETWEEN input_date AND end_date
+ -- Filter event with event name {{target_event}}
+ AND LOWER(E.event_name) IN ('{{target_event}}')
+ AND E.ga_session_id IS NOT NULL
+ AND D.device_os IS NOT NULL
+ -- Grouping by user pseudo ids
+ GROUP BY user_pseudo_id
+);
+
+-- All users in the platform
+CREATE OR REPLACE TEMP TABLE all_users_possible_{{target_event}}s as (
+ SELECT DISTINCT
+ -- User's unique identifier
+ Users.user_pseudo_id,
+ -- The event date for which {{target_event}}s are being considered
+ Days.event_date as event_date,
+ -- Placeholder columns for {{target_event}} counts in future days
+ NULL as {{target_event}}_day_1
+ FROM `{{mds_project_id}}.{{mds_dataset}}.event` Users
+ CROSS JOIN
+ -- Generates a list of dates for the current date (`input_date`)
+ (SELECT event_date FROM UNNEST(GENERATE_DATE_ARRAY(input_date, end_date, INTERVAL 1 DAY)) AS event_date) Days
+ WHERE Days.event_date = input_date
+ -- Filter event with valid sessions
+ AND Users.ga_session_id IS NOT NULL
+);
+
+
+CREATE OR REPLACE TEMP TABLE DataForTargetTable AS
+SELECT DISTINCT
+ -- Timestamp when the data was processed
+ CURRENT_TIMESTAMP() AS processed_timestamp,
+ -- The date for which {{target_event}}s are being considered
+ A.event_date as feature_date,
+ -- User's unique identifier
+ A.user_pseudo_id,
+ -- The maximum of 0 and the {{target_event}} count for day 1 (if it exists)
+ LEAST(COALESCE(B.{{target_event}}_day_1, 0), 1) AS {{target_event}}_day_1
+FROM all_users_possible_{{target_event}}s AS A
+LEFT JOIN future_{{target_event}}s_per_user AS B
+ON B.user_pseudo_id = A.user_pseudo_id
+;
+
+-- Updates or inserts data into the target table
+MERGE `{{project_id}}.{{dataset}}.{{insert_table}}` I
+USING DataForTargetTable T
+ON I.feature_date = T.feature_date
+ AND I.user_pseudo_id = T.user_pseudo_id
+WHEN MATCHED THEN
+ -- Updates existing records
+ UPDATE SET
+ -- Updates the processed timestamp
+ I.processed_timestamp = T.processed_timestamp,
+ -- Updates {{target_event}} counts for each day
+ I.{{target_event}}_day_1 = T.{{target_event}}_day_1
+WHEN NOT MATCHED THEN
+ -- Inserts new records
+ INSERT
+ (processed_timestamp,
+ feature_date,
+ user_pseudo_id,
+ {{target_event}}_day_1)
+ VALUES
+ (T.processed_timestamp,
+ T.feature_date,
+ T.user_pseudo_id,
+ T.{{target_event}}_day_1)
+;
+
+SET rows_added = (SELECT COUNT(DISTINCT user_pseudo_id) FROM `{{project_id}}.{{dataset}}.{{insert_table}}`);
diff --git a/sql/procedure/lead_score_propensity_training_preparation.sqlx b/sql/procedure/lead_score_propensity_training_preparation.sqlx
new file mode 100644
index 00000000..5d0f61e3
--- /dev/null
+++ b/sql/procedure/lead_score_propensity_training_preparation.sqlx
@@ -0,0 +1,569 @@
+-- Copyright 2023 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- http://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+DECLARE custom_start_date DATE DEFAULT NULL;
+DECLARE custom_end_date DATE DEFAULT NULL;
+
+-- custom_start_date: The start date of the data to be used for training.
+-- custom_end_date: The end date of the data to be used for training.
+SET custom_start_date = PARSE_DATE("%Y-%m-%d", {{custom_start_date}});
+SET custom_end_date = PARSE_DATE("%Y-%m-%d", {{custom_end_date}});
+
+-- The procedure first checks if the custom_start_date and custom_end_date parameters are valid.
+-- If either parameter is not valid, the procedure sets the corresponding date to the maximum or
+-- minimum date of the available data.
+IF custom_start_date IS NOT NULL AND custom_start_date >= start_date AND custom_start_date <= end_date
+ AND custom_start_date < custom_end_date THEN
+ SET start_date = custom_start_date;
+END IF;
+
+IF custom_end_date IS NOT NULL AND custom_end_date <= end_date AND custom_end_date >= start_date
+ AND custom_end_date > custom_start_date THEN
+ SET end_date = custom_end_date;
+END IF;
+
+-- This is a temp table consolidating user_dimensions over the dates intervals.
+CREATE OR REPLACE TEMP TABLE training_preparation_ud as (
+ SELECT DISTINCT
+ -- The user pseudo id
+ UD.user_pseudo_id,
+ -- The user id
+ MAX(UD.user_id) OVER(user_dimensions_window) AS user_id,
+ -- The feature date
+ UD.feature_date,
+ -- The user lifetime value revenue
+ MAX(UD.user_ltv_revenue) OVER(user_dimensions_window) AS user_ltv_revenue,
+ -- The device category
+ MAX(UD.device_category) OVER(user_dimensions_window) AS device_category,
+ -- The device brand name
+ MAX(UD.device_mobile_brand_name) OVER(user_dimensions_window) AS device_mobile_brand_name,
+ -- The device model name
+ MAX(UD.device_mobile_model_name) OVER(user_dimensions_window) AS device_mobile_model_name,
+ -- The device operating system
+ MAX(UD.device_os) OVER(user_dimensions_window) AS device_os,
+ -- The device language
+ MAX(UD.device_language) OVER(user_dimensions_window) AS device_language,
+ -- The device web browser
+ MAX(UD.device_web_browser) OVER(user_dimensions_window) AS device_web_browser,
+ -- The user sub continent
+ MAX(UD.geo_sub_continent) OVER(user_dimensions_window) AS geo_sub_continent,
+ -- The user country
+ MAX(UD.geo_country) OVER(user_dimensions_window) AS geo_country,
+ -- The user region
+ MAX(UD.geo_region) OVER(user_dimensions_window) AS geo_region,
+ -- The user city
+ MAX(UD.geo_city) OVER(user_dimensions_window) AS geo_city,
+ -- The user metro
+ MAX(UD.geo_metro) OVER(user_dimensions_window) AS geo_metro,
+ -- The user last traffic source medium
+ MAX(UD.last_traffic_source_medium) OVER(user_dimensions_window) AS last_traffic_source_medium,
+ -- The user last traffic source name
+ MAX(UD.last_traffic_source_name) OVER(user_dimensions_window) AS last_traffic_source_name,
+ -- The user last traffic source source
+ MAX(UD.last_traffic_source_source) OVER(user_dimensions_window) AS last_traffic_source_source,
+ -- The user first traffic source medium
+ MAX(UD.first_traffic_source_medium) OVER(user_dimensions_window) AS first_traffic_source_medium,
+ -- The user first traffic source name
+ MAX(UD.first_traffic_source_name) OVER(user_dimensions_window) AS first_traffic_source_name,
+ -- The user first traffic source source
+ MAX(UD.first_traffic_source_source) OVER(user_dimensions_window) AS first_traffic_source_source,
+ -- Whether the user has signed in with user ID
+ MAX(UD.has_signed_in_with_user_id) OVER(user_dimensions_window) AS has_signed_in_with_user_id,
+FROM
+ `{{feature_store_project_id}}.{{feature_store_dataset}}.user_dimensions` UD
+WHERE
+ -- Filter feature dates according to the defined date interval
+ UD.feature_date BETWEEN start_date AND end_date
+WINDOW
+ user_dimensions_window AS (PARTITION BY UD.user_pseudo_id, UD.feature_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
+);
+
+-- This is a temp table consolidating user rolling metrics over the dates intervals.
+CREATE OR REPLACE TEMP TABLE training_preparation_uwlm as (
+ SELECT DISTINCT
+ -- User pseudo id
+ UWLM.user_pseudo_id,
+ -- Feature date
+ UWLM.feature_date{% for feature in short_list_features %},
+ -- Calculate the maximum value for each metric over the window
+ MAX(UWLM.{{feature.feature_name}}_past_1_day) OVER(user_rolling_lead_window) AS {{feature.feature_name}}_past_1_day,
+ MAX(UWLM.{{feature.feature_name}}_past_2_day) OVER(user_rolling_lead_window) AS {{feature.feature_name}}_past_2_day,
+ MAX(UWLM.{{feature.feature_name}}_past_3_day) OVER(user_rolling_lead_window) AS {{feature.feature_name}}_past_3_day,
+ MAX(UWLM.{{feature.feature_name}}_past_4_day) OVER(user_rolling_lead_window) AS {{feature.feature_name}}_past_4_day,
+ MAX(UWLM.{{feature.feature_name}}_past_5_day) OVER(user_rolling_lead_window) AS {{feature.feature_name}}_past_5_day{% endfor %}
+FROM
+ `{{feature_store_project_id}}.{{feature_store_dataset}}.user_rolling_window_lead_metrics` UWLM
+WHERE
+ -- In the future consider `feature_date BETWEEN start_date AND end_date`, to process multiple days. Modify Partition BY
+ UWLM.feature_date BETWEEN start_date AND end_date
+WINDOW
+ user_rolling_lead_window AS (PARTITION BY UWLM.user_pseudo_id, UWLM.feature_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
+);
+
+-- This is a temp table consolidating user labels over the dates intervals.
+CREATE OR REPLACE TEMP TABLE training_preparation_label as (
+ SELECT DISTINCT
+ LABEL.user_pseudo_id, -- The unique identifier for the user.
+ LABEL.feature_date, -- The date for which the features are extracted.
+ MAX(LABEL.{{target_event}}_day_1) OVER(lead_score_propensity_label_window) AS {{target_event}}_day_1, -- Whether the user made a {{target_event}} on day 1.
+FROM
+ `{{feature_store_project_id}}.{{feature_store_dataset}}.lead_score_propensity_label` LABEL
+WHERE
+ -- Define the training subset interval
+ LABEL.feature_date BETWEEN start_date AND end_date
+WINDOW
+ lead_score_propensity_label_window AS (PARTITION BY LABEL.user_pseudo_id, LABEL.feature_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
+);
+
+-- This is a temp table consolidating all features and labels over the dates intervals.
+CREATE OR REPLACE TEMP TABLE training_preparation as (
+ SELECT DISTINCT
+ UD.user_pseudo_id,
+ UD.user_id,
+ UD.feature_date,
+ COALESCE(UD.user_ltv_revenue, 0.0) AS user_ltv_revenue,
+ UD.device_category,
+ UD.device_mobile_brand_name,
+ UD.device_mobile_model_name,
+ UD.device_os,
+ UD.device_language,
+ UD.device_web_browser,
+ UD.geo_sub_continent,
+ UD.geo_country,
+ UD.geo_region,
+ UD.geo_city,
+ UD.geo_metro,
+ UD.last_traffic_source_medium,
+ UD.last_traffic_source_name,
+ UD.last_traffic_source_source,
+ UD.first_traffic_source_medium,
+ UD.first_traffic_source_name,
+ UD.first_traffic_source_source,
+ UD.has_signed_in_with_user_id,{% for feature in short_list_features %}
+ UWLM.{{feature.feature_name}}_past_1_day,
+ UWLM.{{feature.feature_name}}_past_2_day,
+ UWLM.{{feature.feature_name}}_past_3_day,
+ UWLM.{{feature.feature_name}}_past_4_day,
+ UWLM.{{feature.feature_name}}_past_5_day,{% endfor %}
+ LABEL.{{target_event}}_day_1
+FROM
+ training_preparation_ud UD
+INNER JOIN
+ training_preparation_uwlm UWLM
+ON
+ UWLM.user_pseudo_id = UD.user_pseudo_id
+ AND UWLM.feature_date = UD.feature_date
+INNER JOIN
+ training_preparation_label LABEL
+ON
+ LABEL.user_pseudo_id = UD.user_pseudo_id
+ AND LABEL.feature_date = UD.feature_date
+);
+
+-- This is a temp table split the rows in each different data_split (TRAIN, VALIDATE, TEST) split
+CREATE OR REPLACE TEMP TABLE DataForTargetTable AS(
+ SELECT DISTINCT
+ CASE
+ WHEN (ABS(MOD(FARM_FINGERPRINT(user_pseudo_id), 10)) BETWEEN 0 AND train_split_end_number) THEN "TRAIN"
+ WHEN (ABS(MOD(FARM_FINGERPRINT(user_pseudo_id), 10)) BETWEEN train_split_end_number AND validation_split_end_number) THEN "VALIDATE"
+ WHEN (ABS(MOD(FARM_FINGERPRINT(user_pseudo_id), 10)) BETWEEN validation_split_end_number AND 9) THEN "TEST"
+ END as data_split,
+ feature_date,
+ user_pseudo_id,
+ user_id,
+ user_ltv_revenue,
+ device_category,
+ device_mobile_brand_name,
+ device_mobile_model_name,
+ device_os,
+ device_language,
+ device_web_browser,
+ geo_sub_continent,
+ geo_country,
+ geo_region,
+ geo_city,
+ geo_metro,
+ last_traffic_source_medium,
+ last_traffic_source_name,
+ last_traffic_source_source,
+ first_traffic_source_medium,
+ first_traffic_source_name,
+ first_traffic_source_source,
+ has_signed_in_with_user_id,{% for feature in short_list_features %}
+ {{feature.feature_name}}_past_1_day,
+ {{feature.feature_name}}_past_2_day,
+ {{feature.feature_name}}_past_3_day,
+ {{feature.feature_name}}_past_4_day,
+ {{feature.feature_name}}_past_5_day,{% endfor %}
+ {{target_event}}_day_1
+ FROM training_preparation);
+
+CREATE OR REPLACE TABLE `{{project_id}}.{{dataset}}.lead_score_propensity_training_full_dataset` AS
+SELECT DISTINCT * FROM DataForTargetTable
+WHERE data_split IS NOT NULL;
+
+
+-- This is a table preparing rows for lead score propensity modelling looking back 5 days and looking ahead 1 day.
+CREATE OR REPLACE TABLE `{{project_id}}.{{dataset}}.lead_score_propensity_training_5_1` AS(
+ SELECT DISTINCT
+ CURRENT_TIMESTAMP() AS processed_timestamp,
+ data_split,
+ feature_date,
+ user_pseudo_id,
+ LAST_VALUE(user_id) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS user_id,
+ LAST_VALUE(user_ltv_revenue) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS user_ltv_revenue,
+ LAST_VALUE(device_category) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_category,
+ LAST_VALUE(device_mobile_brand_name) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_mobile_brand_name,
+ LAST_VALUE(device_mobile_model_name) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_mobile_model_name,
+ LAST_VALUE(device_os) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_os,
+ LAST_VALUE(device_language) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_language,
+ LAST_VALUE(device_web_browser) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_web_browser,
+ LAST_VALUE(geo_sub_continent) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_sub_continent,
+ LAST_VALUE(geo_country) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_country,
+ LAST_VALUE(geo_region) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_region,
+ LAST_VALUE(geo_city) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_city,
+ LAST_VALUE(geo_metro) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_metro,
+ LAST_VALUE(last_traffic_source_medium) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS last_traffic_source_medium,
+ LAST_VALUE(last_traffic_source_name) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS last_traffic_source_name,
+ LAST_VALUE(last_traffic_source_source) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS last_traffic_source_source,
+ LAST_VALUE(first_traffic_source_medium) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS first_traffic_source_medium,
+ LAST_VALUE(first_traffic_source_name) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS first_traffic_source_name,
+ LAST_VALUE(first_traffic_source_source) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS first_traffic_source_source,
+ LAST_VALUE(has_signed_in_with_user_id) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS has_signed_in_with_user_id,{% for feature in short_list_features %}
+ LAST_VALUE({{feature.feature_name}}_past_1_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS {{feature.feature_name}}_past_1_day,
+ LAST_VALUE({{feature.feature_name}}_past_2_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS {{feature.feature_name}}_past_2_day,
+ LAST_VALUE({{feature.feature_name}}_past_3_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS {{feature.feature_name}}_past_3_day,
+ LAST_VALUE({{feature.feature_name}}_past_4_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS {{feature.feature_name}}_past_4_day,
+ LAST_VALUE({{feature.feature_name}}_past_5_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS {{feature.feature_name}}_past_5_day,{% endfor %}
+ -- Calculate the will {{target_event}} label.
+ -- Label for a lead score propensity model. It indicates whether a user made a lead score within the next 30 days based on their lead score history.
+ -- This label is then used to train a model that can predict the likelihood of future {{target_event}}s for other users.
+ LAST_VALUE(CASE WHEN ({{target_event}}_day_1) = 0 THEN 0 ELSE 1 END) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) as will_{{target_event}}
+ FROM `{{project_id}}.{{dataset}}.lead_score_propensity_training_full_dataset`
+);
+
+
+-- This is a view preparing rows for lead score propensity modelling looking back 5 days and looking ahead 1 days.
+CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_lead_score_propensity_training_5_1`
+(processed_timestamp,
+ data_split,
+ user_pseudo_id,
+ user_id,
+ user_ltv_revenue,
+ device_category,
+ device_mobile_brand_name,
+ device_mobile_model_name,
+ device_os,
+ device_language,
+ device_web_browser,
+ geo_sub_continent,
+ geo_country,
+ geo_region,
+ geo_city,
+ geo_metro,
+ last_traffic_source_medium,
+ last_traffic_source_name,
+ last_traffic_source_source,
+ first_traffic_source_medium,
+ first_traffic_source_name,
+ first_traffic_source_source,
+ has_signed_in_with_user_id,{% for feature in short_list_features %}
+ {{feature.feature_name}}_past_1_day,
+ {{feature.feature_name}}_past_2_day,
+ {{feature.feature_name}}_past_3_day,
+ {{feature.feature_name}}_past_4_day,
+ {{feature.feature_name}}_past_5_day,{% endfor %}
+ will_{{target_event}})
+OPTIONS(
+ --expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
+ friendly_name="v_lead_score_propensity_training_5_1",
+ description="View Lead Score Propensity Training dataset using 5 days back to predict 1 day ahead. View expires after 48h and should run daily.",
+ labels=[("org_unit", "development")]
+) AS
+SELECT DISTINCT
+ * EXCEPT(feature_date, row_order_peruser_persplit)
+FROM (
+SELECT DISTINCT
+ processed_timestamp,
+ user_pseudo_id,
+ data_split,
+ feature_date,
+ -- Now, I want to skip rows per user, per split every 1 day.
+ ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, data_split, will_{{target_event}} ORDER BY feature_date ASC) AS row_order_peruser_persplit,
+ user_id,
+ user_ltv_revenue,
+ device_category,
+ device_mobile_brand_name,
+ device_mobile_model_name,
+ device_os,
+ device_language,
+ device_web_browser,
+ geo_sub_continent,
+ geo_country,
+ geo_region,
+ geo_city,
+ geo_metro,
+ last_traffic_source_medium,
+ last_traffic_source_name,
+ last_traffic_source_source,
+ first_traffic_source_medium,
+ first_traffic_source_name,
+ first_traffic_source_source,
+ has_signed_in_with_user_id,{% for feature in short_list_features %}
+ {{feature.feature_name}}_past_1_day,
+ {{feature.feature_name}}_past_2_day,
+ {{feature.feature_name}}_past_3_day,
+ {{feature.feature_name}}_past_4_day,
+ {{feature.feature_name}}_past_5_day,{% endfor %}
+ will_{{target_event}}
+FROM(
+SELECT DISTINCT
+ processed_timestamp,
+ data_split,
+ feature_date,
+ user_pseudo_id,
+ user_id,
+ user_ltv_revenue,
+ device_category,
+ device_mobile_brand_name,
+ device_mobile_model_name,
+ device_os,
+ device_language,
+ device_web_browser,
+ geo_sub_continent,
+ geo_country,
+ geo_region,
+ geo_city,
+ geo_metro,
+ last_traffic_source_medium,
+ last_traffic_source_name,
+ last_traffic_source_source,
+ first_traffic_source_medium,
+ first_traffic_source_name,
+ first_traffic_source_source,
+ has_signed_in_with_user_id,{% for feature in short_list_features %}
+ {{feature.feature_name}}_past_1_day,
+ {{feature.feature_name}}_past_2_day,
+ {{feature.feature_name}}_past_3_day,
+ {{feature.feature_name}}_past_4_day,
+ {{feature.feature_name}}_past_5_day,{% endfor %}
+ will_{{target_event}},
+ -- Number of rows per user, per day, per split. Only one row per user, per day, per slip.
+ ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, feature_date, data_split, will_{{target_event}} ORDER BY feature_date DESC) AS row_order_peruser_perday_persplit
+ FROM `{{project_id}}.{{dataset}}.lead_score_propensity_training_5_1`
+)
+WHERE
+ row_order_peruser_perday_persplit = 1
+)
+WHERE
+ --Skipping windows of 5 days, which is the past window size.
+ MOD(row_order_peruser_persplit-1, 5) = 0;
+
+
+-- This is a view preparing rows for lead score propensity modelling looking back 5 days and looking ahead 1 day.
+-- This specifically filter rows which are most recent for each user.
+CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_lead_score_propensity_training_5_1_last_window`
+(processed_timestamp,
+ data_split,
+ user_pseudo_id,
+ user_id,
+ user_ltv_revenue,
+ device_category,
+ device_mobile_brand_name,
+ device_mobile_model_name,
+ device_os,
+ device_language,
+ device_web_browser,
+ geo_sub_continent,
+ geo_country,
+ geo_region,
+ geo_city,
+ geo_metro,
+ last_traffic_source_medium,
+ last_traffic_source_name,
+ last_traffic_source_source,
+ first_traffic_source_medium,
+ first_traffic_source_name,
+ first_traffic_source_source,
+ has_signed_in_with_user_id,{% for feature in short_list_features %}
+ {{feature.feature_name}}_past_1_day,
+ {{feature.feature_name}}_past_2_day,
+ {{feature.feature_name}}_past_3_day,
+ {{feature.feature_name}}_past_4_day,
+ {{feature.feature_name}}_past_5_day,{% endfor %}
+ will_{{target_event}})
+OPTIONS(
+ --expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
+ friendly_name="v_lead_score_propensity_training_5_1_last_window",
+ description="View Lead Score Propensity Training dataset using 5 days back to predict 1 day ahead.",
+ labels=[("org_unit", "development")]
+) AS
+SELECT DISTINCT
+ processed_timestamp,
+ data_split,
+ user_pseudo_id,
+ user_id,
+ user_ltv_revenue,
+ device_category,
+ device_mobile_brand_name,
+ device_mobile_model_name,
+ device_os,
+ device_language,
+ device_web_browser,
+ geo_sub_continent,
+ geo_country,
+ geo_region,
+ geo_city,
+ geo_metro,
+ last_traffic_source_medium,
+ last_traffic_source_name,
+ last_traffic_source_source,
+ first_traffic_source_medium,
+ first_traffic_source_name,
+ first_traffic_source_source,
+ has_signed_in_with_user_id,{% for feature in short_list_features %}
+ {{feature.feature_name}}_past_1_day,
+ {{feature.feature_name}}_past_2_day,
+ {{feature.feature_name}}_past_3_day,
+ {{feature.feature_name}}_past_4_day,
+ {{feature.feature_name}}_past_5_day,{% endfor %}
+ will_{{target_event}}
+FROM(
+SELECT DISTINCT
+ processed_timestamp,
+ data_split,
+ user_pseudo_id,
+ user_id,
+ user_ltv_revenue,
+ device_category,
+ device_mobile_brand_name,
+ device_mobile_model_name,
+ device_os,
+ device_language,
+ device_web_browser,
+ geo_sub_continent,
+ geo_country,
+ geo_region,
+ geo_city,
+ geo_metro,
+ last_traffic_source_medium,
+ last_traffic_source_name,
+ last_traffic_source_source,
+ first_traffic_source_medium,
+ first_traffic_source_name,
+ first_traffic_source_source,
+ has_signed_in_with_user_id,{% for feature in short_list_features %}
+ {{feature.feature_name}}_past_1_day,
+ {{feature.feature_name}}_past_2_day,
+ {{feature.feature_name}}_past_3_day,
+ {{feature.feature_name}}_past_4_day,
+ {{feature.feature_name}}_past_5_day,{% endfor %}
+ will_{{target_event}},
+ ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, data_split, will_{{target_event}} ORDER BY feature_date DESC) AS user_row_order
+ --ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, data_split ORDER BY feature_date DESC) AS user_row_order
+ FROM `{{project_id}}.{{dataset}}.lead_score_propensity_training_5_1`
+)
+WHERE
+ user_row_order = 1;
+
+
+-- This is a view preparing rows for lead score propensity modelling looking back 5 days and looking ahead 1 day.
+-- This is to be used in case recently no {{target_event}}s are registered, and you don't have a way to train the classification model.
+CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_lead_score_propensity_training_5_1_rare_{{target_event}}s`
+(processed_timestamp,
+ data_split,
+ user_pseudo_id,
+ user_id,
+ user_ltv_revenue,
+ device_category,
+ device_mobile_brand_name,
+ device_mobile_model_name,
+ device_os,
+ device_language,
+ device_web_browser,
+ geo_sub_continent,
+ geo_country,
+ geo_region,
+ geo_city,
+ geo_metro,
+ last_traffic_source_medium,
+ last_traffic_source_name,
+ last_traffic_source_source,
+ first_traffic_source_medium,
+ first_traffic_source_name,
+ first_traffic_source_source,
+ has_signed_in_with_user_id,{% for feature in short_list_features %}
+ {{feature.feature_name}}_past_1_day,
+ {{feature.feature_name}}_past_2_day,
+ {{feature.feature_name}}_past_3_day,
+ {{feature.feature_name}}_past_4_day,
+ {{feature.feature_name}}_past_5_day,{% endfor %}
+ will_{{target_event}})
+OPTIONS(
+ --expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
+ friendly_name="v_lead_score_propensity_training_5_1_rare_{{target_event}}s",
+ description="View Lead Score Propensity Training dataset using 5 days back to predict 1 day ahead.",
+ labels=[("org_unit", "development")]
+) AS
+SELECT DISTINCT
+ processed_timestamp,
+ data_split,
+ user_pseudo_id,
+ user_id,
+ user_ltv_revenue,
+ device_category,
+ device_mobile_brand_name,
+ device_mobile_model_name,
+ device_os,
+ device_language,
+ device_web_browser,
+ geo_sub_continent,
+ geo_country,
+ geo_region,
+ geo_city,
+ geo_metro,
+ last_traffic_source_medium,
+ last_traffic_source_name,
+ last_traffic_source_source,
+ first_traffic_source_medium,
+ first_traffic_source_name,
+ first_traffic_source_source,
+ has_signed_in_with_user_id,{% for feature in short_list_features %}
+ {{feature.feature_name}}_past_1_day,
+ {{feature.feature_name}}_past_2_day,
+ {{feature.feature_name}}_past_3_day,
+ {{feature.feature_name}}_past_4_day,
+ {{feature.feature_name}}_past_5_day,{% endfor %}
+ will_{{target_event}}
+ FROM
+ (SELECT DISTINCT
+ *
+ FROM `{{project_id}}.{{dataset}}.v_lead_score_propensity_training_5_1_last_window`
+ )
+ UNION ALL
+ (
+ SELECT DISTINCT
+ * EXCEPT(user_row_order, feature_date)
+ FROM(
+ SELECT DISTINCT
+ *,
+ ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, data_split ORDER BY feature_date DESC) AS user_row_order
+ FROM `{{project_id}}.{{dataset}}.lead_score_propensity_training_5_1`
+ WHERE will_{{target_event}} = 1
+ )
+ WHERE
+ user_row_order = 1
+ LIMIT 100
+ )
+;
\ No newline at end of file
diff --git a/sql/procedure/user_rolling_window_lead_metrics.sqlx b/sql/procedure/user_rolling_window_lead_metrics.sqlx
new file mode 100644
index 00000000..26e25155
--- /dev/null
+++ b/sql/procedure/user_rolling_window_lead_metrics.sqlx
@@ -0,0 +1,129 @@
+-- Copyright 2023 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- http://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- Setting procedure to lookback from the day before `input_date` until the day before `end_date`
+-- Subtract one day from `input_date`
+SET input_date = DATE_SUB(input_date, INTERVAL 1 DAY);
+-- Subtract one day from `end_date`
+SET end_date = DATE_SUB(end_date, INTERVAL 1 DAY);
+
+{% for feature in short_list_features %}
+-- Past User metrics: 1-day {{feature.feature_name}} events per user, 2-5-day {{feature.feature_name}} events per user
+-- Create a temporary table `rolling_{{feature.feature_name}}_past_days` to store the rolling {{feature.feature_name}} events count for each user
+CREATE OR REPLACE TEMP TABLE rolling_{{feature.feature_name}}_past_days AS (
+SELECT
+ -- User's unique identifier
+ user_pseudo_id,
+ -- Calculate the number of {{feature.feature_name}} made in the past 1 day
+ MAX(COUNT(DISTINCT CASE WHEN DATE_DIFF(input_date, event_date, DAY) = 1 THEN event_timestamp END)) OVER(PARTITION BY user_pseudo_id) AS {{feature.feature_name}}_past_1_day,
+ -- Calculate the number of {{feature.feature_name}} made in the past 2 days
+ MAX(COUNT(DISTINCT CASE WHEN DATE_DIFF(input_date, event_date, DAY) = 2 THEN event_timestamp END)) OVER(PARTITION BY user_pseudo_id) AS {{feature.feature_name}}_past_2_day,
+ -- Calculate the number of {{feature.feature_name}} made in the past 3 days
+ MAX(COUNT(DISTINCT CASE WHEN DATE_DIFF(input_date, event_date, DAY) = 3 THEN event_timestamp END)) OVER(PARTITION BY user_pseudo_id) AS {{feature.feature_name}}_past_3_day,
+ -- Calculate the number of {{feature.feature_name}} made in the past 4 days
+ MAX(COUNT(DISTINCT CASE WHEN DATE_DIFF(input_date, event_date, DAY) = 4 THEN event_timestamp END)) OVER(PARTITION BY user_pseudo_id) AS {{feature.feature_name}}_past_4_day,
+ -- Calculate the number of {{feature.feature_name}} made in the past 5 days
+ MAX(COUNT(DISTINCT CASE WHEN DATE_DIFF(input_date, event_date, DAY) = 5 THEN event_timestamp END)) OVER(PARTITION BY user_pseudo_id) AS {{feature.feature_name}}_past_5_day
+FROM `{{mds_project_id}}.{{mds_dataset}}.event` as E
+-- Filter events within the defined date range
+WHERE event_date BETWEEN end_date AND input_date
+-- Filter for {{feature.feature_name}} events
+AND event_name='{{feature.feature_name}}'
+-- Ensure valid session ID
+AND ga_session_id IS NOT NULL
+-- Group the results by user pseudo ID
+GROUP BY user_pseudo_id
+);
+
+{% endfor %}
+
+-- All users in the platform
+CREATE OR REPLACE TEMP TABLE events_users_days as (
+ SELECT DISTINCT
+ -- User pseudo ID
+ Users.user_pseudo_id,
+ -- distinct event date
+ Days.event_date as event_date
+ FROM `{{mds_project_id}}.{{mds_dataset}}.event` Users
+ -- 'Days' is an alias for a temporary table containing distinct event dates
+ CROSS JOIN
+ (SELECT DISTINCT event_date FROM `{{mds_project_id}}.{{mds_dataset}}.event`) Days
+ INNER JOIN `{{mds_project_id}}.{{mds_dataset}}.device` as D
+ ON Users.device_type_id = D.device_type_id
+ -- Exclude events without a valid session ID
+ WHERE Users.ga_session_id IS NOT NULL
+ -- Exclude events without a valid device operating system
+ AND D.device_os IS NOT NULL
+ -- Filter events within the defined date range
+ AND Days.event_date BETWEEN end_date AND input_date)
+;
+
+-- Create a temporary table to store data for the target table
+CREATE OR REPLACE TEMP TABLE DataForTargetTable AS
+SELECT DISTINCT
+ -- Current timestamp
+ CURRENT_TIMESTAMP() AS processed_timestamp,
+ -- Feature date
+ input_date AS feature_date,
+ -- User pseudo ID
+ EUD.user_pseudo_id{% for feature in short_list_features %},
+ COALESCE({{feature.feature_name}}_past_1_day,0) AS {{feature.feature_name}}_past_1_day,
+ COALESCE({{feature.feature_name}}_past_2_day,0) AS {{feature.feature_name}}_past_2_day,
+ COALESCE({{feature.feature_name}}_past_3_day,0) AS {{feature.feature_name}}_past_3_day,
+ COALESCE({{feature.feature_name}}_past_4_day,0) AS {{feature.feature_name}}_past_4_day,
+ COALESCE({{feature.feature_name}}_past_5_day,0) AS {{feature.feature_name}}_past_5_day{% endfor %}
+ FROM events_users_days AS EUD{% for feature in short_list_features %}
+ FULL OUTER JOIN rolling_{{feature.feature_name}}_past_days AS {{feature.feature_name}}
+ ON EUD.user_pseudo_id = {{feature.feature_name}}.user_pseudo_id{% endfor %}
+ -- Exclude rows without a valid user pseudo ID
+ WHERE EUD.user_pseudo_id IS NOT NULL
+ ;
+
+-- Merge data into the target table
+MERGE `{{project_id}}.{{dataset}}.{{insert_table}}` I
+USING DataForTargetTable T
+ON I.feature_date = T.feature_date
+ AND I.user_pseudo_id = T.user_pseudo_id
+WHEN MATCHED THEN
+ UPDATE SET
+ -- Update the processed timestamp and rolling window features
+ I.processed_timestamp = T.processed_timestamp{% for feature in short_list_features %},
+ I.{{feature.feature_name}}_past_1_day = T.{{feature.feature_name}}_past_1_day,
+ I.{{feature.feature_name}}_past_2_day = T.{{feature.feature_name}}_past_2_day,
+ I.{{feature.feature_name}}_past_3_day = T.{{feature.feature_name}}_past_3_day,
+ I.{{feature.feature_name}}_past_4_day = T.{{feature.feature_name}}_past_4_day,
+ I.{{feature.feature_name}}_past_5_day = T.{{feature.feature_name}}_past_5_day{% endfor %}
+WHEN NOT MATCHED THEN
+ INSERT
+ (processed_timestamp,
+ feature_date,
+ user_pseudo_id{% for feature in short_list_features %},
+ {{feature.feature_name}}_past_1_day,
+ {{feature.feature_name}}_past_2_day,
+ {{feature.feature_name}}_past_3_day,
+ {{feature.feature_name}}_past_4_day,
+ {{feature.feature_name}}_past_5_day{% endfor %})
+ VALUES
+ (T.processed_timestamp,
+ T.feature_date,
+ T.user_pseudo_id{% for feature in short_list_features %},
+ T.{{feature.feature_name}}_past_1_day,
+ T.{{feature.feature_name}}_past_2_day,
+ T.{{feature.feature_name}}_past_3_day,
+ T.{{feature.feature_name}}_past_4_day,
+ T.{{feature.feature_name}}_past_5_day{% endfor %})
+;
+
+-- Set a variable to track the number of rows added
+SET rows_added = (SELECT COUNT(DISTINCT user_pseudo_id) FROM `{{project_id}}.{{dataset}}.{{insert_table}}`);
diff --git a/sql/query/create_gemini_model.sqlx b/sql/query/create_gemini_model.sqlx
index 84612d8f..4e365c4a 100644
--- a/sql/query/create_gemini_model.sqlx
+++ b/sql/query/create_gemini_model.sqlx
@@ -18,6 +18,6 @@
-- Your supervised tuning computations also occur in the europe-west4 region, because that's where TPU resources are located.
-- Create a {{endpoint_name}} model using a remote connection to {{region}}.{{connection_name}}
-CREATE OR REPLACE MODEL `{{project_id}}.{{dataset}}.{{model_name}}`
+CREATE MODEL IF NOT EXISTS `{{project_id}}.{{dataset}}.{{model_name}}`
REMOTE WITH CONNECTION `{{project_id}}.{{region}}.{{connection_name}}`
OPTIONS (ENDPOINT = '{{endpoint_name}}');
\ No newline at end of file
diff --git a/sql/query/invoke_backfill_churn_propensity_label.sqlx b/sql/query/invoke_backfill_churn_propensity_label.sqlx
index 4cbe77ac..9dd41da7 100644
--- a/sql/query/invoke_backfill_churn_propensity_label.sqlx
+++ b/sql/query/invoke_backfill_churn_propensity_label.sqlx
@@ -119,7 +119,13 @@ GROUP BY
);
-- Insert data into the target table, combining user information with churn and bounce status
-INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}`
+INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}` (
+ processed_timestamp,
+ feature_date,
+ user_pseudo_id,
+ churned,
+ bounced
+)
SELECT DISTINCT
-- Current timestamp as the processing timestamp
CURRENT_TIMESTAMP() AS processed_timestamp,
diff --git a/sql/query/invoke_backfill_customer_lifetime_value_label.sqlx b/sql/query/invoke_backfill_customer_lifetime_value_label.sqlx
index 27ea59d0..569e5db5 100644
--- a/sql/query/invoke_backfill_customer_lifetime_value_label.sqlx
+++ b/sql/query/invoke_backfill_customer_lifetime_value_label.sqlx
@@ -109,7 +109,14 @@ CREATE OR REPLACE TEMP TABLE future_revenue_per_user AS (
);
-- Insert data into the target table
-INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}`
+INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}` (
+ processed_timestamp,
+ feature_date,
+ user_pseudo_id,
+ pltv_revenue_30_days,
+ pltv_revenue_90_days,
+ pltv_revenue_180_days
+)
SELECT DISTINCT
-- Current timestamp of the processing
CURRENT_TIMESTAMP() AS processed_timestamp,
diff --git a/sql/query/invoke_backfill_lead_score_propensity_label.sqlx b/sql/query/invoke_backfill_lead_score_propensity_label.sqlx
new file mode 100644
index 00000000..eba85784
--- /dev/null
+++ b/sql/query/invoke_backfill_lead_score_propensity_label.sqlx
@@ -0,0 +1,116 @@
+-- Copyright 2023 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- http://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- Declares a variable to store the maximum date for analysis
+DECLARE max_date DATE;
+-- Declares a variable to store the minimum date for analysis
+DECLARE min_date DATE;
+-- Sets the max_date variable to the latest event_date minus a specified number of days ({{interval_max_date}}) from the 'event' table
+SET max_date = (SELECT DATE_SUB(MAX(event_date), INTERVAL {{interval_max_date}} DAY) FROM `{{mds_project_id}}.{{mds_dataset}}.event`);
+-- Sets the min_date variable to the earliest event_date plus a specified number of days ({{interval_min_date}}) from the 'event' table
+SET min_date = (SELECT DATE_ADD(MIN(event_date), INTERVAL {{interval_min_date}} DAY) FROM `{{mds_project_id}}.{{mds_dataset}}.event`);
+
+-- If min_date > maximum event_date OR max_date < minimum event_date, then set min_date for the max event_date and set max_date for the min event_date
+IF min_date >= (SELECT MAX(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`) OR max_date <= (SELECT MIN(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`) OR min_date >= max_date THEN
+ SET min_date = (SELECT MIN(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`);
+ SET max_date = (SELECT MAX(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`);
+END IF;
+
+-- This code block acts as a safeguard to ensure that the min_date and max_date used for further analysis are always within the bounds of the actual data available in the table.
+-- It prevents situations where calculations might mistakenly consider dates beyond the real data range, which could lead to errors or misleading results.
+IF max_date > (SELECT MAX(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`) OR min_date < (SELECT MIN(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`) THEN
+ SET min_date = (SELECT MIN(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`);
+ SET max_date = (SELECT MAX(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`);
+END IF;
+
+-- Creates a temporary table called dates_interval to store distinct event dates and their corresponding end dates
+CREATE OR REPLACE TEMP TABLE dates_interval as (
+ SELECT DISTINCT
+ -- Selects the distinct event_date and assigns it to the column input_date
+ event_date as input_date,
+ -- Calculates the end date by adding a specified number of days ({{interval_end_date}}) to the input_date
+ DATE_ADD(event_date, INTERVAL {{interval_end_date}} DAY) as end_date
+ FROM `{{mds_project_id}}.{{mds_dataset}}.event`
+ -- Filters the events to include only those within the defined date range (between min_date and max_date)
+ WHERE event_date BETWEEN min_date AND max_date
+ ORDER BY input_date DESC
+);
+
+-- All users in the platform
+-- Creates a temporary table called all_users_possible_{{target_event}}s to store user {{target_event}} data
+CREATE OR REPLACE TEMP TABLE all_users_possible_{{target_event}}s as (
+ SELECT DISTINCT
+ -- Selects the user_pseudo_id from the 'event' table and assigns it to the column user_pseudo_id
+ Users.user_pseudo_id,
+ -- Selects the event_date from the date array generated using GENERATE_DATE_ARRAY and assigns it to the column feature_date
+ DI.event_date as feature_date,
+ -- Creates a series of columns ({{target_event}}_day_1) and initializes them with NULL values
+ -- These columns will be populated later with {{target_event}} data for specific days
+ NULL as {{target_event}}_day_1
+ FROM `{{mds_project_id}}.{{mds_dataset}}.event` Users
+ -- Performs a cross join with a subquery that generates a date array using GENERATE_DATE_ARRAY
+ -- The date array includes dates from min_date to max_date with a 1-day interval
+ CROSS JOIN (SELECT event_date FROM UNNEST(GENERATE_DATE_ARRAY(min_date, max_date, INTERVAL 1 DAY)) as event_date) as DI
+ -- Filters the data to include events where event_name is '{{target_event}}'
+ WHERE LOWER(Users.event_name) IN ('{{target_event}}')
+ AND Users.ga_session_id IS NOT NULL
+ );
+
+-- Creates a temporary table called future_{{target_event}}s_per_user to store user {{target_event}} data in the future
+-- Future User metrics: 1-7-day future {{target_event}}s per user, 1-day future {{target_event}}s per user
+CREATE OR REPLACE TEMP TABLE future_{{target_event}}s_per_user AS (
+ SELECT
+ -- Selects user_pseudo_id from the event table and assigns it to column user_pseudo_id
+ user_pseudo_id,
+ -- Selects input_date from the dates_interval table and assigns it to column feature_date
+ input_date as feature_date,
+ -- This calculation is performed over a window partitioned by user_pseudo_id and input_date
+ -- Repeats the above logic for different day offsets (1) to calculate future {{target_event}} counts for different days
+ MAX(COUNT(DISTINCT CASE DATE_DIFF(event_date, input_date, DAY) = 1 WHEN TRUE THEN ecommerce.transaction_id END)) OVER(PARTITION BY user_pseudo_id, input_date) AS {{target_event}}_day_1
+ FROM `{{mds_project_id}}.{{mds_dataset}}.event` as E
+ INNER JOIN `{{mds_project_id}}.{{mds_dataset}}.device` as D
+ ON E.device_type_id = D.device_type_id
+ CROSS JOIN dates_interval as DI
+ -- Filters events to be within the date range defined by input_date and end_date from dates_interval
+ WHERE E.event_date BETWEEN DI.input_date AND DI.end_date
+ AND LOWER(E.event_name) IN ('{{target_event}}')
+ AND E.ga_session_id IS NOT NULL
+ AND D.device_os IS NOT NULL
+ -- Groups the result by user_pseudo_id and feature_date
+ GROUP BY user_pseudo_id, feature_date
+);
+
+-- Inserts data into the target table
+INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}` (
+ processed_timestamp,
+ feature_date,
+ user_pseudo_id,
+ {{target_event}}_day_1
+)
+SELECT DISTINCT
+ -- Selects the current timestamp and assigns it to the column processed_timestamp
+ CURRENT_TIMESTAMP() AS processed_timestamp,
+ -- Selects the feature_date from the all_users_possible_{{target_event}}s table and assigns it to the column feature_date
+ A.feature_date,
+ -- Selects the user_pseudo_id from the all_users_possible_{{target_event}}s table and assigns it to the column user_pseudo_id
+ A.user_pseudo_id,
+ -- Uses the LEAST function to get the minimum value between the coalesced value of {{target_event}}_day_1 and 1
+ -- COALESCE is used to handle null values, replacing them with 0
+ -- This pattern is repeated for {{target_event}}_day_1 to populate the respective columns
+ LEAST(COALESCE(B.{{target_event}}_day_1, 0), 1) AS {{target_event}}_day_1
+FROM all_users_possible_{{target_event}}s AS A
+-- Performs a left join with the future_{{target_event}}s_per_user table (aliased as B) using user_pseudo_id and feature_date
+LEFT JOIN future_{{target_event}}s_per_user AS B
+ON B.user_pseudo_id = A.user_pseudo_id AND B.feature_date = A.feature_date
+;
\ No newline at end of file
diff --git a/sql/query/invoke_backfill_purchase_propensity_label.sqlx b/sql/query/invoke_backfill_purchase_propensity_label.sqlx
index a2c8bee0..b062dc58 100644
--- a/sql/query/invoke_backfill_purchase_propensity_label.sqlx
+++ b/sql/query/invoke_backfill_purchase_propensity_label.sqlx
@@ -125,7 +125,26 @@ CREATE OR REPLACE TEMP TABLE future_purchases_per_user AS (
);
-- Inserts data into the target table
-INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}`
+INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}` (
+ processed_timestamp,
+ feature_date,
+ user_pseudo_id,
+ purchase_day_1,
+ purchase_day_2,
+ purchase_day_3,
+ purchase_day_4,
+ purchase_day_5,
+ purchase_day_6,
+ purchase_day_7,
+ purchase_day_8,
+ purchase_day_9,
+ purchase_day_10,
+ purchase_day_11,
+ purchase_day_12,
+ purchase_day_13,
+ purchase_day_14,
+ purchase_day_15_30
+)
SELECT DISTINCT
-- Selects the current timestamp and assigns it to the column processed_timestamp
CURRENT_TIMESTAMP() AS processed_timestamp,
diff --git a/sql/query/invoke_backfill_user_dimensions.sqlx b/sql/query/invoke_backfill_user_dimensions.sqlx
index c27dd299..6c81b412 100644
--- a/sql/query/invoke_backfill_user_dimensions.sqlx
+++ b/sql/query/invoke_backfill_user_dimensions.sqlx
@@ -122,7 +122,31 @@ CREATE OR REPLACE TEMP TABLE events_users as (
;
-- Inserting aggregated user data into the target table.
-INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}`
+INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}` (
+ processed_timestamp,
+ feature_date,
+ user_pseudo_id,
+ user_id,
+ user_ltv_revenue,
+ device_category,
+ device_mobile_brand_name,
+ device_mobile_model_name,
+ device_os,
+ device_language,
+ device_web_browser,
+ geo_sub_continent,
+ geo_country,
+ geo_region,
+ geo_city,
+ geo_metro,
+ last_traffic_source_medium,
+ last_traffic_source_name,
+ last_traffic_source_source,
+ first_traffic_source_medium,
+ first_traffic_source_name,
+ first_traffic_source_source,
+ has_signed_in_with_user_id
+)
SELECT DISTINCT
-- Timestamp of the data processing
CURRENT_TIMESTAMP() AS processed_timestamp,
diff --git a/sql/query/invoke_backfill_user_lifetime_dimensions.sqlx b/sql/query/invoke_backfill_user_lifetime_dimensions.sqlx
index 4001878f..b05611e0 100644
--- a/sql/query/invoke_backfill_user_lifetime_dimensions.sqlx
+++ b/sql/query/invoke_backfill_user_lifetime_dimensions.sqlx
@@ -137,7 +137,31 @@ CREATE OR REPLACE TEMP TABLE events_users as (
-- This code block inserts data into the specified table, combining information from the "events_users" table
-- and the "user_dimensions_event_session_scoped" table.
-- It aggregates user-level features for each user and date.
-INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}`
+INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}` (
+ processed_timestamp,
+ feature_date,
+ user_pseudo_id,
+ user_id,
+ user_ltv_revenue,
+ device_category,
+ device_mobile_brand_name,
+ device_mobile_model_name,
+ device_os,
+ device_language,
+ device_web_browser,
+ geo_sub_continent,
+ geo_country,
+ geo_region,
+ geo_city,
+ geo_metro,
+ last_traffic_source_medium,
+ last_traffic_source_name,
+ last_traffic_source_source,
+ first_traffic_source_medium,
+ first_traffic_source_name,
+ first_traffic_source_source,
+ has_signed_in_with_user_id
+)
SELECT DISTINCT
-- The current timestamp.
CURRENT_TIMESTAMP() AS processed_timestamp,
diff --git a/sql/query/invoke_backfill_user_lookback_metrics.sqlx b/sql/query/invoke_backfill_user_lookback_metrics.sqlx
index 25e3566b..37bd4563 100644
--- a/sql/query/invoke_backfill_user_lookback_metrics.sqlx
+++ b/sql/query/invoke_backfill_user_lookback_metrics.sqlx
@@ -230,7 +230,25 @@ AND D.device_os IS NOT NULL
-- This code is part of a larger process for building a machine learning model that predicts
-- user behavior based on their past activity. The features generated by this code can be used
-- as input to the model, helping it learn patterns and make predictions.
-INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}`
+INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}`Β (
+ processed_timestamp,
+ feature_date,
+ user_pseudo_id,
+ active_users_past_1_7_day,
+ active_users_past_8_14_day,
+ purchases_past_1_7_day,
+ purchases_past_8_14_day,
+ visits_past_1_7_day,
+ visits_past_8_14_day,
+ view_items_past_1_7_day,
+ view_items_past_8_14_day,
+ add_to_carts_past_1_7_day,
+ add_to_carts_past_8_14_day,
+ checkouts_past_1_7_day,
+ checkouts_past_8_14_day,
+ ltv_revenue_past_1_7_day,
+ ltv_revenue_past_7_15_day
+)
SELECT DISTINCT
-- Timestamp indicating when the data was processed
CURRENT_TIMESTAMP() AS processed_timestamp,
diff --git a/sql/query/invoke_backfill_user_rolling_window_lead_metrics.sqlx b/sql/query/invoke_backfill_user_rolling_window_lead_metrics.sqlx
new file mode 100644
index 00000000..b8b364cc
--- /dev/null
+++ b/sql/query/invoke_backfill_user_rolling_window_lead_metrics.sqlx
@@ -0,0 +1,127 @@
+-- Copyright 2023 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- http://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- This SQL code defines a series of temporary tables to calculate and store user engagement metrics based on
+-- rolling window aggregations. These tables are then used to populate a target table with daily user engagement features.
+
+DECLARE max_date DATE;
+DECLARE min_date DATE;
+-- Sets max_date to the latest event_date from the event table, minus an offset specified by the interval_max_date
+SET max_date = (SELECT DATE_SUB(MAX(event_date), INTERVAL {{interval_max_date}} DAY) FROM `{{mds_project_id}}.{{mds_dataset}}.event`);
+-- Sets min_date to the earliest event_date from the event table, plus an offset specified by the interval_min_date
+SET min_date = (SELECT DATE_ADD(MIN(event_date), INTERVAL {{interval_min_date}} DAY) FROM `{{mds_project_id}}.{{mds_dataset}}.event`);
+
+-- If min_date > maximum event_date OR max_date < minimum event_date, then set min_date for the max event_date and set max_date for the min event_date
+IF min_date >= (SELECT MAX(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`) OR max_date <= (SELECT MIN(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`) OR min_date >= max_date THEN
+ SET min_date = (SELECT MIN(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`);
+ SET max_date = (SELECT MAX(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`);
+END IF;
+
+-- This code block acts as a safeguard to ensure that the min_date and max_date used for further analysis are always within the bounds of the actual data available in the table.
+-- It prevents situations where calculations might mistakenly consider dates beyond the real data range, which could lead to errors or misleading results.
+IF max_date > (SELECT MAX(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`) OR min_date < (SELECT MIN(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`) THEN
+ SET min_date = (SELECT MIN(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`);
+ SET max_date = (SELECT MAX(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`);
+END IF;
+
+-- This section determines the date range for analysis and creates a temporary table dates_interval with distinct date intervals.
+CREATE OR REPLACE TEMP TABLE dates_interval as (
+ SELECT DISTINCT
+ -- Select each distinct event_date as 'input_date', representing the current date in the analysis
+ event_date as input_date,
+ -- Calculate the 'end_date' by subtracting a specified interval from the 'input_date'
+ DATE_SUB(event_date, INTERVAL {{interval_end_date}} DAY) as end_date
+ FROM `{{mds_project_id}}.{{mds_dataset}}.event`
+ WHERE event_date BETWEEN min_date AND max_date
+ ORDER BY input_date DESC
+);
+
+{% for feature in short_list_features %}
+-- Run these windows aggregations every day. For each date in training and inference date ranges.
+-- All users metrics: 1β5-day {{feature.feature_name}} users
+CREATE OR REPLACE TEMP TABLE rolling_{{feature.feature_name}}_past_days AS (
+ SELECT
+ user_pseudo_id,
+ input_date as feature_date,
+ -- Number of times the user has {{feature.feature_name}} in the past 1st day
+ MAX(COUNT(DISTINCT CASE DATE_DIFF(input_date, event_date, DAY) = 1 WHEN TRUE THEN event_timestamp END)) OVER(PARTITION BY user_pseudo_id, input_date) AS {{feature.feature_name}}_past_1_day,
+ -- Number of times the user has {{feature.feature_name}} in the past 2nd day
+ MAX(COUNT(DISTINCT CASE DATE_DIFF(input_date, event_date, DAY) = 2 WHEN TRUE THEN event_timestamp END)) OVER(PARTITION BY user_pseudo_id, input_date) AS {{feature.feature_name}}_past_2_day,
+ -- Number of times the user has {{feature.feature_name}} in the past 3rd day
+ MAX(COUNT(DISTINCT CASE DATE_DIFF(input_date, event_date, DAY) = 3 WHEN TRUE THEN event_timestamp END)) OVER(PARTITION BY user_pseudo_id, input_date) AS {{feature.feature_name}}_past_3_day,
+ -- Number of times the user has {{feature.feature_name}} in the past 4th day
+ MAX(COUNT(DISTINCT CASE DATE_DIFF(input_date, event_date, DAY) = 4 WHEN TRUE THEN event_timestamp END)) OVER(PARTITION BY user_pseudo_id, input_date) AS {{feature.feature_name}}_past_4_day,
+ -- Number of times the user has {{feature.feature_name}} in the past 5th day
+ MAX(COUNT(DISTINCT CASE DATE_DIFF(input_date, event_date, DAY) = 5 WHEN TRUE THEN event_timestamp END)) OVER(PARTITION BY user_pseudo_id, input_date) AS {{feature.feature_name}}_past_5_day
+ FROM `{{mds_project_id}}.{{mds_dataset}}.event` as E
+ CROSS JOIN dates_interval as DI
+ -- Filter events to be within the defined date range
+ WHERE E.event_date BETWEEN DI.end_date AND DI.input_date
+ -- Filter for {{feature.feature_name}} events
+ AND event_name='{{feature.feature_name}}'
+ -- Ensure valid session ID
+ AND ga_session_id IS NOT NULL
+ -- Group the results by user pseudo ID and feature date
+ GROUP BY user_pseudo_id, feature_date
+);
+
+{% endfor %}
+
+-- All users in the platform
+-- This code creates a temporary table that contains a distinct list of user pseudo IDs
+-- and their corresponding feature dates, filtering for events with valid session IDs,
+-- device operating systems, and falling within the specified date range.
+CREATE OR REPLACE TEMP TABLE events_users as (
+ SELECT DISTINCT
+ Users.user_pseudo_id,
+ DI.input_date as feature_date
+ FROM `{{mds_project_id}}.{{mds_dataset}}.event` Users
+ INNER JOIN `{{mds_project_id}}.{{mds_dataset}}.device` as D
+ ON Users.device_type_id = D.device_type_id
+ CROSS JOIN dates_interval as DI
+ WHERE Users.ga_session_id IS NOT NULL
+ AND Users.event_date BETWEEN DI.end_date AND DI.input_date
+ AND D.device_os IS NOT NULL
+);
+
+-- This code block inserts data into a table, combining information from the events_users
+-- table and several temporary tables containing rolling window features. The resulting data
+-- represents user-level features for each user and date, capturing their past activity within
+-- different time windows.
+INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}` (
+ processed_timestamp,
+ feature_date,
+ user_pseudo_id{% for feature in short_list_features %},
+ {{feature.feature_name}}_past_1_day,
+ {{feature.feature_name}}_past_2_day,
+ {{feature.feature_name}}_past_3_day,
+ {{feature.feature_name}}_past_4_day,
+ {{feature.feature_name}}_past_5_day{% endfor %}
+)
+ SELECT DISTINCT
+ -- This selects the current timestamp and assigns it to the column processed_timestamp.
+ CURRENT_TIMESTAMP() AS processed_timestamp,
+ EUD.feature_date,
+ EUD.user_pseudo_id{% for feature in short_list_features %},
+ COALESCE({{feature.feature_name}}_past_1_day,0) AS {{feature.feature_name}}_past_1_day,
+ COALESCE({{feature.feature_name}}_past_2_day,0) AS {{feature.feature_name}}_past_2_day,
+ COALESCE({{feature.feature_name}}_past_3_day,0) AS {{feature.feature_name}}_past_3_day,
+ COALESCE({{feature.feature_name}}_past_4_day,0) AS {{feature.feature_name}}_past_4_day,
+ COALESCE({{feature.feature_name}}_past_5_day,0) AS {{feature.feature_name}}_past_5_day{% endfor %}
+ FROM events_users AS EUD{% for feature in short_list_features %}
+ FULL OUTER JOIN rolling_scroll_50_past_days AS {{feature.feature_name}}
+ ON EUD.user_pseudo_id = A.user_pseudo_id{% endfor %}
+ -- This filters the results to include only rows where the user_pseudo_id is not null.
+ WHERE EUD.user_pseudo_id IS NOT NULL
+ ;
\ No newline at end of file
diff --git a/sql/query/invoke_backfill_user_rolling_window_lifetime_metrics.sqlx b/sql/query/invoke_backfill_user_rolling_window_lifetime_metrics.sqlx
index 2ee219f1..b4a0a415 100644
--- a/sql/query/invoke_backfill_user_rolling_window_lifetime_metrics.sqlx
+++ b/sql/query/invoke_backfill_user_rolling_window_lifetime_metrics.sqlx
@@ -283,7 +283,50 @@ AND D.device_os IS NOT NULL
-- This code is part of a larger process for building a machine learning model that predicts
-- user behavior based on their past activity. The features generated by this code can be used
-- as input to the model, helping it learn patterns and make predictions.
-INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}`
+INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}` (
+ processed_timestamp,
+ feature_date,
+ user_pseudo_id,
+ active_users_past_1_30_day,
+ active_users_past_30_60_day,
+ active_users_past_60_90_day,
+ active_users_past_90_120_day,
+ active_users_past_120_150_day,
+ active_users_past_150_180_day,
+ purchases_past_1_30_day,
+ purchases_past_30_60_day,
+ purchases_past_60_90_day,
+ purchases_past_90_120_day,
+ purchases_past_120_150_day,
+ purchases_past_150_180_day,
+ visits_past_1_30_day,
+ visits_past_30_60_day,
+ visits_past_60_90_day,
+ visits_past_90_120_day,
+ visits_past_120_150_day,
+ visits_past_150_180_day,
+ view_items_past_1_30_day,
+ view_items_past_30_60_day,
+ view_items_past_60_90_day,
+ view_items_past_90_120_day,
+ view_items_past_120_150_day,
+ view_items_past_150_180_day,
+ add_to_carts_past_1_30_day,
+ add_to_carts_past_30_60_day,
+ add_to_carts_past_60_90_day,
+ add_to_carts_past_90_120_day,
+ add_to_carts_past_120_150_day,
+ add_to_carts_past_150_180_day,
+ checkouts_past_1_30_day,
+ checkouts_past_30_60_day,
+ checkouts_past_60_90_day,
+ checkouts_past_90_120_day,
+ checkouts_past_120_150_day,
+ checkouts_past_150_180_day,
+ ltv_revenue_past_1_30_day,
+ ltv_revenue_past_30_90_day,
+ ltv_revenue_past_90_180_day
+)
SELECT DISTINCT
-- This selects the current timestamp and assigns it to the column processed_timestamp.
CURRENT_TIMESTAMP() AS processed_timestamp,
diff --git a/sql/query/invoke_backfill_user_rolling_window_metrics.sqlx b/sql/query/invoke_backfill_user_rolling_window_metrics.sqlx
index 9317225a..be0a0860 100644
--- a/sql/query/invoke_backfill_user_rolling_window_metrics.sqlx
+++ b/sql/query/invoke_backfill_user_rolling_window_metrics.sqlx
@@ -272,7 +272,65 @@ CREATE OR REPLACE TEMP TABLE events_users as (
-- table and several temporary tables containing rolling window features. The resulting data
-- represents user-level features for each user and date, capturing their past activity within
-- different time windows.
-INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}`
+INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}` (
+ processed_timestamp,
+ feature_date,
+ user_pseudo_id,
+ active_users_past_1_day,
+ active_users_past_2_day,
+ active_users_past_3_day,
+ active_users_past_4_day,
+ active_users_past_5_day,
+ active_users_past_6_day,
+ active_users_past_7_day,
+ active_users_past_8_14_day,
+ active_users_past_15_30_day,
+ purchases_past_1_day,
+ purchases_past_2_day,
+ purchases_past_3_day,
+ purchases_past_4_day,
+ purchases_past_5_day,
+ purchases_past_6_day,
+ purchases_past_7_day,
+ purchases_past_8_14_day,
+ purchases_past_15_30_day,
+ visits_past_1_day,
+ visits_past_2_day,
+ visits_past_3_day,
+ visits_past_4_day,
+ visits_past_5_day,
+ visits_past_6_day,
+ visits_past_7_day,
+ visits_past_8_14_day,
+ visits_past_15_30_day,
+ view_items_past_1_day,
+ view_items_past_2_day,
+ view_items_past_3_day,
+ view_items_past_4_day,
+ view_items_past_5_day,
+ view_items_past_6_day,
+ view_items_past_7_day,
+ view_items_past_8_14_day,
+ view_items_past_15_30_day,
+ add_to_carts_past_1_day,
+ add_to_carts_past_2_day,
+ add_to_carts_past_3_day,
+ add_to_carts_past_4_day,
+ add_to_carts_past_5_day,
+ add_to_carts_past_6_day,
+ add_to_carts_past_7_day,
+ add_to_carts_past_8_14_day,
+ add_to_carts_past_15_30_day,
+ checkouts_past_1_day,
+ checkouts_past_2_day,
+ checkouts_past_3_day,
+ checkouts_past_4_day,
+ checkouts_past_5_day,
+ checkouts_past_6_day,
+ checkouts_past_7_day,
+ checkouts_past_8_14_day,
+ checkouts_past_15_30_day
+)
SELECT DISTINCT
-- This selects the current timestamp and assigns it to the column processed_timestamp.
CURRENT_TIMESTAMP() AS processed_timestamp,
diff --git a/sql/query/invoke_backfill_user_scoped_lifetime_metrics.sqlx b/sql/query/invoke_backfill_user_scoped_lifetime_metrics.sqlx
index bfb93869..ed4bf30e 100644
--- a/sql/query/invoke_backfill_user_scoped_lifetime_metrics.sqlx
+++ b/sql/query/invoke_backfill_user_scoped_lifetime_metrics.sqlx
@@ -163,7 +163,35 @@ CREATE OR REPLACE TEMP TABLE first_purchasers as (
);
-- This SQL code calculates various user engagement and revenue metrics at a daily level and inserts the results into a target table. It leverages several temporary tables created earlier in the script to aggregate data efficiently.
-INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}`
+INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}` (
+ processed_timestamp,
+ feature_date,
+ lifetime_purchasers_users,
+ lifetime_average_daily_purchasers,
+ lifetime_active_users,
+ lifetime_DAU,
+ lifetime_MAU,
+ lifetime_WAU,
+ lifetime_dau_per_mau,
+ lifetime_dau_per_wau,
+ lifetime_wau_per_mau,
+ lifetime_users_engagement_duration_seconds,
+ lifetime_average_engagement_time,
+ lifetime_average_engagement_time_per_session,
+ lifetime_average_sessions_per_user,
+ lifetime_ARPPU,
+ lifetime_ARPU,
+ lifetime_average_daily_revenue,
+ lifetime_max_daily_revenue,
+ lifetime_min_daily_revenue,
+ lifetime_new_users,
+ lifetime_returning_users,
+ lifetime_first_time_purchasers,
+ lifetime_first_time_purchaser_conversion,
+ lifetime_first_time_purchasers_per_new_user,
+ lifetime_avg_user_conversion_rate,
+ lifetime_avg_session_conversion_rate
+)
SELECT
-- Records the current timestamp when the query is executed.
CURRENT_TIMESTAMP() AS processed_timestamp,
diff --git a/sql/query/invoke_backfill_user_scoped_metrics.sqlx b/sql/query/invoke_backfill_user_scoped_metrics.sqlx
index 3cc45b49..c5252519 100644
--- a/sql/query/invoke_backfill_user_scoped_metrics.sqlx
+++ b/sql/query/invoke_backfill_user_scoped_metrics.sqlx
@@ -183,7 +183,35 @@ CREATE OR REPLACE TEMP TABLE new_users_ as (
);
-- Insert data into the target table after calculating various user engagement and revenue metrics.
-INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}`
+INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}` (
+ processed_timestamp,
+ feature_date,
+ purchasers_users,
+ average_daily_purchasers,
+ active_users,
+ DAU,
+ MAU,
+ WAU,
+ dau_per_mau,
+ dau_per_wau,
+ wau_per_mau,
+ users_engagement_duration_seconds,
+ average_engagement_time,
+ average_engagement_time_per_session,
+ average_sessions_per_user,
+ ARPPU,
+ ARPU,
+ average_daily_revenue,
+ max_daily_revenue,
+ min_daily_revenue,
+ new_users,
+ returning_users,
+ first_time_purchasers,
+ first_time_purchaser_conversion,
+ first_time_purchasers_per_new_user,
+ avg_user_conversion_rate,
+ avg_session_conversion_rate
+)
SELECT DISTINCT
-- Record the current timestamp when the query is executed.
CURRENT_TIMESTAMP() AS processed_timestamp,
diff --git a/sql/query/invoke_backfill_user_scoped_segmentation_metrics.sqlx b/sql/query/invoke_backfill_user_scoped_segmentation_metrics.sqlx
index c6f03aaa..251dfead 100644
--- a/sql/query/invoke_backfill_user_scoped_segmentation_metrics.sqlx
+++ b/sql/query/invoke_backfill_user_scoped_segmentation_metrics.sqlx
@@ -136,7 +136,35 @@ GROUP BY feature_date
);
-- This SQL code calculates various user engagement and revenue metrics at a daily level and inserts the results into a target table. It leverages several temporary tables created earlier in the script to aggregate data efficiently.
-INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}`
+INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}` (
+ processed_timestamp,
+ feature_date,
+ purchasers_users,
+ average_daily_purchasers,
+ active_users,
+ DAU,
+ MAU,
+ WAU,
+ dau_per_mau,
+ dau_per_wau,
+ wau_per_mau,
+ users_engagement_duration_seconds,
+ average_engagement_time,
+ average_engagement_time_per_session,
+ average_sessions_per_user,
+ ARPPU,
+ ARPU,
+ average_daily_revenue,
+ max_daily_revenue,
+ min_daily_revenue,
+ new_users,
+ returning_users,
+ first_time_purchasers,
+ first_time_purchaser_conversion,
+ first_time_purchasers_per_new_user,
+ avg_user_conversion_rate,
+ avg_session_conversion_rate
+)
SELECT
-- Records the current timestamp when the query is executed.
CURRENT_TIMESTAMP() AS processed_timestamp,
diff --git a/sql/query/invoke_backfill_user_segmentation_dimensions.sqlx b/sql/query/invoke_backfill_user_segmentation_dimensions.sqlx
index be402415..cf2dc7ff 100644
--- a/sql/query/invoke_backfill_user_segmentation_dimensions.sqlx
+++ b/sql/query/invoke_backfill_user_segmentation_dimensions.sqlx
@@ -95,7 +95,31 @@ CREATE OR REPLACE TEMP TABLE events_users as (
-- This code snippet performs a complex aggregation and insertion operation. It combines data from two temporary tables,
-- calculates various user-level dimensions, and inserts the aggregated results into a target table. The use of window functions,
-- approximate aggregation, and careful joining ensures that the query is efficient and produces meaningful insights from the data.
-INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}`
+INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}` (
+ processed_timestamp,
+ feature_date,
+ user_pseudo_id,
+ user_id,
+ user_ltv_revenue,
+ device_category,
+ device_mobile_brand_name,
+ device_mobile_model_name,
+ device_os,
+ device_language,
+ device_web_browser,
+ geo_sub_continent,
+ geo_country,
+ geo_region,
+ geo_city,
+ geo_metro,
+ last_traffic_source_medium,
+ last_traffic_source_name,
+ last_traffic_source_source,
+ first_traffic_source_medium,
+ first_traffic_source_name,
+ first_traffic_source_source,
+ has_signed_in_with_user_id
+)
-- The DISTINCT keyword ensures that only unique rows are inserted, eliminating any potential duplicates.
SELECT DISTINCT
CURRENT_TIMESTAMP() AS processed_timestamp,
diff --git a/sql/query/invoke_backfill_user_session_event_aggregated_metrics.sqlx b/sql/query/invoke_backfill_user_session_event_aggregated_metrics.sqlx
index 7ba0e2f7..4c6f3373 100644
--- a/sql/query/invoke_backfill_user_session_event_aggregated_metrics.sqlx
+++ b/sql/query/invoke_backfill_user_session_event_aggregated_metrics.sqlx
@@ -354,7 +354,45 @@ CREATE OR REPLACE TEMP TABLE events_users_days as (
-- user_events_per_day_event_scoped (UEPDES): Contains user-level event metrics aggregated on a daily basis. Metrics include add_to_carts, cart_to_view_rate, checkouts, ecommerce_purchases, etc.
-- repeated_purchase (R): Stores information about whether a user has made previous purchases, indicated by the how_many_purchased_before column.
-- cart_to_purchase (CP): Contains a flag (has_abandoned_cart) indicating whether a user abandoned their cart on a given day.
-INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}`
+INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}` (
+ processed_timestamp,
+ feature_date,
+ user_pseudo_id,
+ engagement_rate,
+ engaged_sessions_per_user,
+ session_conversion_rate,
+ bounces,
+ bounce_rate_per_user,
+ sessions_per_user,
+ avg_views_per_session,
+ sum_engagement_time_seconds,
+ avg_engagement_time_seconds,
+ new_visits,
+ returning_visits,
+ add_to_carts,
+ cart_to_view_rate,
+ checkouts,
+ ecommerce_purchases,
+ ecommerce_quantity,
+ ecommerce_revenue,
+ item_revenue,
+ item_quantity,
+ item_refund_amount,
+ item_view_events,
+ items_clicked_in_promotion,
+ items_clicked_in_list,
+ items_checked_out,
+ items_added_to_cart,
+ item_list_click_events,
+ item_list_view_events,
+ purchase_revenue,
+ purchase_to_view_rate,
+ refunds,
+ transactions_per_purchaser,
+ user_conversion_rate,
+ how_many_purchased_before,
+ has_abandoned_cart
+)
SELECT
CURRENT_TIMESTAMP() AS processed_timestamp,
EUD.feature_date,
diff --git a/sql/query/invoke_churn_propensity_training_preparation.sqlx b/sql/query/invoke_churn_propensity_training_preparation.sqlx
index 632fb03b..10a48ef4 100644
--- a/sql/query/invoke_churn_propensity_training_preparation.sqlx
+++ b/sql/query/invoke_churn_propensity_training_preparation.sqlx
@@ -57,14 +57,14 @@ SET churners = (SELECT COUNT(DISTINCT user_pseudo_id)
);
-- Setting Training Dates
--- If there are churners in the training set, then keep the user-defined dates, or else set
--- the start and end dates instead.
+-- If there are churners in the training set, then keep the calculated dates, or else set
+-- the start and end dates to a fixed interval preventing `train_start_date` and `train_end_date` from being NULL.
IF churners > 0 THEN
- SET train_start_date = GREATEST(train_start_date, min_date);
- SET train_end_date = LEAST(train_end_date, max_date);
-ELSE
SET train_start_date = min_date;
SET train_end_date = max_date;
+ELSE
+ SET train_start_date = DATE_SUB(CURRENT_DATE(), INTERVAL 3 YEAR);
+ SET train_end_date = DATE_SUB(CURRENT_DATE(), INTERVAL 5 DAY);
END IF;
-- Finally, the script calls a stored procedure, passing the adjusted training dates and split numbers as arguments.
diff --git a/sql/query/invoke_customer_lifetime_value_training_preparation.sqlx b/sql/query/invoke_customer_lifetime_value_training_preparation.sqlx
index 597dcac8..cfbad806 100644
--- a/sql/query/invoke_customer_lifetime_value_training_preparation.sqlx
+++ b/sql/query/invoke_customer_lifetime_value_training_preparation.sqlx
@@ -54,17 +54,18 @@ SET validation_split_end_number = {{validation_split_end_number}};
-- IF there are no users in the time interval selected, then set "train_start_date" and "train_end_date" as "max_date" and "min_date".
SET purchasers = (SELECT COUNT(DISTINCT user_pseudo_id)
FROM `{{mds_project_id}}.{{mds_dataset}}.event`
- WHERE event_date BETWEEN train_start_date AND train_end_date
+ WHERE event_date BETWEEN min_date AND max_date
);
--- If there are purchasers no changes to the train_start_date and train_end_date
--- Else, expand the interval, hopefully a purchaser will be in the interval
+-- Setting Training Dates
+-- If there are churners in the training set, then keep the calculated dates, or else set
+-- the start and end dates to a fixed interval preventing `train_start_date` and `train_end_date` from being NULL.
IF purchasers > 0 THEN
- SET train_start_date = train_start_date;
- SET train_end_date = train_end_date;
-ELSE
SET train_start_date = min_date;
SET train_end_date = max_date;
+ELSE
+ SET train_start_date = DATE_SUB(CURRENT_DATE(), INTERVAL 3 YEAR);
+ SET train_end_date = DATE_SUB(CURRENT_DATE(), INTERVAL 5 DAY);
END IF;
-- Finally, the script calls a stored procedure, passing the adjusted training dates and split numbers as arguments. This stored procedure likely handles the actual data preparation for the model.
diff --git a/sql/query/invoke_lead_score_propensity_inference_preparation.sqlx b/sql/query/invoke_lead_score_propensity_inference_preparation.sqlx
new file mode 100644
index 00000000..54e937d7
--- /dev/null
+++ b/sql/query/invoke_lead_score_propensity_inference_preparation.sqlx
@@ -0,0 +1,23 @@
+-- Copyright 2023 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- http://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- This script determines the current date and then passes it as an argument to a
+-- stored procedure in your BigQuery project. This pattern is commonly used when
+-- you want a stored procedure to perform operations or calculations that are
+-- relevant to the current date, such as data processing, analysis, or reporting tasks.
+
+DECLARE inference_date DATE DEFAULT NULL;
+SET inference_date = CURRENT_DATE();
+
+CALL `{{project_id}}.{{dataset}}.{{stored_procedure}}`(inference_date);
diff --git a/sql/query/invoke_lead_score_propensity_label.sqlx b/sql/query/invoke_lead_score_propensity_label.sqlx
new file mode 100644
index 00000000..f4288278
--- /dev/null
+++ b/sql/query/invoke_lead_score_propensity_label.sqlx
@@ -0,0 +1,39 @@
+-- Copyright 2023 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- http://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- This script sets up a date range, calls a stored procedure with this range and a variable to
+-- store a result, and then returns the result of the stored procedure. This pattern is common
+-- for orchestrating data processing tasks within BigQuery using stored procedures.
+
+DECLARE input_date DATE;
+DECLARE end_date DATE;
+DECLARE users_added INT64 DEFAULT NULL;
+
+SET end_date= CURRENT_DATE();
+SET input_date= (SELECT DATE_SUB(end_date, INTERVAL {{interval_input_date}} DAY));
+
+-- This code block ensures that the end_date used in subsequent operations is not later than one day after the latest available data in
+-- the specified events table. This prevents potential attempts to process data for a date range that extends beyond the actual data availability.
+IF (SELECT DATE_SUB(end_date, INTERVAL 1 DAY)) > (SELECT MAX(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`) THEN
+ SET end_date = (SELECT DATE_ADD(MAX(event_date), INTERVAL 1 DAY) FROM `{{mds_project_id}}.{{mds_dataset}}.event`);
+END IF;
+
+-- This code block ensures that the input_date used in subsequent operations is not before the earliest available data in the
+-- specified events table. This prevents potential errors or unexpected behavior that might occur when trying to process data
+-- for a date range that precedes the actual data availability.
+IF input_date < (SELECT MIN(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`) THEN
+ SET input_date = (SELECT DATE_ADD(MIN(event_date), INTERVAL 1 DAY) FROM `{{mds_project_id}}.{{mds_dataset}}.event`);
+END IF;
+
+CALL `{{project_id}}.{{dataset}}.{{stored_procedure}}`(input_date, end_date, users_added);
\ No newline at end of file
diff --git a/sql/query/invoke_lead_score_propensity_training_preparation.sqlx b/sql/query/invoke_lead_score_propensity_training_preparation.sqlx
new file mode 100644
index 00000000..3d515348
--- /dev/null
+++ b/sql/query/invoke_lead_score_propensity_training_preparation.sqlx
@@ -0,0 +1,73 @@
+-- Copyright 2023 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- http://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- This script intelligently determines the optimal date range for training a purchase
+-- propensity model by considering user-defined parameters and the availability of purchase
+-- events within the dataset. It ensures that the training data includes purchase events if
+-- they exist within the specified bounds.
+
+-- Intended start and end dates for training data
+-- Initializing Training Dates
+DECLARE train_start_date DATE DEFAULT NULL;
+DECLARE train_end_date DATE DEFAULT NULL;
+
+-- Control data splitting for training and validation (likely used in a subsequent process).
+DECLARE train_split_end_number INT64 DEFAULT NULL;
+DECLARE validation_split_end_number INT64 DEFAULT NULL;
+
+-- Will store the count of distinct users who made a {{target_event}} within a given period.
+DECLARE {{target_event}}_users INT64 DEFAULT NULL;
+
+-- Used to store the maximum and minimum event dates from the source data.
+DECLARE max_date DATE;
+DECLARE min_date DATE;
+
+-- Determining Maximum and Minimum Dates
+SET max_date = (SELECT DATE_SUB(MAX(event_date), INTERVAL {{interval_max_date}} DAY) FROM `{{mds_project_id}}.{{mds_dataset}}.event`);
+SET min_date = (SELECT DATE_ADD(MIN(event_date), INTERVAL {{interval_min_date}} DAY) FROM `{{mds_project_id}}.{{mds_dataset}}.event`);
+
+-- If min_date > maximum event_date OR max_date < minimum event_date, then set min_date for the min event_date and set max_date for the max event_date
+IF min_date >= (SELECT MAX(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`) OR max_date <= (SELECT MIN(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`) OR min_date >= max_date THEN
+ SET min_date = (SELECT MIN(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`);
+ SET max_date = (SELECT MAX(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`);
+END IF;
+
+-- Setting Split Numbers
+-- Sets the train_split_end_number to a user-defined value. This value likely determines the proportion of data used for training.
+SET train_split_end_number = {{train_split_end_number}}; -- If you want 60% for training use number 5. If you want 80% use number 7.
+-- Sets the validation_split_end_number to a user-defined value, controlling the proportion of data used for validation.
+SET validation_split_end_number = {{validation_split_end_number}};
+
+-- This crucial step counts distinct users who have an event named '{{target_event}}' within the initially set training date range.
+-- IF there are no users with {{target_event}} event in the time interval selected, then set "train_start_date" and "train_end_date" as "max_date" and "min_date".
+SET {{target_event}}_users = (SELECT COUNT(DISTINCT user_pseudo_id)
+ FROM `{{mds_project_id}}.{{mds_dataset}}.event`
+ WHERE event_name = '{{target_event}}' AND
+ event_date BETWEEN min_date AND max_date
+ );
+
+-- Setting Training Dates
+-- If there are {{target_event}}_users in the training set, then keep the calculated dates, or else set
+-- the start and end dates to a fixed interval preventing `train_start_date` and `train_end_date` from being NULL.
+IF {{target_event}}_users > 0 THEN
+ SET train_start_date = min_date;
+ SET train_end_date = max_date;
+ELSE
+ SET train_start_date = DATE_SUB(CURRENT_DATE(), INTERVAL 3 YEAR);
+ SET train_end_date = DATE_SUB(CURRENT_DATE(), INTERVAL 5 DAY);
+END IF;
+
+-- Finally, the script calls a stored procedure, passing the adjusted training dates and split numbers as arguments. This stored procedure
+-- handles the actual data preparation for the lead score propensity model.
+CALL `{{project_id}}.{{dataset}}.{{stored_procedure}}`(train_start_date, train_end_date, train_split_end_number, validation_split_end_number);
diff --git a/sql/query/invoke_purchase_propensity_training_preparation.sqlx b/sql/query/invoke_purchase_propensity_training_preparation.sqlx
index 4d2eab86..b8738465 100644
--- a/sql/query/invoke_purchase_propensity_training_preparation.sqlx
+++ b/sql/query/invoke_purchase_propensity_training_preparation.sqlx
@@ -54,17 +54,18 @@ SET validation_split_end_number = {{validation_split_end_number}};
SET purchasers = (SELECT COUNT(DISTINCT user_pseudo_id)
FROM `{{mds_project_id}}.{{mds_dataset}}.event`
WHERE event_name = 'purchase' AND
- event_date BETWEEN train_start_date AND train_end_date
+ event_date BETWEEN min_date AND max_date
);
--- If there are purchasers no changes to the train_start_date and train_end_date
--- Else, expand the interval, hopefully a purchaser will be in the interval
+-- Setting Training Dates
+-- If there are purchasers in the training set, then keep the calculated dates, or else set
+-- the start and end dates to a fixed interval preventing `train_start_date` and `train_end_date` from being NULL.
IF purchasers > 0 THEN
- SET train_start_date = GREATEST(train_start_date, min_date);
- SET train_end_date = LEAST(train_end_date, max_date);
-ELSE
SET train_start_date = min_date;
SET train_end_date = max_date;
+ELSE
+ SET train_start_date = DATE_SUB(CURRENT_DATE(), INTERVAL 3 YEAR);
+ SET train_end_date = DATE_SUB(CURRENT_DATE(), INTERVAL 5 DAY);
END IF;
-- Finally, the script calls a stored procedure, passing the adjusted training dates and split numbers as arguments. This stored procedure
diff --git a/sql/query/invoke_user_rolling_window_lead_metrics.sqlx b/sql/query/invoke_user_rolling_window_lead_metrics.sqlx
new file mode 100644
index 00000000..e469a2d7
--- /dev/null
+++ b/sql/query/invoke_user_rolling_window_lead_metrics.sqlx
@@ -0,0 +1,28 @@
+-- Copyright 2023 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- http://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- This script sets up a date range, calls a stored procedure with this range and a variable to
+-- store a result, and then returns the result of the stored procedure. This pattern is common
+-- for orchestrating data processing tasks within BigQuery using stored procedures.
+
+DECLARE input_date DATE;
+DECLARE end_date DATE;
+DECLARE users_added INT64 DEFAULT NULL;
+
+SET input_date= CURRENT_DATE();
+SET end_date= (SELECT DATE_SUB(input_date, INTERVAL {{interval_end_date}} DAY));
+
+CALL `{{project_id}}.{{dataset}}.{{stored_procedure}}`(input_date, end_date, users_added);
+
+SELECT users_added;
\ No newline at end of file
diff --git a/sql/schema/table/lead_score_propensity_inference_preparation.json b/sql/schema/table/lead_score_propensity_inference_preparation.json
new file mode 100644
index 00000000..5fc9e6ec
--- /dev/null
+++ b/sql/schema/table/lead_score_propensity_inference_preparation.json
@@ -0,0 +1,337 @@
+[
+ {
+ "name": "user_pseudo_id",
+ "type": "STRING",
+ "description": "The user pseudo identifier"
+ },
+ {
+ "name": "user_id",
+ "type": "STRING",
+ "description": "The user identifier when the user is logged in"
+ },
+ {
+ "name": "feature_date",
+ "type": "DATE",
+ "description": "Date that serves as the basis for the calculation of the features"
+ },
+ {
+ "name": "user_ltv_revenue",
+ "type": "FLOAT",
+ "description": "The current customer lifetime value revenue of the user"
+ },
+ {
+ "name": "device_category",
+ "type": "STRING",
+ "description": "The device category the user last accessed"
+ },
+ {
+ "name": "device_mobile_brand_name",
+ "type": "STRING",
+ "description": "The device mobile brand name the user last accessed"
+ },
+ {
+ "name": "device_mobile_model_name",
+ "type": "STRING",
+ "description": "The device mobile model name the user last accessed"
+ },
+ {
+ "name": "device_os",
+ "type": "STRING",
+ "description": "The device operating system the user last accessed"
+ },
+ {
+ "name": "device_language",
+ "type": "STRING",
+ "description": "The device language the user last accessed"
+ },
+ {
+ "name": "device_web_browser",
+ "type": "STRING",
+ "description": "The device web browser the user last accessed"
+ },
+ {
+ "name": "geo_sub_continent",
+ "type": "STRING",
+ "description": "The geographic subcontinent the user last accessed from"
+ },
+ {
+ "name": "geo_country",
+ "type": "STRING",
+ "description": "The geographic country the user last accessed from"
+ },
+ {
+ "name": "geo_region",
+ "type": "STRING",
+ "description": "The geographic region the user last accessed from"
+ },
+ {
+ "name": "geo_city",
+ "type": "STRING",
+ "description": "The geographic city the user last accessed from"
+ },
+ {
+ "name": "geo_metro",
+ "type": "STRING",
+ "description": "The geographic metropolitan area the user last accessed from"
+ },
+ {
+ "name": "last_traffic_source_medium",
+ "type": "STRING",
+ "description": "The last traffic source medium the user has been acquired"
+ },
+ {
+ "name": "last_traffic_source_name",
+ "type": "STRING",
+ "description": "The last traffic source name the user has been acquired"
+ },
+ {
+ "name": "last_traffic_source_source",
+ "type": "STRING",
+ "description": "The last traffic source source the user has been acquired"
+ },
+ {
+ "name": "first_traffic_source_medium",
+ "type": "STRING",
+ "description": "The first traffic source medium the user has been acquired"
+ },
+ {
+ "name": "first_traffic_source_name",
+ "type": "STRING",
+ "description": "The first traffic source name the user has been acquired"
+ },
+ {
+ "name": "first_traffic_source_source",
+ "type": "STRING",
+ "description": "The first traffic source source the user has been acquired"
+ },
+ {
+ "name": "has_signed_in_with_user_id",
+ "type": "BOOLEAN",
+ "description": "A boolean indicating whether the user has signed in with an user id"
+ },
+ {
+ "name": "scroll_50_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has been active in the past 1 day"
+ },
+ {
+ "name": "scroll_50_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has been active in the past 2nd day"
+ },
+ {
+ "name": "scroll_50_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has been active in the past 3rd day"
+ },
+ {
+ "name": "scroll_50_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has been active in the past 4th day"
+ },
+ {
+ "name": "scroll_50_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has been active in the past 5th day"
+ },
+ {
+ "name": "scroll_90_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has scrolled 90p pages in the past day"
+ },
+ {
+ "name": "scroll_90_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has scrolled 90p pages in the past 2nd day"
+ },
+ {
+ "name": "scroll_90_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has scrolled 90p pages in the past 3rd day"
+ },
+ {
+ "name": "scroll_90_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has scrolled 90p pages in the past 4th day"
+ },
+ {
+ "name": "scroll_90_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has scrolled 90p pages in the past 5th day"
+ },
+ {
+ "name": "view_search_results_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past day"
+ },
+ {
+ "name": "view_search_results_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 2nd day"
+ },
+ {
+ "name": "view_search_results_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 3rd day"
+ },
+ {
+ "name": "view_search_results_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 4th day"
+ },
+ {
+ "name": "view_search_results_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 5th day"
+ },
+ {
+ "name": "file_download_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past day"
+ },
+ {
+ "name": "file_download_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 2nd day"
+ },
+ {
+ "name": "file_download_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 3rd day"
+ },
+ {
+ "name": "file_download_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 4th day"
+ },
+ {
+ "name": "file_download_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 5th day"
+ },
+ {
+ "name": "recipe_add_to_list_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has viewed items in the past day"
+ },
+ {
+ "name": "recipe_add_to_list_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has viewed items in the past 2nd day"
+ },
+ {
+ "name": "recipe_add_to_list_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has viewed items in the past 3rd day"
+ },
+ {
+ "name": "recipe_add_to_list_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has viewed items in the past 4th day"
+ },
+ {
+ "name": "recipe_add_to_list_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has viewed items in the past 5th day"
+ },
+ {
+ "name": "recipe_print_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has added items to cart in the past day"
+ },
+ {
+ "name": "recipe_print_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has added items to cart in the past 2nd day"
+ },
+ {
+ "name": "recipe_print_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has added items to cart in the past 3rd day"
+ },
+ {
+ "name": "recipe_print_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has added items to cart in the past 4th day"
+ },
+ {
+ "name": "recipe_print_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has added items to cart in the past 5th day"
+ },
+ {
+ "name": "sign_up_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past day"
+ },
+ {
+ "name": "sign_up_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 2nd day"
+ },
+ {
+ "name": "sign_up_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 3rd day"
+ },
+ {
+ "name": "sign_up_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 4th day"
+ },
+ {
+ "name": "sign_up_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 5th day"
+ },
+ {
+ "name": "recipe_favorite_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past day"
+ },
+ {
+ "name": "recipe_favorite_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 2nd day"
+ },
+ {
+ "name": "recipe_favorite_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 3rd day"
+ },
+ {
+ "name": "recipe_favorite_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 4th day"
+ },
+ {
+ "name": "recipe_favorite_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 5th day"
+ },
+ {
+ "name": "recipe_add_to_menu_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past day"
+ },
+ {
+ "name": "recipe_add_to_menu_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 2nd day"
+ },
+ {
+ "name": "recipe_add_to_menu_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 3rd day"
+ },
+ {
+ "name": "recipe_add_to_menu_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 4th day"
+ },
+ {
+ "name": "recipe_add_to_menu_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 5th day"
+ }
+]
\ No newline at end of file
diff --git a/sql/schema/table/lead_score_propensity_label.json b/sql/schema/table/lead_score_propensity_label.json
new file mode 100644
index 00000000..8b63bc6f
--- /dev/null
+++ b/sql/schema/table/lead_score_propensity_label.json
@@ -0,0 +1,22 @@
+[
+ {
+ "name": "processed_timestamp",
+ "type": "TIMESTAMP",
+ "description": "Timestamp of when the data was processed"
+ },
+ {
+ "name": "feature_date",
+ "type": "DATE",
+ "description": "The date serving as basis for the features calculation"
+ },
+ {
+ "name": "user_pseudo_id",
+ "type": "STRING",
+ "description": "The user pseudo identifier"
+ },
+ {
+ "name": "login_day_1",
+ "type": "INTEGER",
+ "description": "Predicted number of logins by the user in the next 1st day from the feature date"
+ }
+]
\ No newline at end of file
diff --git a/sql/schema/table/lead_score_propensity_training_preparation.json b/sql/schema/table/lead_score_propensity_training_preparation.json
new file mode 100644
index 00000000..f5647417
--- /dev/null
+++ b/sql/schema/table/lead_score_propensity_training_preparation.json
@@ -0,0 +1,352 @@
+[
+ {
+ "name": "processed_timestamp",
+ "type": "TIMESTAMP",
+ "description": "Timestamp of when the data was processed"
+ },
+ {
+ "name": "data_split",
+ "type": "STRING",
+ "description": "The indication of whether the row should be used for TRAINING, VALIDATION or TESTING"
+ },
+ {
+ "name": "feature_date",
+ "type": "DATE",
+ "description": "The date serving as basis for the features calculation"
+ },
+ {
+ "name": "user_pseudo_id",
+ "type": "STRING",
+ "description": "The user pseudo identifier"
+ },
+ {
+ "name": "user_id",
+ "type": "STRING",
+ "description": "The user identifier of when the user has logged in"
+ },
+ {
+ "name": "user_ltv_revenue",
+ "type": "FLOAT",
+ "description": "The current user lifetime value"
+ },
+ {
+ "name": "device_category",
+ "type": "STRING",
+ "description": "The device category of the user last used to access"
+ },
+ {
+ "name": "device_mobile_brand_name",
+ "type": "STRING",
+ "description": "The device mobile brand name last used by the user"
+ },
+ {
+ "name": "device_mobile_model_name",
+ "type": "STRING",
+ "description": "The device mobile model name last used by the user"
+ },
+ {
+ "name": "device_os",
+ "type": "STRING",
+ "description": "The device operating system last used by the user"
+ },
+ {
+ "name": "device_language",
+ "type": "STRING",
+ "description": "The device language last used by the user"
+ },
+ {
+ "name": "device_web_browser",
+ "type": "STRING",
+ "description": "The device web browser last used by the user"
+ },
+ {
+ "name": "geo_sub_continent",
+ "type": "STRING",
+ "description": "The geographic subcontinent from the user last access"
+ },
+ {
+ "name": "geo_country",
+ "type": "STRING",
+ "description": "The geographic country from the user last access"
+ },
+ {
+ "name": "geo_region",
+ "type": "STRING",
+ "description": "The geographic region from the user last access"
+ },
+ {
+ "name": "geo_city",
+ "type": "STRING",
+ "description": "The geographic city from the user last access"
+ },
+ {
+ "name": "geo_metro",
+ "type": "STRING",
+ "description": "The geographic metropolitan area from the user user last access"
+ },
+ {
+ "name": "last_traffic_source_medium",
+ "type": "STRING",
+ "description": "The last traffic source medium from where the user was acquired"
+ },
+ {
+ "name": "last_traffic_source_name",
+ "type": "STRING",
+ "description": "The last traffic source name from where the user was acquired"
+ },
+ {
+ "name": "last_traffic_source_source",
+ "type": "STRING",
+ "description": "The last traffic source soure from where the user was acquired"
+ },
+ {
+ "name": "first_traffic_source_medium",
+ "type": "STRING",
+ "description": "The first traffic source medium from where the user was acquired"
+ },
+ {
+ "name": "first_traffic_source_name",
+ "type": "STRING",
+ "description": "The first traffic source name from where the user was acquired"
+ },
+ {
+ "name": "first_traffic_source_source",
+ "type": "STRING",
+ "description": "The first traffic source source from where the user was acquired"
+ },
+ {
+ "name": "has_signed_in_with_user_id",
+ "type": "BOOLEAN",
+ "description": "A boolean indicating whether the user has signed in with the user id"
+ },
+ {
+ "name": "scroll_50_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has been active in the past 1 day"
+ },
+ {
+ "name": "scroll_50_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has been active in the past 2nd day"
+ },
+ {
+ "name": "scroll_50_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has been active in the past 3rd day"
+ },
+ {
+ "name": "scroll_50_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has been active in the past 4th day"
+ },
+ {
+ "name": "scroll_50_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has been active in the past 5th day"
+ },
+ {
+ "name": "scroll_90_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has scrolled 90p pages in the past day"
+ },
+ {
+ "name": "scroll_90_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has scrolled 90p pages in the past 2nd day"
+ },
+ {
+ "name": "scroll_90_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has scrolled 90p pages in the past 3rd day"
+ },
+ {
+ "name": "scroll_90_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has scrolled 90p pages in the past 4th day"
+ },
+ {
+ "name": "scroll_90_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has scrolled 90p pages in the past 5th day"
+ },
+ {
+ "name": "view_search_results_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past day"
+ },
+ {
+ "name": "view_search_results_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 2nd day"
+ },
+ {
+ "name": "view_search_results_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 3rd day"
+ },
+ {
+ "name": "view_search_results_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 4th day"
+ },
+ {
+ "name": "view_search_results_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 5th day"
+ },
+ {
+ "name": "file_download_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past day"
+ },
+ {
+ "name": "file_download_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 2nd day"
+ },
+ {
+ "name": "file_download_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 3rd day"
+ },
+ {
+ "name": "file_download_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 4th day"
+ },
+ {
+ "name": "file_download_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 5th day"
+ },
+ {
+ "name": "recipe_add_to_list_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has viewed items in the past day"
+ },
+ {
+ "name": "recipe_add_to_list_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has viewed items in the past 2nd day"
+ },
+ {
+ "name": "recipe_add_to_list_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has viewed items in the past 3rd day"
+ },
+ {
+ "name": "recipe_add_to_list_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has viewed items in the past 4th day"
+ },
+ {
+ "name": "recipe_add_to_list_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has viewed items in the past 5th day"
+ },
+ {
+ "name": "recipe_print_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has added items to cart in the past day"
+ },
+ {
+ "name": "recipe_print_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has added items to cart in the past 2nd day"
+ },
+ {
+ "name": "recipe_print_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has added items to cart in the past 3rd day"
+ },
+ {
+ "name": "recipe_print_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has added items to cart in the past 4th day"
+ },
+ {
+ "name": "recipe_print_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has added items to cart in the past 5th day"
+ },
+ {
+ "name": "sign_up_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past day"
+ },
+ {
+ "name": "sign_up_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 2nd day"
+ },
+ {
+ "name": "sign_up_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 3rd day"
+ },
+ {
+ "name": "sign_up_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 4th day"
+ },
+ {
+ "name": "sign_up_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 5th day"
+ },
+ {
+ "name": "recipe_favorite_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past day"
+ },
+ {
+ "name": "recipe_favorite_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 2nd day"
+ },
+ {
+ "name": "recipe_favorite_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 3rd day"
+ },
+ {
+ "name": "recipe_favorite_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 4th day"
+ },
+ {
+ "name": "recipe_favorite_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 5th day"
+ },
+ {
+ "name": "recipe_add_to_menu_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past day"
+ },
+ {
+ "name": "recipe_add_to_menu_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 2nd day"
+ },
+ {
+ "name": "recipe_add_to_menu_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 3rd day"
+ },
+ {
+ "name": "recipe_add_to_menu_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 4th day"
+ },
+ {
+ "name": "recipe_add_to_menu_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 5th day"
+ },
+ {
+ "name": "will_login",
+ "type": "INTEGER",
+ "description": "A boolean indicating whether the user will login in the next period"
+ }
+]
\ No newline at end of file
diff --git a/sql/schema/table/purchase_propensity_inference_preparation.json b/sql/schema/table/purchase_propensity_inference_preparation.json
index 0fe328b2..2f8b1256 100644
--- a/sql/schema/table/purchase_propensity_inference_preparation.json
+++ b/sql/schema/table/purchase_propensity_inference_preparation.json
@@ -109,161 +109,6 @@
"type": "BOOLEAN",
"description": "A boolean indicating whether the user has signed in with an user id"
},
- {
- "name": "engagement_rate",
- "type": "FLOAT",
- "description": "The percentage of sessions that were engaged sessions. Engagement rate = engaged sessions / total sessions Engagement rate is the inverse of bounce rate"
- },
- {
- "name": "engaged_sessions_per_user",
- "type": "INTEGER",
- "description": "The number of engaged sessions per user"
- },
- {
- "name": "session_conversion_rate",
- "type": "FLOAT",
- "description": "The session conversion rate is calculated by dividing the number of sessions with a conversion event by the total number of sessions"
- },
- {
- "name": "bounces",
- "type": "INTEGER",
- "description": "The number of not engaged sessions"
- },
- {
- "name": "bounce_rate_per_user",
- "type": "FLOAT",
- "description": "The percentage of sessions that were not engaged sessions per user. Bounce rate = not engaged sessions / total sessions Bounce rate is the inverse of engagement rate"
- },
- {
- "name": "sessions_per_user",
- "type": "INTEGER",
- "description": "The number of sessions per user"
- },
- {
- "name": "avg_views_per_session",
- "type": "FLOAT",
- "description": "The average number of views per sessions"
- },
- {
- "name": "sum_engagement_time_seconds",
- "type": "FLOAT",
- "description": "The sum of time that your website was in focus in a user's browser or an app was in the foreground of a user's device in seconds per user"
- },
- {
- "name": "avg_engagement_time_seconds",
- "type": "FLOAT",
- "description": "The average time that your website was in focus in a user's browser or an app was in the foreground of a user's device. Average engagement time = total user engagement durations / number of active users"
- },
- {
- "name": "new_visits",
- "type": "INTEGER",
- "description": "The number of times your users opened your website for the first time"
- },
- {
- "name": "returning_visits",
- "type": "INTEGER",
- "description": "The number of users who have initiated at least one previous session, regardless of whether or not the previous sessions were engaged sessions"
- },
- {
- "name": "add_to_carts",
- "type": "INTEGER",
- "description": "The number of times users added items to their shopping carts"
- },
- {
- "name": "cart_to_view_rate",
- "type": "FLOAT",
- "description": "The number of times users added items to their shopping carts divided by the the number of mobile app screens or web pages your users saw. Repeated views of a single screen or page are counted"
- },
- {
- "name": "checkouts",
- "type": "INTEGER",
- "description": "The number of times users started the checkout process"
- },
- {
- "name": "ecommerce_purchases",
- "type": "INTEGER",
- "description": "The number of purchases on your website or app"
- },
- {
- "name": "ecommerce_quantity",
- "type": "INTEGER",
- "description": "The number of units for an ecommerce event"
- },
- {
- "name": "ecommerce_revenue",
- "type": "FLOAT",
- "description": "The sum of revenue from purchases made on your website or app, minus any refunds given. Purchase revenue = purchases + in-app purchases + subscriptions - refund"
- },
- {
- "name": "item_revenue",
- "type": "FLOAT",
- "description": "The total revenue from items only minus refunds, excluding tax and shipping"
- },
- {
- "name": "item_quantity",
- "type": "INTEGER",
- "description": "The number of units for a single item included in ecommerce events"
- },
- {
- "name": "item_view_events",
- "type": "INTEGER",
- "description": "The number of times an item was viewed"
- },
- {
- "name": "items_clicked_in_promotion",
- "type": "INTEGER",
- "description": "The number of items that the customer clicked in a promotion"
- },
- {
- "name": "items_clicked_in_list",
- "type": "INTEGER",
- "description": "The number of items that the customer clicked in a list of items"
- },
- {
- "name": "items_checked_out",
- "type": "INTEGER",
- "description": "The number of times the user has checked out"
- },
- {
- "name": "items_added_to_cart",
- "type": "INTEGER",
- "description": "The number of times the user has added items to cart"
- },
- {
- "name": "item_list_view_events",
- "type": "INTEGER",
- "description": "The number of times the user has viewed items in list"
- },
- {
- "name": "purchase_revenue",
- "type": "FLOAT",
- "description": "The total revenue from purchases, in-app purchases, subscriptions, and ad revenue. Total revenue = purchases + in-app purchases + subscriptions + ad revenue - refunds"
- },
- {
- "name": "purchase_to_view_rate",
- "type": "FLOAT",
- "description": "The number of purchases on your website or app divided by the number of mobile app screens or web pages your users saw"
- },
- {
- "name": "transactions_per_purchaser",
- "type": "FLOAT",
- "description": "The average number of purchases per buyer for the selected time frame"
- },
- {
- "name": "user_conversion_rate",
- "type": "FLOAT",
- "description": "The number of users who performed a conversion action divided by the total number of users"
- },
- {
- "name": "how_many_purchased_before",
- "type": "INTEGER",
- "description": "The number of times the user have purchased before"
- },
- {
- "name": "has_abandoned_cart",
- "type": "BOOLEAN",
- "description": "a boolean indicating whether the user has abandoned a cart in the past"
- },
{
"name": "active_users_past_1_day",
"type": "INTEGER",
diff --git a/sql/schema/table/purchase_propensity_training_preparation.json b/sql/schema/table/purchase_propensity_training_preparation.json
index e5d284d5..f984f42e 100644
--- a/sql/schema/table/purchase_propensity_training_preparation.json
+++ b/sql/schema/table/purchase_propensity_training_preparation.json
@@ -119,161 +119,6 @@
"type": "BOOLEAN",
"description": "A boolean indicating whether the user has signed in with the user id"
},
- {
- "name": "engagement_rate",
- "type": "FLOAT",
- "description": "The percentage of sessions that were engaged sessions. Engagement rate = engaged sessions / total sessions Engagement rate is the inverse of bounce rate"
- },
- {
- "name": "engaged_sessions_per_user",
- "type": "INTEGER",
- "description": "The number of engaged sessions per user"
- },
- {
- "name": "session_conversion_rate",
- "type": "FLOAT",
- "description": "The session conversion rate is calculated by dividing the number of sessions with a conversion event by the total number of sessions"
- },
- {
- "name": "bounces",
- "type": "INTEGER",
- "description": "The number of not engaged sessions"
- },
- {
- "name": "bounce_rate_per_user",
- "type": "FLOAT",
- "description": "The percentage of sessions that were not engaged sessions per user. Bounce rate = not engaged sessions / total sessions Bounce rate is the inverse of engagement rate"
- },
- {
- "name": "sessions_per_user",
- "type": "INTEGER",
- "description": "The number of sessions per user"
- },
- {
- "name": "avg_views_per_session",
- "type": "FLOAT",
- "description": "The average number of views per sessions"
- },
- {
- "name": "sum_engagement_time_seconds",
- "type": "FLOAT",
- "description": "The sum of time that your website was in focus in a user's browser or an app was in the foreground of a user's device in seconds per user"
- },
- {
- "name": "avg_engagement_time_seconds",
- "type": "FLOAT",
- "description": "The average time that your website was in focus in a user's browser or an app was in the foreground of a user's device. Average engagement time = total user engagement durations / number of active users"
- },
- {
- "name": "new_visits",
- "type": "INTEGER",
- "description": "The number of times your users opened your website for the first time"
- },
- {
- "name": "returning_visits",
- "type": "INTEGER",
- "description": "The number of users who have initiated at least one previous session, regardless of whether or not the previous sessions were engaged sessions"
- },
- {
- "name": "add_to_carts",
- "type": "INTEGER",
- "description": "The number of times users added items to their shopping carts"
- },
- {
- "name": "cart_to_view_rate",
- "type": "FLOAT",
- "description": "The number of times users added items to their shopping carts divided by the the number of mobile app screens or web pages your users saw. Repeated views of a single screen or page are counted"
- },
- {
- "name": "checkouts",
- "type": "INTEGER",
- "description": "The number of times users started the checkout process"
- },
- {
- "name": "ecommerce_purchases",
- "type": "INTEGER",
- "description": "The number of purchases on your website or app"
- },
- {
- "name": "ecommerce_quantity",
- "type": "INTEGER",
- "description": "The number of units for an ecommerce event"
- },
- {
- "name": "ecommerce_revenue",
- "type": "FLOAT",
- "description": "The sum of revenue from purchases made on your website or app, minus any refunds given. Purchase revenue = purchases + in-app purchases + subscriptions - refund"
- },
- {
- "name": "item_revenue",
- "type": "FLOAT",
- "description": "The total revenue from items only minus refunds, excluding tax and shipping"
- },
- {
- "name": "item_quantity",
- "type": "INTEGER",
- "description": "The number of units for a single item included in ecommerce events"
- },
- {
- "name": "item_view_events",
- "type": "INTEGER",
- "description": "The number of times an item was viewed"
- },
- {
- "name": "items_clicked_in_promotion",
- "type": "INTEGER",
- "description": "The number of items that the customer clicked in a promotion"
- },
- {
- "name": "items_clicked_in_list",
- "type": "INTEGER",
- "description": "The number of items that the customer clicked in a list of items"
- },
- {
- "name": "items_checked_out",
- "type": "INTEGER",
- "description": "The number of times the user has checked out"
- },
- {
- "name": "items_added_to_cart",
- "type": "INTEGER",
- "description": "The number of times the user has added items to cart"
- },
- {
- "name": "item_list_view_events",
- "type": "INTEGER",
- "description": "The number of times the user has viewed items in list"
- },
- {
- "name": "purchase_revenue",
- "type": "FLOAT",
- "description": "The total revenue from purchases, in-app purchases, subscriptions, and ad revenue. Total revenue = purchases + in-app purchases + subscriptions + ad revenue - refunds"
- },
- {
- "name": "purchase_to_view_rate",
- "type": "FLOAT",
- "description": "The number of purchases on your website or app divided by the number of mobile app screens or web pages your users saw"
- },
- {
- "name": "transactions_per_purchaser",
- "type": "FLOAT",
- "description": "The average number of purchases per buyer for the selected time frame"
- },
- {
- "name": "user_conversion_rate",
- "type": "FLOAT",
- "description": "The number of users who performed a conversion action divided by the total number of users"
- },
- {
- "name": "how_many_purchased_before",
- "type": "INTEGER",
- "description": "The number of times the user have purchased before"
- },
- {
- "name": "has_abandoned_cart",
- "type": "BOOLEAN",
- "description": "a boolean indicating whether the user has abandoned a cart in the past"
- },
{
"name": "active_users_past_1_day",
"type": "INTEGER",
@@ -544,131 +389,6 @@
"type": "INTEGER",
"description": "The number of times the user has checked out in the past 15 to 30 days"
},
- {
- "name": "purchasers_users",
- "type": "INTEGER",
- "description": "The number of distinct users who have purchases in the past"
- },
- {
- "name": "average_daily_purchasers",
- "type": "FLOAT",
- "description": "The average number of purchasers across all the days in the selected time frame"
- },
- {
- "name": "active_users",
- "type": "INTEGER",
- "description": "The number of distinct users who visited your website or application. An active user is any user who has an engaged session or when Analytics collects: the first_visit event or engagement_time_msec parameter from a website the first_open event or engagement_time_msec parameter from an Android app the first_open or user_engagement event from an iOS app"
- },
- {
- "name": "DAU",
- "type": "FLOAT",
- "description": "The number of users who engaged for the calendar day"
- },
- {
- "name": "MAU",
- "type": "FLOAT",
- "description": "The number of users who engaged in the last 30 days"
- },
- {
- "name": "WAU",
- "type": "FLOAT",
- "description": "The number of users who engaged in the last week"
- },
- {
- "name": "dau_per_mau",
- "type": "FLOAT",
- "description": "Daily Active Users (DAU) / Monthly Active Users (MAU) shows the percentage of users who engaged for the calendar day out of the users who engaged in the last 30 days. A higher ratio suggests good engagement and user retention"
- },
- {
- "name": "dau_per_wau",
- "type": "FLOAT",
- "description": "Daily Active Users (DAU) / Weekly Active Users (WAU) shows the percentage of users who engaged in the last 24 hours out of the users who engaged in the last 7 days. A higher ratio suggests good engagement and user retention"
- },
- {
- "name": "wau_per_mau",
- "type": "FLOAT",
- "description": "Weekly Active Users (DAU) / Monthly Active Users (MAU) shows the percentage of users who engaged in the last 7 days out of the users who engaged in the last 30 days. A higher ratio suggests good engagement and user retention"
- },
- {
- "name": "users_engagement_duration_seconds",
- "type": "FLOAT",
- "description": "The length of time that your app screen was in the foreground or your web page was in focus in seconds"
- },
- {
- "name": "average_engagement_time",
- "type": "FLOAT",
- "description": "The average time that your website was in focus in a user's browser or an app was in the foreground of a user's device. Average engagement time = total user engagement durations / number of active users"
- },
- {
- "name": "average_engagement_time_per_session",
- "type": "FLOAT",
- "description": "The average engagement time per session"
- },
- {
- "name": "average_sessions_per_user",
- "type": "FLOAT",
- "description": "The average number of sessions per user"
- },
- {
- "name": "ARPPU",
- "type": "FLOAT",
- "description": "Average revenue per paying user (ARPPU) is the total purchase revenue per active user who made a purchase"
- },
- {
- "name": "ARPU",
- "type": "FLOAT",
- "description": "Average revenue per active user (ARPU) is the total revenue generated on average from each active user, whether they made a purchase or not. ARPU = (Total ad revenue + purchase revenue + in-app purchase revenue + subscriptions) / Active users"
- },
- {
- "name": "average_daily_revenue",
- "type": "FLOAT",
- "description": "Average daily revenue The average total revenue for a day over the selected time frame"
- },
- {
- "name": "max_daily_revenue",
- "type": "FLOAT",
- "description": "The maximum total revenue for a day over the selected time frame"
- },
- {
- "name": "min_daily_revenue",
- "type": "FLOAT",
- "description": "The minimum total revenue for a day over the selected time frame"
- },
- {
- "name": "new_users",
- "type": "INTEGER",
- "description": "The number of new unique user IDs that logged the first_open or first_visit event. The metric allows you to measure the number of users who interacted with your site or launched your app for the first time"
- },
- {
- "name": "returning_users",
- "type": "INTEGER",
- "description": "The number of users who have initiated at least one previous session, regardless of whether or not the previous sessions were engaged sessions"
- },
- {
- "name": "first_time_purchasers",
- "type": "INTEGER",
- "description": "The number of users who made their first purchase in the selected time frame."
- },
- {
- "name": "first_time_purchaser_conversion",
- "type": "FLOAT",
- "description": "The percentage of active users who made their first purchase. This metric is returned as a fraction; for example, 0.092 means 9.2% of active users were first-time purchasers"
- },
- {
- "name": "first_time_purchasers_per_new_user",
- "type": "FLOAT",
- "description": "The average number of first-time purchasers per new user"
- },
- {
- "name": "avg_user_conversion_rate",
- "type": "FLOAT",
- "description": "The average number of converting user per total users"
- },
- {
- "name": "avg_session_conversion_rate",
- "type": "FLOAT",
- "description": "The average number of converting session per total sessions"
- },
{
"name": "will_purchase",
"type": "INTEGER",
diff --git a/sql/schema/table/user_rolling_window_lead_metrics.json b/sql/schema/table/user_rolling_window_lead_metrics.json
new file mode 100644
index 00000000..e22d0ceb
--- /dev/null
+++ b/sql/schema/table/user_rolling_window_lead_metrics.json
@@ -0,0 +1,242 @@
+[
+ {
+ "name": "processed_timestamp",
+ "type": "TIMESTAMP",
+ "description": "Timestamp of when the data was processed"
+ },
+ {
+ "name": "feature_date",
+ "type": "DATE",
+ "description": "The date serving as basis for the features calculation"
+ },
+ {
+ "name": "user_pseudo_id",
+ "type": "STRING",
+ "description": "The user pseudo identifier"
+ },
+ {
+ "name": "scroll_50_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has been active in the past 1 day"
+ },
+ {
+ "name": "scroll_50_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has been active in the past 2nd day"
+ },
+ {
+ "name": "scroll_50_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has been active in the past 3rd day"
+ },
+ {
+ "name": "scroll_50_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has been active in the past 4th day"
+ },
+ {
+ "name": "scroll_50_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has been active in the past 5th day"
+ },
+ {
+ "name": "scroll_90_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has purchased in the past day"
+ },
+ {
+ "name": "scroll_90_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has purchased in the past 2nd day"
+ },
+ {
+ "name": "scroll_90_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has purchased in the past 3rd day"
+ },
+ {
+ "name": "scroll_90_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has purchased in the past 4th day"
+ },
+ {
+ "name": "scroll_90_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has purchased in the past 5th day"
+ },
+ {
+ "name": "view_search_results_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past day"
+ },
+ {
+ "name": "view_search_results_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 2nd day"
+ },
+ {
+ "name": "view_search_results_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 3rd day"
+ },
+ {
+ "name": "view_search_results_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 4th day"
+ },
+ {
+ "name": "view_search_results_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 5th day"
+ },
+ {
+ "name": "file_download_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past day"
+ },
+ {
+ "name": "file_download_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 2nd day"
+ },
+ {
+ "name": "file_download_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 3rd day"
+ },
+ {
+ "name": "file_download_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 4th day"
+ },
+ {
+ "name": "file_download_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 5th day"
+ },
+ {
+ "name": "recipe_add_to_list_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has viewed items in the past day"
+ },
+ {
+ "name": "recipe_add_to_list_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has viewed items in the past 2nd day"
+ },
+ {
+ "name": "recipe_add_to_list_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has viewed items in the past 3rd day"
+ },
+ {
+ "name": "recipe_add_to_list_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has viewed items in the past 4th day"
+ },
+ {
+ "name": "recipe_add_to_list_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has viewed items in the past 5th day"
+ },
+ {
+ "name": "recipe_print_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has added items to cart in the past day"
+ },
+ {
+ "name": "recipe_print_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has added items to cart in the past 2nd day"
+ },
+ {
+ "name": "recipe_print_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has added items to cart in the past 3rd day"
+ },
+ {
+ "name": "recipe_print_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has added items to cart in the past 4th day"
+ },
+ {
+ "name": "recipe_print_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has added items to cart in the past 5th day"
+ },
+ {
+ "name": "sign_up_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past day"
+ },
+ {
+ "name": "sign_up_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 2nd day"
+ },
+ {
+ "name": "sign_up_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 3rd day"
+ },
+ {
+ "name": "sign_up_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 4th day"
+ },
+ {
+ "name": "sign_up_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 5th day"
+ },
+ {
+ "name": "recipe_favorite_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past day"
+ },
+ {
+ "name": "recipe_favorite_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 2nd day"
+ },
+ {
+ "name": "recipe_favorite_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 3rd day"
+ },
+ {
+ "name": "recipe_favorite_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 4th day"
+ },
+ {
+ "name": "recipe_favorite_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 5th day"
+ },
+ {
+ "name": "recipe_add_to_menu_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past day"
+ },
+ {
+ "name": "recipe_add_to_menu_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 2nd day"
+ },
+ {
+ "name": "recipe_add_to_menu_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 3rd day"
+ },
+ {
+ "name": "recipe_add_to_menu_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 4th day"
+ },
+ {
+ "name": "recipe_add_to_menu_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 5th day"
+ }
+]
\ No newline at end of file
diff --git a/templates/activation_query/audience_segmentation_query_template.sqlx b/templates/activation_query/audience_segmentation_query_template.sqlx
index 40c8c9a5..89eec5e0 100644
--- a/templates/activation_query/audience_segmentation_query_template.sqlx
+++ b/templates/activation_query/audience_segmentation_query_template.sqlx
@@ -1,8 +1,9 @@
SELECT
- a.prediction AS a_s_prediction,
+ a.prediction AS user_prop_a_s_prediction,
b.user_pseudo_id AS client_id,
b.user_id AS user_id,
- b.ga_session_id AS session_id,
+ b.ga_session_id AS event_param_session_id,
+ '100' AS event_param_engagement_time_msec,
CASE WHEN EXTRACT(MICROSECOND FROM b.event_timestamp) = 1 THEN b.event_timestamp ELSE TIMESTAMP_SUB(b.event_timestamp, INTERVAL 1 MICROSECOND) END AS inference_date
FROM
`${mds_project_id}.marketing_ga4_v1_${mds_dataset_suffix}.latest_event_per_user_last_72_hours` b,
diff --git a/templates/activation_query/auto_audience_segmentation_query_template.sqlx b/templates/activation_query/auto_audience_segmentation_query_template.sqlx
index 5b6c0eef..d2e960d9 100644
--- a/templates/activation_query/auto_audience_segmentation_query_template.sqlx
+++ b/templates/activation_query/auto_audience_segmentation_query_template.sqlx
@@ -1,8 +1,9 @@
SELECT
- a.prediction AS a_a_s_prediction,
+ a.prediction AS user_prop_a_a_s_prediction,
b.user_pseudo_id AS client_id,
b.user_id AS user_id,
- b.ga_session_id AS session_id,
+ b.ga_session_id AS event_param_session_id,
+ '100' AS event_param_engagement_time_msec,
CASE WHEN EXTRACT(MICROSECOND FROM b.event_timestamp) = 1 THEN b.event_timestamp ELSE TIMESTAMP_SUB(b.event_timestamp, INTERVAL 1 MICROSECOND) END AS inference_date
FROM
`${mds_project_id}.marketing_ga4_v1_${mds_dataset_suffix}.latest_event_per_user_last_72_hours` b,
diff --git a/templates/activation_query/churn_propensity_query_template.sqlx b/templates/activation_query/churn_propensity_query_template.sqlx
index 5ab39212..ae42604a 100644
--- a/templates/activation_query/churn_propensity_query_template.sqlx
+++ b/templates/activation_query/churn_propensity_query_template.sqlx
@@ -1,9 +1,10 @@
SELECT
- a.prediction AS c_p_prediction,
- NTILE(10) OVER (ORDER BY a.prediction_prob DESC) AS c_p_decile,
+ a.prediction AS user_prop_c_p_prediction,
+ NTILE(10) OVER (ORDER BY a.prediction_prob DESC) AS user_prop_c_p_decile,
b.user_pseudo_id AS client_id,
b.user_id AS user_id,
- b.ga_session_id AS session_id,
+ b.ga_session_id AS event_param_session_id,
+ '100' AS event_param_engagement_time_msec,
CASE WHEN EXTRACT(MICROSECOND FROM b.event_timestamp) = 1 THEN b.event_timestamp ELSE TIMESTAMP_SUB(b.event_timestamp, INTERVAL 1 MICROSECOND) END AS inference_date
FROM
`${mds_project_id}.marketing_ga4_v1_${mds_dataset_suffix}.latest_event_per_user_last_72_hours` b,
diff --git a/templates/activation_query/cltv_query_template.sqlx b/templates/activation_query/cltv_query_template.sqlx
index 3a94982d..bdd4bffd 100644
--- a/templates/activation_query/cltv_query_template.sqlx
+++ b/templates/activation_query/cltv_query_template.sqlx
@@ -1,8 +1,9 @@
SELECT
- NTILE(10) OVER (ORDER BY a.prediction DESC) AS cltv_decile,
+ NTILE(10) OVER (ORDER BY a.prediction DESC) AS user_prop_cltv_decile,
b.user_pseudo_id AS client_id,
b.user_id AS user_id,
- b.ga_session_id AS session_id,
+ b.ga_session_id AS event_param_session_id,
+ '100' AS event_param_engagement_time_msec,
CASE WHEN EXTRACT(MICROSECOND FROM b.event_timestamp) = 1 THEN b.event_timestamp ELSE TIMESTAMP_SUB(b.event_timestamp, INTERVAL 1 MICROSECOND) END AS inference_date
FROM
`${mds_project_id}.marketing_ga4_v1_${mds_dataset_suffix}.latest_event_per_user_last_72_hours` b,
diff --git a/templates/activation_query/lead_score_propensity_query_template.sqlx b/templates/activation_query/lead_score_propensity_query_template.sqlx
new file mode 100644
index 00000000..5ad0b874
--- /dev/null
+++ b/templates/activation_query/lead_score_propensity_query_template.sqlx
@@ -0,0 +1,14 @@
+SELECT
+ a.prediction AS user_prop_l_s_p_prediction,
+ NTILE(10) OVER (ORDER BY a.prediction_prob DESC) AS user_prop_l_s_p_decile,
+ b.user_pseudo_id AS client_id,
+ b.user_id AS user_id,
+ b.ga_session_id AS event_param_session_id,
+ '100' AS event_param_engagement_time_msec,
+ CASE WHEN EXTRACT(MICROSECOND FROM b.event_timestamp) = 1 THEN b.event_timestamp ELSE TIMESTAMP_SUB(b.event_timestamp, INTERVAL 1 MICROSECOND) END AS inference_date
+FROM
+ `${mds_project_id}.marketing_ga4_v1_${mds_dataset_suffix}.latest_event_per_user_last_72_hours` b,
+ `{{source_table}}` a
+WHERE
+ COALESCE(a.user_id, "") = COALESCE(b.user_id, "")
+ AND a.user_pseudo_id = b.user_pseudo_id
diff --git a/templates/activation_query/lead_score_propensity_vbb_query_template.sqlx b/templates/activation_query/lead_score_propensity_vbb_query_template.sqlx
new file mode 100644
index 00000000..9be0e0a9
--- /dev/null
+++ b/templates/activation_query/lead_score_propensity_vbb_query_template.sqlx
@@ -0,0 +1,35 @@
+WITH user_prediction_decile AS (
+ SELECT
+ a.prediction AS l_s_p_prediction,
+ NTILE(10) OVER (ORDER BY a.prediction_prob DESC) AS l_s_p_decile,
+ b.user_pseudo_id AS client_id,
+ b.user_id AS user_id,
+ b.ga_session_id AS session_id,
+ CASE
+ WHEN EXTRACT(MICROSECOND FROM b.event_timestamp) = 1 THEN b.event_timestamp
+ ELSE TIMESTAMP_SUB(b.event_timestamp, INTERVAL 1 MICROSECOND)
+ END AS inference_date
+ FROM
+ `${mds_project_id}.marketing_ga4_v1_${mds_dataset_suffix}.latest_event_per_user_last_24_hours` b,
+ `{{source_table}}` a
+ WHERE
+ COALESCE(a.user_id, "") = COALESCE(b.user_id, "")
+ AND a.user_pseudo_id = b.user_pseudo_id)
+SELECT
+ a.l_s_p_prediction AS user_prop_l_s_p_prediction,
+ a.l_s_p_decile AS user_prop_l_s_p_decile,
+ b.value AS event_param_value,
+ 'USD' AS event_param_currency,
+ a.client_id,
+ a.user_id,
+ a.session_id AS event_param_session_id,
+ a.inference_date
+FROM
+ user_prediction_decile AS a
+LEFT JOIN
+ `${activation_project_id}.${dataset}.vbb_activation_configuration` AS b
+ON
+ a.l_s_p_decile = b.decile
+WHERE
+ b.activation_type = 'lead-score-propensity'
+AND b.value > 0
\ No newline at end of file
diff --git a/templates/activation_query/purchase_propensity_query_template.sqlx b/templates/activation_query/purchase_propensity_query_template.sqlx
index 40fe5c40..985edf03 100644
--- a/templates/activation_query/purchase_propensity_query_template.sqlx
+++ b/templates/activation_query/purchase_propensity_query_template.sqlx
@@ -1,9 +1,10 @@
SELECT
- a.prediction AS p_p_prediction,
- NTILE(10) OVER (ORDER BY a.prediction_prob DESC) AS p_p_decile,
+ a.prediction AS user_prop_p_p_prediction,
+ NTILE(10) OVER (ORDER BY a.prediction_prob DESC) AS user_prop_p_p_decile,
b.user_pseudo_id AS client_id,
b.user_id AS user_id,
- b.ga_session_id AS session_id,
+ b.ga_session_id AS event_param_session_id,
+ '100' AS event_param_engagement_time_msec,
CASE WHEN EXTRACT(MICROSECOND FROM b.event_timestamp) = 1 THEN b.event_timestamp ELSE TIMESTAMP_SUB(b.event_timestamp, INTERVAL 1 MICROSECOND) END AS inference_date
FROM
`${mds_project_id}.marketing_ga4_v1_${mds_dataset_suffix}.latest_event_per_user_last_72_hours` b,
diff --git a/templates/activation_query/purchase_propensity_vbb_query_template.sqlx b/templates/activation_query/purchase_propensity_vbb_query_template.sqlx
new file mode 100644
index 00000000..f81fce9f
--- /dev/null
+++ b/templates/activation_query/purchase_propensity_vbb_query_template.sqlx
@@ -0,0 +1,35 @@
+WITH user_prediction_decile AS (
+ SELECT
+ a.prediction AS p_p_prediction,
+ NTILE(10) OVER (ORDER BY a.prediction_prob DESC) AS p_p_decile,
+ b.user_pseudo_id AS client_id,
+ b.user_id AS user_id,
+ b.ga_session_id AS session_id,
+ CASE
+ WHEN EXTRACT(MICROSECOND FROM b.event_timestamp) = 1 THEN b.event_timestamp
+ ELSE TIMESTAMP_SUB(b.event_timestamp, INTERVAL 1 MICROSECOND)
+ END AS inference_date
+ FROM
+ `${mds_project_id}.marketing_ga4_v1_${mds_dataset_suffix}.latest_event_per_user_last_24_hours` b,
+ `{{source_table}}` a
+ WHERE
+ COALESCE(a.user_id, "") = COALESCE(b.user_id, "")
+ AND a.user_pseudo_id = b.user_pseudo_id)
+SELECT
+ a.p_p_prediction AS user_prop_p_p_prediction,
+ a.p_p_decile AS user_prop_p_p_decile,
+ b.value AS event_param_value,
+ 'USD' AS event_param_currency,
+ a.client_id,
+ a.user_id,
+ a.session_id AS event_param_session_id,
+ a.inference_date
+FROM
+ user_prediction_decile AS a
+LEFT JOIN
+ `${activation_project_id}.${dataset}.vbb_activation_configuration` AS b
+ON
+ a.p_p_decile = b.decile
+WHERE
+ b.activation_type = 'purchase-propensity'
+AND b.value > 0
\ No newline at end of file
diff --git a/templates/activation_type_configuration_template.tpl b/templates/activation_type_configuration_template.tpl
index 22afddc3..913b70a2 100644
--- a/templates/activation_type_configuration_template.tpl
+++ b/templates/activation_type_configuration_template.tpl
@@ -1,47 +1,54 @@
{
"audience-segmentation-15": {
"activation_event_name": "maj_audience_segmentation_15",
- "source_query_template": "${audience_segmentation_query_template_gcs_path}",
- "measurement_protocol_payload_template": "${measurement_protocol_payload_template_gcs_path}"
+ "source_query_template": "${audience_segmentation_query_template_gcs_path}"
},
"auto-audience-segmentation-15": {
"activation_event_name": "maj_auto_audience_segmentation_15",
- "source_query_template": "${auto_audience_segmentation_query_template_gcs_path}",
- "measurement_protocol_payload_template": "${measurement_protocol_payload_template_gcs_path}"
+ "source_query_template": "${auto_audience_segmentation_query_template_gcs_path}"
},
"cltv-180-180": {
"activation_event_name": "maj_cltv_180_180",
- "source_query_template": "${cltv_query_template_gcs_path}",
- "measurement_protocol_payload_template": "${measurement_protocol_payload_template_gcs_path}"
+ "source_query_template": "${cltv_query_template_gcs_path}"
},
"cltv-180-90": {
"activation_event_name": "maj_cltv_180_90",
- "source_query_template": "${cltv_query_template_gcs_path}",
- "measurement_protocol_payload_template": "${measurement_protocol_payload_template_gcs_path}"
+ "source_query_template": "${cltv_query_template_gcs_path}"
},
"cltv-180-30": {
"activation_event_name": "maj_cltv_180_30",
- "source_query_template": "${cltv_query_template_gcs_path}",
- "measurement_protocol_payload_template": "${measurement_protocol_payload_template_gcs_path}"
+ "source_query_template": "${cltv_query_template_gcs_path}"
},
"purchase-propensity-30-15": {
"activation_event_name": "maj_purchase_propensity_30_15",
- "source_query_template": "${purchase_propensity_query_template_gcs_path}",
- "measurement_protocol_payload_template": "${measurement_protocol_payload_template_gcs_path}"
+ "source_query_template": "${purchase_propensity_query_template_gcs_path}"
+ },
+ "purchase-propensity-vbb-30-15": {
+ "activation_event_name": "maj_purchase_propensity_vbb_30_15",
+ "source_query_template": "${purchase_propensity_vbb_query_template_gcs_path}"
},
"purchase-propensity-15-15": {
"activation_event_name": "maj_purchase_propensity_15_15",
- "source_query_template": "${purchase_propensity_query_template_gcs_path}",
- "measurement_protocol_payload_template": "${measurement_protocol_payload_template_gcs_path}"
+ "source_query_template": "${purchase_propensity_query_template_gcs_path}"
},
"purchase-propensity-15-7": {
"activation_event_name": "maj_purchase_propensity_15_7",
- "source_query_template": "${purchase_propensity_query_template_gcs_path}",
- "measurement_protocol_payload_template": "${measurement_protocol_payload_template_gcs_path}"
+ "source_query_template": "${purchase_propensity_query_template_gcs_path}"
},
"churn-propensity-30-15": {
"activation_event_name": "maj_churn_propensity_30_15",
- "source_query_template": "${churn_propensity_query_template_gcs_path}",
- "measurement_protocol_payload_template": "${measurement_protocol_payload_template_gcs_path}"
+ "source_query_template": "${churn_propensity_query_template_gcs_path}"
+ },
+ "churn-propensity-15-15": {
+ "activation_event_name": "maj_churn_propensity_15_15",
+ "source_query_template": "${churn_propensity_query_template_gcs_path}"
+ },
+ "churn-propensity-15-7": {
+ "activation_event_name": "maj_churn_propensity_15_7",
+ "source_query_template": "${churn_propensity_query_template_gcs_path}"
+ },
+ "lead-score-propensity-30-15": {
+ "activation_event_name": "maj_lead_score_propensity_30_15",
+ "source_query_template": "${lead_score_propensity_query_template_gcs_path}"
}
-}
\ No newline at end of file
+}
diff --git a/templates/activation_user_import/lead_score_propensity_csv_export.sqlx b/templates/activation_user_import/lead_score_propensity_csv_export.sqlx
new file mode 100644
index 00000000..376cea56
--- /dev/null
+++ b/templates/activation_user_import/lead_score_propensity_csv_export.sqlx
@@ -0,0 +1,27 @@
+DECLARE
+ select_query STRING;
+SET
+ select_query = FORMAT("""
+ CREATE TEMPORARY TABLE tmp_selection AS
+ SELECT
+ user_pseudo_id AS client_id,
+ '${ga4_stream_id}' AS stream_id,
+ prediction AS l_s_p_prediction,
+ NTILE(10) OVER (ORDER BY prediction_prob DESC) AS l_s_p_decile
+ FROM `%s`
+ """, prediction_table_name);
+EXECUTE IMMEDIATE
+ select_query;
+EXPORT DATA
+ OPTIONS ( uri = 'gs://${export_bucket}/csv-export/lead_score_propensity-*.csv',
+ format = 'CSV',
+ OVERWRITE = TRUE,
+ header = TRUE,
+ field_delimiter = ',' ) AS (
+ SELECT
+ client_id,
+ stream_id,
+ l_s_p_prediction,
+ l_s_p_decile
+ FROM
+ tmp_selection );
diff --git a/templates/app_payload_template.jinja2 b/templates/app_payload_template.jinja2
deleted file mode 100644
index 33179784..00000000
--- a/templates/app_payload_template.jinja2
+++ /dev/null
@@ -1,20 +0,0 @@
-{
- "client_id": "{{client_id}}",
- {{user_id}}
- "timestamp_micros": "{{event_timestamp}}",
- "nonPersonalizedAds": false,
- "consent": {
- "ad_user_data": "GRANTED",
- "ad_personalization": "GRANTED"
- },
- "user_properties":
- {{user_properties}},
- "events": [
- {
- "name": "{{event_name}}",
- "params": {
- "session_id": "{{session_id}}"
- }
- }
- ]
-}
diff --git a/templates/load_vbb_activation_configuration.sql.tpl b/templates/load_vbb_activation_configuration.sql.tpl
new file mode 100644
index 00000000..b256e9ca
--- /dev/null
+++ b/templates/load_vbb_activation_configuration.sql.tpl
@@ -0,0 +1,33 @@
+-- Copyright 2023 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- http://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- Step 1: Load JSON data from GCS into the temporary table
+LOAD DATA OVERWRITE `${project_id}.${dataset}.temp_json_data`
+FROM FILES (
+ format = 'JSON',
+ uris = ['${config_file_uri}']
+);
+
+-- Step 2: Transform and load into the final table
+CREATE OR REPLACE TABLE `${project_id}.${dataset}.vbb_activation_configuration` AS
+ SELECT
+ t.activation_type AS activation_type,
+ dm.decile,
+ (t.value_norm * dm.multiplier) AS value
+ FROM
+ `${project_id}.${dataset}.temp_json_data` AS t,
+ UNNEST(t.decile_multiplier) AS dm;
+
+-- Step 3: Clean up temporary tables
+DROP TABLE `${project_id}.${dataset}.temp_json_data`;
diff --git a/templates/purchase_propensity_smart_bidding_view.sql.tpl b/templates/purchase_propensity_smart_bidding_view.sql.tpl
new file mode 100644
index 00000000..f3891c15
--- /dev/null
+++ b/templates/purchase_propensity_smart_bidding_view.sql.tpl
@@ -0,0 +1,41 @@
+-- Copyright 2024 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- http://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+SELECT
+ p_stat.inference_date,
+ p_stat.p_p_decile,
+ p_stat.number_of_users,
+ conf.value*p_stat.number_of_users AS predicted_purchase_value
+FROM (
+ SELECT
+ inference_date,
+ p_p_decile,
+ COUNT(p_p_decile) AS number_of_users
+ FROM (
+ SELECT
+ PARSE_DATE('%Y_%m_%d', SUBSTR(_TABLE_SUFFIX, 1,10)) AS inference_date,
+ NTILE(10) OVER (PARTITION BY _TABLE_SUFFIX ORDER BY b.prediction_prob DESC) AS p_p_decile,
+ FROM
+ `${project_id}.${purchase_propensity_dataset}.predictions_*` b
+ WHERE
+ ENDS_WITH(_TABLE_SUFFIX, '_view') )
+ GROUP BY
+ inference_date,
+ p_p_decile ) AS p_stat
+JOIN
+ `${project_id}.${activation_dataset}.vbb_activation_configuration` conf
+ON
+ p_stat.p_p_decile = decile
+WHERE
+ conf.activation_type = 'purchase-propensity'
\ No newline at end of file
diff --git a/templates/vbb_activation_configuration.jsonl b/templates/vbb_activation_configuration.jsonl
new file mode 100644
index 00000000..57b200e0
--- /dev/null
+++ b/templates/vbb_activation_configuration.jsonl
@@ -0,0 +1,3 @@
+{"activation_type":"purchase-propensity","value_norm":150,"decile_multiplier":[{"decile":1,"multiplier":5.5},{"decile":2,"multiplier":3},{"decile":3,"multiplier":2},{"decile":4,"multiplier":1},{"decile":5,"multiplier":0},{"decile":6,"multiplier":0},{"decile":7,"multiplier":0},{"decile":8,"multiplier":0},{"decile":9,"multiplier":0},{"decile":10,"multiplier":0}]}
+{"activation_type":"cltv","value_norm":500,"decile_multiplier":[{"decile":1,"multiplier":5.5},{"decile":2,"multiplier":3},{"decile":3,"multiplier":2},{"decile":4,"multiplier":1},{"decile":5,"multiplier":0},{"decile":6,"multiplier":0},{"decile":7,"multiplier":0},{"decile":8,"multiplier":0},{"decile":9,"multiplier":0},{"decile":10,"multiplier":0}]}
+{"activation_type":"lead-score-propensity","value_norm":150,"decile_multiplier":[{"decile":1,"multiplier":5.5},{"decile":2,"multiplier":3},{"decile":3,"multiplier":2},{"decile":4,"multiplier":1},{"decile":5,"multiplier":0},{"decile":6,"multiplier":0},{"decile":7,"multiplier":0},{"decile":8,"multiplier":0},{"decile":9,"multiplier":0},{"decile":10,"multiplier":0}]}
\ No newline at end of file