Skip to content

Commit

Permalink
Fix data leakage (#142)
Browse files Browse the repository at this point in the history
* predicting for only the users with traffic in the past 72h - purchase propensity

* running inference only for users events in the past 72h

* including 72h users for all models predictions

* considering null values in TabWorkflow models

* deleting unused pipfile

* upgrading lib versions

* implementing reporting preprocessing as a new pipeline

* adding more code documentation

* adding important information on the main README.md and DEVELOPMENT.md

* adding schedule run name and more code documentation

* implementing a new scheduler using the vertex ai sdk & adding user_id to procedures for consistency

* adding more code documentation

* adding code doc to the python custom component

* adding more code documentation

* fixing aggregated predictions query

* removing unnecessary resources from deployment

* Writing MDS guide

* adding the MDS developer and troubleshooting documentation

* fix one deployment issue and removing bad predictors features

* removing leakage feature device_os_version which is not a good predictor

---------

Co-authored-by: Carlos Timoteo <[email protected]>
  • Loading branch information
chmstimoteo and Carlos Timoteo authored Jun 6, 2024
1 parent 79ee516 commit b76ff34
Show file tree
Hide file tree
Showing 10 changed files with 7 additions and 22 deletions.
4 changes: 4 additions & 0 deletions config/config.yaml.tftpl
Original file line number Diff line number Diff line change
Expand Up @@ -662,6 +662,8 @@ vertex_ai:
#- feature_date
- user_pseudo_id
- user_id
- device_web_browser_version
- device_os_version
- will_purchase
pipeline_parameters_substitutions: null
prediction:
Expand Down Expand Up @@ -962,6 +964,8 @@ vertex_ai:
#- feature_date
- user_pseudo_id
- user_id
- device_web_browser_version
- device_os_version
- will_purchase
pipeline_parameters_substitutions: null

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions infrastructure/terraform/modules/activation/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -416,7 +416,7 @@ module "pipeline_bucket" {
condition = {
age = 365
with_state = "ANY"
matches_prefix = module.project_services.project_id
matches_prefix = var.project_id
}
}]

Expand Down Expand Up @@ -575,7 +575,7 @@ module "function_bucket" {
source = "terraform-google-modules/cloud-storage/google//modules/simple_bucket"
version = "~> 3.4.1"
project_id = null_resource.check_cloudfunctions_api.id != "" ? module.project_services.project_id : var.project_id
name = "activation-trigger-${module.project_services.project_id}"
name = "${local.app_prefix}-trigger-${module.project_services.project_id}"
location = var.location
# When deleting a bucket, this boolean option will delete all contained objects.
# If false, Terraform will fail to delete buckets which contain objects.
Expand Down
1 change: 1 addition & 0 deletions infrastructure/terraform/modules/data-store/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ data "google_project" "data_processing" {

data "google_secret_manager_secret" "github_secret_name" {
secret_id = google_secret_manager_secret.github-secret.name
project = var.data_processing_project_id
}

provider "google" {
Expand Down
10 changes: 0 additions & 10 deletions python/pipelines/transformations-purchase-propensity-cltv.json
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,6 @@
"column_name": "device_os"
}
},
{
"categorical": {
"column_name": "device_os_version"
}
},
{
"categorical": {
"column_name": "device_language"
Expand All @@ -40,11 +35,6 @@
"column_name": "device_web_browser"
}
},
{
"categorical": {
"column_name": "device_web_browser_version"
}
},
{
"categorical": {
"column_name": "geo_sub_continent"
Expand Down
10 changes: 0 additions & 10 deletions python/pipelines/transformations-purchase-propensity.json
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,6 @@
"column_name": "device_os"
}
},
{
"categorical": {
"column_name": "device_os_version"
}
},
{
"categorical": {
"column_name": "device_language"
Expand All @@ -40,11 +35,6 @@
"column_name": "device_web_browser"
}
},
{
"categorical": {
"column_name": "device_web_browser_version"
}
},
{
"categorical": {
"column_name": "geo_sub_continent"
Expand Down

0 comments on commit b76ff34

Please sign in to comment.