Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebase Multi property with main #247

Merged
merged 13 commits into from
Nov 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ Marketing Analytics Jumpstart consists of an easy, extensible and automated impl

## Developer pre-requisites
Use Visual Studio Code to develop the solution. Install Gemini Code Assistant, Docker, GitHub, Hashicorp, Terraform, Jinja extensions.
You should have Python 3, Poetry, Terraform, Git and Docker installed in your developer terminal environment.
You should have Python 3, uv, Terraform, Git and Docker installed in your developer terminal environment.

## Preparing development environment

Expand Down
11 changes: 9 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,15 @@
# Marketing Analytics Jumpstart
Marketing Analytics Jumpstart is a terraform automated, quick-to-deploy, customizable end-to-end marketing solution on Google Cloud Platform (GCP). This solution aims at helping customer better understand and better use their digital advertising budget.
Marketing Analytics Jumpstart (MAJ) is a terraform automated, quick-to-deploy, customizable end-to-end marketing solution on Google Cloud Platform (GCP). This solution aims at helping customer better understand and better use their digital advertising budget.

Customers are looking to drive revenue and increase media efficiency be identifying, predicting and targeting valuable users through the use of machine learning. However, marketers first have to solve the challenge of having a number of disparate data sources that prevent them from having a holistic view of customers. Marketers also often don't have the expertise and/or resources in their marketing departments to train, run, and activate ML models on paid channels. Without this solution that enables innovation through predictive analytics, marketers are missing opportunities to advance their marketing program and accelerate key goals and objectives (e.g. acquire new customers, improve customer retention, etc).

## Version Variants

| Version Name | Branch | Purpose |
| ------------ | ------ | ------- |
| Multi Stream Activation | [multi-stream-activation](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/tree/multi-stream-activation) | Activate to multiple Google Analytics 4 data streams (websites and application). |
| Multi Property | [multi-property](https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart/tree/multi-property) | Deployment of multiple MAJ resources per each Google Analytics 4 property in the same Google Cloud project. |

## Quick Installation ⏰

Want to quickly install and use it? Run this [installation notebook 📔](https://colab.sandbox.google.com/github/GoogleCloudPlatform/marketing-analytics-jumpstart/blob/main/notebooks/quick_installation.ipynb) on Google Colaboratory and leverage Marketing Analytics Jumpstart in under 30 minutes.
Expand Down Expand Up @@ -112,7 +119,7 @@ This high-level architecture demonstrates how Marketing Analytics Jumpstart inte
- [ ] [Backfill](https://cloud.google.com/bigquery/docs/google-ads-transfer) BigQuery Data Transfer service for Google Ads
- [ ] Have existing Google Analytics 4 Property with [Measurement ID](https://support.google.com/analytics/answer/12270356?hl=en)

**Note:** Google Ads Customer Matching currently only works with Google Analytics 4 **Properties** linked to Google Ads Accounts, it won't work for subproperties or Rollup properties.
**Note:** Google Ads Customer Matching currently only works with Google Analytics 4 **Property** and **Subproperty** linked to Google Ads Accounts, it won't work for Rollup properties.

## Installation Permissions and Privileges
- [ ] Google Analytics Property Editor or Owner
Expand Down
7 changes: 3 additions & 4 deletions docs/data_store.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,12 +107,11 @@ To deploy the Marketing Data Store, follow the pre-requisites and instructions i
Next, after creating the Terraform variables file by making a copy from the template, set the Terraform variables to create the environments you need for Dataform.

```bash
create_dev_environment = false
create_staging_environment = false
create_prod_environment = true
deploy_dataform = true
property_id = "PROPERTY_ID"
```

When the `create_dev_environment` variable is set to `true`, a development environment will be created. When the `create_staging_environment` variable is set to `true`, a staging environment will be created. When the `create_prod_environment` variable is set to `true`, a production environment will be created.
When the `deploy_dataform` variable is set to `true`, a dataform workspace will be created.

![Dataform Repository](images/data_store_dataform_github_repository.png)
After deploying the Marketing Data Store, the repository called `marketing_analytics` is created in Dataform.
Expand Down
6 changes: 2 additions & 4 deletions infrastructure/cloudshell/terraform-template.tfvars
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,7 @@
tf_state_project_id = "${MAJ_DEFAULT_PROJECT_ID}"
google_default_region = "${MAJ_DEFAULT_REGION}"

create_dev_environment = false
create_staging_environment = false
create_prod_environment = true

deploy_dataform = true
deploy_activation = true
deploy_feature_store = true
deploy_pipelines = true
Expand All @@ -30,6 +27,7 @@ deploy_monitoring = true

data_project_id = "${MAJ_MDS_PROJECT_ID}"
destination_data_location = "${MAJ_MDS_DATA_LOCATION}"
property_id = "${MAJ_GA4_PROPERTY_ID}"
data_processing_project_id = "${MAJ_MDS_DATAFORM_PROJECT_ID}"
source_ga4_export_project_id = "${MAJ_GA4_EXPORT_PROJECT_ID}"
source_ga4_export_dataset = "${MAJ_GA4_EXPORT_DATASET}"
Expand Down
40 changes: 6 additions & 34 deletions infrastructure/cloudshell/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,45 +12,17 @@ export PROJECT_ID="<walkthrough-project-id/>"
gcloud config set project $PROJECT_ID
```

## Install or update Python3
Install a compatible version of Python 3.8-3.10 and set the CLOUDSDK_PYTHON environment variable to point to it.
```sh
sudo apt-get install python3.10
CLOUDSDK_PYTHON=python3.10
```
## Install update uv for running python scripts
Install [uv](https://docs.astral.sh/uv/) that manages the python version and dependecies for the solution.

## Install Python's Poetry and set Poetry to use Python 3.10 version
[Poetry](https://python-poetry.org/docs/) is a Python's tool for dependency management and packaging.
If you are installing on in Cloud Shell use the following commands:
```sh
pipx install poetry
```
If you don't have pipx installed - follow the [Pipx installation guide](https://pipx.pypa.io/stable/installation/)
```sh
sudo apt update
sudo apt install pipx
pipx ensurepath
pipx install poetry
```
Verify that `poetry` is on your $PATH variable:
```sh
poetry --version
```
If it fails - add it to your $PATH variable:
```sh
curl -LsSf https://astral.sh/uv/install.sh | sh
export PATH="$HOME/.local/bin:$PATH"
```
Verify poetry is properly installed, run:
```sh
poetry --version
```
Set poetry to use your latest python3
```sh
poetry env use python3
```
Install python dependencies, run:

Check uv installation
```sh
poetry install
uv --version
```

## Authenticate with additional OAuth 2.0 scopes
Expand Down
52 changes: 9 additions & 43 deletions infrastructure/terraform/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,52 +43,18 @@ Also, this method allows you to extend this solution and develop it to satisfy y
gcloud config set project $PROJECT_ID
```

1. Install or update Python3
Install a compatible version of Python 3.8-3.10 and set the CLOUDSDK_PYTHON environment variable to point to it.
1. Install update uv for running python scripts
Install [uv](https://docs.astral.sh/uv/) that manages the python version and dependecies for the solution.

```bash
sudo apt-get install python3.10
CLOUDSDK_PYTHON=python3.10
```sh
curl -LsSf https://astral.sh/uv/install.sh | sh
export PATH="$HOME/.local/bin:$PATH"
```
If you are installing on a Mac:
```shell
brew install [email protected]
CLOUDSDK_PYTHON=python3.10
```

1. Install Python's Poetry and set Poetry to use Python 3.10 version

[Poetry](https://python-poetry.org/docs/) is a Python's tool for dependency management and packaging.

If you are installing on in Cloud Shell use the following commands:
```shell
pipx install poetry
```
If you don't have pipx installed - follow the [Pipx installation guide](https://pipx.pypa.io/stable/installation/)
```shell
sudo apt update
sudo apt install pipx
pipx ensurepath
pipx install poetry
```
Verify that `poetry` is on your $PATH variable:
```shell
poetry --version
```
If it fails - add it to your $PATH variable:
```shell
export PATH="$HOME/.local/bin:$PATH"
```
If you are installing on a Mac:
```shell
brew install poetry
```
Set poetry to use your latest python3
```shell
SOURCE_ROOT=${HOME}/${REPO}
cd ${SOURCE_ROOT}
poetry env use python3
```
Check uv installation:
```sh
uv --version
```

1. Authenticate with additional OAuth 2.0 scopes needed to use the Google Analytics Admin API:
```shell
Expand Down
70 changes: 17 additions & 53 deletions infrastructure/terraform/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -69,10 +69,10 @@ locals {
source_root_dir = "../.."
# The config_file_name is the name of the config file.
config_file_name = "config"
# The poetry_run_alias is the alias of the poetry command.
poetry_run_alias = "${var.poetry_cmd} run"
# The uv_run_alias is the alias of the uv run command.
uv_run_alias = "${var.uv_cmd} run"
# The mds_dataset_suffix is the suffix of the marketing data store dataset.
mds_dataset_suffix = var.create_staging_environment ? "staging" : var.create_dev_environment ? "dev" : "prod"
mds_dataset_suffix = var.property_id
# The project_toml_file_path is the path to the project.toml file.
project_toml_file_path = "${local.source_root_dir}/pyproject.toml"
# The project_toml_content_hash is the hash of the project.toml file.
Expand Down Expand Up @@ -127,39 +127,22 @@ resource "local_file" "feature_store_configuration" {
})
}

# Runs the poetry command to install the dependencies.
# The command is: poetry install
resource "null_resource" "poetry_install" {
triggers = {
create_command = "${var.poetry_cmd} lock && ${var.poetry_cmd} install"
source_contents_hash = local.project_toml_content_hash
}

# Only run the command when `terraform apply` executes and the resource doesn't exist.
provisioner "local-exec" {
when = create
command = self.triggers.create_command
working_dir = local.source_root_dir
}
}

data "external" "check_ga4_property_type" {
program = ["bash", "-c", "${local.poetry_run_alias} ga4-setup --ga4_resource=check_property_type --ga4_property_id=${var.ga4_property_id} --ga4_stream_id=${var.ga4_stream_id}"]
program = ["bash", "-c", "${local.uv_run_alias} ga4-setup --ga4_resource=check_property_type --ga4_property_id=${var.ga4_property_id} --ga4_stream_id=${var.ga4_stream_id}"]
working_dir = local.source_root_dir
depends_on = [null_resource.poetry_install]
}

# Runs the poetry invoke command to generate the sql queries and procedures.
# Runs the uv invoke command to generate the sql queries and procedures.
# This command is executed before the feature store is created.
resource "null_resource" "generate_sql_queries" {

triggers = {
# The create command generates the sql queries and procedures.
# The command is: poetry inv [function_name] --env-name=${local.config_file_name}
# The command is: uv inv [function_name] --env-name=${local.config_file_name}
# The --env-name argument is the name of the configuration file.
create_command = <<-EOT
${local.poetry_run_alias} inv apply-config-parameters-to-all-queries --env-name=${local.config_file_name}
${local.poetry_run_alias} inv apply-config-parameters-to-all-procedures --env-name=${local.config_file_name}
${local.uv_run_alias} inv apply-config-parameters-to-all-queries --env-name=${local.config_file_name}
${local.uv_run_alias} inv apply-config-parameters-to-all-procedures --env-name=${local.config_file_name}
EOT

# The destroy command removes the generated sql queries and procedures.
Expand All @@ -171,10 +154,6 @@ resource "null_resource" "generate_sql_queries" {
# The working directory is the root of the project.
working_dir = local.source_root_dir

# The poetry_installed trigger is the ID of the null_resource.poetry_install resource.
# This is used to ensure that the poetry command is run before the generate_sql_queries command.
poetry_installed = null_resource.poetry_install.id

# The source_contents_hash trigger is the hash of the project.toml file.
# This is used to ensure that the generate_sql_queries command is run only if the project.toml file has changed.
# It also ensures that the generate_sql_queries command is run only if the sql queries and procedures have changed.
Expand Down Expand Up @@ -305,8 +284,7 @@ resource "null_resource" "check_iam_api" {
# Create the data store module.
# The data store module creates the marketing data store in BigQuery, creates the ETL pipeline in Dataform
# for the marketing data from Google Ads and Google Analytics.
# The data store is created only if the `create_prod_environment`, `create_staging_environment`
# or `create_dev_environment` variable is set to true in the terraform.tfvars file.
# The data store is created only if the `deploy_dataform` variable is set to true in the terraform.tfvars file.
# The data store is created in the `data_project_id` project.
module "data_store" {
# The source directory of the data store module.
Expand Down Expand Up @@ -338,18 +316,10 @@ module "data_store" {
dataform_github_repo = var.dataform_github_repo
dataform_github_token = var.dataform_github_token

# The create_dev_environment is set in the terraform.tfvars file.
# The create_dev_environment determines if the dev environment is created.
# When the value is true, the dev environment is created.
# The create_staging_environment is set in the terraform.tfvars file.
# The create_staging_environment determines if the staging environment is created.
# When the value is true, the staging environment is created.
# The create_prod_environment is set in the terraform.tfvars file.
# The create_prod_environment determines if the prod environment is created.
# When the value is true, the prod environment is created.
create_dev_environment = var.create_dev_environment
create_staging_environment = var.create_staging_environment
create_prod_environment = var.create_prod_environment
# The create_dataform determines if dataform is created.
# When the value is true, the dataform environment is created.
deploy_dataform = var.deploy_dataform
property_id = var.property_id

# The dev_data_project_id is the project ID of where the dev datasets will created.
#If not provided, data_project_id will be used.
Expand Down Expand Up @@ -415,15 +385,12 @@ module "pipelines" {
# The source is the path to the pipelines module.
source = "./modules/pipelines"
config_file_path = local_file.feature_store_configuration.id != "" ? local_file.feature_store_configuration.filename : ""
poetry_run_alias = local.poetry_run_alias
uv_run_alias = local.uv_run_alias
# The count determines if the pipelines are created or not.
# If the count is 1, the pipelines are created.
# If the count is 0, the pipelines are not created.
# This is done to avoid creating the pipelines if the `deploy_pipelines` variable is set to false in the terraform.tfvars file.
count = var.deploy_pipelines ? 1 : 0
# The poetry_installed trigger is the ID of the null_resource.poetry_install resource.
# This is used to ensure that the poetry command is run before the pipelines module is created.
poetry_installed = null_resource.poetry_install.id
# The project_id is the project in which the data is stored.
# This is set to the data project ID in the terraform.tfvars file.
mds_project_id = var.data_project_id
Expand Down Expand Up @@ -454,9 +421,9 @@ module "activation" {
# The trigger function is used to trigger the activation function.
# The trigger function is created in the same region as the activation function.
trigger_function_location = var.google_default_region
# The poetry_cmd is the poetry_cmd variable.
# This can be set on the poetry_cmd in the terraform.tfvars file.
poetry_cmd = var.poetry_cmd
# The uv_run_alias is the uv_run_alias variable.
# This can be set on the uv_cmd in the terraform.tfvars file.
uv_run_alias = local.uv_run_alias
# The ga4_measurement_id is the ga4_measurement_id variable.
# This can be set on the ga4_measurement_id in the terraform.tfvars file.
ga4_measurement_id = var.ga4_measurement_id
Expand All @@ -479,9 +446,6 @@ module "activation" {
# This is done to avoid creating the activation function if the `deploy_activation` variable is set
# to false in the terraform.tfvars file.
count = var.deploy_activation ? 1 : 0
# The poetry_installed is the ID of the null_resource poetry_install
# This is used to ensure that the poetry command is run before the activation module is created.
poetry_installed = null_resource.poetry_install.id
mds_project_id = var.data_project_id
mds_dataset_suffix = local.mds_dataset_suffix

Expand Down
Loading
Loading