"
+ ],
+ "text/markdown": "## Propensity Model Event Selection:\n\nThis analysis aims to recommend events from your GA4 data in BigQuery for building a propensity model to predict **purchase** likelihood. \n\n### Understanding the Results:\n\n* **Target Variable:** `purchase_past_7_days` signifies whether a user made a purchase in the past 7 days. Values range from 0 to 27, indicating multiple purchases are possible.\n* **Features:** Each other variable represents a user action (event) aggregated over a 7-day window. \n* **Correlation:** Measures the linear relationship between an event and the target. Values closer to 1 or -1 indicate stronger relationships.\n\n### Event Selection Rationale:\n\nWe'll prioritize events with the strongest positive correlation to `purchase_past_7_days`, as they are the most indicative of purchase intent. However, highly correlated features might carry redundant information, so we'll aim for a balance of predictability and model simplicity.\n\n1. **High Correlation Events:** \n * `add_payment_info`: **Strongest correlation (0.955)**, logically connected to purchase completion.\n * `add_shipping_info`: **Very high correlation (0.914),** another strong indicator of purchase intent.\n * **Note:** The extremely high correlation between these two might indicate they capture very similar user behavior. Consider whether one sufficiently represents the step before purchase or if both are necessary. \n\n2. **Moderate Correlation Events:**\n * `view_cart`: **Good correlation (0.628).** Viewing a cart suggests product consideration.\n * `add_to_cart`: **Solid correlation (0.541).** Adding to cart is a stronger purchase intent signal than just viewing.\n\n3. **Events for Potential Further Analysis (with adjustments):**\n * `scroll`: While the correlation is low (0.138), engagement metrics like scroll depth *might* become more predictive if we modify the aggregation window (e.g., 1-day instead of 7-day) or segment users differently. \n\n4. **Events to Exclude:**\n * `select_promotion`, `click`, `add_to_wishlist` show very weak or slightly negative correlations. They're unlikely to improve model accuracy.\n\n### Suggested Events:\n\n* **Target Event:** `purchase` \n* **Analyzed Events:** `view_cart`, `select_promotion`, `scroll`, `click`, `add_to_wishlist`, `add_to_cart`, `add_shipping_info`, `add_payment_info`\n* **Suggested Events:** `add_payment_info`, `add_shipping_info`, `add_to_cart`, `view_cart`\n\n### Copy the list below and paste into the `terraform.tfvars` file when installing the Marketing Analytics Jumpstart:\n\n```\n[\"add_payment_info\", \"add_shipping_info\", \"add_to_cart\" , \"view_cart\"]\n``` \n"
+ },
+ "metadata": {}
+ }
+ ]
+ }
+ ]
+}
\ No newline at end of file
diff --git a/notebooks/quick_installation.ipynb b/notebooks/quick_installation.ipynb
new file mode 100644
index 00000000..56f14d1b
--- /dev/null
+++ b/notebooks/quick_installation.ipynb
@@ -0,0 +1,563 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 0,
+ "metadata": {
+ "colab": {
+ "provenance": []
+ },
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3"
+ },
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# Marketing Analytics Jumpstart Quick Installation\n",
+ "\n",
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ "  Run in Colab\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ "  Run in Colab Enterprise\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ "  View on GitHub\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ "  Open in Vertex AI Workbench\n",
+ " \n",
+ " | \n",
+ "
"
+ ],
+ "metadata": {
+ "id": "AKtB_GVpt2QJ"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Follow this Colab notebook to quick install the Marketing Analytics Jumpstart solution on a Google Cloud Project.\n",
+ "\n",
+ "> **Note:** You need access to the Google Analytics 4 Property, Google Ads Account and a Google Cloud project in which you will deploy Marketing Analytics Jumpstart, with the following permissions:\n",
+ ">> * Google Analytics Property Editor or Owner\n",
+ ">>\n",
+ ">> * Google Ads Reader\n",
+ ">>\n",
+ ">> * Project Owner for a Google Cloud Project\n",
+ ">>\n",
+ ">> * GitHub or GitLab account priviledges for repo creation and access token. [Details](https://cloud.google.com/dataform/docs/connect-repository)\n",
+ "\n",
+ "\n",
+ "\n",
+ "Total Installation time is around **35-40 minutes**."
+ ],
+ "metadata": {
+ "id": "mj-8n9jIyTn-"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### 1. Authenticate to Google Cloud Platform\n",
+ "\n",
+ "Click the ( βΆ ) button to authenticate you to the Google Cloud Project.\n",
+ "\n",
+ "***Time: 30 seconds.***"
+ ],
+ "metadata": {
+ "id": "DDGHqJNhq5Oi"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @title\n",
+ "from google.colab import auth\n",
+ "auth.authenticate_user()\n",
+ "\n",
+ "print('Authenticated')"
+ ],
+ "metadata": {
+ "id": "9TyPgnleJGGZ",
+ "cellView": "form",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "921e7c52-1913-402b-a880-8760861d0358"
+ },
+ "execution_count": 1,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Authenticated\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### 2. Installation Configurations\n",
+ "\n",
+ "Fill-out the form, and Click the ( βΆ ) button.\n",
+ "\n",
+ "***Time: 10 minutes.***"
+ ],
+ "metadata": {
+ "id": "mq1yqwr8qcx1"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @markdown ---\n",
+ "# @markdown # Google Cloud Platform\n",
+ "# @markdown Copy the `Project ID` from the \"Project Info\" card in the console [Dashboard](https://console.cloud.google.com/home/dashboard).\n",
+ "GOOGLE_CLOUD_PROJECT_ID = \"your-project-id\" #@param {type:\"string\"}\n",
+ "GOOGLE_CLOUD_QUOTA_PROJECT = GOOGLE_CLOUD_PROJECT_ID\n",
+ "PROJECT_ID = GOOGLE_CLOUD_PROJECT_ID\n",
+ "MAJ_DEFAULT_PROJECT_ID = GOOGLE_CLOUD_PROJECT_ID\n",
+ "# @markdown ---\n",
+ "# @markdown # Google Analytics 4\n",
+ "# @markdown For a quick installation, copy the Google Analytics 4 property ID and stream ID. You will find it in your Google Analytics 4 console, under Admin settings.\n",
+ "GA4_PROPERTY_ID = \"1234567890\" #@param {type:\"string\"}\n",
+ "MAJ_GA4_PROPERTY_ID = GA4_PROPERTY_ID\n",
+ "GA4_STREAM_ID = \"1234567890\" #@param {type:\"string\"}\n",
+ "MAJ_GA4_STREAM_ID = GA4_STREAM_ID\n",
+ "# @markdown The website your Google Analytics 4 events are coming from.\n",
+ "WEBSITE_URL = \"https://shop.googlemerchandisestore.com\" #@param {type:\"string\", placeholder:\"Full web URL\"}\n",
+ "MAJ_WEBSITE_URL = WEBSITE_URL\n",
+ "# @markdown ---\n",
+ "# @markdown # Google Ads\n",
+ "# @markdown For a quick installation, copy the Google Ads Customer ID. You will find it in your Google Ads console. It must be in the following format: `\"CUSTOMERID\"` (without dashes).\n",
+ "GOOGLE_ADS_CUSTOMER_ID= \"1234567890\" #@param {type:\"string\", placeholder:\"GAds Account Number (e.g. 4717384083)\"}\n",
+ "MAJ_ADS_EXPORT_TABLE_SUFFIX = \"_\"+GOOGLE_ADS_CUSTOMER_ID\n",
+ "# @markdown ---\n",
+ "# @markdown # Github\n",
+ "# @markdown For a quick installation, use your email credentials that allows you to create a dataform repository connected to a remote Github repository, more info [here](https://cloud.google.com/dataform/docs/connect-repository).\n",
+ "GITHUB_REPO_OWNER_EMAIL = \"user@company.com\" #@param {type:\"string\", placeholder:\"user@company.com\"}\n",
+ "MAJ_DATAFORM_REPO_OWNER_EMAIL = GITHUB_REPO_OWNER_EMAIL\n",
+ "MAJ_DATAFORM_GITHUB_REPO_URL = \"https://github.com/GoogleCloudPlatform/marketing-analytics-jumpstart-dataform.git\"\n",
+ "# @markdown For a quick installation, reuse or create your [GitHub personal access token](https://cloud.google.com/dataform/docs/connect-repository#connect-https)\n",
+ "GITHUB_PERSONAL_TOKEN = \"your_github_personal_access_token\" #@param {type:\"string\"}\n",
+ "MAJ_DATAFORM_GITHUB_TOKEN = GITHUB_PERSONAL_TOKEN\n",
+ "# @markdown ---\n",
+ "\n",
+ "import os\n",
+ "os.environ['GOOGLE_CLOUD_PROJECT_ID'] = GOOGLE_CLOUD_PROJECT_ID\n",
+ "os.environ['GOOGLE_CLOUD_QUOTA_PROJECT'] = GOOGLE_CLOUD_QUOTA_PROJECT\n",
+ "os.environ['PROJECT_ID'] = PROJECT_ID\n",
+ "os.environ['MAJ_DEFAULT_PROJECT_ID'] = MAJ_DEFAULT_PROJECT_ID\n",
+ "!export SOURCE_ROOT=$(pwd)\n",
+ "!export TERRAFORM_RUN_DIR={SOURCE_ROOT}/infrastructure/terraform\n",
+ "REPO=\"marketing-analytics-jumpstart\"\n",
+ "!if [ ! -d \"/content/{REPO}\" ]; then git clone https://github.com/GoogleCloudPlatform/{REPO}.git ; fi\n",
+ "SOURCE_ROOT=\"/content/\"+REPO\n",
+ "%cd {SOURCE_ROOT}\n",
+ "!echo \"Enabling APIs\"\n",
+ "!gcloud config set project {GOOGLE_CLOUD_PROJECT_ID}\n",
+ "!. ~/.bashrc\n",
+ "!gcloud projects add-iam-policy-binding {GOOGLE_CLOUD_PROJECT_ID} --member user:{MAJ_DATAFORM_REPO_OWNER_EMAIL} --role=roles/bigquery.admin\n",
+ "!source ./scripts/common.sh && enable_all_apis > /dev/null\n",
+ "!echo \"APIs enabled\"\n",
+ "\n",
+ "from google.cloud import bigquery\n",
+ "# Construct a BigQuery client object.\n",
+ "client = bigquery.Client(project=GOOGLE_CLOUD_PROJECT_ID)\n",
+ "# Replace with your desired dataset ID suffix\n",
+ "dataset_id_suffix = MAJ_GA4_PROPERTY_ID\n",
+ "location = ''\n",
+ "dataset_id = ''\n",
+ "# Iterate through datasets and find the one with the matching suffix\n",
+ "for dataset in client.list_datasets():\n",
+ " dataset_id = dataset.dataset_id\n",
+ " if dataset_id.endswith(dataset_id_suffix):\n",
+ " dataset_ref = client.get_dataset(dataset.reference)\n",
+ " location = dataset_ref.location\n",
+ " print(f\"GA4 Dataset ID: {dataset_id}, Location: {location}\")\n",
+ " break\n",
+ "else:\n",
+ " print(f\"No dataset found with ID suffix: {dataset_id_suffix}\")\n",
+ "MAJ_MDS_DATA_LOCATION = location\n",
+ "MAJ_GA4_EXPORT_PROJECT_ID = GOOGLE_CLOUD_PROJECT_ID\n",
+ "MAJ_GA4_EXPORT_DATASET = dataset_id\n",
+ "\n",
+ "if MAJ_MDS_DATA_LOCATION == 'US':\n",
+ " MAJ_DEFAULT_REGION = 'us-central1'\n",
+ "elif MAJ_MDS_DATA_LOCATION == 'EU':\n",
+ " MAJ_DEFAULT_REGION = 'europe-west1'\n",
+ "else:\n",
+ " MAJ_DEFAULT_REGION = MAJ_MDS_DATA_LOCATION\n",
+ "MAJ_MDS_PROJECT_ID=MAJ_DEFAULT_PROJECT_ID\n",
+ "MAJ_MDS_DATAFORM_PROJECT_ID=MAJ_DEFAULT_PROJECT_ID\n",
+ "MAJ_FEATURE_STORE_PROJECT_ID=MAJ_DEFAULT_PROJECT_ID\n",
+ "MAJ_ACTIVATION_PROJECT_ID=MAJ_DEFAULT_PROJECT_ID\n",
+ "MAJ_ADS_EXPORT_PROJECT_ID = GOOGLE_CLOUD_PROJECT_ID\n",
+ "project_id=MAJ_ADS_EXPORT_PROJECT_ID\n",
+ "location = MAJ_MDS_DATA_LOCATION\n",
+ "table_suffix = MAJ_ADS_EXPORT_TABLE_SUFFIX\n",
+ "# Query to find datasets that contain tables with the specified suffix.\n",
+ "query = f\"\"\"\n",
+ " SELECT table_schema as dataset_id\n",
+ " FROM `{project_id}.region-{location}.INFORMATION_SCHEMA.TABLES`\n",
+ " WHERE table_name LIKE '%{table_suffix}'\n",
+ " GROUP BY table_schema\n",
+ "\"\"\"\n",
+ "# Run the query and fetch the results.\n",
+ "query_job = client.query(query)\n",
+ "results = query_job.result()\n",
+ "# Print the dataset IDs that match the criteria.\n",
+ "ads_dataset_id = ''\n",
+ "for row in results:\n",
+ " ads_dataset_id = row.dataset_id\n",
+ " print(f\"GAds dataset: {row.dataset_id}, Location: {location}\")\n",
+ "MAJ_ADS_EXPORT_DATASET = ads_dataset_id\n",
+ "\n",
+ "os.environ['MAJ_DEFAULT_REGION'] = MAJ_DEFAULT_REGION\n",
+ "os.environ['MAJ_MDS_PROJECT_ID'] = MAJ_MDS_PROJECT_ID\n",
+ "os.environ['MAJ_MDS_DATAFORM_PROJECT_ID'] = MAJ_MDS_DATAFORM_PROJECT_ID\n",
+ "os.environ['MAJ_FEATURE_STORE_PROJECT_ID'] = MAJ_FEATURE_STORE_PROJECT_ID\n",
+ "os.environ['MAJ_ACTIVATION_PROJECT_ID'] = MAJ_ACTIVATION_PROJECT_ID\n",
+ "os.environ['MAJ_MDS_DATA_LOCATION'] = MAJ_MDS_DATA_LOCATION\n",
+ "os.environ['MAJ_GA4_EXPORT_PROJECT_ID'] = MAJ_GA4_EXPORT_PROJECT_ID\n",
+ "os.environ['MAJ_GA4_EXPORT_DATASET'] = MAJ_GA4_EXPORT_DATASET\n",
+ "os.environ['MAJ_ADS_EXPORT_PROJECT_ID'] = MAJ_ADS_EXPORT_PROJECT_ID\n",
+ "os.environ['MAJ_ADS_EXPORT_DATASET'] = MAJ_ADS_EXPORT_DATASET\n",
+ "os.environ['MAJ_ADS_EXPORT_TABLE_SUFFIX'] = MAJ_ADS_EXPORT_TABLE_SUFFIX\n",
+ "os.environ['MAJ_WEBSITE_URL'] = MAJ_WEBSITE_URL\n",
+ "os.environ['MAJ_GA4_PROPERTY_ID'] = MAJ_GA4_PROPERTY_ID\n",
+ "os.environ['MAJ_GA4_STREAM_ID'] = MAJ_GA4_STREAM_ID\n",
+ "os.environ['MAJ_DATAFORM_REPO_OWNER_EMAIL'] = MAJ_DATAFORM_REPO_OWNER_EMAIL\n",
+ "os.environ['MAJ_DATAFORM_GITHUB_REPO_URL'] = MAJ_DATAFORM_GITHUB_REPO_URL\n",
+ "os.environ['MAJ_DATAFORM_GITHUB_TOKEN'] = MAJ_DATAFORM_GITHUB_TOKEN\n",
+ "\n",
+ "!sudo apt-get -qq -o=Dpkg::Use-Pty=0 install gettext\n",
+ "!envsubst < \"{SOURCE_ROOT}/infrastructure/cloudshell/terraform-template.tfvars\" > \"{SOURCE_ROOT}/infrastructure/terraform/terraform.tfvars\"\n",
+ "\n",
+ "!gcloud config set disable_prompts true\n",
+ "!gcloud config set project {PROJECT_ID}\n",
+ "\n",
+ "from IPython.display import clear_output\n",
+ "clear_output(wait=True)\n",
+ "print(\"SUCCESS\")"
+ ],
+ "metadata": {
+ "id": "dMcepKg8IQWj",
+ "cellView": "form",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "77ae4ebb-ffd2-462e-eb6f-d7235e62a4e0"
+ },
+ "execution_count": 2,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "SUCCESS\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### 3. Authenticate using application default credentials Google Cloud Platform\n",
+ "\n",
+ "Click the ( βΆ ) button to create your Terraform application default credentials to the Google Cloud Project.\n",
+ "\n",
+ "*To complete this step, you will be prompted to copy/paste a password from another window into the prompt below.*\n",
+ "\n",
+ "**Note:** *Click on the hidden input box after the colon, as shown below.*\n",
+ "\n",
+ "\n",
+ "\n",
+ "***Time: 2 minute.***"
+ ],
+ "metadata": {
+ "id": "mOISt4ShqIbc"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @title\n",
+ "!gcloud config set disable_prompts false\n",
+ "!gcloud auth application-default login --quiet --scopes=\"openid,https://www.googleapis.com/auth/userinfo.email,https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/sqlservice.login,https://www.googleapis.com/auth/analytics,https://www.googleapis.com/auth/analytics.edit,https://www.googleapis.com/auth/analytics.provision,https://www.googleapis.com/auth/analytics.readonly,https://www.googleapis.com/auth/accounts.reauth\"\n",
+ "!gcloud auth application-default set-quota-project {PROJECT_ID}\n",
+ "!export GOOGLE_APPLICATION_CREDENTIALS=/content/.config/application_default_credentials.json\n",
+ "\n",
+ "clear_output(wait=True)\n",
+ "print(\"SUCCESS\")"
+ ],
+ "metadata": {
+ "id": "3cAwp6CRLSVf",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "052b9063-ac72-4eb5-ba91-98662b5dbd0c",
+ "cellView": "form"
+ },
+ "execution_count": 3,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "SUCCESS\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### 4. Prepare environment for Installation\n",
+ "\n",
+ "Click the ( βΆ ) button to prepare the environment for an end-to-end installation.\n",
+ "\n",
+ "***Time: 5 minutes.***"
+ ],
+ "metadata": {
+ "id": "WYG5sjFEqX2X"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @title\n",
+ "%%capture\n",
+ "%%bash\n",
+ "# prompt: install packages\n",
+ "apt-get install python3.11\n",
+ "CLOUDSDK_PYTHON=python3.11\n",
+ "\n",
+ "#prompt: install uv\n",
+ "curl -LsSf https://astral.sh/uv/install.sh | sh\n",
+ "\n",
+ "export PATH=\"/root/.local/bin:$PATH\"\n",
+ "uv --version\n",
+ "\n",
+ "git clone --depth=1 https://github.com/tfutils/tfenv.git ~/.tfenv\n",
+ "echo 'export PATH=\"~/.tfenv/bin:$PATH\"' >> ~/.bash_profile\n",
+ "echo 'export PATH=$PATH:~/.tfenv/bin' >> ~/.bashrc\n",
+ "export PATH=\"$PATH:~/.tfenv/bin\"\n",
+ "\n",
+ "mkdir -p ~/.local/bin/\n",
+ ". ~/.profile\n",
+ "ln -s ~/.tfenv/bin/* ~/.local/bin\n",
+ "which tfenv\n",
+ "tfenv --version\n",
+ "\n",
+ "tfenv install 1.9.7\n",
+ "tfenv use 1.9.7\n",
+ "terraform --version\n",
+ "\n",
+ "export PATH=\"$PATH:~/.tfenv/bin\"\n",
+ "export PROJECT_ID=$(gcloud config get project --format=json | tr -d '\"')\n",
+ "source ./scripts/generate-tf-backend.sh"
+ ],
+ "metadata": {
+ "id": "hmdklTTuQ_9d",
+ "cellView": "form"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### 5. Run Installation\n",
+ "\n",
+ "Click the ( βΆ ) button to run the installation end-to-end.\n",
+ "After clicking the button, expand this section to observe that all cells have successfully executed without issues.\n",
+ "\n",
+ "***Time: 25-30 minutes.***"
+ ],
+ "metadata": {
+ "id": "US36yJ8lmqnP"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @title\n",
+ "%%capture\n",
+ "%%bash\n",
+ "export PATH=\"$PATH:~/.tfenv/bin\"\n",
+ "export GOOGLE_APPLICATION_CREDENTIALS=/content/.config/application_default_credentials.json\n",
+ "TERRAFORM_RUN_DIR=$(pwd)/infrastructure/terraform\n",
+ "terraform -chdir=\"${TERRAFORM_RUN_DIR}\" init"
+ ],
+ "metadata": {
+ "id": "5UIbC_z9bgy4",
+ "cellView": "form"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @title\n",
+ "%%capture\n",
+ "%%bash\n",
+ "export PATH=\"$PATH:~/.tfenv/bin\"\n",
+ "export PATH=\"/root/.local/bin:$PATH\"\n",
+ "export PATH=\"$PATH:$(which gcloud)\"\n",
+ "export GOOGLE_APPLICATION_CREDENTIALS=/content/.config/application_default_credentials.json\n",
+ "TERRAFORM_RUN_DIR=$(pwd)/infrastructure/terraform\n",
+ "terraform -chdir=\"${TERRAFORM_RUN_DIR}\" apply -target=module.data_store -auto-approve"
+ ],
+ "metadata": {
+ "id": "BGteib5ebsA-",
+ "cellView": "form"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @title\n",
+ "#%%capture\n",
+ "%%bash\n",
+ "export PATH=\"$PATH:~/.tfenv/bin\"\n",
+ "export PATH=\"/root/.local/bin:$PATH\"\n",
+ "export PATH=\"$PATH:$(which gcloud)\"\n",
+ "export GOOGLE_APPLICATION_CREDENTIALS=/content/.config/application_default_credentials.json\n",
+ "TERRAFORM_RUN_DIR=$(pwd)/infrastructure/terraform\n",
+ "terraform -chdir=\"${TERRAFORM_RUN_DIR}\" apply -target=module.feature_store -auto-approve"
+ ],
+ "metadata": {
+ "cellView": "form",
+ "id": "dwD5DRRM2Ryl"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @title\n",
+ "%%capture\n",
+ "%%bash\n",
+ "export PATH=\"$PATH:~/.tfenv/bin\"\n",
+ "export PATH=\"/root/.local/bin:$PATH\"\n",
+ "export PATH=\"$PATH:$(which gcloud)\"\n",
+ "export GOOGLE_APPLICATION_CREDENTIALS=/content/.config/application_default_credentials.json\n",
+ "TERRAFORM_RUN_DIR=$(pwd)/infrastructure/terraform\n",
+ "terraform -chdir=\"${TERRAFORM_RUN_DIR}\" apply -target=module.pipelines -auto-approve"
+ ],
+ "metadata": {
+ "cellView": "form",
+ "id": "KrEr1yXS1_oA"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @title\n",
+ "%%capture\n",
+ "%%bash\n",
+ "export PATH=\"$PATH:~/.tfenv/bin\"\n",
+ "export PATH=\"/root/.local/bin:$PATH\"\n",
+ "export PATH=\"$PATH:$(which gcloud)\"\n",
+ "export GOOGLE_APPLICATION_CREDENTIALS=/content/.config/application_default_credentials.json\n",
+ "TERRAFORM_RUN_DIR=$(pwd)/infrastructure/terraform\n",
+ "terraform -chdir=\"${TERRAFORM_RUN_DIR}\" apply -target=module.activation -auto-approve"
+ ],
+ "metadata": {
+ "collapsed": true,
+ "cellView": "form",
+ "id": "7-Qr46vR2bLl"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @title\n",
+ "%%capture\n",
+ "%%bash\n",
+ "export PATH=\"$PATH:~/.tfenv/bin\"\n",
+ "export PATH=\"/root/.local/bin:$PATH\"\n",
+ "export PATH=\"$PATH:$(which gcloud)\"\n",
+ "export GOOGLE_APPLICATION_CREDENTIALS=/content/.config/application_default_credentials.json\n",
+ "TERRAFORM_RUN_DIR=$(pwd)/infrastructure/terraform\n",
+ "terraform -chdir=\"${TERRAFORM_RUN_DIR}\" apply -target=module.monitoring -auto-approve"
+ ],
+ "metadata": {
+ "collapsed": true,
+ "cellView": "form",
+ "id": "ElOBpEV3Mtbc"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @title\n",
+ "%%capture\n",
+ "%%bash\n",
+ "export PATH=\"$PATH:~/.tfenv/bin\"\n",
+ "export PATH=\"/root/.local/bin:$PATH\"\n",
+ "export PATH=\"$PATH:$(which gcloud)\"\n",
+ "export GOOGLE_APPLICATION_CREDENTIALS=/content/.config/application_default_credentials.json\n",
+ "TERRAFORM_RUN_DIR=$(pwd)/infrastructure/terraform\n",
+ "terraform -chdir=\"${TERRAFORM_RUN_DIR}\" apply -auto-approve"
+ ],
+ "metadata": {
+ "cellView": "form",
+ "id": "eyZNdewu2zQI"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# @title\n",
+ "print(\"SUCCESS!\")"
+ ],
+ "metadata": {
+ "id": "1h7k6jFYpLPO",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "266267e5-ac12-4621-ec7f-19e051027edb"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "SUCCESS!\n"
+ ]
+ }
+ ]
+ }
+ ]
+}
\ No newline at end of file
diff --git a/pyproject.toml b/pyproject.toml
index 3e0abe24..01eaf8aa 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -12,6 +12,15 @@
# See the License for the specific language governing permissions and
# limitations under the License.
+[project]
+name = "marketing-analytics-jumpstart"
+version = "1.0.0"
+description = "Marketing Analytics Jumpstart"
+authors = [{name = "Marketing Analytics Solutions Architects", email = "ma-se@google.com"}]
+license = "Apache 2.0"
+readme = "README.md"
+requires-python = ">=3.9,<3.12.0"
+
[tool.poetry]
name = "marketing-analytics-jumpstart"
version = "1.0.0"
@@ -22,49 +31,55 @@ readme = "README.md"
packages = [{include = "python"}]
[tool.poetry.dependencies]
-python = ">=3.8,<3.11"
-google-cloud-aiplatform = "1.52.0"
+python = ">=3.9,<3.12.0"
+#google-cloud-aiplatform = "1.52.0"
+google-cloud-aiplatform = "1.77.0"
shapely = "<2.0.0"
google-cloud = "^0.34.0"
jinja2 = ">=3.0.1,<4.0.0"
-pip = "23.3"
+pip = "23.3.2"
invoke = "2.2.0"
## pyinvoke = "1.0.4"
pre-commit = ">=2.14.1,<3.0.0"
-pandas = "1.3.5"
-google-cloud-bigquery = "2.30.0"
+pandas = "1.5.3"
+google-cloud-bigquery = "3.21.0"
+google-cloud-bigquery-connection = "1.17.0"
#google-cloud-pipeline-components = "1.0.33"
google-cloud-pipeline-components = "2.6.0"
google-auth = "^2.14.1"
google-cloud-storage = "^2.6.0"
+kfp = "2.4.0"
## Fixing this error: https://stackoverflow.com/questions/76175487/sudden-importerror-cannot-import-name-appengine-from-requests-packages-urlli
-kfp = "2.0.0-rc.2"
+#kfp = "2.0.0-rc.2"
#kfp = {version = "2.0.0-b12", allow-prereleases = true}
#kfp = {version = "2.0.0-b16", allow-prereleases = true}
-kfp-server-api = "2.0.0-rc.1"
+kfp-server-api = "2.0.5"
+#kfp-server-api = "2.0.0-rc.1"
#kfp-server-api = "2.0.0.a6"
#kfp-server-api = "2.0.0b1"
-urllib3 = "1.26.18"
+urllib3 = "1.26.20"
toml = "0.10.2"
docker = "^6.0.1"
-db-dtypes = "1.2.0"
-optuna = "3.2.0"
-scikit-learn = "1.2.2"
+db-dtypes = "1.3.1"
+optuna = "3.6.1"
+scikit-learn = "1.5.0"
#plotly = "5.16.0"
#matplotlib= "3.7.2"
#seaborn = "0.12.2"
ma-components = {path = "python/base_component_image/", develop = true}
-google-cloud-pubsub = "2.15.0"
+google-cloud-pubsub = "2.27.2"
#google-analytics-admin = "0.17.0"
-google-analytics-admin = "0.22.7"
+google-analytics-admin = "0.23.3"
google-analytics-data = "^0.18.0"
pyarrow = "15.0.2"
google-auth-oauthlib = "^1.2.1"
oauth2client = "^4.1.3"
google-cloud-core = "^2.4.1"
+sympy="1.13.3"
+google-cloud-resource-manager="1.14.0"
[tool.poetry.group.component_vertex.dependencies]
-google-cloud-aiplatform = "1.52.0"
+google-cloud-aiplatform = "1.77.0"
shapely = "<2.0.0"
toml = "0.10.2"
@@ -72,16 +87,16 @@ toml = "0.10.2"
ga4-setup = "python.ga4_setup.setup:entry"
[tool.poetry.group.test.dependencies]
-pytest = "7.0.0"
-pytest-env = "0.6.2"
-pytest-mock = "3.7.0"
+pytest = "7.4.4"
+pytest-env = "0.8.2"
+pytest-mock = "3.14.0"
pytest-variables = {extras = ["yaml"], version = "^2.0.0"}
coverage = {extras = ["toml"], version = "^6.5.0"}
pytest-cov = "^4.0.0"
pytest-xdist = "^3.0.2"
[tool.poetry.group.dev.dependencies]
-pip = "23.3"
+pip = "23.3.2"
invoke = "2.2.0"
pre-commit = ">=2.14.1,<3.0.0"
black = "22.12.0"
@@ -118,4 +133,7 @@ parallel = true
[tool.coverage.report]
fail_under = 70
show_missing = true
-skip_empty= true
\ No newline at end of file
+skip_empty= true
+
+[tool.uv.workspace]
+members = ["python/lookerstudio"]
diff --git a/python/activation/main.py b/python/activation/main.py
index 6bf3ff15..69c33785 100644
--- a/python/activation/main.py
+++ b/python/activation/main.py
@@ -62,6 +62,7 @@ def _add_argparse_args(cls, parser):
- purchase-propensity-15-15
- purchase-propensity-15-7
- churn-propensity-30-15
+ - lead-score-propensity-5-1
activation_type_configuration: The GCS path to the configuration file for all activation types.
"""
@@ -110,6 +111,7 @@ def _add_argparse_args(cls, parser):
purchase-propensity-15-15
purchase-propensity-15-7
churn-propensity-30-15
+ lead-score-propensity-5-1
''',
required=True
)
@@ -330,7 +332,6 @@ class TransformToPayload(beam.DoFn):
The DoFn takes the following arguments:
- - template_str: The Jinja2 template string used to generate the Measurement Protocol payload.
- event_name: The name of the event to be sent to Google Analytics 4.
The DoFn yields the following output:
@@ -338,33 +339,28 @@ class TransformToPayload(beam.DoFn):
- A dictionary containing the Measurement Protocol payload.
The DoFn performs the following steps:
-
1. Removes bad shaping strings in the `client_id` field.
- 2. Renders the Jinja2 template string using the provided data and event name.
- 3. Converts the rendered template string into a JSON object.
+ 2. Converts the rendered template string into a JSON object.
4. Handles any JSON decoding errors.
The DoFn is used to ensure that the Measurement Protocol payload is formatted correctly before being sent to Google Analytics 4.
"""
- def __init__(self, template_str, event_name):
+ def __init__(self, event_name):
"""
Initializes the DoFn.
Args:
- template_str: The Jinja2 template string used to generate the Measurement Protocol payload.
event_name: The name of the event to be sent to Google Analytics 4.
"""
- self.template_str = template_str
self.date_format = "%Y-%m-%d"
self.date_time_format = "%Y-%m-%d %H:%M:%S.%f %Z"
self.event_name = event_name
-
-
- def setup(self):
- """
- Sets up the Jinja2 environment.
- """
- self.payload_template = Environment(loader=BaseLoader).from_string(self.template_str)
+ self.consent_obj = {
+ 'ad_user_data':'GRANTED',
+ 'ad_personalization':'GRANTED'
+ }
+ self.user_property_prefix = 'user_prop_'
+ self.event_parameter_prefix = 'event_param_'
def process(self, element):
@@ -384,21 +380,17 @@ def process(self, element):
_client_id = element['client_id'].replace(r'
', '')
_client_id = element['client_id'].replace(r'q=">', '')
-
- payload_str = self.payload_template.render(
- client_id=_client_id,
- user_id=self.generate_user_id_key_value_pair(element),
- event_timestamp=self.date_to_micro(element["inference_date"]),
- event_name=self.event_name,
- session_id=element['session_id'],
- user_properties=self.generate_user_properties(element),
- )
+
result = {}
- try:
- result = json.loads(r'{}'.format(payload_str))
- except json.decoder.JSONDecodeError as e:
- logging.error(payload_str)
- logging.error(traceback.format_exc())
+ result['client_id'] = _client_id
+ if element['user_id']:
+ result['user_id'] = element['user_id']
+ result['timestamp_micros'] = self.date_to_micro(element["inference_date"])
+ result['non_personalized_ads'] = False
+ result['consent'] = self.consent_obj
+ result['user_properties'] = self.extract_user_properties(element)
+ result['events'] = [self.extract_event(element)]
+
yield result
@@ -419,62 +411,40 @@ def date_to_micro(self, date_str):
return int(datetime.datetime.strptime(date_str, self.date_format).timestamp() * 1E6)
- def generate_param_fields(self, element):
+ def extract_user_properties(self, element):
"""
- Generates a JSON string containing the parameter fields of the element.
+ Generates a dictionary containing the user properties of the element.
Args:
element: The element to be processed.
Returns:
- A JSON string containing the parameter fields of the element.
+ A dictionary containing the user properties of the element.
"""
- element_copy = element.copy()
- del element_copy['client_id']
- del element_copy['user_id']
- del element_copy['session_id']
- del element_copy['inference_date']
- element_copy = {k: v for k, v in element_copy.items() if v}
- return json.dumps(element_copy, cls=DecimalEncoder)
+ user_properties = {}
+ for k, v in element.items():
+ if k.startswith(self.user_property_prefix) and v:
+ user_properties[k[len(self.user_property_prefix):]] = {'value': str(v)}
+ return user_properties
-
- def generate_user_properties(self, element):
- """
- Generates a JSON string containing the user properties of the element.
-
- Args:
- element: The element to be processed.
-
- Returns:
- A JSON string containing the user properties of the element.
+ def extract_event(self, element):
"""
- element_copy = element.copy()
- del element_copy['client_id']
- del element_copy['user_id']
- del element_copy['session_id']
- del element_copy['inference_date']
- user_properties_obj = {}
- for k, v in element_copy.items():
- if v:
- user_properties_obj[k] = {'value': str(v)}
- return json.dumps(user_properties_obj, cls=DecimalEncoder)
-
+ Generates a dictionary containing the event parameters from the element.
- def generate_user_id_key_value_pair(self, element):
- """
- If the user_id field is not empty generate the key/value string with the user_id.
- else return empty string
Args:
element: The element to be processed.
Returns:
- A string containing the key and value with the user_id.
+ A dictionary containing the event parameters from the element.
"""
- user_id = element['user_id']
- if user_id:
- return f'"user_id": "{user_id}",'
- return ""
-
+ event = {
+ 'name': self.event_name,
+ 'params': {}
+ }
+ for k, v in element.items():
+ if k.startswith(self.event_parameter_prefix) and v:
+ event['params'][k[len(self.event_parameter_prefix):]] = v
+ return event
@@ -519,8 +489,7 @@ def load_activation_type_configuration(args):
# Create the activation type configuration dictionary.
configuration = {
'activation_event_name': activation_config['activation_event_name'],
- 'source_query_template': Environment(loader=BaseLoader).from_string(gcs_read_file(args.project, activation_config['source_query_template']).replace('\n', ' ')),
- 'measurement_protocol_payload_template': gcs_read_file(args.project, activation_config['measurement_protocol_payload_template'])
+ 'source_query_template': Environment(loader=BaseLoader).from_string(gcs_read_file(args.project, activation_config['source_query_template']).replace('\n', ' '))
}
return configuration
@@ -589,7 +558,7 @@ def run(argv=None):
query=load_from_source_query,
use_json_exports=True,
use_standard_sql=True)
- | 'Prepare Measurement Protocol API payload' >> beam.ParDo(TransformToPayload(activation_type_configuration['measurement_protocol_payload_template'], activation_type_configuration['activation_event_name']))
+ | 'Prepare Measurement Protocol API payload' >> beam.ParDo(TransformToPayload(activation_type_configuration['activation_event_name']))
| 'POST event to Measurement Protocol API' >> beam.ParDo(CallMeasurementProtocolAPI(activation_options.ga4_measurement_id, activation_options.ga4_api_secret, debug=activation_options.use_api_validation))
)
diff --git a/python/activation/requirements.txt b/python/activation/requirements.txt
index 996c59a7..1809103f 100644
--- a/python/activation/requirements.txt
+++ b/python/activation/requirements.txt
@@ -1 +1 @@
-jinja2==3.1.4
+jinja2==3.1.5
diff --git a/python/base_component_image/Dockerfile b/python/base_component_image/Dockerfile
index 1e555421..4ee0156b 100644
--- a/python/base_component_image/Dockerfile
+++ b/python/base_component_image/Dockerfile
@@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
-FROM python:3.8.18-slim
+FROM python:3.10.16-slim
RUN pip install --upgrade pip
RUN pip install poetry
@@ -22,5 +22,5 @@ ENV PYTHONPATH=${PYTHONPATH}:${PWD}
COPY ./pyproject.toml ./README.md /app/
COPY ./ma_components /app/ma_components
-RUN poetry config virtualenvs.create false # so that installations is on global python3.7 and not in venv
+RUN poetry config virtualenvs.create false # so that installations is on global python3.10 and not in venv
RUN poetry install
\ No newline at end of file
diff --git a/python/base_component_image/pyproject.toml b/python/base_component_image/pyproject.toml
index 3ce3fc2f..61352f50 100644
--- a/python/base_component_image/pyproject.toml
+++ b/python/base_component_image/pyproject.toml
@@ -2,32 +2,37 @@
name = "ma-components"
version = "1.0.0"
description = "contains components used in marketing analytics project. the need is to package the components and containerise so that they can be used from the python function based component"
-authors = ["Christos Aniftos "]
+authors = ["Marketing Analytics Solutions Architects "]
+license = "Apache 2.0"
readme = "README.md"
packages = [{include = "ma_components"}]
[tool.poetry.dependencies]
-python = ">=3.8,<3.11"
-pip = "23.3"
+python = ">=3.9,<3.12.0"
+pip = "23.3.2"
+kfp = "2.4.0"
## Fixing this error: https://stackoverflow.com/questions/76175487/sudden-importerror-cannot-import-name-appengine-from-requests-packages-urlli
-kfp = "2.0.0-rc.2"
+#kfp = "2.0.0-rc.2"
#kfp = {version = "2.0.0-b12", allow-prereleases = true}
#kfp = {version = "2.0.0-b16", allow-prereleases = true}
-kfp-server-api = "2.0.0-rc.1"
+kfp-server-api = "2.0.5"
+#kfp-server-api = "2.0.0-rc.1"
#kfp-server-api = "2.0.0.a6"
#kfp-server-api = "2.0.0b1"
-urllib3 = "1.26.18"
+urllib3 = "1.26.20"
toml = "^0.10.2"
docker = "^6.0.1"
-google-cloud-bigquery = "2.30.0"
-google-cloud-aiplatform = "1.52.0"
+google-cloud-bigquery = "3.21.0"
+google-cloud-bigquery-connection = "1.17.0"
+#google-cloud-aiplatform = "1.52.0"
+google-cloud-aiplatform = "1.77.0"
shapely = "<2.0.0"
-google-cloud-pubsub = "2.15.0"
+google-cloud-pubsub = "2.27.2"
#google-cloud-pipeline-components = "1.0.33"
google-cloud-pipeline-components = "2.6.0"
-db-dtypes = "1.2.0"
-optuna = "3.2.0"
-scikit-learn = "1.2.2"
+db-dtypes = "1.3.1"
+optuna = "3.6.1"
+scikit-learn = "1.5.0"
#plotly = "5.16.0"
#matplotlib= "3.7.2"
#seaborn = "0.12.2"
@@ -35,6 +40,8 @@ pyarrow = "15.0.2"
google-auth-oauthlib = "^1.2.1"
oauth2client = "^4.1.3"
google-cloud-core = "^2.4.1"
+sympy="1.13.3"
+google-cloud-resource-manager="1.14.0"
[build-system]
requires = ["poetry-core>=1.0.0"]
diff --git a/python/function/trigger_activation/requirements.txt b/python/function/trigger_activation/requirements.txt
index 1b6d3ebf..2c76e274 100644
--- a/python/function/trigger_activation/requirements.txt
+++ b/python/function/trigger_activation/requirements.txt
@@ -1,2 +1,2 @@
-functions-framework==3.7.0
-google-cloud-dataflow-client==0.8.10
\ No newline at end of file
+functions-framework==3.8.2
+google-cloud-dataflow-client==0.8.15
\ No newline at end of file
diff --git a/python/ga4_setup/setup.py b/python/ga4_setup/setup.py
index 03204812..dd4c885d 100644
--- a/python/ga4_setup/setup.py
+++ b/python/ga4_setup/setup.py
@@ -276,6 +276,7 @@ def create_custom_dimensions(configuration: map):
create_custom_dimensions_for('CLTV', ['cltv_decile'], existing_dimensions, configuration)
create_custom_dimensions_for('Auto Audience Segmentation', ['a_a_s_prediction'], existing_dimensions, configuration)
create_custom_dimensions_for('Churn Propensity', ['c_p_prediction', 'c_p_decile'], existing_dimensions, configuration)
+ create_custom_dimensions_for('Lead Score Propensity', ['l_s_p_prediction', 'l_s_p_decile'], existing_dimensions, configuration)
@@ -513,9 +514,14 @@ def entry():
if args.ga4_resource == "check_property_type":
property = get_property(configuration)
- result = {
- 'supported': f"{property.property_type == property.property_type.PROPERTY_TYPE_ORDINARY}"
- }
+ is_property_supported = set((property.property_type.PROPERTY_TYPE_ORDINARY, property.property_type.PROPERTY_TYPE_SUBPROPERTY, property.property_type.PROPERTY_TYPE_ROLLUP))
+
+ result = {}
+ if property.property_type in is_property_supported:
+ result = {'supported': "True"}
+ else:
+ result = {'supported': "False"}
+
print(json.dumps(result))
# python setup.py --ga4_resource=custom_events
diff --git a/python/lookerstudio/README.md b/python/lookerstudio/README.md
index dc019f63..aa624b3f 100644
--- a/python/lookerstudio/README.md
+++ b/python/lookerstudio/README.md
@@ -1,5 +1,30 @@
# Marketing Analytics Jumpstart Looker Studio Dashboard
+## Prerequisites
+This Looker Studio dashboard relies on specific BigQuery tables that should be present in your project. These tables are created during the deployment of the Marketing Analytics Jumpstart and by the data processing pipelines of the solution.
+Before deploying the dashboard, make sure the pre-requisite tables exist. If tables are missing, ensure the corresponding pipelines have run successfully.
+
+| Table | Dataset | Source Process | Troubleshooting Link |
+| -------- | ------- | ------- | --------- |
+| session_date | marketing_ga4_v1_* | Dataform Execution| [Workflow Execution Logs](https://console.cloud.google.com/bigquery/dataform/locations/us-central1/repositories/marketing-analytics/details/workflows) |
+| session_device_daily_metrics | marketing_ga4_v1_* | Dataform Execution| [Workflow Execution Logs](https://console.cloud.google.com/bigquery/dataform/locations/us-central1/repositories/marketing-analytics/details/workflows) |
+| latest | aggregated_predictions | feature-store terraform module and aggregated_predictions.aggregate_last_day_predictions stored procedure | [Aggregating stored prodedure](https://console.cloud.google.com/bigquery?ws=!1m5!1m4!6m3!1s!2saggregated_predictions!3saggregate_last_day_predictions) |
+| resource_link | maj_dashboard | monitor terraform module | [Dashboard dataset](https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1s!2smaj_dashboard) |
+| dataform_googleapis_com_workflow_invocation_completion | maj_logs | monitor terraform module | [maj_logs dataset](https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1s!2smaj_logs) |
+| event | marketing_ga4_base_* | Dataform Execution | [Workflow Execution Logs](https://console.cloud.google.com/bigquery/dataform/locations/us-central1/repositories/marketing-analytics/details/workflows) |
+| session_location_daily_metrics | marketing_ga4_v1_* | Dataform Execution | [Workflow Execution Logs](https://console.cloud.google.com/bigquery/dataform/locations/us-central1/repositories/marketing-analytics/details/workflows) |
+| aggregated_value_based_bidding_volume_weekly | aggregated_vbb | feature-store terraform module and aggregated_vbb.invoke_aggregated_value_based_bidding_explanation_preparation stored procedure | [aggregated_value_based_bidding_explanation_preparation](https://console.cloud.google.com/bigquery?ws=!1m5!1m4!6m3!1s!2saggregated_vbb!3sinvoke_aggregated_value_based_bidding_explanation_preparation) |
+| event_page | marketing_ga4_v1_* | Dataform Execution| [Workflow Execution Logs](https://console.cloud.google.com/bigquery/dataform/locations/us-central1/repositories/marketing-analytics/details/workflows) |
+| unique_page_views | marketing_ga4_v1_* | Dataform Execution| [Workflow Execution Logs](https://console.cloud.google.com/bigquery/dataform/locations/us-central1/repositories/marketing-analytics/details/workflows) |
+| aggregated_value_based_bidding_correlation | aggregated_vbb | feature-store terraform module and aggregated_vbb.invoke_aggregated_value_based_bidding_explanation_preparation stored procedure | [aggregated_value_based_bidding_explanation_preparation](https://console.cloud.google.com/bigquery?ws=!1m5!1m4!6m3!1s!2saggregated_vbb!3sinvoke_aggregated_value_based_bidding_explanation_preparation) |
+| ad_performance_conversions | marketing_ads_v1_* | Dataform Execution | [Workflow Execution Logs](https://console.cloud.google.com/bigquery/dataform/locations/us-central1/repositories/marketing-analytics/details/workflows) |
+| user_behaviour_revenue_insights_daily | gemini_insights | feature-store terraform module and gemini_insights.user_behaviour_revenue_insights stored procedure | [User Behaviour Revenue Insights](https://console.cloud.google.com/bigquery?ws=!1m5!1m4!6m3!1s!2sgemini_insights!3suser_behaviour_revenue_insights) |
+| dataflow_googleapis_com_job_message | maj_logs | monitor terraform module | [maj_logs dataset](https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1s!2smaj_logs) |
+| vbb_weights | aggregated_vbb | feature-store terraform module and VBB explanation pipeline | [VBB Explanation Pipeline](https://console.cloud.google.com/vertex-ai/pipelines/schedules) |
+| page_session_daily_metrics | marketing_ga4_v1_* | Dataform Execution| [Workflow Execution Logs](https://console.cloud.google.com/bigquery/dataform/locations/us-central1/repositories/marketing-analytics/details/workflows) |
+| aiplatform_googleapis_com_pipeline_job_events | maj_logs | monitor terraform module | [maj_logs dataset](https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1s!2smaj_logs) |
+| aggregated_value_based_bidding_volume_daily | aggregated_vbb | feature-store terraform module and aggregated_vbb.invoke_aggregated_value_based_bidding_explanation_preparation stored procedure | [aggregated_value_based_bidding_explanation_preparation](https://console.cloud.google.com/bigquery?ws=!1m5!1m4!6m3!1s!2saggregated_vbb!3sinvoke_aggregated_value_based_bidding_explanation_preparation) |
+
## Extract Looker Studio dashboard URL
Extract the URL used to create the dashboard from the Terraform output value:
diff --git a/python/lookerstudio/config.ini b/python/lookerstudio/config.ini
index ed0c91d1..3c9d365b 100644
--- a/python/lookerstudio/config.ini
+++ b/python/lookerstudio/config.ini
@@ -33,15 +33,17 @@
[COMMON]
# TODO: Replace the values in this section with your own
-project = project_id
+project = project_id
ga4_dataset = marketing_ga4_v1_prod
ga4_base_dataset = marketing_ga4_base_prod
ads_dataset = marketing_ads_v1_prod
+ads_base_dataset = marketing_ads_base_prod
dashboard_dataset = maj_dashboard
logs_dataset = maj_logs
aggregated_vbb_dataset = aggregated_vbb
aggregated_predictions_dataset = aggregated_predictions
gemini_insights_dataset = gemini_insights
+purchase_propensity_dataset = purchase_propensity
# The below sections can be used as is unless you've used a custom dataset & view naming convention
@@ -188,3 +190,19 @@ type = TABLE
tableId = user_behaviour_revenue_insights_daily
datasetId = ${COMMON:gemini_insights_dataset}
projectId = ${COMMON:project}
+
+[Bid Strategy ROAS VBB]
+ds_alias = Bid_strategy_roas_vbb
+connector = bigQuery
+type = TABLE
+tableId = bid_strategy_roas
+datasetId = ${COMMON:ads_base_dataset}
+projectId = ${COMMON:project}
+
+[Prediction Stats]
+ds_alias = Prediction_stats
+connector = bigQuery
+type = TABLE
+tableId = prediction_stats
+datasetId = ${COMMON:purchase_propensity_dataset}
+projectId = ${COMMON:project}
diff --git a/python/lookerstudio/lookerstudio_deployment.py b/python/lookerstudio/lookerstudio_deployment.py
index 3e0c497c..59139fd1 100644
--- a/python/lookerstudio/lookerstudio_deployment.py
+++ b/python/lookerstudio/lookerstudio_deployment.py
@@ -28,7 +28,7 @@
# Constants
-CONFIG_FILE = "config.ini"
+CONFIG_FILE = "python/lookerstudio/config.ini"
BASE_URL = "https://lookerstudio.google.com/reporting/create?"
REPORT_ID = "f61f65fe-4991-45fc-bcdc-80593966f28c"
REPORT_NAME = "Marketing%20Analytics%20Sample"
diff --git a/python/lookerstudio/pyproject.toml b/python/lookerstudio/pyproject.toml
index 12dbe21b..4bb61293 100644
--- a/python/lookerstudio/pyproject.toml
+++ b/python/lookerstudio/pyproject.toml
@@ -1,3 +1,11 @@
+[project]
+name = "lookerstudio"
+version = "0.1.0"
+description = "Deployment process for the Marketing Analytics Jumpstart Looker Studio dashboard."
+readme = "README.md"
+requires-python = ">=3.7.1"
+dependencies = []
+
[tool.poetry]
name = "looker studio deployment"
version = "0.1.0"
@@ -7,7 +15,7 @@ license = "Apache 2.0"
readme = "README.md"
[tool.poetry.dependencies]
-python = ">=3.7.1"
+python = ">=3.9"
google-cloud-bigquery = "^3.10.0"
google-auth = "^2.17.3"
google-api-core = "^2.11.0"
diff --git a/python/pipelines/automl_tabular_pl_v4.yaml b/python/pipelines/automl_tabular_pl_v4.yaml
index 4d20b803..6bdc8cfb 100644
--- a/python/pipelines/automl_tabular_pl_v4.yaml
+++ b/python/pipelines/automl_tabular_pl_v4.yaml
@@ -11151,21 +11151,21 @@ root:
isOptional: true
parameterType: BOOLEAN
distill_batch_predict_machine_type:
- defaultValue: n1-standard-16
+ defaultValue: n1-highmem-8
description: 'The prediction server machine type for
batch predict component in the model distillation.'
isOptional: true
parameterType: STRING
distill_batch_predict_max_replica_count:
- defaultValue: 25.0
+ defaultValue: 5.0
description: 'The max number of prediction server
for batch predict component in the model distillation.'
isOptional: true
parameterType: NUMBER_INTEGER
distill_batch_predict_starting_replica_count:
- defaultValue: 25.0
+ defaultValue: 5.0
description: 'The initial number of
prediction server for batch predict component in the model distillation.'
@@ -11201,14 +11201,14 @@ root:
isOptional: true
parameterType: STRING
evaluation_batch_explain_max_replica_count:
- defaultValue: 10.0
+ defaultValue: 5.0
description: 'The max number of prediction
server for batch explain components during evaluation.'
isOptional: true
parameterType: NUMBER_INTEGER
evaluation_batch_explain_starting_replica_count:
- defaultValue: 10.0
+ defaultValue: 5.0
description: 'The initial number of
prediction server for batch explain components during evaluation.'
@@ -11222,14 +11222,14 @@ root:
isOptional: true
parameterType: STRING
evaluation_batch_predict_max_replica_count:
- defaultValue: 20.0
+ defaultValue: 5.0
description: 'The max number of prediction
server for batch predict components during evaluation.'
isOptional: true
parameterType: NUMBER_INTEGER
evaluation_batch_predict_starting_replica_count:
- defaultValue: 20.0
+ defaultValue: 5.0
description: 'The initial number of
prediction server for batch predict components during evaluation.'
@@ -11279,7 +11279,7 @@ root:
description: The GCP region that runs the pipeline components.
parameterType: STRING
max_selected_features:
- defaultValue: 1000.0
+ defaultValue: 100.0
description: number of features to select for training.
isOptional: true
parameterType: NUMBER_INTEGER
@@ -11356,7 +11356,7 @@ root:
isOptional: true
parameterType: BOOLEAN
stage_1_num_parallel_trials:
- defaultValue: 35.0
+ defaultValue: 5.0
description: Number of parallel trails for stage 1.
isOptional: true
parameterType: NUMBER_INTEGER
@@ -11367,7 +11367,7 @@ root:
isOptional: true
parameterType: LIST
stage_2_num_parallel_trials:
- defaultValue: 35.0
+ defaultValue: 5.0
description: Number of parallel trails for stage 2.
isOptional: true
parameterType: NUMBER_INTEGER
diff --git a/python/pipelines/compiler.py b/python/pipelines/compiler.py
index 6b5224dd..97bbc62c 100644
--- a/python/pipelines/compiler.py
+++ b/python/pipelines/compiler.py
@@ -31,6 +31,7 @@
'vertex_ai.pipelines.feature-creation-purchase-propensity.execution': "pipelines.feature_engineering_pipelines.purchase_propensity_feature_engineering_pipeline",
'vertex_ai.pipelines.feature-creation-churn-propensity.execution': "pipelines.feature_engineering_pipelines.churn_propensity_feature_engineering_pipeline",
'vertex_ai.pipelines.feature-creation-customer-ltv.execution': "pipelines.feature_engineering_pipelines.customer_lifetime_value_feature_engineering_pipeline",
+ 'vertex_ai.pipelines.feature-creation-lead-score-propensity.execution': "pipelines.feature_engineering_pipelines.lead_score_propensity_feature_engineering_pipeline",
'vertex_ai.pipelines.auto_segmentation.training': "pipelines.auto_segmentation_pipelines.training_pl",
'vertex_ai.pipelines.auto_segmentation.prediction': "pipelines.auto_segmentation_pipelines.prediction_pl",
'vertex_ai.pipelines.segmentation.training': "pipelines.segmentation_pipelines.training_pl",
@@ -39,6 +40,8 @@
'vertex_ai.pipelines.purchase_propensity.prediction': "pipelines.tabular_pipelines.prediction_binary_classification_pl",
'vertex_ai.pipelines.churn_propensity.training': None, # tabular workflows pipelines is precompiled
'vertex_ai.pipelines.churn_propensity.prediction': "pipelines.tabular_pipelines.prediction_binary_classification_pl",
+ 'vertex_ai.pipelines.lead_score_propensity.training': None, # tabular workflows pipelines is precompiled
+ 'vertex_ai.pipelines.lead_score_propensity.prediction': "pipelines.tabular_pipelines.prediction_binary_classification_pl",
'vertex_ai.pipelines.propensity_clv.training': None, # tabular workflows pipelines is precompiled
'vertex_ai.pipelines.clv.training': None, # tabular workflows pipelines is precompiled
'vertex_ai.pipelines.clv.prediction': "pipelines.tabular_pipelines.prediction_binary_classification_regression_pl",
diff --git a/python/pipelines/components/bigquery/component.py b/python/pipelines/components/bigquery/component.py
index c4aa542f..e52a511e 100644
--- a/python/pipelines/components/bigquery/component.py
+++ b/python/pipelines/components/bigquery/component.py
@@ -879,7 +879,7 @@ def bq_dynamic_query_exec_output(
# Construct query template
template = jinja2.Template("""
CREATE OR REPLACE TABLE `{{project_id}}.{{dataset}}.{{create_table}}` AS (
- SELECT
+ SELECT DISTINCT
feature,
ROUND(100 * SUM(users) OVER (ORDER BY users DESC) / SUM(users) OVER (), 2) as cumulative_traffic_percent,
@@ -892,7 +892,7 @@ def bq_dynamic_query_exec_output(
SELECT
user_pseudo_id,
user_id,
- page_location as page_path
+ LOWER(page_location) as page_path
FROM `{{mds_project_id}}.{{mds_dataset}}.event`
WHERE
event_name = 'page_view'
@@ -1423,4 +1423,4 @@ def execute_query_with_retries(query):
logging.error(f"Query failed after retries: {e}")
-
\ No newline at end of file
+
diff --git a/python/pipelines/feature_engineering_pipelines.py b/python/pipelines/feature_engineering_pipelines.py
index deb7b88b..a15ffa12 100644
--- a/python/pipelines/feature_engineering_pipelines.py
+++ b/python/pipelines/feature_engineering_pipelines.py
@@ -196,8 +196,73 @@ def audience_segmentation_feature_engineering_pipeline(
location=location,
query=query_audience_segmentation_inference_preparation,
timeout=timeout).set_display_name('audience_segmentation_inference_preparation').after(*phase_1)
-
-
+
+
+@dsl.pipeline()
+def lead_score_propensity_feature_engineering_pipeline(
+ project_id: str,
+ location: Optional[str],
+ query_lead_score_propensity_label: str,
+ query_user_dimensions: str,
+ query_user_rolling_window_metrics: str,
+ query_lead_score_propensity_inference_preparation: str,
+ query_lead_score_propensity_training_preparation: str,
+ timeout: Optional[float] = 3600.0
+):
+ """
+ This pipeline defines the steps for feature engineering for the lead score propensity model.
+
+ Args:
+ project_id: The Google Cloud project ID.
+ location: The Google Cloud region where the pipeline will be run.
+ query_lead_score_propensity_label: The SQL query that will be used to calculate the purchase propensity label.
+ query_user_dimensions: The SQL query that will be used to calculate the user dimensions.
+ query_user_rolling_window_metrics: The SQL query that will be used to calculate the user rolling window metrics.
+ query_lead_score_propensity_inference_preparation: The SQL query that will be used to prepare the inference data.
+ query_lead_score_propensity_training_preparation: The SQL query that will be used to prepare the training data.
+ timeout: The timeout for the pipeline in seconds.
+
+ Returns:
+ None
+ """
+
+ # Features Preparation
+ phase_1 = list()
+ phase_1.append(
+ sp(
+ project=project_id,
+ location=location,
+ query=query_lead_score_propensity_label,
+ timeout=timeout).set_display_name('lead_score_propensity_label')
+ )
+ phase_1.append(
+ sp(
+ project=project_id,
+ location=location,
+ query=query_user_dimensions,
+ timeout=timeout).set_display_name('user_dimensions')
+ )
+ phase_1.append(
+ sp(
+ project=project_id,
+ location=location,
+ query=query_user_rolling_window_metrics,
+ timeout=timeout).set_display_name('user_rolling_window_metrics')
+ )
+ # Training data preparation
+ purchase_propensity_train_prep = sp(
+ project=project_id,
+ location=location,
+ query=query_lead_score_propensity_training_preparation,
+ timeout=timeout).set_display_name('lead_score_propensity_training_preparation').after(*phase_1)
+ # Inference data preparation
+ purchase_propensity_inf_prep = sp(
+ project=project_id,
+ location=location,
+ query=query_lead_score_propensity_inference_preparation,
+ timeout=timeout).set_display_name('lead_score_propensity_inference_preparation').after(*phase_1)
+
+
@dsl.pipeline()
def purchase_propensity_feature_engineering_pipeline(
project_id: str,
diff --git a/python/pipelines/pipeline_ops.py b/python/pipelines/pipeline_ops.py
index a1b94675..abb15659 100644
--- a/python/pipelines/pipeline_ops.py
+++ b/python/pipelines/pipeline_ops.py
@@ -17,6 +17,7 @@
from tracemalloc import start
import pip
+from sympy import preview
from kfp import compiler
from google.cloud.aiplatform.pipeline_jobs import PipelineJob, _set_enable_caching_value
from google.cloud.aiplatform import TabularDataset, Artifact
@@ -625,6 +626,30 @@ def get_gcp_bearer_token() -> str:
return bearer_token
+def _get_project_number(project_id) -> str:
+ """
+ Retrieves the project number from a project id
+
+ Returns:
+ A string containing the project number
+
+ Raises:
+ Exception: If an error occurs while retrieving the resource manager project object.
+ """
+ from google.cloud import resourcemanager_v3
+
+ # Create a resource manager client
+ client = resourcemanager_v3.ProjectsClient()
+
+ # Get the project number
+ project = client.get_project(name=f"projects/{project_id}").name
+ project_number = project.split('/')[-1]
+
+ logging.info(f"Project Number: {project_number}")
+
+ return project_number
+
+
# Function to schedule the pipeline.
def schedule_pipeline(
project_id: str,
@@ -636,7 +661,9 @@ def schedule_pipeline(
cron: str,
max_concurrent_run_count: str,
start_time: str,
- end_time: str,
+ end_time: str = None,
+ subnetwork: str = "default",
+ use_private_service_access: bool = False,
pipeline_parameters: Dict[str, Any] = None,
pipeline_parameters_substitutions: Optional[Dict[str, Any]] = None,
) -> dict:
@@ -654,6 +681,8 @@ def schedule_pipeline(
max_concurrent_run_count: The maximum number of concurrent pipeline runs.
start_time: The start time of the schedule.
end_time: The end time of the schedule.
+ subnetwork: The VPC subnetwork name to be used in VPC peering.
+ use_private_service_access: A flag to define whether to use the VPC private service access or not.
Returns:
A dictionary containing information about the scheduled pipeline.
@@ -676,19 +705,53 @@ def schedule_pipeline(
pipeline_job = aiplatform.PipelineJob(
template_path=template_path,
pipeline_root=pipeline_root,
+ location=region,
display_name=f"{pipeline_name}",
)
- # Create the schedule with the pipeline job defined
- pipeline_job_schedule = pipeline_job.create_schedule(
+ # https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.PipelineJobSchedule
+ # Create a schedule for the pipeline job
+ pipeline_job_schedule = aiplatform.PipelineJobSchedule(
display_name=f"{pipeline_name}",
- cron=cron,
- max_concurrent_run_count=max_concurrent_run_count,
- start_time=start_time,
- end_time=end_time,
- service_account=pipeline_sa,
+ pipeline_job=pipeline_job,
+ location=region
)
+ # Get the project number to use in the network identifier
+ project_number = _get_project_number(project_id)
+
+ # Create the schedule using the pipeline job schedule
+ # Using the VPC private service access or not, depending on the flag
+ if use_private_service_access:
+ pipeline_job_schedule.create(
+ cron=cron,
+ max_concurrent_run_count=max_concurrent_run_count,
+ start_time=start_time,
+ end_time=end_time,
+ service_account=pipeline_sa,
+ network=f"projects/{project_number}/global/networks/{subnetwork}",
+ create_request_timeout=None,
+ )
+ else:
+ pipeline_job_schedule.create(
+ cron=cron,
+ max_concurrent_run_count=max_concurrent_run_count,
+ start_time=start_time,
+ end_time=end_time,
+ service_account=pipeline_sa,
+ create_request_timeout=None,
+ )
+
+ # Old version - Create the schedule with the pipeline job defined
+ #pipeline_job_schedule = pipeline_job.create_schedule(
+ # display_name=f"{pipeline_name}",
+ # cron=cron,
+ # max_concurrent_run_count=max_concurrent_run_count,
+ # start_time=start_time,
+ # end_time=end_time,
+ # service_account=pipeline_sa,
+ #)
+
logging.info(f"Pipeline scheduled : {pipeline_name}")
return pipeline_job
@@ -903,4 +966,4 @@ def run_pipeline(
if (pl.has_failed):
raise RuntimeError("Pipeline execution failed")
return pl
-
\ No newline at end of file
+
diff --git a/python/pipelines/scheduler.py b/python/pipelines/scheduler.py
index fbdd9933..7e00dc8e 100644
--- a/python/pipelines/scheduler.py
+++ b/python/pipelines/scheduler.py
@@ -37,8 +37,11 @@ def check_extention(file_path: str, type: str = '.yaml'):
'vertex_ai.pipelines.feature-creation-purchase-propensity.execution': "pipelines.feature_engineering_pipelines.purchase_propensity_feature_engineering_pipeline",
'vertex_ai.pipelines.feature-creation-churn-propensity.execution': "pipelines.feature_engineering_pipelines.churn_propensity_feature_engineering_pipeline",
'vertex_ai.pipelines.feature-creation-customer-ltv.execution': "pipelines.feature_engineering_pipelines.customer_lifetime_value_feature_engineering_pipeline",
+ 'vertex_ai.pipelines.feature-creation-lead-score-propensity.execution': "pipelines.feature_engineering_pipelines.lead_score_propensity_feature_engineering_pipeline",
'vertex_ai.pipelines.purchase_propensity.training': None, # tabular workflows pipelines is precompiled
'vertex_ai.pipelines.purchase_propensity.prediction': "pipelines.tabular_pipelines.prediction_binary_classification_pl",
+ 'vertex_ai.pipelines.lead_score_propensity.training': None, # tabular workflows pipelines is precompiled
+ 'vertex_ai.pipelines.lead_score_propensity.prediction': "pipelines.tabular_pipelines.prediction_binary_classification_pl",
'vertex_ai.pipelines.churn_propensity.training': None, # tabular workflows pipelines is precompiled
'vertex_ai.pipelines.churn_propensity.prediction': "pipelines.tabular_pipelines.prediction_binary_classification_pl",
'vertex_ai.pipelines.segmentation.training': "pipelines.segmentation_pipelines.training_pl",
@@ -138,7 +141,9 @@ def check_extention(file_path: str, type: str = '.yaml'):
cron=my_pipeline_vars['schedule']['cron'],
max_concurrent_run_count=my_pipeline_vars['schedule']['max_concurrent_run_count'],
start_time=my_pipeline_vars['schedule']['start_time'],
- end_time=my_pipeline_vars['schedule']['end_time']
+ end_time=my_pipeline_vars['schedule']['end_time'],
+ subnetwork=my_pipeline_vars['schedule']['subnetwork'],
+ use_private_service_access=my_pipeline_vars['schedule']['use_private_service_access'],
)
if my_pipeline_vars['schedule']['state'] == 'PAUSED':
diff --git a/python/pipelines/transformations-lead-score-propensity.json b/python/pipelines/transformations-lead-score-propensity.json
new file mode 100644
index 00000000..28ca5e70
--- /dev/null
+++ b/python/pipelines/transformations-lead-score-propensity.json
@@ -0,0 +1,368 @@
+[
+ {
+ "numeric": {
+ "column_name": "user_ltv_revenue",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "categorical": {
+ "column_name": "device_category"
+ }
+ },
+ {
+ "categorical": {
+ "column_name": "device_mobile_brand_name"
+ }
+ },
+ {
+ "categorical": {
+ "column_name": "device_mobile_model_name"
+ }
+ },
+ {
+ "categorical": {
+ "column_name": "device_os"
+ }
+ },
+ {
+ "categorical": {
+ "column_name": "device_language"
+ }
+ },
+ {
+ "categorical": {
+ "column_name": "device_web_browser"
+ }
+ },
+ {
+ "categorical": {
+ "column_name": "geo_sub_continent"
+ }
+ },
+ {
+ "categorical": {
+ "column_name": "geo_country"
+ }
+ },
+ {
+ "categorical": {
+ "column_name": "geo_region"
+ }
+ },
+ {
+ "categorical": {
+ "column_name": "geo_city"
+ }
+ },
+ {
+ "categorical": {
+ "column_name": "geo_metro"
+ }
+ },
+ {
+ "categorical": {
+ "column_name": "last_traffic_source_medium"
+ }
+ },
+ {
+ "categorical": {
+ "column_name": "last_traffic_source_name"
+ }
+ },
+ {
+ "categorical": {
+ "column_name": "last_traffic_source_source"
+ }
+ },
+ {
+ "categorical": {
+ "column_name": "first_traffic_source_medium"
+ }
+ },
+ {
+ "categorical": {
+ "column_name": "first_traffic_source_name"
+ }
+ },
+ {
+ "categorical": {
+ "column_name": "first_traffic_source_source"
+ }
+ },
+ {
+ "categorical": {
+ "column_name": "has_signed_in_with_user_id"
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "scroll_50_past_1_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "scroll_50_past_2_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "scroll_50_past_3_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "scroll_50_past_4_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "scroll_50_past_5_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "scroll_90_past_1_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "scroll_90_past_2_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "scroll_90_past_3_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "scroll_90_past_4_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "scroll_90_past_5_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "view_search_results_past_1_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "view_search_results_past_2_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "view_search_results_past_3_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "view_search_results_past_4_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "view_search_results_past_5_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "file_download_past_1_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "file_download_past_2_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "file_download_past_3_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "file_download_past_4_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "file_download_past_5_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_add_to_list_past_1_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_add_to_list_past_2_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_add_to_list_past_3_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_add_to_list_past_4_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_add_to_list_past_5_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_print_past_1_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_print_past_2_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_print_past_3_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_print_past_4_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_print_past_5_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "sign_up_past_1_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "sign_up_past_2_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "sign_up_past_3_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "sign_up_past_4_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "sign_up_past_5_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_favorite_past_1_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_favorite_past_2_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_favorite_past_3_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_favorite_past_4_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_favorite_past_5_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_add_to_menu_past_1_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_add_to_menu_past_2_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_add_to_menu_past_3_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_add_to_menu_past_4_day",
+ "invalid_values_allowed": true
+ }
+ },
+ {
+ "numeric": {
+ "column_name": "recipe_add_to_menu_past_5_day",
+ "invalid_values_allowed": true
+ }
+ }
+]
\ No newline at end of file
diff --git a/scripts/common.sh b/scripts/common.sh
index 926eec7f..142fbbc1 100644
--- a/scripts/common.sh
+++ b/scripts/common.sh
@@ -46,8 +46,43 @@ declare -a apis_array=("cloudresourcemanager.googleapis.com"
"bigquerymigration.googleapis.com"
"bigquerydatatransfer.googleapis.com"
"dataform.googleapis.com"
+ "cloudkms.googleapis.com"
+ "servicenetworking.googleapis.com"
+ "artifactregistry.googleapis.com"
+ "cloudbuild.googleapis.com"
+ "aiplatform.googleapis.com"
+ "storage-api.googleapis.com"
+ "bigqueryconnection.googleapis.com"
)
+create_bigquery_connection() {
+ _PROJECT_ID=$1
+ _LOCATION=$2
+ _CONNECTION_TYPE='CLOUD_RESOURCE'
+ _CONNECTION_NAME=$3
+
+ CONNECTION_EXISTS=$(bq ls --connection --location=$_LOCATION --project_id=$_PROJECT_ID)
+ if [ "$CONNECTION_EXISTS" = "No connections found." ]; then
+ bq mk --connection --location=$_LOCATION --project_id=$_PROJECT_ID --connection_type=$_CONNECTION_TYPE $_CONNECTION_NAME
+
+ SERVICE_ACCT_EMAIL=$(bq show --format=prettyjson --connection $_LOCATION.$_CONNECTION_NAME | grep "serviceAccountId" | cut -d '"' -f 4 | cut -d '?' -f 1)
+ echo $SERVICE_ACCT_EMAIL
+
+ gcloud projects add-iam-policy-binding $PROJECT_ID --condition=None --no-user-output-enabled --member="serviceAccount:$SERVICE_ACCT_EMAIL" --role="roles/serviceusage.serviceUsageConsumer"
+ gcloud projects add-iam-policy-binding $PROJECT_ID --condition=None --no-user-output-enabled --member="serviceAccount:$SERVICE_ACCT_EMAIL" --role="roles/bigquery.connectionUser"
+ gcloud projects add-iam-policy-binding $PROJECT_ID --condition=None --no-user-output-enabled --member="serviceAccount:$SERVICE_ACCT_EMAIL" --role="roles/bigquery.connectionAdmin"
+ gcloud projects add-iam-policy-binding $PROJECT_ID --condition=None --no-user-output-enabled --member="serviceAccount:$SERVICE_ACCT_EMAIL" --role="roles/aiplatform.user"
+ gcloud projects add-iam-policy-binding $PROJECT_ID --condition=None --no-user-output-enabled --member="serviceAccount:$SERVICE_ACCT_EMAIL" --role="roles/bigquery.jobUser"
+ gcloud projects add-iam-policy-binding $PROJECT_ID --condition=None --no-user-output-enabled --member="serviceAccount:$SERVICE_ACCT_EMAIL" --role="roles/bigquery.dataEditor"
+ gcloud projects add-iam-policy-binding $PROJECT_ID --condition=None --no-user-output-enabled --member="serviceAccount:$SERVICE_ACCT_EMAIL" --role="roles/storage.admin"
+ gcloud projects add-iam-policy-binding $PROJECT_ID --condition=None --no-user-output-enabled --member="serviceAccount:$SERVICE_ACCT_EMAIL" --role="roles/storage.objectViewer"
+ return 0
+ else
+ echo "BQ Connection already exists: $CONNECTION_EXISTS"
+ return 0
+ fi
+}
+
get_project_id() {
local __resultvar=$1
VALUE=$(gcloud config get-value project | xargs)
diff --git a/scripts/generate-tf-backend.sh b/scripts/generate-tf-backend.sh
index 5a1178fc..481a0885 100755
--- a/scripts/generate-tf-backend.sh
+++ b/scripts/generate-tf-backend.sh
@@ -19,15 +19,15 @@ set -o nounset
. scripts/common.sh
-section_open "Check if the necessary dependencies are available: gcloud, gsutil, terraform, poetry"
+section_open "Check if the necessary dependencies are available: gcloud, gsutil, terraform, uv"
check_exec_dependency "gcloud"
check_exec_version "gcloud"
check_exec_dependency "gsutil"
check_exec_version "gsutil"
check_exec_dependency "terraform"
check_exec_version "terraform"
- check_exec_dependency "poetry"
- check_exec_version "poetry"
+ check_exec_dependency "uv"
+ check_exec_version "uv"
section_close
section_open "Check if the necessary variables are set: PROJECT_ID"
@@ -51,10 +51,6 @@ section_open "Enable all the required APIs"
enable_all_apis
section_close
-section_open "Install poetry libraries in the virtual environment for Terraform"
- poetry install
-section_close
-
section_open "Creating a new Google Cloud Storage bucket to store the Terraform state in ${TF_STATE_PROJECT} project, bucket: ${TF_STATE_BUCKET}"
if gsutil ls -b gs://"${TF_STATE_BUCKET}" >/dev/null 2>&1; then
printf "The ${TF_STATE_BUCKET} Google Cloud Storage bucket already exists. \n"
@@ -69,6 +65,11 @@ section_open "Creating terraform backend.tf configuration file"
create_terraform_backend_config_file "${TERRAFORM_RUN_DIR}" "${TF_STATE_BUCKET}"
section_close
+section_open "Creating BigQuery and Vertex AI connection"
+ create_bigquery_connection "${PROJECT_ID}" "${LOCATION}" "vertex_ai_conn"
+ create_bigquery_connection "${PROJECT_ID}" "US" "vertex_ai_conn"
+section_close
+
printf "$DIVIDER"
printf "You got the end the of your generate-tf-backend script with everything working. \n"
printf "$DIVIDER"
diff --git a/scripts/quick-install.sh b/scripts/quick-install.sh
new file mode 100755
index 00000000..57d0ed2c
--- /dev/null
+++ b/scripts/quick-install.sh
@@ -0,0 +1,137 @@
+#!/usr/bin/env sh
+
+# Copyright 2023 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+set -o errexit
+set -o nounset
+#set -x
+
+. scripts/common.sh
+
+section_open "Setting the gcloud project id"
+ # Ask user to input the project id
+ echo "Input the GCP Project Id where you want to deploy Marketing Analytics Jumpstart:"
+ read TF_STATE_PROJECT_ID
+ # Set the project id to the environment variable
+ export TF_STATE_PROJECT_ID
+ # Set the project id to the environment variable
+ export GOOGLE_CLOUD_PROJECT=${TF_STATE_PROJECT_ID}
+ # Set the project id to the environment variable
+ export GOOGLE_CLOUD_QUOTA_PROJECT=$GOOGLE_CLOUD_PROJECT
+ # Set the project id to the environment variable
+ export PROJECT_ID=$GOOGLE_CLOUD_PROJECT
+ # Disable prompts
+ gcloud config set disable_prompts true
+ # Set the project id to the gcloud configuration
+ gcloud config set project "${TF_STATE_PROJECT_ID}"
+section_close
+
+section_open "Enable all the required APIs"
+ enable_all_apis
+section_close
+
+section_open "Authenticate to Google Cloud Project"
+ gcloud auth login --project "${TF_STATE_PROJECT_ID}"
+ echo "Close the browser tab that was open and press any key to continue.."
+ read moveon
+section_close
+
+section_open "Setting Google Application Default Credentials"
+ gcloud config set disable_prompts false
+ gcloud auth application-default login --quiet --scopes="openid,https://www.googleapis.com/auth/userinfo.email,https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/sqlservice.login,https://www.googleapis.com/auth/analytics,https://www.googleapis.com/auth/analytics.edit,https://www.googleapis.com/auth/analytics.provision,https://www.googleapis.com/auth/analytics.readonly,https://www.googleapis.com/auth/accounts.reauth"
+ echo "Close the browser tab that was open and press any key to continue.."
+ read moveon
+ CREDENTIAL_FILE=`gcloud auth application-default set-quota-project "${PROJECT_ID}" 2>&1 | grep -e "Credentials saved to file:" | cut -d "[" -f2 | cut -d "]" -f1`
+ export GOOGLE_APPLICATION_CREDENTIALS=${CREDENTIAL_FILE}
+section_close
+
+section_open "Check OS system"
+ unameOut="$(uname -s)"
+ case "${unameOut}" in
+ Linux*) machine=Linux;;
+ Darwin*) machine=Mac;;
+ CYGWIN*) machine=Cygwin;;
+ MINGW*) machine=MinGw;;
+ MSYS_NT*) machine=Git;;
+ *) machine="UNKNOWN:${unameOut}"
+ esac
+ echo ${machine}
+section_close
+
+section_open "Configuring environment"
+ SOURCE_ROOT=$(pwd)
+ cd ${SOURCE_ROOT}
+
+ # Install python3.10
+ sudo chown -R ctimoteo /usr/local/sbin
+ chmod u+w /usr/local/sbin
+ if [ $machine == "Linux" ]; then
+ sudo DEBIAN_FRONTEND=noninteractive apt-get -qq -o=Dpkg::Use-Pty=0 install python3.10 --assume-yes
+ elif [ $machine == "Darwin" ]; then
+ brew install python@3.10
+ fi
+ CLOUDSDK_PYTHON=python3.10
+
+ # Install pipx
+ if [ $machine == "Linux" ]; then
+ sudo apt update
+ sudo apt install pipx
+ elif [ $machine == "Darwin" ]; then
+ brew install pipx
+ fi
+ pipx ensurepath
+
+ #pip3 install poetry
+ pipx install poetry
+ export PATH="$HOME/.local/bin:$PATH"
+ poetry env use python3.10
+ poetry --version
+
+ # Install tfenv
+ if [ ! -d ~/.tfenv ]; then
+ git clone --depth=1 https://github.com/tfutils/tfenv.git ~/.tfenv
+ echo 'export PATH="$HOME/.tfenv/bin:$PATH"' >> ~/.bash_profile
+ echo 'export PATH=$PATH:$HOME/.tfenv/bin' >> ~/.bashrc
+ fi
+ export PATH="$PATH:$HOME/.tfenv/bin"
+
+ # Install terraform version
+ tfenv install 1.5.7
+ tfenv use 1.5.7
+ terraform --version
+
+ # Generate TF backend
+ . scripts/generate-tf-backend.sh
+section_close
+
+section_open "Preparing Terraform Environment File"
+ TERRAFORM_RUN_DIR=${SOURCE_ROOT}/infrastructure/terraform
+ if [ ! -f $TERRAFORM_RUN_DIR/terraform.tfvars ]; then
+ . scripts/set-env.sh
+ sudo apt-get -qq -o=Dpkg::Use-Pty=0 install gettext
+ envsubst < "${SOURCE_ROOT}/infrastructure/cloudshell/terraform-template.tfvars" > "${TERRAFORM_RUN_DIR}/terraform.tfvars"
+ fi
+section_close
+
+section_open "Deploying Terraform Infrastructure Resources"
+ export PATH="$HOME/.local/bin:$PATH"
+ export PATH="$PATH:$HOME/.tfenv/bin"
+ terraform -chdir="${TERRAFORM_RUN_DIR}" init
+ terraform -chdir="${TERRAFORM_RUN_DIR}" apply
+section_close
+
+#set +x
+set +o nounset
+set +o errexit
diff --git a/sql/procedure/churn_propensity_training_preparation.sqlx b/sql/procedure/churn_propensity_training_preparation.sqlx
index 32056b62..36a8f657 100644
--- a/sql/procedure/churn_propensity_training_preparation.sqlx
+++ b/sql/procedure/churn_propensity_training_preparation.sqlx
@@ -849,189 +849,6 @@ WHERE
MOD(row_order_peruser_persplit-1, 30) = 0;
- CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_churn_propensity_training_30_30_balanced`
-(processed_timestamp,
- data_split,
- user_pseudo_id,
- user_id,
- user_ltv_revenue,
- device_category,
- device_mobile_brand_name,
- device_mobile_model_name,
- device_os,
- device_language,
- device_web_browser,
- geo_sub_continent,
- geo_country,
- geo_region,
- geo_city,
- geo_metro,
- last_traffic_source_medium,
- last_traffic_source_name,
- last_traffic_source_source,
- first_traffic_source_medium,
- first_traffic_source_name,
- first_traffic_source_source,
- has_signed_in_with_user_id,
- active_users_past_1_day,
- active_users_past_2_day,
- active_users_past_3_day,
- active_users_past_4_day,
- active_users_past_5_day,
- active_users_past_6_day,
- active_users_past_7_day,
- active_users_past_8_14_day,
- active_users_past_15_30_day,
- purchases_past_1_day,
- purchases_past_2_day,
- purchases_past_3_day,
- purchases_past_4_day,
- purchases_past_5_day,
- purchases_past_6_day,
- purchases_past_7_day,
- purchases_past_8_14_day,
- purchases_past_15_30_day,
- visits_past_1_day,
- visits_past_2_day,
- visits_past_3_day,
- visits_past_4_day,
- visits_past_5_day,
- visits_past_6_day,
- visits_past_7_day,
- visits_past_8_14_day,
- visits_past_15_30_day,
- view_items_past_1_day,
- view_items_past_2_day,
- view_items_past_3_day,
- view_items_past_4_day,
- view_items_past_5_day,
- view_items_past_6_day,
- view_items_past_7_day,
- view_items_past_8_14_day,
- view_items_past_15_30_day,
- add_to_carts_past_1_day,
- add_to_carts_past_2_day,
- add_to_carts_past_3_day,
- add_to_carts_past_4_day,
- add_to_carts_past_5_day,
- add_to_carts_past_6_day,
- add_to_carts_past_7_day,
- add_to_carts_past_8_14_day,
- add_to_carts_past_15_30_day,
- checkouts_past_1_day,
- checkouts_past_2_day,
- checkouts_past_3_day,
- checkouts_past_4_day,
- checkouts_past_5_day,
- checkouts_past_6_day,
- checkouts_past_7_day,
- checkouts_past_8_14_day,
- checkouts_past_15_30_day,
- churned)
-OPTIONS(
- --expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
- friendly_name="v_churn_propensity_training_30_30_balanced",
- description="View Churn Propensity Training dataset using 30 days back to predict 15 days ahead. View expires after 48h and should run daily.",
- labels=[("org_unit", "development")]
-) AS
- SELECT DISTINCT
- processed_timestamp, -- The timestamp the row was processed.
- data_split, -- The data split (train, validation, test) for the user.
- user_pseudo_id, -- The unique identifier for the user.
- user_id, -- The user ID.
- user_ltv_revenue, -- The lifetime value revenue for the user.
- device_category, -- The category of the device used by the user.
- device_mobile_brand_name, -- The brand name of the mobile device used by the user.
- device_mobile_model_name, -- The model name of the mobile device used by the user.
- device_os, -- The operating system of the device used by the user.
- device_language, -- The language used by the user.
- device_web_browser, -- The web browser used by the user.
- geo_sub_continent, -- The sub-continent of the user's location.
- geo_country, -- The country of the user's location.
- geo_region, -- The region of the user's location.
- geo_city, -- The city of the user's location.
- geo_metro, -- The metropolitan area of the user's location.
- last_traffic_source_medium, -- The medium used to reach the user's last session.
- last_traffic_source_name, -- The name of the traffic source used to reach the user's last session.
- last_traffic_source_source, -- The source of the last traffic source used by the user.
- first_traffic_source_medium, -- The medium of the first traffic source used by the user.
- first_traffic_source_name, -- The name of the first traffic source used by the user.
- first_traffic_source_source, -- The source of the first traffic source used by the user.
- has_signed_in_with_user_id, -- Whether the user has signed in with a user ID.
- active_users_past_1_day, -- The number of active users in the past 1 day for each user.
- active_users_past_2_day, -- The number of active users in the past 2 days for each user.
- active_users_past_3_day, -- The number of active users in the past 3 days for each user.
- active_users_past_4_day, -- The number of active users in the past 4 days for each user.
- active_users_past_5_day, -- The number of active users in the past 5 days for each user.
- active_users_past_6_day, -- The number of active users in the past 6 days for each user.
- active_users_past_7_day, -- The number of active users in the past 7 days for each user.
- active_users_past_8_14_day, -- The number of active users in the past 8-14 days for each user.
- active_users_past_15_30_day, -- The number of active users in the past 15-30 days for each user.
- purchases_past_1_day, -- The number of purchases in the past 1 day for each user.
- purchases_past_2_day, -- The number of purchases in the past 2 days for each user.
- purchases_past_3_day, -- The number of purchases in the past 3 days for each user.
- purchases_past_4_day, -- The number of purchases in the past 4 days for each user.
- purchases_past_5_day, -- The number of purchases in the past 5 days for each user.
- purchases_past_6_day, -- The number of purchases in the past 6 days for each user.
- purchases_past_7_day, -- The number of purchases in the past 7 days for each user.
- purchases_past_8_14_day, -- The number of purchases in the past 8-14 days for each user.
- purchases_past_15_30_day, -- The number of purchases in the past 15-30 days for each user.
- visits_past_1_day, -- The number of visits in the past 1 day for each user.
- visits_past_2_day, -- The number of visits in the past 2 days for each user.
- visits_past_3_day, -- The number of visits in the past 3 days for each user.
- visits_past_4_day, -- The number of visits in the past 4 days for each user.
- visits_past_5_day, -- The number of visits in the past 5 days for each user.
- visits_past_6_day, -- The number of visits in the past 6 days for each user.
- visits_past_7_day, -- The number of visits in the past 7 days for each user.
- visits_past_8_14_day, -- The number of visits in the past 8-14 days for each user.
- visits_past_15_30_day, -- The number of visits in the past 15-30 days for each user.
- view_items_past_1_day, -- The number of items viewed in the past 1 day for each user.
- view_items_past_2_day, -- The number of items viewed in the past 2 days for each user.
- view_items_past_3_day, -- The number of items viewed in the past 3 days for each user.
- view_items_past_4_day, -- The number of items viewed in the past 4 days for each user.
- view_items_past_5_day, -- The number of items viewed in the past 5 days for each user.
- view_items_past_6_day, -- The number of items viewed in the past 6 days for each user.
- view_items_past_7_day, -- The number of items viewed in the past 7 days for each user.
- view_items_past_8_14_day, -- The number of items viewed in the past 8-14 days for each user.
- view_items_past_15_30_day, -- The number of items viewed in the past 15-30 days for each user.
- add_to_carts_past_1_day, -- The number of items added to carts in the past 1 day for each user.
- add_to_carts_past_2_day, -- The number of items added to carts in the past 2 days for each user.
- add_to_carts_past_3_day, -- The number of items added to carts in the past 3 days for each user.
- add_to_carts_past_4_day, -- The number of items added to carts in the past 4 days for each user.
- add_to_carts_past_5_day, -- The number of items added to carts in the past 5 days for each user.
- add_to_carts_past_6_day, -- The number of items added to carts in the past 6 days for each user.
- add_to_carts_past_7_day, -- The number of items added to carts in the past 7 days for each user.
- add_to_carts_past_8_14_day, -- The number of items added to carts in the past 8-14 days for each user.
- add_to_carts_past_15_30_day, -- The number of items added to carts in the past 15-30 days for each user.
- checkouts_past_1_day, -- The number of checkouts in the past 1 day for each user.
- checkouts_past_2_day, -- The number of checkouts in the past 2 days for each user.
- checkouts_past_3_day, -- The number of checkouts in the past 3 days for each user.
- checkouts_past_4_day, -- The number of checkouts in the past 4 days for each user.
- checkouts_past_5_day, -- The number of checkouts in the past 5 days for each user.
- checkouts_past_6_day, -- The number of checkouts in the past 6 days for each user.
- checkouts_past_7_day, -- The number of checkouts in the past 7 days for each user.
- checkouts_past_8_14_day, -- The number of checkouts in the past 8-14 days for each user.
- checkouts_past_15_30_day, -- The number of checkouts in the past 15-30 days for each user.
- churned -- Whether the user churned.
- FROM (
- SELECT
- DISTINCT *,
- -- Adding a random number to the rows to shuffle them.
- ROW_NUMBER() OVER (PARTITION BY churned ORDER BY RAND()) AS rn
- FROM
- `{{project_id}}.{{dataset}}.v_churn_propensity_training_30_30` )
- WHERE
- rn <= (
- SELECT
- -- Counting the number of churned users.
- COUNT(churned)
- FROM
- `{{project_id}}.{{dataset}}.v_churn_propensity_training_30_30`
- WHERE
- churned = 1)
-;
-
-
CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_churn_propensity_training_30_30_last_window`
(processed_timestamp,
data_split,
diff --git a/sql/procedure/customer_lifetime_value_inference_preparation.sqlx b/sql/procedure/customer_lifetime_value_inference_preparation.sqlx
index 497f4a7c..c635ed3c 100644
--- a/sql/procedure/customer_lifetime_value_inference_preparation.sqlx
+++ b/sql/procedure/customer_lifetime_value_inference_preparation.sqlx
@@ -485,248 +485,6 @@ CREATE OR REPLACE TABLE `{{project_id}}.{{dataset}}.customer_lifetime_value_infe
FROM `{{project_id}}.{{dataset}}.{{insert_table}}`
);
-CREATE OR REPLACE TABLE `{{project_id}}.{{dataset}}.customer_lifetime_value_inference_180_90` AS(
- SELECT DISTINCT
- -- Adding processed_timestamp column with current timestamp
- CURRENT_TIMESTAMP() AS processed_timestamp,
- -- Adding feature_date column as it is
- feature_date,
- -- Adding user_pseudo_id column as it is
- user_pseudo_id,
- -- Selecting the last value of user_id for each user_pseudo_id and feature_date
- LAST_VALUE(user_id) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS user_id,
- -- Selecting the last value of device_category for each user_pseudo_id and feature_date
- LAST_VALUE(device_category) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_category,
- -- Selecting the last value of device_mobile_brand_name for each user_pseudo_id and feature_date
- LAST_VALUE(device_mobile_brand_name) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_mobile_brand_name,
- -- Selecting the last value of device_mobile_model_name for each user_pseudo_id and feature_date
- LAST_VALUE(device_mobile_model_name) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_mobile_model_name,
- -- Selecting the last value of device_os for each user_pseudo_id and feature_date
- LAST_VALUE(device_os) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_os,
- -- Selecting the last value of device_language for each user_pseudo_id and feature_date
- LAST_VALUE(device_language) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_language,
- -- Selecting the last value of device_web_browser for each user_pseudo_id and feature_date
- LAST_VALUE(device_web_browser) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_web_browser,
- -- Selecting the last value of geo_sub_continent for each user_pseudo_id and feature_date
- LAST_VALUE(geo_sub_continent) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_sub_continent,
- -- Selecting the last value of geo_country for each user_pseudo_id and feature_date
- LAST_VALUE(geo_country) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_country,
- -- Selecting the last value of geo_region for each user_pseudo_id and feature_date
- LAST_VALUE(geo_region) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_region,
- -- Selecting the last value of geo_city for each user_pseudo_id and feature_date
- LAST_VALUE(geo_city) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_city,
- -- Selecting the last value of geo_metro for each user_pseudo_id and feature_date
- LAST_VALUE(geo_metro) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_metro,
- -- Selecting the last value of last_traffic_source_medium for each user_pseudo_id and feature_date
- LAST_VALUE(last_traffic_source_medium) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS last_traffic_source_medium,
- -- Selecting the last value of last_traffic_source_name for each user_pseudo_id and feature_date
- LAST_VALUE(last_traffic_source_name) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS last_traffic_source_name,
- -- Selecting the last value of last_traffic_source_source for each user_pseudo_id and feature_date
- LAST_VALUE(last_traffic_source_source) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS last_traffic_source_source,
- -- Selecting the last value of first_traffic_source_medium for each user_pseudo_id and feature_date
- LAST_VALUE(first_traffic_source_medium) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS first_traffic_source_medium,
- -- Selecting the last value of first_traffic_source_name for each user_pseudo_id and feature_date
- LAST_VALUE(first_traffic_source_name) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS first_traffic_source_name,
- -- Selecting the last value of first_traffic_source_source for each user_pseudo_id and feature_date
- LAST_VALUE(first_traffic_source_source) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS first_traffic_source_source,
- -- Selecting the last value of has_signed_in_with_user_id for each user_pseudo_id and feature_date
- LAST_VALUE(has_signed_in_with_user_id) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS has_signed_in_with_user_id,
- -- Selecting the last value of active_users_past_1_30_day for each user_pseudo_id and feature_date
- LAST_VALUE(active_users_past_1_30_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_1_30_day,
- -- Selecting the last value of active_users_past_30_60_day for each user_pseudo_id and feature_date
- LAST_VALUE(active_users_past_30_60_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_30_60_day,
- -- Selecting the last value of active_users_past_60_90_day for each user_pseudo_id and feature_date
- LAST_VALUE(active_users_past_60_90_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_60_90_day,
- -- Selecting the last value of active_users_past_90_120_day for each user_pseudo_id and feature_date
- LAST_VALUE(active_users_past_90_120_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_90_120_day,
- -- Selecting the last value of active_users_past_120_150_day for each user_pseudo_id and feature_date
- LAST_VALUE(active_users_past_120_150_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_120_150_day,
- -- Selecting the last value of active_users_past_150_180_day for each user_pseudo_id and feature_date
- LAST_VALUE(active_users_past_150_180_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_150_180_day,
- -- Selecting the last value of purchases_past_1_30_day for each user_pseudo_id and feature_date
- LAST_VALUE(purchases_past_1_30_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_1_30_day,
- -- Selecting the last value of purchases_past_30_60_day for each user_pseudo_id and feature_date
- LAST_VALUE(purchases_past_30_60_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_30_60_day,
- -- Selecting the last value of purchases_past_60_90_day for each user_pseudo_id and feature_date
- LAST_VALUE(purchases_past_60_90_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_60_90_day,
- -- Selecting the last value of purchases_past_90_120_day for each user_pseudo_id and feature_date
- LAST_VALUE(purchases_past_90_120_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_90_120_day,
- -- Selecting the last value of purchases_past_120_150_day for each user_pseudo_id and feature_date
- LAST_VALUE(purchases_past_120_150_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_120_150_day,
- -- Selecting the last value of purchases_past_150_180_day for each user_pseudo_id and feature_date
- LAST_VALUE(purchases_past_150_180_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_150_180_day,
- -- Selecting the last value of visits_past_1_30_day for each user_pseudo_id and feature_date
- LAST_VALUE(visits_past_1_30_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_1_30_day,
- -- Selecting the last value of visits_past_30_60_day for each user_pseudo_id and feature_date
- LAST_VALUE(visits_past_30_60_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_30_60_day,
- -- Selecting the last value of visits_past_60_90_day for each user_pseudo_id and feature_date
- LAST_VALUE(visits_past_60_90_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_60_90_day,
- -- Selecting the last value of visits_past_90_120_day for each user_pseudo_id and feature_date
- LAST_VALUE(visits_past_90_120_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_90_120_day,
- -- Selecting the last value of visits_past_120_150_day for each user_pseudo_id and feature_date
- LAST_VALUE(visits_past_120_150_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_120_150_day,
- -- Selecting the last value of visits_past_150_180_day for each user_pseudo_id and feature_date
- LAST_VALUE(visits_past_150_180_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_150_180_day,
- -- Selecting the last value of view_items_past_1_30_day for each user_pseudo_id and feature_date
- LAST_VALUE(view_items_past_1_30_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_1_30_day,
- -- Selecting the last value of view_items_past_30_60_day for each user_pseudo_id and feature_date
- LAST_VALUE(view_items_past_30_60_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_30_60_day,
- -- Selecting the last value of view_items_past_60_90_day for each user_pseudo_id and feature_date
- LAST_VALUE(view_items_past_60_90_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_60_90_day,
- -- Selecting the last value of view_items_past_90_120_day for each user_pseudo_id and feature_date
- LAST_VALUE(view_items_past_90_120_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_90_120_day,
- -- Selecting the last value of view_items_past_120_150_day for each user_pseudo_id and feature_date
- LAST_VALUE(view_items_past_120_150_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_120_150_day,
- -- Selecting the last value of view_items_past_150_180_day for each user_pseudo_id and feature_date
- LAST_VALUE(view_items_past_150_180_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_150_180_day,
- -- Selecting the last value of add_to_carts_past_1_30_day for each user_pseudo_id and feature_date
- LAST_VALUE(add_to_carts_past_1_30_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_1_30_day,
- -- Selecting the last value of add_to_carts_past_30_60_day for each user_pseudo_id and feature_date
- LAST_VALUE(add_to_carts_past_30_60_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_30_60_day,
- -- Selecting the last value of add_to_carts_past_60_90_day for each user_pseudo_id and feature_date
- LAST_VALUE(add_to_carts_past_60_90_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_60_90_day,
- -- Selecting the last value of add_to_carts_past_90_120_day for each user_pseudo_id and feature_date
- LAST_VALUE(add_to_carts_past_90_120_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_90_120_day,
- -- Selecting the last value of add_to_carts_past_120_150_day for each user_pseudo_id and feature_date
- LAST_VALUE(add_to_carts_past_120_150_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_120_150_day,
- -- Selecting the last value of add_to_carts_past_150_180_day for each user_pseudo_id and feature_date
- LAST_VALUE(add_to_carts_past_150_180_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_150_180_day,
- -- Selecting the last value of checkouts_past_1_30_day for each user_pseudo_id and feature_date
- LAST_VALUE(checkouts_past_1_30_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_1_30_day,
- -- Selecting the last value of checkouts_past_30_60_day for each user_pseudo_id and feature_date
- LAST_VALUE(checkouts_past_30_60_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_30_60_day,
- -- Selecting the last value of checkouts_past_60_90_day for each user_pseudo_id and feature_date
- LAST_VALUE(checkouts_past_60_90_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_60_90_day,
- -- Selecting the last value of checkouts_past_90_120_day for each user_pseudo_id and feature_date
- LAST_VALUE(checkouts_past_90_120_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_90_120_day,
- -- Selecting the last value of checkouts_past_120_150_day for each user_pseudo_id and feature_date
- LAST_VALUE(checkouts_past_120_150_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_120_150_day,
- -- Selecting the last value of checkouts_past_150_180_day for each user_pseudo_id and feature_date
- LAST_VALUE(checkouts_past_150_180_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_150_180_day
- FROM `{{project_id}}.{{dataset}}.{{insert_table}}`
-);
-
-CREATE OR REPLACE TABLE `{{project_id}}.{{dataset}}.customer_lifetime_value_inference_180_180` AS(
- SELECT DISTINCT
- -- Retrieves the current timestamp when the query is executed.
- CURRENT_TIMESTAMP() AS processed_timestamp,
- -- The date for which the features are extracted.
- feature_date,
- -- The unique identifier for the user.
- user_pseudo_id,
- -- Extracts the last user ID for each user_pseudo_id and feature_date.
- LAST_VALUE(user_id) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS user_id,
- -- Extracts the last device category for each user_pseudo_id and feature_date.
- LAST_VALUE(device_category) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_category,
- -- Extracts the last device mobile brand name for each user_pseudo_id and feature_date.
- LAST_VALUE(device_mobile_brand_name) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_mobile_brand_name,
- -- Extracts the last device mobile model name for each user_pseudo_id and feature_date.
- LAST_VALUE(device_mobile_model_name) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_mobile_model_name,
- -- Extracts the last device operating system for each user_pseudo_id and feature_date.
- LAST_VALUE(device_os) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_os,
- -- Extracts the last device language for each user_pseudo_id and feature_date.
- LAST_VALUE(device_language) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_language,
- -- Extracts the last device web browser for each user_pseudo_id and feature_date.
- LAST_VALUE(device_web_browser) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_web_browser,
- -- Extracts the last geo subcontinent for each user_pseudo_id and feature_date.
- LAST_VALUE(geo_sub_continent) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_sub_continent,
- -- Extracts the last geo country for each user_pseudo_id and feature_date.
- LAST_VALUE(geo_country) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_country,
- -- Extracts the last geo region for each user_pseudo_id and feature_date.
- LAST_VALUE(geo_region) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_region,
- -- Extracts the last geo city for each user_pseudo_id and feature_date.
- LAST_VALUE(geo_city) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_city,
- -- Extracts the last geo metro for each user_pseudo_id and feature_date.
- LAST_VALUE(geo_metro) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_metro,
- -- Extracts the last traffic source medium for each user_pseudo_id and feature_date.
- LAST_VALUE(last_traffic_source_medium) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS last_traffic_source_medium,
- -- Extracts the last traffic source name for each user_pseudo_id and feature_date.
- LAST_VALUE(last_traffic_source_name) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS last_traffic_source_name,
- -- Extracts the last traffic source source for each user_pseudo_id and feature_date.
- LAST_VALUE(last_traffic_source_source) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS last_traffic_source_source,
- -- Extracts the last first traffic source medium for each user_pseudo_id and feature_date.
- LAST_VALUE(first_traffic_source_medium) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS first_traffic_source_medium,
- -- Extracts the last first traffic source name for each user_pseudo_id and feature_date.
- LAST_VALUE(first_traffic_source_name) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS first_traffic_source_name,
- -- Extracts the last first traffic source source for each user_pseudo_id and feature_date.
- LAST_VALUE(first_traffic_source_source) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS first_traffic_source_source,
- -- Extracts the last has_signed_in_with_user_id for each user_pseudo_id and feature_date.
- LAST_VALUE(has_signed_in_with_user_id) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS has_signed_in_with_user_id,
- -- Extracts the last active_users_past_1_30_day for each user_pseudo_id and feature_date.
- LAST_VALUE(active_users_past_1_30_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_1_30_day,
- -- Extracts the last active_users_past_30_60_day for each user_pseudo_id and feature_date.
- LAST_VALUE(active_users_past_30_60_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_30_60_day,
- -- Extracts the last active_users_past_60_90_day for each user_pseudo_id and feature_date.
- LAST_VALUE(active_users_past_60_90_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_60_90_day,
- -- Extracts the last active_users_past_90_120_day for each user_pseudo_id and feature_date.
- LAST_VALUE(active_users_past_90_120_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_90_120_day,
- -- Extracts the last active_users_past_120_150_day for each user_pseudo_id and feature_date.
- LAST_VALUE(active_users_past_120_150_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_120_150_day,
- -- Extracts the last active_users_past_150_180_day for each user_pseudo_id and feature_date.
- LAST_VALUE(active_users_past_150_180_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_150_180_day,
- -- Extracts the last purchases_past_1_30_day for each user_pseudo_id and feature_date.
- LAST_VALUE(purchases_past_1_30_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_1_30_day,
- -- Extracts the last purchases_past_30_60_day for each user_pseudo_id and feature_date.
- LAST_VALUE(purchases_past_30_60_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_30_60_day,
- -- Extracts the last purchases_past_60_90_day for each user_pseudo_id and feature_date.
- LAST_VALUE(purchases_past_60_90_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_60_90_day,
- -- Extracts the last purchases_past_90_120_day for each user_pseudo_id and feature_date.
- LAST_VALUE(purchases_past_90_120_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_90_120_day,
- -- Extracts the last purchases_past_120_150_day for each user_pseudo_id and feature_date.
- LAST_VALUE(purchases_past_120_150_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_120_150_day,
- -- Extracts the last purchases_past_150_180_day for each user_pseudo_id and feature_date.
- LAST_VALUE(purchases_past_150_180_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_150_180_day,
- -- Extracts the last visits_past_1_30_day for each user_pseudo_id and feature_date.
- LAST_VALUE(visits_past_1_30_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_1_30_day,
- -- Extracts the last visits_past_30_60_day for each user_pseudo_id and feature_date.
- LAST_VALUE(visits_past_30_60_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_30_60_day,
- -- Extracts the last visits_past_60_90_day for each user_pseudo_id and feature_date.
- LAST_VALUE(visits_past_60_90_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_60_90_day,
- -- Extracts the last visits_past_90_120_day for each user_pseudo_id and feature_date.
- LAST_VALUE(visits_past_90_120_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_90_120_day,
- -- Extracts the last visits_past_120_150_day for each user_pseudo_id and feature_date.
- LAST_VALUE(visits_past_120_150_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_120_150_day,
- -- Extracts the last visits_past_150_180_day for each user_pseudo_id and feature_date.
- LAST_VALUE(visits_past_150_180_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_150_180_day,
- -- Extracts the last view_items_past_1_30_day for each user_pseudo_id and feature_date.
- LAST_VALUE(view_items_past_1_30_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_1_30_day,
- -- Extracts the last view_items_past_30_60_day for each user_pseudo_id and feature_date.
- LAST_VALUE(view_items_past_30_60_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_30_60_day,
- -- Extracts the last view_items_past_60_90_day for each user_pseudo_id and feature_date.
- LAST_VALUE(view_items_past_60_90_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_60_90_day,
- -- Extracts the last view_items_past_90_120_day for each user_pseudo_id and feature_date.
- LAST_VALUE(view_items_past_90_120_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_90_120_day,
- -- Extracts the last view_items_past_120_150_day for each user_pseudo_id and feature_date.
- LAST_VALUE(view_items_past_120_150_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_120_150_day,
- -- Extracts the last view_items_past_150_180_day for each user_pseudo_id and feature_date.
- LAST_VALUE(view_items_past_150_180_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_150_180_day,
- -- Extracts the last add_to_carts_past_1_30_day for each user_pseudo_id and feature_date.
- LAST_VALUE(add_to_carts_past_1_30_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_1_30_day,
- -- Extracts the last add_to_carts_past_30_60_day for each user_pseudo_id and feature_date.
- LAST_VALUE(add_to_carts_past_30_60_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_30_60_day,
- -- Extracts the last add_to_carts_past_60_90_day for each user_pseudo_id and feature_date.
- LAST_VALUE(add_to_carts_past_60_90_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_60_90_day,
- -- Extracts the last add_to_carts_past_90_120_day for each user_pseudo_id and feature_date.
- LAST_VALUE(add_to_carts_past_90_120_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_90_120_day,
- -- Extracts the last add_to_carts_past_120_150_day for each user_pseudo_id and feature_date.
- LAST_VALUE(add_to_carts_past_120_150_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_120_150_day,
- -- Extracts the last add_to_carts_past_150_180_day for each user_pseudo_id and feature_date.
- LAST_VALUE(add_to_carts_past_150_180_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_150_180_day,
- -- Extracts the last checkouts_past_1_30_day for each user_pseudo_id and feature_date.
- LAST_VALUE(checkouts_past_1_30_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_1_30_day,
- -- Extracts the last checkouts_past_30_60_day for each user_pseudo_id and feature_date.
- LAST_VALUE(checkouts_past_30_60_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_30_60_day,
- -- Extracts the last checkouts_past_60_90_day for each user_pseudo_id and feature_date.
- LAST_VALUE(checkouts_past_60_90_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_60_90_day,
- -- Extracts the last checkouts_past_90_120_day for each user_pseudo_id and feature_date.
- LAST_VALUE(checkouts_past_90_120_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_90_120_day,
- -- Extracts the last checkouts_past_120_150_day for each user_pseudo_id and feature_date.
- LAST_VALUE(checkouts_past_120_150_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_120_150_day,
- -- Extracts the last checkouts_past_150_180_day for each user_pseudo_id and feature_date.
- LAST_VALUE(checkouts_past_150_180_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_150_180_day
- FROM `{{project_id}}.{{dataset}}.{{insert_table}}`
-);
-
CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_customer_lifetime_value_inference_180_30`
(processed_timestamp,
@@ -920,392 +678,3 @@ SELECT DISTINCT
WHERE
-- Filter only for one row for each user_pseudo_id
user_row_order = 1;
-
-
-
-CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_customer_lifetime_value_inference_180_90`
-(processed_timestamp,
- feature_date,
- user_pseudo_id,
- user_id,
- device_category,
- device_mobile_brand_name,
- device_mobile_model_name,
- device_os,
- device_language,
- device_web_browser,
- geo_sub_continent,
- geo_country,
- geo_region,
- geo_city,
- geo_metro,
- last_traffic_source_medium,
- last_traffic_source_name,
- last_traffic_source_source,
- first_traffic_source_medium,
- first_traffic_source_name,
- first_traffic_source_source,
- has_signed_in_with_user_id,
- active_users_past_1_30_day,
- active_users_past_30_60_day,
- active_users_past_60_90_day,
- active_users_past_90_120_day,
- active_users_past_120_150_day,
- active_users_past_150_180_day,
- purchases_past_1_30_day,
- purchases_past_30_60_day,
- purchases_past_60_90_day,
- purchases_past_90_120_day,
- purchases_past_120_150_day,
- purchases_past_150_180_day,
- visits_past_1_30_day,
- visits_past_30_60_day,
- visits_past_60_90_day,
- visits_past_90_120_day,
- visits_past_120_150_day,
- visits_past_150_180_day,
- view_items_past_1_30_day,
- view_items_past_30_60_day,
- view_items_past_60_90_day,
- view_items_past_90_120_day,
- view_items_past_120_150_day,
- view_items_past_150_180_day,
- add_to_carts_past_1_30_day,
- add_to_carts_past_30_60_day,
- add_to_carts_past_60_90_day,
- add_to_carts_past_90_120_day,
- add_to_carts_past_120_150_day,
- add_to_carts_past_150_180_day,
- checkouts_past_1_30_day,
- checkouts_past_30_60_day,
- checkouts_past_60_90_day,
- checkouts_past_90_120_day,
- checkouts_past_120_150_day,
- checkouts_past_150_180_day
- )
-OPTIONS(
- --expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL {{expiration_duration_hours}} HOUR),
- friendly_name="v_customer_lifetime_value_inference_180_90",
- description="View Purchase Propensity Inference dataset using 15 days back to predict 7 days ahead. View expires after 48h and should run daily.",
- labels=[("org_unit", "development")]
-) AS
-SELECT DISTINCT
- processed_timestamp,
- feature_date,
- user_pseudo_id,
- user_id,
- device_category,
- device_mobile_brand_name,
- device_mobile_model_name,
- device_os,
- device_language,
- device_web_browser,
- geo_sub_continent,
- geo_country,
- geo_region,
- geo_city,
- geo_metro,
- last_traffic_source_medium,
- last_traffic_source_name,
- last_traffic_source_source,
- first_traffic_source_medium,
- first_traffic_source_name,
- first_traffic_source_source,
- has_signed_in_with_user_id,
- active_users_past_1_30_day,
- active_users_past_30_60_day,
- active_users_past_60_90_day,
- active_users_past_90_120_day,
- active_users_past_120_150_day,
- active_users_past_150_180_day,
- purchases_past_1_30_day,
- purchases_past_30_60_day,
- purchases_past_60_90_day,
- purchases_past_90_120_day,
- purchases_past_120_150_day,
- purchases_past_150_180_day,
- visits_past_1_30_day,
- visits_past_30_60_day,
- visits_past_60_90_day,
- visits_past_90_120_day,
- visits_past_120_150_day,
- visits_past_150_180_day,
- view_items_past_1_30_day,
- view_items_past_30_60_day,
- view_items_past_60_90_day,
- view_items_past_90_120_day,
- view_items_past_120_150_day,
- view_items_past_150_180_day,
- add_to_carts_past_1_30_day,
- add_to_carts_past_30_60_day,
- add_to_carts_past_60_90_day,
- add_to_carts_past_90_120_day,
- add_to_carts_past_120_150_day,
- add_to_carts_past_150_180_day,
- checkouts_past_1_30_day,
- checkouts_past_30_60_day,
- checkouts_past_60_90_day,
- checkouts_past_90_120_day,
- checkouts_past_120_150_day,
- checkouts_past_150_180_day
-FROM(
-SELECT DISTINCT
- processed_timestamp,
- feature_date,
- user_pseudo_id,
- user_id,
- device_category,
- device_mobile_brand_name,
- device_mobile_model_name,
- device_os,
- device_language,
- device_web_browser,
- geo_sub_continent,
- geo_country,
- geo_region,
- geo_city,
- geo_metro,
- last_traffic_source_medium,
- last_traffic_source_name,
- last_traffic_source_source,
- first_traffic_source_medium,
- first_traffic_source_name,
- first_traffic_source_source,
- has_signed_in_with_user_id,
- active_users_past_1_30_day,
- active_users_past_30_60_day,
- active_users_past_60_90_day,
- active_users_past_90_120_day,
- active_users_past_120_150_day,
- active_users_past_150_180_day,
- purchases_past_1_30_day,
- purchases_past_30_60_day,
- purchases_past_60_90_day,
- purchases_past_90_120_day,
- purchases_past_120_150_day,
- purchases_past_150_180_day,
- visits_past_1_30_day,
- visits_past_30_60_day,
- visits_past_60_90_day,
- visits_past_90_120_day,
- visits_past_120_150_day,
- visits_past_150_180_day,
- view_items_past_1_30_day,
- view_items_past_30_60_day,
- view_items_past_60_90_day,
- view_items_past_90_120_day,
- view_items_past_120_150_day,
- view_items_past_150_180_day,
- add_to_carts_past_1_30_day,
- add_to_carts_past_30_60_day,
- add_to_carts_past_60_90_day,
- add_to_carts_past_90_120_day,
- add_to_carts_past_120_150_day,
- add_to_carts_past_150_180_day,
- checkouts_past_1_30_day,
- checkouts_past_30_60_day,
- checkouts_past_60_90_day,
- checkouts_past_90_120_day,
- checkouts_past_120_150_day,
- checkouts_past_150_180_day,
- -- Gets a row number for each user_pseudo_id ordered by feature_date descending
- ROW_NUMBER() OVER (PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS user_row_order
- FROM `{{project_id}}.{{dataset}}.customer_lifetime_value_inference_180_90`
-)
-WHERE
- -- Filters only for one row for each user_pseudo_id
- user_row_order = 1;
-
-
- CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_customer_lifetime_value_training_180_180`
-(processed_timestamp,
- feature_date,
- user_pseudo_id,
- user_id,
- device_category,
- device_mobile_brand_name,
- device_mobile_model_name,
- device_os,
- device_language,
- device_web_browser,
- geo_sub_continent,
- geo_country,
- geo_region,
- geo_city,
- geo_metro,
- last_traffic_source_medium,
- last_traffic_source_name,
- last_traffic_source_source,
- first_traffic_source_medium,
- first_traffic_source_name,
- first_traffic_source_source,
- has_signed_in_with_user_id,
- active_users_past_1_30_day,
- active_users_past_30_60_day,
- active_users_past_60_90_day,
- active_users_past_90_120_day,
- active_users_past_120_150_day,
- active_users_past_150_180_day,
- purchases_past_1_30_day,
- purchases_past_30_60_day,
- purchases_past_60_90_day,
- purchases_past_90_120_day,
- purchases_past_120_150_day,
- purchases_past_150_180_day,
- visits_past_1_30_day,
- visits_past_30_60_day,
- visits_past_60_90_day,
- visits_past_90_120_day,
- visits_past_120_150_day,
- visits_past_150_180_day,
- view_items_past_1_30_day,
- view_items_past_30_60_day,
- view_items_past_60_90_day,
- view_items_past_90_120_day,
- view_items_past_120_150_day,
- view_items_past_150_180_day,
- add_to_carts_past_1_30_day,
- add_to_carts_past_30_60_day,
- add_to_carts_past_60_90_day,
- add_to_carts_past_90_120_day,
- add_to_carts_past_120_150_day,
- add_to_carts_past_150_180_day,
- checkouts_past_1_30_day,
- checkouts_past_30_60_day,
- checkouts_past_60_90_day,
- checkouts_past_90_120_day,
- checkouts_past_120_150_day,
- checkouts_past_150_180_day
- )
-OPTIONS(
- --expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL {{expiration_duration_hours}} HOUR),
- friendly_name="v_customer_lifetime_value_inference_180_180",
- description="View Purchase Propensity Inference dataset using 30 days back to predict 15 days ahead. View expires after 48h and should run daily.",
- labels=[("org_unit", "development")]
-) AS
-SELECT DISTINCT
- processed_timestamp,
- feature_date,
- user_pseudo_id,
- user_id,
- device_category,
- device_mobile_brand_name,
- device_mobile_model_name,
- device_os,
- device_language,
- device_web_browser,
- geo_sub_continent,
- geo_country,
- geo_region,
- geo_city,
- geo_metro,
- last_traffic_source_medium,
- last_traffic_source_name,
- last_traffic_source_source,
- first_traffic_source_medium,
- first_traffic_source_name,
- first_traffic_source_source,
- has_signed_in_with_user_id,
- active_users_past_1_30_day,
- active_users_past_30_60_day,
- active_users_past_60_90_day,
- active_users_past_90_120_day,
- active_users_past_120_150_day,
- active_users_past_150_180_day,
- purchases_past_1_30_day,
- purchases_past_30_60_day,
- purchases_past_60_90_day,
- purchases_past_90_120_day,
- purchases_past_120_150_day,
- purchases_past_150_180_day,
- visits_past_1_30_day,
- visits_past_30_60_day,
- visits_past_60_90_day,
- visits_past_90_120_day,
- visits_past_120_150_day,
- visits_past_150_180_day,
- view_items_past_1_30_day,
- view_items_past_30_60_day,
- view_items_past_60_90_day,
- view_items_past_90_120_day,
- view_items_past_120_150_day,
- view_items_past_150_180_day,
- add_to_carts_past_1_30_day,
- add_to_carts_past_30_60_day,
- add_to_carts_past_60_90_day,
- add_to_carts_past_90_120_day,
- add_to_carts_past_120_150_day,
- add_to_carts_past_150_180_day,
- checkouts_past_1_30_day,
- checkouts_past_30_60_day,
- checkouts_past_60_90_day,
- checkouts_past_90_120_day,
- checkouts_past_120_150_day,
- checkouts_past_150_180_day
-FROM (
-SELECT DISTINCT
- processed_timestamp,
- feature_date,
- user_pseudo_id,
- user_id,
- device_category,
- device_mobile_brand_name,
- device_mobile_model_name,
- device_os,
- device_language,
- device_web_browser,
- geo_sub_continent,
- geo_country,
- geo_region,
- geo_city,
- geo_metro,
- last_traffic_source_medium,
- last_traffic_source_name,
- last_traffic_source_source,
- first_traffic_source_medium,
- first_traffic_source_name,
- first_traffic_source_source,
- has_signed_in_with_user_id,
- active_users_past_1_30_day,
- active_users_past_30_60_day,
- active_users_past_60_90_day,
- active_users_past_90_120_day,
- active_users_past_120_150_day,
- active_users_past_150_180_day,
- purchases_past_1_30_day,
- purchases_past_30_60_day,
- purchases_past_60_90_day,
- purchases_past_90_120_day,
- purchases_past_120_150_day,
- purchases_past_150_180_day,
- visits_past_1_30_day,
- visits_past_30_60_day,
- visits_past_60_90_day,
- visits_past_90_120_day,
- visits_past_120_150_day,
- visits_past_150_180_day,
- view_items_past_1_30_day,
- view_items_past_30_60_day,
- view_items_past_60_90_day,
- view_items_past_90_120_day,
- view_items_past_120_150_day,
- view_items_past_150_180_day,
- add_to_carts_past_1_30_day,
- add_to_carts_past_30_60_day,
- add_to_carts_past_60_90_day,
- add_to_carts_past_90_120_day,
- add_to_carts_past_120_150_day,
- add_to_carts_past_150_180_day,
- checkouts_past_1_30_day,
- checkouts_past_30_60_day,
- checkouts_past_60_90_day,
- checkouts_past_90_120_day,
- checkouts_past_120_150_day,
- checkouts_past_150_180_day,
- -- Gets a row number for each user_pseudo_id ordered by feature_date descending
- ROW_NUMBER() OVER (PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS user_row_order
- FROM `{{project_id}}.{{dataset}}.customer_lifetime_value_inference_180_180`
-)
-WHERE
- -- Filter only for one row for each user_pseudo_id
- user_row_order = 1;
diff --git a/sql/procedure/customer_lifetime_value_training_preparation.sqlx b/sql/procedure/customer_lifetime_value_training_preparation.sqlx
index 9780547e..88d9c29f 100644
--- a/sql/procedure/customer_lifetime_value_training_preparation.sqlx
+++ b/sql/procedure/customer_lifetime_value_training_preparation.sqlx
@@ -560,681 +560,10 @@ CREATE OR REPLACE TABLE `{{project_id}}.{{dataset}}.customer_lifetime_value_trai
WHERE pltv_revenue_30_days > 0.0
);
--- Prepares the non-duplocated features and labels for the CLTV model looking back 180 days to predict 90 days.
-CREATE OR REPLACE TABLE `{{project_id}}.{{dataset}}.customer_lifetime_value_training_180_90` AS(
- SELECT DISTINCT
- -- Current timestamp for processing time
- CURRENT_TIMESTAMP() AS processed_timestamp,
- -- Data split for training and testing
- data_split,
- -- Feature date for the features
- feature_date,
- -- User pseudo id for identifying users
- user_pseudo_id,
- -- Get the latest user id for each user and feature date
- LAST_VALUE(user_id) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS user_id,
- -- Get the latest device category for each user and feature date
- LAST_VALUE(device_category) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_category,
- -- Get the latest device mobile brand name for each user and feature date
- LAST_VALUE(device_mobile_brand_name) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_mobile_brand_name,
- -- Get the latest device mobile model name for each user and feature date
- LAST_VALUE(device_mobile_model_name) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_mobile_model_name,
- -- Get the latest device os for each user and feature date
- LAST_VALUE(device_os) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_os,
- -- Get the latest device language for each user and feature date
- LAST_VALUE(device_language) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_language,
- -- Get the latest device web browser for each user and feature date
- LAST_VALUE(device_web_browser) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_web_browser,
- -- Get the latest geo sub continent for each user and feature date
- LAST_VALUE(geo_sub_continent) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_sub_continent,
- -- Get the latest geo country for each user and feature date
- LAST_VALUE(geo_country) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_country,
- -- Get the latest geo region for each user and feature date
- LAST_VALUE(geo_region) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_region,
- -- Get the latest geo city for each user and feature date
- LAST_VALUE(geo_city) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_city,
- -- Get the latest geo metro for each user and feature date
- LAST_VALUE(geo_metro) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_metro,
- -- Get the latest last traffic source medium for each user and feature date
- LAST_VALUE(last_traffic_source_medium) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS last_traffic_source_medium,
- -- Get the latest last traffic source name for each user and feature date
- LAST_VALUE(last_traffic_source_name) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS last_traffic_source_name,
- -- Get the latest last traffic source source for each user and feature date
- LAST_VALUE(last_traffic_source_source) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS last_traffic_source_source,
- -- Get the latest first traffic source medium for each user and feature date
- LAST_VALUE(first_traffic_source_medium) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS first_traffic_source_medium,
- -- Get the latest first traffic source name for each user and feature date
- LAST_VALUE(first_traffic_source_name) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS first_traffic_source_name,
- -- Get the latest first traffic source source for each user and feature date
- LAST_VALUE(first_traffic_source_source) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS first_traffic_source_source,
- -- Get the latest has signed in with user id for each user and feature date
- LAST_VALUE(has_signed_in_with_user_id) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS has_signed_in_with_user_id,
- -- Get the latest active users past 1-30 day for each user and feature date
- LAST_VALUE(active_users_past_1_30_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_1_30_day,
- -- Get the latest active users past 30-60 day for each user and feature date
- LAST_VALUE(active_users_past_30_60_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_30_60_day,
- -- Get the latest active users past 60-90 day for each user and feature date
- LAST_VALUE(active_users_past_60_90_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_60_90_day,
- -- Get the latest active users past 90-120 day for each user and feature date
- LAST_VALUE(active_users_past_90_120_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_90_120_day,
- -- Get the latest active users past 120-150 day for each user and feature date
- LAST_VALUE(active_users_past_120_150_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_120_150_day,
- -- Get the latest active users past 150-180 day for each user and feature date
- LAST_VALUE(active_users_past_150_180_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_150_180_day,
- -- Get the latest purchases past 1-30 day for each user and feature date
- LAST_VALUE(purchases_past_1_30_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_1_30_day,
- -- Get the latest purchases past 30-60 day for each user and feature date
- LAST_VALUE(purchases_past_30_60_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_30_60_day,
- -- Get the latest purchases past 60-90 day for each user and feature date
- LAST_VALUE(purchases_past_60_90_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_60_90_day,
- -- Get the latest purchases past 90-120 day for each user and feature date
- LAST_VALUE(purchases_past_90_120_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_90_120_day,
- -- Get the latest purchases past 120-150 day for each user and feature date
- LAST_VALUE(purchases_past_120_150_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_120_150_day,
- -- Get the latest purchases past 150-180 day for each user and feature date
- LAST_VALUE(purchases_past_150_180_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_150_180_day,
- -- Get the latest visits past 1-30 day for each user and feature date
- LAST_VALUE(visits_past_1_30_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_1_30_day,
- -- Get the latest visits past 30-60 day for each user and feature date
- LAST_VALUE(visits_past_30_60_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_30_60_day,
- -- Get the latest visits past 60-90 day for each user and feature date
- LAST_VALUE(visits_past_60_90_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_60_90_day,
- -- Get the latest visits past 90-120 day for each user and feature date
- LAST_VALUE(visits_past_90_120_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_90_120_day,
- -- Get the latest visits past 120-150 day for each user and feature date
- LAST_VALUE(visits_past_120_150_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_120_150_day,
- -- Get the latest visits past 150-180 day for each user and feature date
- LAST_VALUE(visits_past_150_180_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_150_180_day,
- -- Get the latest view items past 1-30 day for each user and feature date
- LAST_VALUE(view_items_past_1_30_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_1_30_day,
- -- Get the latest view items past 30-60 day for each user and feature date
- LAST_VALUE(view_items_past_30_60_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_30_60_day,
- -- Get the latest view items past 60-90 day for each user and feature date
- LAST_VALUE(view_items_past_60_90_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_60_90_day,
- -- Get the latest view items past 90-120 day for each user and feature date
- LAST_VALUE(view_items_past_90_120_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_90_120_day,
- -- Get the latest view items past 120-150 day for each user and feature date
- LAST_VALUE(view_items_past_120_150_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_120_150_day,
- -- Get the latest view items past 150-180 day for each user and feature date
- LAST_VALUE(view_items_past_150_180_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_150_180_day,
- -- Get the latest add to carts past 1-30 day for each user and feature date
- LAST_VALUE(add_to_carts_past_1_30_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_1_30_day,
- -- Get the latest add to carts past 30-60 day for each user and feature date
- LAST_VALUE(add_to_carts_past_30_60_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_30_60_day,
- -- Get the latest add to carts past 60-90 day for each user and feature date
- LAST_VALUE(add_to_carts_past_60_90_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_60_90_day,
- -- Get the latest add to carts past 90-120 day for each user and feature date
- LAST_VALUE(add_to_carts_past_90_120_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_90_120_day,
- -- Get the latest add to carts past 120-150 day for each user and feature date
- LAST_VALUE(add_to_carts_past_120_150_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_120_150_day,
- -- Get the latest add to carts past 150-180 day for each user and feature date
- LAST_VALUE(add_to_carts_past_150_180_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_150_180_day,
- -- Get the latest checkouts past 1-30 day for each user and feature date
- LAST_VALUE(checkouts_past_1_30_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_1_30_day,
- -- Get the latest checkouts past 30-60 day for each user and feature date
- LAST_VALUE(checkouts_past_30_60_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_30_60_day,
- -- Get the latest checkouts past 60-90 day for each user and feature date
- LAST_VALUE(checkouts_past_60_90_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_60_90_day,
- -- Get the latest checkouts past 90-120 day for each user and feature date
- LAST_VALUE(checkouts_past_90_120_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_90_120_day,
- -- Get the latest checkouts past 120-150 day for each user and feature date
- LAST_VALUE(checkouts_past_120_150_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_120_150_day,
- -- Get the latest checkouts past 150-180 day for each user and feature date
- LAST_VALUE(checkouts_past_150_180_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_150_180_day,
- -- Get the latest pltv revenue 90 days for each user and feature date
- LAST_VALUE(pltv_revenue_90_days) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS pltv_revenue_90_days
- FROM `{{project_id}}.{{dataset}}.{{insert_table}}`
- -- Filter for users with pltv revenue 90 days greater than 0
- WHERE pltv_revenue_90_days > 0.0
-);
-
--- Prepares the non-duplocated features and labels for the CLTV model looking back 180 days to predict 180 days.
-CREATE OR REPLACE TABLE `{{project_id}}.{{dataset}}.customer_lifetime_value_training_180_180` AS(
- SELECT DISTINCT
- -- Captures the current timestamp when the query is executed.
- CURRENT_TIMESTAMP() AS processed_timestamp,
- -- Represents the data split (e.g., train, validation, test).
- data_split,
- -- Represents the date for which features are extracted.
- feature_date,
- -- Represents the unique identifier for a user.
- user_pseudo_id,
- -- Extracts the latest user ID for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(user_id) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS user_id,
- -- Extracts the latest device category for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(device_category) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_category,
- -- Extracts the latest device brand name for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(device_mobile_brand_name) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_mobile_brand_name,
- -- Extracts the latest device model name for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(device_mobile_model_name) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_mobile_model_name,
- -- Extracts the latest device operating system for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(device_os) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_os,
- -- Extracts the latest device language for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(device_language) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_language,
- -- Extracts the latest device web browser for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(device_web_browser) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_web_browser,
- -- Extracts the latest geographic sub-continent for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(geo_sub_continent) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_sub_continent,
- -- Extracts the latest geographic country for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(geo_country) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_country,
- -- Extracts the latest geographic region for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(geo_region) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_region,
- -- Extracts the latest geographic city for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(geo_city) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_city,
- -- Extracts the latest geographic metropolitan area for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(geo_metro) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_metro,
- -- Extracts the latest last traffic source medium for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(last_traffic_source_medium) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS last_traffic_source_medium,
- -- Extracts the latest last traffic source name for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(last_traffic_source_name) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS last_traffic_source_name,
- -- Extracts the latest last traffic source for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(last_traffic_source_source) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS last_traffic_source_source,
- -- Extracts the latest first traffic source medium for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(first_traffic_source_medium) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS first_traffic_source_medium,
- -- Extracts the latest first traffic source name for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(first_traffic_source_name) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS first_traffic_source_name,
- -- Extracts the latest first traffic source for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(first_traffic_source_source) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS first_traffic_source_source,
- -- Extracts the latest user's sign-in status for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(has_signed_in_with_user_id) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS has_signed_in_with_user_id,
- -- Extracts the latest number of active users in the past 1-30 days for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(active_users_past_1_30_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_1_30_day,
- -- Extracts the latest number of active users in the past 30-60 days for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(active_users_past_30_60_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_30_60_day,
- -- Extracts the latest number of active users in the past 60-90 days for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(active_users_past_60_90_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_60_90_day,
- -- Extracts the latest number of active users in the past 90-120 days for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(active_users_past_90_120_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_90_120_day,
- -- Extracts the latest number of active users in the past 120-150 days for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(active_users_past_120_150_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_120_150_day,
- -- Extracts the latest number of active users in the past 150-180 days for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(active_users_past_150_180_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_150_180_day,
- -- Extracts the latest number of purchases in the past 1-30 days for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(purchases_past_1_30_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_1_30_day,
- -- Extracts the latest number of purchases in the past 30-60 days for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(purchases_past_30_60_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_30_60_day,
- -- Extracts the latest number of purchases in the past 60-90 days for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(purchases_past_60_90_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_60_90_day,
- -- Extracts the latest number of purchases in the past 90-120 days for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(purchases_past_90_120_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_90_120_day,
- -- Extracts the latest number of purchases in the past 120-150 days for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(purchases_past_120_150_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_120_150_day,
- -- Extracts the latest number of purchases in the past 150-180 days for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(purchases_past_150_180_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_150_180_day,
- -- Extracts the latest number of visits in the past 1-30 days for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(visits_past_1_30_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_1_30_day,
- -- Extracts the latest number of visits in the past 30-60 days for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(visits_past_30_60_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_30_60_day,
- -- Extracts the latest number of visits in the past 60-90 days for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(visits_past_60_90_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_60_90_day,
- -- Extracts the latest number of visits in the past 90-120 days for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(visits_past_90_120_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_90_120_day,
- -- Extracts the latest number of visits in the past 120-150 days for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(visits_past_120_150_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_120_150_day,
- -- Extracts the latest number of visits in the past 150-180 days for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(visits_past_150_180_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_150_180_day,
- -- Extracts the latest number of viewed items in the past 1-30 days for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(view_items_past_1_30_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_1_30_day,
- -- Extracts the latest number of viewed items in the past 30-60 days for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(view_items_past_30_60_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_30_60_day,
- -- Extracts the latest number of viewed items in the past 60-90 days for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(view_items_past_60_90_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_60_90_day,
- -- Extracts the latest number of viewed items in the past 90-120 days for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(view_items_past_90_120_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_90_120_day,
- -- Extracts the latest number of viewed items in the past 120-150 days for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(view_items_past_120_150_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_120_150_day,
- -- Extracts the latest number of viewed items in the past 150-180 days for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(view_items_past_150_180_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_150_180_day,
- -- Extracts the latest number of items added to carts in the past 1-30 days for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(add_to_carts_past_1_30_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_1_30_day,
- -- Extracts the latest number of items added to carts in the past 30-60 days for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(add_to_carts_past_30_60_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_30_60_day,
- -- Extracts the latest number of items added to carts in the past 60-90 days for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(add_to_carts_past_60_90_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_60_90_day,
- -- Extracts the latest number of items added to carts in the past 90-120 days for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(add_to_carts_past_90_120_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_90_120_day,
- -- Extracts the latest number of items added to carts in the past 120-150 days for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(add_to_carts_past_120_150_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_120_150_day,
- -- Extracts the latest number of items added to carts in the past 150-180 days for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(add_to_carts_past_150_180_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_150_180_day,
- -- Extracts the latest number of checkouts in the past 1-30 days for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(checkouts_past_1_30_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_1_30_day,
- -- Extracts the latest number of checkouts in the past 30-60 days for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(checkouts_past_30_60_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_30_60_day,
- -- Extracts the latest number of checkouts in the past 60-90 days for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(checkouts_past_60_90_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_60_90_day,
- -- Extracts the latest number of checkouts in the past 90-120 days for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(checkouts_past_90_120_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_90_120_day,
- -- Extracts the latest number of checkouts in the past 120-150 days for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(checkouts_past_120_150_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_120_150_day,
- -- Extracts the latest number of checkouts in the past 150-180 days for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(checkouts_past_150_180_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_150_180_day,
- -- Extracts the latest projected lifetime value (PLTV) revenue for each user, based on the feature_date, using the LAST_VALUE window function.
- LAST_VALUE(pltv_revenue_180_days ) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS pltv_revenue_180_days
- FROM `{{project_id}}.{{dataset}}.{{insert_table}}`
- -- Filters for users with a PLTV revenue greater than zero.
- WHERE pltv_revenue_180_days > 0.0
-);
-
--- Creates the final view containing input data for the CLTV model looking back 180 days to predict 30 days.
-CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_customer_lifetime_value_training_180_30`
-(processed_timestamp,
- data_split,
- user_pseudo_id,
- user_id,
- device_category,
- device_mobile_brand_name,
- device_mobile_model_name,
- device_os,
- device_language,
- device_web_browser,
- geo_sub_continent,
- geo_country,
- geo_region,
- geo_city,
- geo_metro,
- last_traffic_source_medium,
- last_traffic_source_name,
- last_traffic_source_source,
- first_traffic_source_medium,
- first_traffic_source_name,
- first_traffic_source_source,
- has_signed_in_with_user_id,
- active_users_past_1_30_day,
- active_users_past_30_60_day,
- active_users_past_60_90_day,
- active_users_past_90_120_day,
- active_users_past_120_150_day,
- active_users_past_150_180_day,
- purchases_past_1_30_day,
- purchases_past_30_60_day,
- purchases_past_60_90_day,
- purchases_past_90_120_day,
- purchases_past_120_150_day,
- purchases_past_150_180_day,
- visits_past_1_30_day,
- visits_past_30_60_day,
- visits_past_60_90_day,
- visits_past_90_120_day,
- visits_past_120_150_day,
- visits_past_150_180_day,
- view_items_past_1_30_day,
- view_items_past_30_60_day,
- view_items_past_60_90_day,
- view_items_past_90_120_day,
- view_items_past_120_150_day,
- view_items_past_150_180_day,
- add_to_carts_past_1_30_day,
- add_to_carts_past_30_60_day,
- add_to_carts_past_60_90_day,
- add_to_carts_past_90_120_day,
- add_to_carts_past_120_150_day,
- add_to_carts_past_150_180_day,
- checkouts_past_1_30_day,
- checkouts_past_30_60_day,
- checkouts_past_60_90_day,
- checkouts_past_90_120_day,
- checkouts_past_120_150_day,
- checkouts_past_150_180_day,
- pltv_revenue_30_days)
-OPTIONS(
- --expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL {{expiration_duration_hours}} HOUR),
- friendly_name="v_customer_lifetime_value_training_180_30",
- description="View Purchase Propensity Training dataset using 15 days back to predict 15 days ahead. View expires after 48h and should run daily.",
- labels=[("org_unit", "development")]
-) AS
-SELECT DISTINCT
- * EXCEPT(feature_date, row_order_peruser_persplit)
-FROM (
-SELECT DISTINCT
- processed_timestamp,
- data_split,
- feature_date,
- user_pseudo_id,
- user_id,
- -- Now, I want to skip rows per user, per split every 30 days.
- ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, data_split ORDER BY feature_date ASC) AS row_order_peruser_persplit,
- device_category,
- device_mobile_brand_name,
- device_mobile_model_name,
- device_os,
- device_language,
- device_web_browser,
- geo_sub_continent,
- geo_country,
- geo_region,
- geo_city,
- geo_metro,
- last_traffic_source_medium,
- last_traffic_source_name,
- last_traffic_source_source,
- first_traffic_source_medium,
- first_traffic_source_name,
- first_traffic_source_source,
- has_signed_in_with_user_id,
- active_users_past_1_30_day,
- active_users_past_30_60_day,
- active_users_past_60_90_day,
- active_users_past_90_120_day,
- active_users_past_120_150_day,
- active_users_past_150_180_day,
- purchases_past_1_30_day,
- purchases_past_30_60_day,
- purchases_past_60_90_day,
- purchases_past_90_120_day,
- purchases_past_120_150_day,
- purchases_past_150_180_day,
- visits_past_1_30_day,
- visits_past_30_60_day,
- visits_past_60_90_day,
- visits_past_90_120_day,
- visits_past_120_150_day,
- visits_past_150_180_day,
- view_items_past_1_30_day,
- view_items_past_30_60_day,
- view_items_past_60_90_day,
- view_items_past_90_120_day,
- view_items_past_120_150_day,
- view_items_past_150_180_day,
- add_to_carts_past_1_30_day,
- add_to_carts_past_30_60_day,
- add_to_carts_past_60_90_day,
- add_to_carts_past_90_120_day,
- add_to_carts_past_120_150_day,
- add_to_carts_past_150_180_day,
- checkouts_past_1_30_day,
- checkouts_past_30_60_day,
- checkouts_past_60_90_day,
- checkouts_past_90_120_day,
- checkouts_past_120_150_day,
- checkouts_past_150_180_day,
- pltv_revenue_30_days
-FROM(
-SELECT DISTINCT
- processed_timestamp,
- data_split,
- feature_date,
- user_pseudo_id,
- user_id,
- device_category,
- device_mobile_brand_name,
- device_mobile_model_name,
- device_os,
- device_language,
- device_web_browser,
- geo_sub_continent,
- geo_country,
- geo_region,
- geo_city,
- geo_metro,
- last_traffic_source_medium,
- last_traffic_source_name,
- last_traffic_source_source,
- first_traffic_source_medium,
- first_traffic_source_name,
- first_traffic_source_source,
- has_signed_in_with_user_id,
- active_users_past_1_30_day,
- active_users_past_30_60_day,
- active_users_past_60_90_day,
- active_users_past_90_120_day,
- active_users_past_120_150_day,
- active_users_past_150_180_day,
- purchases_past_1_30_day,
- purchases_past_30_60_day,
- purchases_past_60_90_day,
- purchases_past_90_120_day,
- purchases_past_120_150_day,
- purchases_past_150_180_day,
- visits_past_1_30_day,
- visits_past_30_60_day,
- visits_past_60_90_day,
- visits_past_90_120_day,
- visits_past_120_150_day,
- visits_past_150_180_day,
- view_items_past_1_30_day,
- view_items_past_30_60_day,
- view_items_past_60_90_day,
- view_items_past_90_120_day,
- view_items_past_120_150_day,
- view_items_past_150_180_day,
- add_to_carts_past_1_30_day,
- add_to_carts_past_30_60_day,
- add_to_carts_past_60_90_day,
- add_to_carts_past_90_120_day,
- add_to_carts_past_120_150_day,
- add_to_carts_past_150_180_day,
- checkouts_past_1_30_day,
- checkouts_past_30_60_day,
- checkouts_past_60_90_day,
- checkouts_past_90_120_day,
- checkouts_past_120_150_day,
- checkouts_past_150_180_day,
- pltv_revenue_30_days,
- -- Row number for each user pseudo id, feature date and data split combination ordered by feature date descending.
- ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, feature_date, data_split ORDER BY feature_date DESC) AS row_order_peruser_perday_persplit
- FROM `{{project_id}}.{{dataset}}.customer_lifetime_value_training_180_30`
-)
-WHERE
- -- Filter 1 example ordered descending
- row_order_peruser_perday_persplit = 1
-)
-WHERE
- -- Skipping windows each 30 days, which is the future window size.
- MOD(row_order_peruser_persplit-1, 30) = 0;
-
-
--- Creates the final view containing input data for the CLTV model looking back 180 days to predict 90 days.
-CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_customer_lifetime_value_training_180_90`
-(processed_timestamp,
- data_split,
- user_pseudo_id,
- user_id,
- device_category,
- device_mobile_brand_name,
- device_mobile_model_name,
- device_os,
- device_language,
- device_web_browser,
- geo_sub_continent,
- geo_country,
- geo_region,
- geo_city,
- geo_metro,
- last_traffic_source_medium,
- last_traffic_source_name,
- last_traffic_source_source,
- first_traffic_source_medium,
- first_traffic_source_name,
- first_traffic_source_source,
- has_signed_in_with_user_id,
- active_users_past_1_30_day,
- active_users_past_30_60_day,
- active_users_past_60_90_day,
- active_users_past_90_120_day,
- active_users_past_120_150_day,
- active_users_past_150_180_day,
- purchases_past_1_30_day,
- purchases_past_30_60_day,
- purchases_past_60_90_day,
- purchases_past_90_120_day,
- purchases_past_120_150_day,
- purchases_past_150_180_day,
- visits_past_1_30_day,
- visits_past_30_60_day,
- visits_past_60_90_day,
- visits_past_90_120_day,
- visits_past_120_150_day,
- visits_past_150_180_day,
- view_items_past_1_30_day,
- view_items_past_30_60_day,
- view_items_past_60_90_day,
- view_items_past_90_120_day,
- view_items_past_120_150_day,
- view_items_past_150_180_day,
- add_to_carts_past_1_30_day,
- add_to_carts_past_30_60_day,
- add_to_carts_past_60_90_day,
- add_to_carts_past_90_120_day,
- add_to_carts_past_120_150_day,
- add_to_carts_past_150_180_day,
- checkouts_past_1_30_day,
- checkouts_past_30_60_day,
- checkouts_past_60_90_day,
- checkouts_past_90_120_day,
- checkouts_past_120_150_day,
- checkouts_past_150_180_day,
- pltv_revenue_90_days)
-OPTIONS(
- --expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL {{expiration_duration_hours}} HOUR),
- friendly_name="v_customer_lifetime_value_training_180_90",
- description="View Purchase Propensity Training dataset using 15 days back to predict 7 days ahead. View expires after 48h and should run daily.",
- labels=[("org_unit", "development")]
-) AS
-SELECT DISTINCT
- * EXCEPT(feature_date, row_order_peruser_persplit)
-FROM (
-SELECT DISTINCT
- processed_timestamp,
- data_split,
- feature_date,
- user_pseudo_id,
- user_id,
- -- Now, I want to skip rows per user, per split every 15 days.
- ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, data_split ORDER BY feature_date ASC) AS row_order_peruser_persplit,
- device_category,
- device_mobile_brand_name,
- device_mobile_model_name,
- device_os,
- device_language,
- device_web_browser,
- geo_sub_continent,
- geo_country,
- geo_region,
- geo_city,
- geo_metro,
- last_traffic_source_medium,
- last_traffic_source_name,
- last_traffic_source_source,
- first_traffic_source_medium,
- first_traffic_source_name,
- first_traffic_source_source,
- has_signed_in_with_user_id,
- active_users_past_1_30_day,
- active_users_past_30_60_day,
- active_users_past_60_90_day,
- active_users_past_90_120_day,
- active_users_past_120_150_day,
- active_users_past_150_180_day,
- purchases_past_1_30_day,
- purchases_past_30_60_day,
- purchases_past_60_90_day,
- purchases_past_90_120_day,
- purchases_past_120_150_day,
- purchases_past_150_180_day,
- visits_past_1_30_day,
- visits_past_30_60_day,
- visits_past_60_90_day,
- visits_past_90_120_day,
- visits_past_120_150_day,
- visits_past_150_180_day,
- view_items_past_1_30_day,
- view_items_past_30_60_day,
- view_items_past_60_90_day,
- view_items_past_90_120_day,
- view_items_past_120_150_day,
- view_items_past_150_180_day,
- add_to_carts_past_1_30_day,
- add_to_carts_past_30_60_day,
- add_to_carts_past_60_90_day,
- add_to_carts_past_90_120_day,
- add_to_carts_past_120_150_day,
- add_to_carts_past_150_180_day,
- checkouts_past_1_30_day,
- checkouts_past_30_60_day,
- checkouts_past_60_90_day,
- checkouts_past_90_120_day,
- checkouts_past_120_150_day,
- checkouts_past_150_180_day,
- pltv_revenue_90_days
-FROM(
-SELECT DISTINCT
- processed_timestamp,
- data_split,
- feature_date,
- user_pseudo_id,
- user_id,
- device_category,
- device_mobile_brand_name,
- device_mobile_model_name,
- device_os,
- device_language,
- device_web_browser,
- geo_sub_continent,
- geo_country,
- geo_region,
- geo_city,
- geo_metro,
- last_traffic_source_medium,
- last_traffic_source_name,
- last_traffic_source_source,
- first_traffic_source_medium,
- first_traffic_source_name,
- first_traffic_source_source,
- has_signed_in_with_user_id,
- active_users_past_1_30_day,
- active_users_past_30_60_day,
- active_users_past_60_90_day,
- active_users_past_90_120_day,
- active_users_past_120_150_day,
- active_users_past_150_180_day,
- purchases_past_1_30_day,
- purchases_past_30_60_day,
- purchases_past_60_90_day,
- purchases_past_90_120_day,
- purchases_past_120_150_day,
- purchases_past_150_180_day,
- visits_past_1_30_day,
- visits_past_30_60_day,
- visits_past_60_90_day,
- visits_past_90_120_day,
- visits_past_120_150_day,
- visits_past_150_180_day,
- view_items_past_1_30_day,
- view_items_past_30_60_day,
- view_items_past_60_90_day,
- view_items_past_90_120_day,
- view_items_past_120_150_day,
- view_items_past_150_180_day,
- add_to_carts_past_1_30_day,
- add_to_carts_past_30_60_day,
- add_to_carts_past_60_90_day,
- add_to_carts_past_90_120_day,
- add_to_carts_past_120_150_day,
- add_to_carts_past_150_180_day,
- checkouts_past_1_30_day,
- checkouts_past_30_60_day,
- checkouts_past_60_90_day,
- checkouts_past_90_120_day,
- checkouts_past_120_150_day,
- checkouts_past_150_180_day,
- pltv_revenue_90_days,
- -- Number of rows per user, per day, per split. Only one row per user, per day, per split.
- ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, feature_date, data_split ORDER BY feature_date DESC) AS row_order_peruser_perday_persplit
- FROM `{{project_id}}.{{dataset}}.customer_lifetime_value_training_180_90`
-)
-WHERE
- -- Filter 1 example ordered descending
- row_order_peruser_perday_persplit = 1
-)
-WHERE
- -- Skipping windows of 90 days, which is the future window size.
- MOD(row_order_peruser_persplit-1, 90) = 0;
-
--- Creates the final view containing input data for the CLTV model looking back 180 days to predict 180 days.
-CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_customer_lifetime_value_training_180_180`
-(processed_timestamp,
+-- Creates the final view containing input data for the CLTV model looking back 180 days to predict 30 days.
+CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_customer_lifetime_value_training_180_30`
+(processed_timestamp,
data_split,
user_pseudo_id,
user_id,
@@ -1292,23 +621,23 @@ CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_customer_lifetime_value_tra
checkouts_past_90_120_day,
checkouts_past_120_150_day,
checkouts_past_150_180_day,
- pltv_revenue_180_days)
+ pltv_revenue_30_days)
OPTIONS(
--expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL {{expiration_duration_hours}} HOUR),
- friendly_name="v_customer_lifetime_value_training_180_180",
- description="View Purchase Propensity Training dataset using 30 days back to predict 15 days ahead. View expires after 48h and should run daily.",
+ friendly_name="v_customer_lifetime_value_training_180_30",
+ description="View Purchase Propensity Training dataset using 15 days back to predict 15 days ahead. View expires after 48h and should run daily.",
labels=[("org_unit", "development")]
-) AS
+) AS
SELECT DISTINCT
* EXCEPT(feature_date, row_order_peruser_persplit)
-FROM (
+FROM (
SELECT DISTINCT
processed_timestamp,
data_split,
feature_date,
user_pseudo_id,
user_id,
- -- Now, I want to skip rows per user, per split every 15 days.
+ -- Now, I want to skip rows per user, per split every 30 days.
ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, data_split ORDER BY feature_date ASC) AS row_order_peruser_persplit,
device_category,
device_mobile_brand_name,
@@ -1364,7 +693,7 @@ SELECT DISTINCT
checkouts_past_90_120_day,
checkouts_past_120_150_day,
checkouts_past_150_180_day,
- pltv_revenue_180_days
+ pltv_revenue_30_days
FROM(
SELECT DISTINCT
processed_timestamp,
@@ -1426,20 +755,19 @@ SELECT DISTINCT
checkouts_past_90_120_day,
checkouts_past_120_150_day,
checkouts_past_150_180_day,
- pltv_revenue_180_days,
- -- Number of rows per user, per day, per split. Only one row per user, per day, per slip.
+ pltv_revenue_30_days,
+ -- Row number for each user pseudo id, feature date and data split combination ordered by feature date descending.
ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, feature_date, data_split ORDER BY feature_date DESC) AS row_order_peruser_perday_persplit
- FROM `{{project_id}}.{{dataset}}.customer_lifetime_value_training_180_180`
+ FROM `{{project_id}}.{{dataset}}.customer_lifetime_value_training_180_30`
)
WHERE
-- Filter 1 example ordered descending
row_order_peruser_perday_persplit = 1
)
WHERE
- -- Skipping windows of 30 days, which is the future window size.
+ -- Skipping windows each 30 days, which is the future window size.
MOD(row_order_peruser_persplit-1, 30) = 0;
-
-- Creates the final view containing input data for the CLTV model looking back 180 days to predict 30 days.
-- Consider only the last windows per user. So that we can use only the most recent interactions of users.
CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_customer_lifetime_value_training_180_30_last_window`
@@ -1638,165 +966,3 @@ SELECT DISTINCT
WHERE
-- Filter 1 example ordered descending
user_row_order = 1;
-
--- Creates the final view containing balanced input data for the CLTV model looking back 180 days to predict 30 days.
-CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_customer_lifetime_value_training_180_30_balanced`
-(data_split,
- user_pseudo_id,
- device_category,
- device_mobile_brand_name,
- device_mobile_model_name,
- device_os,
- device_language,
- device_web_browser,
- geo_sub_continent,
- geo_country,
- geo_region,
- geo_city,
- geo_metro,
- last_traffic_source_medium,
- last_traffic_source_name,
- last_traffic_source_source,
- first_traffic_source_medium,
- first_traffic_source_name,
- first_traffic_source_source,
- has_signed_in_with_user_id,
- active_users_past_1_30_day,
- active_users_past_30_60_day,
- active_users_past_60_90_day,
- active_users_past_90_120_day,
- active_users_past_120_150_day,
- active_users_past_150_180_day,
- purchases_past_1_30_day,
- purchases_past_30_60_day,
- purchases_past_60_90_day,
- purchases_past_90_120_day,
- purchases_past_120_150_day,
- purchases_past_150_180_day,
- visits_past_1_30_day,
- visits_past_30_60_day,
- visits_past_60_90_day,
- visits_past_90_120_day,
- visits_past_120_150_day,
- visits_past_150_180_day,
- view_items_past_1_30_day,
- view_items_past_30_60_day,
- view_items_past_60_90_day,
- view_items_past_90_120_day,
- view_items_past_120_150_day,
- view_items_past_150_180_day,
- add_to_carts_past_1_30_day,
- add_to_carts_past_30_60_day,
- add_to_carts_past_60_90_day,
- add_to_carts_past_90_120_day,
- add_to_carts_past_120_150_day,
- add_to_carts_past_150_180_day,
- checkouts_past_1_30_day,
- checkouts_past_30_60_day,
- checkouts_past_60_90_day,
- checkouts_past_90_120_day,
- checkouts_past_120_150_day,
- checkouts_past_150_180_day,
- pltv_revenue_30_days)
-OPTIONS(
- --expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL {{expiration_duration_hours}} HOUR),
- friendly_name="v_customer_lifetime_value_training_180_30_balanced",
- description="View Purchase Propensity Training dataset using 15 days back to predict 15 days ahead. View expires after 48h and should run daily.",
- labels=[("org_unit", "development")]
-) AS
--- This query performs a stratified random sampling of users from the v_customer_lifetime_value_training_180_30 table,
--- ensuring that the final dataset has a balanced representation of users across different PLTV revenue ranges.
--- This balanced dataset is then used for training and validating a CLTV model.
-SELECT DISTINCT
- data_split,
- user_pseudo_id,
- device_category,
- device_mobile_brand_name,
- device_mobile_model_name,
- device_os,
- device_language,
- device_web_browser,
- geo_sub_continent,
- geo_country,
- geo_region,
- geo_city,
- geo_metro,
- last_traffic_source_medium,
- last_traffic_source_name,
- last_traffic_source_source,
- first_traffic_source_medium,
- first_traffic_source_name,
- first_traffic_source_source,
- has_signed_in_with_user_id,
- active_users_past_1_30_day,
- active_users_past_30_60_day,
- active_users_past_60_90_day,
- active_users_past_90_120_day,
- active_users_past_120_150_day,
- active_users_past_150_180_day,
- purchases_past_1_30_day,
- purchases_past_30_60_day,
- purchases_past_60_90_day,
- purchases_past_90_120_day,
- purchases_past_120_150_day,
- purchases_past_150_180_day,
- visits_past_1_30_day,
- visits_past_30_60_day,
- visits_past_60_90_day,
- visits_past_90_120_day,
- visits_past_120_150_day,
- visits_past_150_180_day,
- view_items_past_1_30_day,
- view_items_past_30_60_day,
- view_items_past_60_90_day,
- view_items_past_90_120_day,
- view_items_past_120_150_day,
- view_items_past_150_180_day,
- add_to_carts_past_1_30_day,
- add_to_carts_past_30_60_day,
- add_to_carts_past_60_90_day,
- add_to_carts_past_90_120_day,
- add_to_carts_past_120_150_day,
- add_to_carts_past_150_180_day,
- checkouts_past_1_30_day,
- checkouts_past_30_60_day,
- checkouts_past_60_90_day,
- checkouts_past_90_120_day,
- checkouts_past_120_150_day,
- checkouts_past_150_180_day,
- pltv_revenue_30_days
-FROM
-(
-SELECT
-* EXCEPT(rn) FROM (
-SELECT
- *,
- -- Calculates a unique row number (rn) for each row within each bucket.
- -- The PARTITION BY bucket ensures that the numbering is independent for each bucket.
- -- The ORDER BY RAND() randomizes the order of rows within each bucket.
- ROW_NUMBER() OVER (PARTITION BY bucket ORDER BY RAND()) AS rn
-FROM (
- SELECT
- *,
- -- Creates a new column called bucket by categorizing users into 10 buckets based on
- -- their pltv_revenue_30_days (predicted lifetime value revenue over the next 30 days).
- CASE
- WHEN pltv_revenue_30_days < 50 THEN "bucket1"
- WHEN pltv_revenue_30_days BETWEEN 50 AND 100 THEN "bucket2"
- WHEN pltv_revenue_30_days BETWEEN 100 AND 200 THEN "bucket3"
- WHEN pltv_revenue_30_days BETWEEN 200 AND 300 THEN "bucket4"
- WHEN pltv_revenue_30_days BETWEEN 300 AND 400 THEN "bucket5"
- WHEN pltv_revenue_30_days BETWEEN 400 AND 500 THEN "bucket6"
- WHEN pltv_revenue_30_days BETWEEN 500 AND 750 THEN "bucket7"
- WHEN pltv_revenue_30_days BETWEEN 750 AND 1000 THEN "bucket8"
- WHEN pltv_revenue_30_days BETWEEN 1000 AND 2000 THEN "bucket9"
- WHEN pltv_revenue_30_days > 2000 THEN "bucket10" END as bucket
- FROM
- `{{project_id}}.{{dataset}}.v_customer_lifetime_value_training_180_30`)
-)
-WHERE
- -- This filter selects only the first 1000 rows from each bucket.
- -- This is a form of stratified sampling, ensuring that the final dataset has a
- -- balanced representation of users across different PLTV revenue ranges
- rn <= 1000)
-;
diff --git a/sql/procedure/lead_score_propensity_inference_preparation.sqlx b/sql/procedure/lead_score_propensity_inference_preparation.sqlx
new file mode 100644
index 00000000..d9753b88
--- /dev/null
+++ b/sql/procedure/lead_score_propensity_inference_preparation.sqlx
@@ -0,0 +1,352 @@
+-- Copyright 2023 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- http://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+DECLARE lastest_processed_time_ud TIMESTAMP;
+DECLARE lastest_processed_time_useam TIMESTAMP;
+DECLARE lastest_processed_time_uwlm TIMESTAMP;
+DECLARE lastest_processed_time_um TIMESTAMP;
+
+-- Setting procedure to lookback from the day before `inference_date`
+SET inference_date = DATE_SUB(inference_date, INTERVAL 1 DAY);
+
+SET lastest_processed_time_ud = (SELECT MAX(processed_timestamp) FROM `{{feature_store_project_id}}.{{feature_store_dataset}}.user_dimensions` WHERE feature_date = inference_date LIMIT 1);
+SET lastest_processed_time_useam = (SELECT MAX(processed_timestamp) FROM `{{feature_store_project_id}}.{{feature_store_dataset}}.user_session_event_aggregated_metrics` WHERE feature_date = inference_date LIMIT 1);
+SET lastest_processed_time_uwlm = (SELECT MAX(processed_timestamp) FROM `{{feature_store_project_id}}.{{feature_store_dataset}}.user_rolling_window_lead_metrics` WHERE feature_date = inference_date LIMIT 1);
+SET lastest_processed_time_um = (SELECT MAX(processed_timestamp) FROM `{{feature_store_project_id}}.{{feature_store_dataset}}.user_scoped_metrics` WHERE feature_date = inference_date LIMIT 1);
+
+CREATE OR REPLACE TEMP TABLE inference_preparation_ud as (
+ SELECT DISTINCT
+ -- The user pseudo id
+ UD.user_pseudo_id,
+ -- The user id
+ MAX(UD.user_id) OVER(user_dimensions_window) AS user_id,
+ -- The feature date
+ UD.feature_date,
+ -- The user lifetime value revenue
+ MAX(UD.user_ltv_revenue) OVER(user_dimensions_window) AS user_ltv_revenue,
+ -- The device category
+ MAX(UD.device_category) OVER(user_dimensions_window) AS device_category,
+ -- The device brand name
+ MAX(UD.device_mobile_brand_name) OVER(user_dimensions_window) AS device_mobile_brand_name,
+ -- The device model name
+ MAX(UD.device_mobile_model_name) OVER(user_dimensions_window) AS device_mobile_model_name,
+ -- The device operating system
+ MAX(UD.device_os) OVER(user_dimensions_window) AS device_os,
+ -- The device language
+ MAX(UD.device_language) OVER(user_dimensions_window) AS device_language,
+ -- The device web browser
+ MAX(UD.device_web_browser) OVER(user_dimensions_window) AS device_web_browser,
+ -- The user sub continent
+ MAX(UD.geo_sub_continent) OVER(user_dimensions_window) AS geo_sub_continent,
+ -- The user country
+ MAX(UD.geo_country) OVER(user_dimensions_window) AS geo_country,
+ -- The user region
+ MAX(UD.geo_region) OVER(user_dimensions_window) AS geo_region,
+ -- The user city
+ MAX(UD.geo_city) OVER(user_dimensions_window) AS geo_city,
+ -- The user metro
+ MAX(UD.geo_metro) OVER(user_dimensions_window) AS geo_metro,
+ -- The user last traffic source medium
+ MAX(UD.last_traffic_source_medium) OVER(user_dimensions_window) AS last_traffic_source_medium,
+ -- The user last traffic source name
+ MAX(UD.last_traffic_source_name) OVER(user_dimensions_window) AS last_traffic_source_name,
+ -- The user last traffic source source
+ MAX(UD.last_traffic_source_source) OVER(user_dimensions_window) AS last_traffic_source_source,
+ -- The user first traffic source medium
+ MAX(UD.first_traffic_source_medium) OVER(user_dimensions_window) AS first_traffic_source_medium,
+ -- The user first traffic source name
+ MAX(UD.first_traffic_source_name) OVER(user_dimensions_window) AS first_traffic_source_name,
+ -- The user first traffic source source
+ MAX(UD.first_traffic_source_source) OVER(user_dimensions_window) AS first_traffic_source_source,
+ -- Whether the user has signed in with user ID
+ MAX(UD.has_signed_in_with_user_id) OVER(user_dimensions_window) AS has_signed_in_with_user_id,
+FROM
+ `{{feature_store_project_id}}.{{feature_store_dataset}}.user_dimensions` UD
+INNER JOIN
+ `{{project_id}}.{{mds_dataset}}.latest_event_per_user_last_72_hours` LEU
+ON
+ UD.user_pseudo_id = LEU.user_pseudo_id
+WHERE
+ -- In the future consider `feature_date BETWEEN start_date AND end_date`, to process multiple days. Modify Partition BY
+ UD.feature_date = inference_date
+ AND UD.processed_timestamp = lastest_processed_time_ud
+WINDOW
+ user_dimensions_window AS (PARTITION BY UD.user_pseudo_id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
+);
+
+
+CREATE OR REPLACE TEMP TABLE inference_preparation_uwlm as (
+ SELECT DISTINCT
+ -- User pseudo id
+ UWLM.user_pseudo_id,
+ -- Feature date
+ UWLM.feature_date{% for feature in short_list_features %},
+ -- Calculate the maximum value for each metric over the window
+ MAX(UWLM.{{feature.feature_name}}_past_1_day) OVER(user_rolling_lead_window) AS {{feature.feature_name}}_past_1_day,
+ MAX(UWLM.{{feature.feature_name}}_past_2_day) OVER(user_rolling_lead_window) AS {{feature.feature_name}}_past_2_day,
+ MAX(UWLM.{{feature.feature_name}}_past_3_day) OVER(user_rolling_lead_window) AS {{feature.feature_name}}_past_3_day,
+ MAX(UWLM.{{feature.feature_name}}_past_4_day) OVER(user_rolling_lead_window) AS {{feature.feature_name}}_past_4_day,
+ MAX(UWLM.{{feature.feature_name}}_past_5_day) OVER(user_rolling_lead_window) AS {{feature.feature_name}}_past_5_day{% endfor %}
+FROM
+ `{{feature_store_project_id}}.{{feature_store_dataset}}.user_rolling_window_lead_metrics` UWLM
+INNER JOIN
+ `{{project_id}}.{{mds_dataset}}.latest_event_per_user_last_72_hours` LEU
+ON
+ UWLM.user_pseudo_id = LEU.user_pseudo_id
+WHERE
+ -- Filter for the features in the inferecen date
+ UWLM.feature_date = inference_date
+ AND UWLM.processed_timestamp = lastest_processed_time_uwlm
+WINDOW
+ user_rolling_lead_window AS (PARTITION BY UWLM.user_pseudo_id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
+);
+
+-- This is a temp table consolidating all features over the dates intervals.
+CREATE OR REPLACE TEMP TABLE inference_preparation as (
+ SELECT DISTINCT
+ UD.user_pseudo_id,
+ UD.user_id,
+ UD.feature_date,
+ UD.user_ltv_revenue,
+ UD.device_category,
+ UD.device_mobile_brand_name,
+ UD.device_mobile_model_name,
+ UD.device_os,
+ UD.device_language,
+ UD.device_web_browser,
+ UD.geo_sub_continent,
+ UD.geo_country,
+ UD.geo_region,
+ UD.geo_city,
+ UD.geo_metro,
+ UD.last_traffic_source_medium,
+ UD.last_traffic_source_name,
+ UD.last_traffic_source_source,
+ UD.first_traffic_source_medium,
+ UD.first_traffic_source_name,
+ UD.first_traffic_source_source,
+ UD.has_signed_in_with_user_id{% for feature in short_list_features %},
+ UWLM.{{feature.feature_name}}_past_1_day,
+ UWLM.{{feature.feature_name}}_past_2_day,
+ UWLM.{{feature.feature_name}}_past_3_day,
+ UWLM.{{feature.feature_name}}_past_4_day,
+ UWLM.{{feature.feature_name}}_past_5_day{% endfor %}
+FROM
+ inference_preparation_ud UD
+INNER JOIN
+ inference_preparation_uwlm UWLM
+ON
+ UWLM.user_pseudo_id = UD.user_pseudo_id
+ AND UWLM.feature_date = UD.feature_date
+);
+
+DELETE FROM `{{project_id}}.{{dataset}}.{{insert_table}}` WHERE TRUE;
+
+INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}`
+(
+ feature_date,
+ user_pseudo_id,
+ user_id,
+ user_ltv_revenue,
+ device_category,
+ device_mobile_brand_name,
+ device_mobile_model_name,
+ device_os,
+ device_language,
+ device_web_browser,
+ geo_sub_continent,
+ geo_country,
+ geo_region,
+ geo_city,
+ geo_metro,
+ last_traffic_source_medium,
+ last_traffic_source_name,
+ last_traffic_source_source,
+ first_traffic_source_medium,
+ first_traffic_source_name,
+ first_traffic_source_source,
+ has_signed_in_with_user_id{% for feature in short_list_features %},
+ {{feature.feature_name}}_past_1_day,
+ {{feature.feature_name}}_past_2_day,
+ {{feature.feature_name}}_past_3_day,
+ {{feature.feature_name}}_past_4_day,
+ {{feature.feature_name}}_past_5_day{% endfor %}
+)
+SELECT DISTINCT
+feature_date,
+ user_pseudo_id,
+ user_id,
+ MIN(user_ltv_revenue) OVER(PARTITION BY user_pseudo_id, feature_date) as user_ltv_revenue,
+ device_category,
+ device_mobile_brand_name,
+ device_mobile_model_name,
+ device_os,
+ device_language,
+ device_web_browser,
+ geo_sub_continent,
+ geo_country,
+ geo_region,
+ geo_city,
+ geo_metro,
+ last_traffic_source_medium,
+ last_traffic_source_name,
+ last_traffic_source_source,
+ first_traffic_source_medium,
+ first_traffic_source_name,
+ first_traffic_source_source,
+ has_signed_in_with_user_id{% for feature in short_list_features %},
+ {{feature.feature_name}}_past_1_day,
+ {{feature.feature_name}}_past_2_day,
+ {{feature.feature_name}}_past_3_day,
+ {{feature.feature_name}}_past_4_day,
+ {{feature.feature_name}}_past_5_day{% endfor %}
+FROM inference_preparation;
+
+
+CREATE OR REPLACE TABLE `{{project_id}}.{{dataset}}.lead_score_propensity_inference_5_1` AS(
+ SELECT DISTINCT
+ CURRENT_TIMESTAMP() AS processed_timestamp,
+ feature_date,
+ user_pseudo_id,
+ LAST_VALUE(user_id) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS user_id,
+ LAST_VALUE(user_ltv_revenue) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS user_ltv_revenue,
+ LAST_VALUE(device_category) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS device_category,
+ LAST_VALUE(device_mobile_brand_name) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS device_mobile_brand_name,
+ LAST_VALUE(device_mobile_model_name) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS device_mobile_model_name,
+ LAST_VALUE(device_os) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS device_os,
+ LAST_VALUE(device_language) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS device_language,
+ LAST_VALUE(device_web_browser) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS device_web_browser,
+ LAST_VALUE(geo_sub_continent) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS geo_sub_continent,
+ LAST_VALUE(geo_country) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS geo_country,
+ LAST_VALUE(geo_region) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS geo_region,
+ LAST_VALUE(geo_city) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS geo_city,
+ LAST_VALUE(geo_metro) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS geo_metro,
+ LAST_VALUE(last_traffic_source_medium) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS last_traffic_source_medium,
+ LAST_VALUE(last_traffic_source_name) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS last_traffic_source_name,
+ LAST_VALUE(last_traffic_source_source) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS last_traffic_source_source,
+ LAST_VALUE(first_traffic_source_medium) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS first_traffic_source_medium,
+ LAST_VALUE(first_traffic_source_name) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS first_traffic_source_name,
+ LAST_VALUE(first_traffic_source_source) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS first_traffic_source_source,
+ LAST_VALUE(has_signed_in_with_user_id) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS has_signed_in_with_user_id{% for feature in short_list_features %},
+ LAST_VALUE({{feature.feature_name}}_past_1_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS {{feature.feature_name}}_past_1_day,
+ LAST_VALUE({{feature.feature_name}}_past_2_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS {{feature.feature_name}}_past_2_day,
+ LAST_VALUE({{feature.feature_name}}_past_3_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS {{feature.feature_name}}_past_3_day,
+ LAST_VALUE({{feature.feature_name}}_past_4_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS {{feature.feature_name}}_past_4_day,
+ LAST_VALUE({{feature.feature_name}}_past_5_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS {{feature.feature_name}}_past_5_day{% endfor %}
+ FROM `{{project_id}}.{{dataset}}.{{insert_table}}`
+);
+
+
+CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_lead_score_propensity_inference_5_1`
+(processed_timestamp,
+ feature_date,
+ user_pseudo_id,
+ user_id,
+ user_ltv_revenue,
+ device_category,
+ device_mobile_brand_name,
+ device_mobile_model_name,
+ device_os,
+ device_language,
+ device_web_browser,
+ geo_sub_continent,
+ geo_country,
+ geo_region,
+ geo_city,
+ geo_metro,
+ last_traffic_source_medium,
+ last_traffic_source_name,
+ last_traffic_source_source,
+ first_traffic_source_medium,
+ first_traffic_source_name,
+ first_traffic_source_source,
+ has_signed_in_with_user_id{% for feature in short_list_features %},
+ {{feature.feature_name}}_past_1_day,
+ {{feature.feature_name}}_past_2_day,
+ {{feature.feature_name}}_past_3_day,
+ {{feature.feature_name}}_past_4_day,
+ {{feature.feature_name}}_past_5_day{% endfor %})
+OPTIONS(
+ --expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL {{expiration_duration_hours}} HOUR),
+ friendly_name="v_lead_score_propensity_inference_5_1",
+ description="View Lead Score Propensity Inference dataset using 5 days back to predict 1 day ahead. View expires after 48h and should run daily.",
+ labels=[("org_unit", "development")]
+) AS
+SELECT DISTINCT
+ processed_timestamp,
+ feature_date,
+ user_pseudo_id,
+ user_id,
+ user_ltv_revenue,
+ device_category,
+ device_mobile_brand_name,
+ device_mobile_model_name,
+ device_os,
+ device_language,
+ device_web_browser,
+ geo_sub_continent,
+ geo_country,
+ geo_region,
+ geo_city,
+ geo_metro,
+ last_traffic_source_medium,
+ last_traffic_source_name,
+ last_traffic_source_source,
+ first_traffic_source_medium,
+ first_traffic_source_name,
+ first_traffic_source_source,
+ has_signed_in_with_user_id{% for feature in short_list_features %},
+ {{feature.feature_name}}_past_1_day,
+ {{feature.feature_name}}_past_2_day,
+ {{feature.feature_name}}_past_3_day,
+ {{feature.feature_name}}_past_4_day,
+ {{feature.feature_name}}_past_5_day{% endfor %}
+FROM (
+SELECT DISTINCT
+ processed_timestamp,
+ feature_date,
+ user_pseudo_id,
+ user_id,
+ user_ltv_revenue,
+ device_category,
+ device_mobile_brand_name,
+ device_mobile_model_name,
+ device_os,
+ device_language,
+ device_web_browser,
+ geo_sub_continent,
+ geo_country,
+ geo_region,
+ geo_city,
+ geo_metro,
+ last_traffic_source_medium,
+ last_traffic_source_name,
+ last_traffic_source_source,
+ first_traffic_source_medium,
+ first_traffic_source_name,
+ first_traffic_source_source,
+ has_signed_in_with_user_id,{% for feature in short_list_features %}
+ {{feature.feature_name}}_past_1_day,
+ {{feature.feature_name}}_past_2_day,
+ {{feature.feature_name}}_past_3_day,
+ {{feature.feature_name}}_past_4_day,
+ {{feature.feature_name}}_past_5_day,{% endfor %}
+ -- Row number partitioned by user pseudo id ordered by feature date descending
+ ROW_NUMBER() OVER (PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS user_row_order
+ FROM `{{project_id}}.{{dataset}}.lead_score_propensity_inference_5_1`
+)
+WHERE
+ -- Filter only for the most recent user example
+ user_row_order = 1;
+
diff --git a/sql/procedure/lead_score_propensity_label.sqlx b/sql/procedure/lead_score_propensity_label.sqlx
new file mode 100644
index 00000000..fc27c071
--- /dev/null
+++ b/sql/procedure/lead_score_propensity_label.sqlx
@@ -0,0 +1,102 @@
+-- Copyright 2023 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- http://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- Run these windows aggregations every day. For each date in training and inference date ranges.
+-- Setting procedure to lookback from the day before `input_date` until the day before `end_date`
+SET input_date = DATE_SUB(input_date, INTERVAL 1 DAY);
+SET end_date = DATE_SUB(end_date, INTERVAL 1 DAY);
+
+-- Future User metrics: 1-day future {{target_event}}s per user
+CREATE OR REPLACE TEMP TABLE future_{{target_event}}s_per_user AS (
+ SELECT
+ -- User's unique identifier
+ user_pseudo_id,
+ -- The date for which future {{target_event}}s are being calculated
+ input_date as event_date,
+ -- Calculates the maximum count of distinct events for users who made a {{target_event}}s 1 day after `input_date`
+ MAX(COUNT(DISTINCT CASE DATE_DIFF(event_date, input_date, DAY) = 1 WHEN TRUE THEN event_timestamp END)) OVER(PARTITION BY user_pseudo_id) AS {{target_event}}_day_1
+ FROM `{{mds_project_id}}.{{mds_dataset}}.event` as E
+ INNER JOIN `{{mds_project_id}}.{{mds_dataset}}.device` as D
+ ON E.device_type_id = D.device_type_id
+ -- Filters events to be within the date range defined by input_date and end_date from dates_interval
+ WHERE event_date BETWEEN input_date AND end_date
+ -- Filter event with event name {{target_event}}
+ AND LOWER(E.event_name) IN ('{{target_event}}')
+ AND E.ga_session_id IS NOT NULL
+ AND D.device_os IS NOT NULL
+ -- Grouping by user pseudo ids
+ GROUP BY user_pseudo_id
+);
+
+-- All users in the platform
+CREATE OR REPLACE TEMP TABLE all_users_possible_{{target_event}}s as (
+ SELECT DISTINCT
+ -- User's unique identifier
+ Users.user_pseudo_id,
+ -- The event date for which {{target_event}}s are being considered
+ Days.event_date as event_date,
+ -- Placeholder columns for {{target_event}} counts in future days
+ NULL as {{target_event}}_day_1
+ FROM `{{mds_project_id}}.{{mds_dataset}}.event` Users
+ CROSS JOIN
+ -- Generates a list of dates for the current date (`input_date`)
+ (SELECT event_date FROM UNNEST(GENERATE_DATE_ARRAY(input_date, end_date, INTERVAL 1 DAY)) AS event_date) Days
+ WHERE Days.event_date = input_date
+ -- Filter event with valid sessions
+ AND Users.ga_session_id IS NOT NULL
+);
+
+
+CREATE OR REPLACE TEMP TABLE DataForTargetTable AS
+SELECT DISTINCT
+ -- Timestamp when the data was processed
+ CURRENT_TIMESTAMP() AS processed_timestamp,
+ -- The date for which {{target_event}}s are being considered
+ A.event_date as feature_date,
+ -- User's unique identifier
+ A.user_pseudo_id,
+ -- The maximum of 0 and the {{target_event}} count for day 1 (if it exists)
+ LEAST(COALESCE(B.{{target_event}}_day_1, 0), 1) AS {{target_event}}_day_1
+FROM all_users_possible_{{target_event}}s AS A
+LEFT JOIN future_{{target_event}}s_per_user AS B
+ON B.user_pseudo_id = A.user_pseudo_id
+;
+
+-- Updates or inserts data into the target table
+MERGE `{{project_id}}.{{dataset}}.{{insert_table}}` I
+USING DataForTargetTable T
+ON I.feature_date = T.feature_date
+ AND I.user_pseudo_id = T.user_pseudo_id
+WHEN MATCHED THEN
+ -- Updates existing records
+ UPDATE SET
+ -- Updates the processed timestamp
+ I.processed_timestamp = T.processed_timestamp,
+ -- Updates {{target_event}} counts for each day
+ I.{{target_event}}_day_1 = T.{{target_event}}_day_1
+WHEN NOT MATCHED THEN
+ -- Inserts new records
+ INSERT
+ (processed_timestamp,
+ feature_date,
+ user_pseudo_id,
+ {{target_event}}_day_1)
+ VALUES
+ (T.processed_timestamp,
+ T.feature_date,
+ T.user_pseudo_id,
+ T.{{target_event}}_day_1)
+;
+
+SET rows_added = (SELECT COUNT(DISTINCT user_pseudo_id) FROM `{{project_id}}.{{dataset}}.{{insert_table}}`);
diff --git a/sql/procedure/lead_score_propensity_training_preparation.sqlx b/sql/procedure/lead_score_propensity_training_preparation.sqlx
new file mode 100644
index 00000000..5d0f61e3
--- /dev/null
+++ b/sql/procedure/lead_score_propensity_training_preparation.sqlx
@@ -0,0 +1,569 @@
+-- Copyright 2023 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- http://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+DECLARE custom_start_date DATE DEFAULT NULL;
+DECLARE custom_end_date DATE DEFAULT NULL;
+
+-- custom_start_date: The start date of the data to be used for training.
+-- custom_end_date: The end date of the data to be used for training.
+SET custom_start_date = PARSE_DATE("%Y-%m-%d", {{custom_start_date}});
+SET custom_end_date = PARSE_DATE("%Y-%m-%d", {{custom_end_date}});
+
+-- The procedure first checks if the custom_start_date and custom_end_date parameters are valid.
+-- If either parameter is not valid, the procedure sets the corresponding date to the maximum or
+-- minimum date of the available data.
+IF custom_start_date IS NOT NULL AND custom_start_date >= start_date AND custom_start_date <= end_date
+ AND custom_start_date < custom_end_date THEN
+ SET start_date = custom_start_date;
+END IF;
+
+IF custom_end_date IS NOT NULL AND custom_end_date <= end_date AND custom_end_date >= start_date
+ AND custom_end_date > custom_start_date THEN
+ SET end_date = custom_end_date;
+END IF;
+
+-- This is a temp table consolidating user_dimensions over the dates intervals.
+CREATE OR REPLACE TEMP TABLE training_preparation_ud as (
+ SELECT DISTINCT
+ -- The user pseudo id
+ UD.user_pseudo_id,
+ -- The user id
+ MAX(UD.user_id) OVER(user_dimensions_window) AS user_id,
+ -- The feature date
+ UD.feature_date,
+ -- The user lifetime value revenue
+ MAX(UD.user_ltv_revenue) OVER(user_dimensions_window) AS user_ltv_revenue,
+ -- The device category
+ MAX(UD.device_category) OVER(user_dimensions_window) AS device_category,
+ -- The device brand name
+ MAX(UD.device_mobile_brand_name) OVER(user_dimensions_window) AS device_mobile_brand_name,
+ -- The device model name
+ MAX(UD.device_mobile_model_name) OVER(user_dimensions_window) AS device_mobile_model_name,
+ -- The device operating system
+ MAX(UD.device_os) OVER(user_dimensions_window) AS device_os,
+ -- The device language
+ MAX(UD.device_language) OVER(user_dimensions_window) AS device_language,
+ -- The device web browser
+ MAX(UD.device_web_browser) OVER(user_dimensions_window) AS device_web_browser,
+ -- The user sub continent
+ MAX(UD.geo_sub_continent) OVER(user_dimensions_window) AS geo_sub_continent,
+ -- The user country
+ MAX(UD.geo_country) OVER(user_dimensions_window) AS geo_country,
+ -- The user region
+ MAX(UD.geo_region) OVER(user_dimensions_window) AS geo_region,
+ -- The user city
+ MAX(UD.geo_city) OVER(user_dimensions_window) AS geo_city,
+ -- The user metro
+ MAX(UD.geo_metro) OVER(user_dimensions_window) AS geo_metro,
+ -- The user last traffic source medium
+ MAX(UD.last_traffic_source_medium) OVER(user_dimensions_window) AS last_traffic_source_medium,
+ -- The user last traffic source name
+ MAX(UD.last_traffic_source_name) OVER(user_dimensions_window) AS last_traffic_source_name,
+ -- The user last traffic source source
+ MAX(UD.last_traffic_source_source) OVER(user_dimensions_window) AS last_traffic_source_source,
+ -- The user first traffic source medium
+ MAX(UD.first_traffic_source_medium) OVER(user_dimensions_window) AS first_traffic_source_medium,
+ -- The user first traffic source name
+ MAX(UD.first_traffic_source_name) OVER(user_dimensions_window) AS first_traffic_source_name,
+ -- The user first traffic source source
+ MAX(UD.first_traffic_source_source) OVER(user_dimensions_window) AS first_traffic_source_source,
+ -- Whether the user has signed in with user ID
+ MAX(UD.has_signed_in_with_user_id) OVER(user_dimensions_window) AS has_signed_in_with_user_id,
+FROM
+ `{{feature_store_project_id}}.{{feature_store_dataset}}.user_dimensions` UD
+WHERE
+ -- Filter feature dates according to the defined date interval
+ UD.feature_date BETWEEN start_date AND end_date
+WINDOW
+ user_dimensions_window AS (PARTITION BY UD.user_pseudo_id, UD.feature_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
+);
+
+-- This is a temp table consolidating user rolling metrics over the dates intervals.
+CREATE OR REPLACE TEMP TABLE training_preparation_uwlm as (
+ SELECT DISTINCT
+ -- User pseudo id
+ UWLM.user_pseudo_id,
+ -- Feature date
+ UWLM.feature_date{% for feature in short_list_features %},
+ -- Calculate the maximum value for each metric over the window
+ MAX(UWLM.{{feature.feature_name}}_past_1_day) OVER(user_rolling_lead_window) AS {{feature.feature_name}}_past_1_day,
+ MAX(UWLM.{{feature.feature_name}}_past_2_day) OVER(user_rolling_lead_window) AS {{feature.feature_name}}_past_2_day,
+ MAX(UWLM.{{feature.feature_name}}_past_3_day) OVER(user_rolling_lead_window) AS {{feature.feature_name}}_past_3_day,
+ MAX(UWLM.{{feature.feature_name}}_past_4_day) OVER(user_rolling_lead_window) AS {{feature.feature_name}}_past_4_day,
+ MAX(UWLM.{{feature.feature_name}}_past_5_day) OVER(user_rolling_lead_window) AS {{feature.feature_name}}_past_5_day{% endfor %}
+FROM
+ `{{feature_store_project_id}}.{{feature_store_dataset}}.user_rolling_window_lead_metrics` UWLM
+WHERE
+ -- In the future consider `feature_date BETWEEN start_date AND end_date`, to process multiple days. Modify Partition BY
+ UWLM.feature_date BETWEEN start_date AND end_date
+WINDOW
+ user_rolling_lead_window AS (PARTITION BY UWLM.user_pseudo_id, UWLM.feature_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
+);
+
+-- This is a temp table consolidating user labels over the dates intervals.
+CREATE OR REPLACE TEMP TABLE training_preparation_label as (
+ SELECT DISTINCT
+ LABEL.user_pseudo_id, -- The unique identifier for the user.
+ LABEL.feature_date, -- The date for which the features are extracted.
+ MAX(LABEL.{{target_event}}_day_1) OVER(lead_score_propensity_label_window) AS {{target_event}}_day_1, -- Whether the user made a {{target_event}} on day 1.
+FROM
+ `{{feature_store_project_id}}.{{feature_store_dataset}}.lead_score_propensity_label` LABEL
+WHERE
+ -- Define the training subset interval
+ LABEL.feature_date BETWEEN start_date AND end_date
+WINDOW
+ lead_score_propensity_label_window AS (PARTITION BY LABEL.user_pseudo_id, LABEL.feature_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
+);
+
+-- This is a temp table consolidating all features and labels over the dates intervals.
+CREATE OR REPLACE TEMP TABLE training_preparation as (
+ SELECT DISTINCT
+ UD.user_pseudo_id,
+ UD.user_id,
+ UD.feature_date,
+ COALESCE(UD.user_ltv_revenue, 0.0) AS user_ltv_revenue,
+ UD.device_category,
+ UD.device_mobile_brand_name,
+ UD.device_mobile_model_name,
+ UD.device_os,
+ UD.device_language,
+ UD.device_web_browser,
+ UD.geo_sub_continent,
+ UD.geo_country,
+ UD.geo_region,
+ UD.geo_city,
+ UD.geo_metro,
+ UD.last_traffic_source_medium,
+ UD.last_traffic_source_name,
+ UD.last_traffic_source_source,
+ UD.first_traffic_source_medium,
+ UD.first_traffic_source_name,
+ UD.first_traffic_source_source,
+ UD.has_signed_in_with_user_id,{% for feature in short_list_features %}
+ UWLM.{{feature.feature_name}}_past_1_day,
+ UWLM.{{feature.feature_name}}_past_2_day,
+ UWLM.{{feature.feature_name}}_past_3_day,
+ UWLM.{{feature.feature_name}}_past_4_day,
+ UWLM.{{feature.feature_name}}_past_5_day,{% endfor %}
+ LABEL.{{target_event}}_day_1
+FROM
+ training_preparation_ud UD
+INNER JOIN
+ training_preparation_uwlm UWLM
+ON
+ UWLM.user_pseudo_id = UD.user_pseudo_id
+ AND UWLM.feature_date = UD.feature_date
+INNER JOIN
+ training_preparation_label LABEL
+ON
+ LABEL.user_pseudo_id = UD.user_pseudo_id
+ AND LABEL.feature_date = UD.feature_date
+);
+
+-- This is a temp table split the rows in each different data_split (TRAIN, VALIDATE, TEST) split
+CREATE OR REPLACE TEMP TABLE DataForTargetTable AS(
+ SELECT DISTINCT
+ CASE
+ WHEN (ABS(MOD(FARM_FINGERPRINT(user_pseudo_id), 10)) BETWEEN 0 AND train_split_end_number) THEN "TRAIN"
+ WHEN (ABS(MOD(FARM_FINGERPRINT(user_pseudo_id), 10)) BETWEEN train_split_end_number AND validation_split_end_number) THEN "VALIDATE"
+ WHEN (ABS(MOD(FARM_FINGERPRINT(user_pseudo_id), 10)) BETWEEN validation_split_end_number AND 9) THEN "TEST"
+ END as data_split,
+ feature_date,
+ user_pseudo_id,
+ user_id,
+ user_ltv_revenue,
+ device_category,
+ device_mobile_brand_name,
+ device_mobile_model_name,
+ device_os,
+ device_language,
+ device_web_browser,
+ geo_sub_continent,
+ geo_country,
+ geo_region,
+ geo_city,
+ geo_metro,
+ last_traffic_source_medium,
+ last_traffic_source_name,
+ last_traffic_source_source,
+ first_traffic_source_medium,
+ first_traffic_source_name,
+ first_traffic_source_source,
+ has_signed_in_with_user_id,{% for feature in short_list_features %}
+ {{feature.feature_name}}_past_1_day,
+ {{feature.feature_name}}_past_2_day,
+ {{feature.feature_name}}_past_3_day,
+ {{feature.feature_name}}_past_4_day,
+ {{feature.feature_name}}_past_5_day,{% endfor %}
+ {{target_event}}_day_1
+ FROM training_preparation);
+
+CREATE OR REPLACE TABLE `{{project_id}}.{{dataset}}.lead_score_propensity_training_full_dataset` AS
+SELECT DISTINCT * FROM DataForTargetTable
+WHERE data_split IS NOT NULL;
+
+
+-- This is a table preparing rows for lead score propensity modelling looking back 5 days and looking ahead 1 day.
+CREATE OR REPLACE TABLE `{{project_id}}.{{dataset}}.lead_score_propensity_training_5_1` AS(
+ SELECT DISTINCT
+ CURRENT_TIMESTAMP() AS processed_timestamp,
+ data_split,
+ feature_date,
+ user_pseudo_id,
+ LAST_VALUE(user_id) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS user_id,
+ LAST_VALUE(user_ltv_revenue) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS user_ltv_revenue,
+ LAST_VALUE(device_category) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_category,
+ LAST_VALUE(device_mobile_brand_name) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_mobile_brand_name,
+ LAST_VALUE(device_mobile_model_name) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_mobile_model_name,
+ LAST_VALUE(device_os) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_os,
+ LAST_VALUE(device_language) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_language,
+ LAST_VALUE(device_web_browser) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_web_browser,
+ LAST_VALUE(geo_sub_continent) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_sub_continent,
+ LAST_VALUE(geo_country) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_country,
+ LAST_VALUE(geo_region) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_region,
+ LAST_VALUE(geo_city) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_city,
+ LAST_VALUE(geo_metro) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_metro,
+ LAST_VALUE(last_traffic_source_medium) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS last_traffic_source_medium,
+ LAST_VALUE(last_traffic_source_name) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS last_traffic_source_name,
+ LAST_VALUE(last_traffic_source_source) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS last_traffic_source_source,
+ LAST_VALUE(first_traffic_source_medium) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS first_traffic_source_medium,
+ LAST_VALUE(first_traffic_source_name) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS first_traffic_source_name,
+ LAST_VALUE(first_traffic_source_source) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS first_traffic_source_source,
+ LAST_VALUE(has_signed_in_with_user_id) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS has_signed_in_with_user_id,{% for feature in short_list_features %}
+ LAST_VALUE({{feature.feature_name}}_past_1_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS {{feature.feature_name}}_past_1_day,
+ LAST_VALUE({{feature.feature_name}}_past_2_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS {{feature.feature_name}}_past_2_day,
+ LAST_VALUE({{feature.feature_name}}_past_3_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS {{feature.feature_name}}_past_3_day,
+ LAST_VALUE({{feature.feature_name}}_past_4_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS {{feature.feature_name}}_past_4_day,
+ LAST_VALUE({{feature.feature_name}}_past_5_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS {{feature.feature_name}}_past_5_day,{% endfor %}
+ -- Calculate the will {{target_event}} label.
+ -- Label for a lead score propensity model. It indicates whether a user made a lead score within the next 30 days based on their lead score history.
+ -- This label is then used to train a model that can predict the likelihood of future {{target_event}}s for other users.
+ LAST_VALUE(CASE WHEN ({{target_event}}_day_1) = 0 THEN 0 ELSE 1 END) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) as will_{{target_event}}
+ FROM `{{project_id}}.{{dataset}}.lead_score_propensity_training_full_dataset`
+);
+
+
+-- This is a view preparing rows for lead score propensity modelling looking back 5 days and looking ahead 1 days.
+CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_lead_score_propensity_training_5_1`
+(processed_timestamp,
+ data_split,
+ user_pseudo_id,
+ user_id,
+ user_ltv_revenue,
+ device_category,
+ device_mobile_brand_name,
+ device_mobile_model_name,
+ device_os,
+ device_language,
+ device_web_browser,
+ geo_sub_continent,
+ geo_country,
+ geo_region,
+ geo_city,
+ geo_metro,
+ last_traffic_source_medium,
+ last_traffic_source_name,
+ last_traffic_source_source,
+ first_traffic_source_medium,
+ first_traffic_source_name,
+ first_traffic_source_source,
+ has_signed_in_with_user_id,{% for feature in short_list_features %}
+ {{feature.feature_name}}_past_1_day,
+ {{feature.feature_name}}_past_2_day,
+ {{feature.feature_name}}_past_3_day,
+ {{feature.feature_name}}_past_4_day,
+ {{feature.feature_name}}_past_5_day,{% endfor %}
+ will_{{target_event}})
+OPTIONS(
+ --expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
+ friendly_name="v_lead_score_propensity_training_5_1",
+ description="View Lead Score Propensity Training dataset using 5 days back to predict 1 day ahead. View expires after 48h and should run daily.",
+ labels=[("org_unit", "development")]
+) AS
+SELECT DISTINCT
+ * EXCEPT(feature_date, row_order_peruser_persplit)
+FROM (
+SELECT DISTINCT
+ processed_timestamp,
+ user_pseudo_id,
+ data_split,
+ feature_date,
+ -- Now, I want to skip rows per user, per split every 1 day.
+ ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, data_split, will_{{target_event}} ORDER BY feature_date ASC) AS row_order_peruser_persplit,
+ user_id,
+ user_ltv_revenue,
+ device_category,
+ device_mobile_brand_name,
+ device_mobile_model_name,
+ device_os,
+ device_language,
+ device_web_browser,
+ geo_sub_continent,
+ geo_country,
+ geo_region,
+ geo_city,
+ geo_metro,
+ last_traffic_source_medium,
+ last_traffic_source_name,
+ last_traffic_source_source,
+ first_traffic_source_medium,
+ first_traffic_source_name,
+ first_traffic_source_source,
+ has_signed_in_with_user_id,{% for feature in short_list_features %}
+ {{feature.feature_name}}_past_1_day,
+ {{feature.feature_name}}_past_2_day,
+ {{feature.feature_name}}_past_3_day,
+ {{feature.feature_name}}_past_4_day,
+ {{feature.feature_name}}_past_5_day,{% endfor %}
+ will_{{target_event}}
+FROM(
+SELECT DISTINCT
+ processed_timestamp,
+ data_split,
+ feature_date,
+ user_pseudo_id,
+ user_id,
+ user_ltv_revenue,
+ device_category,
+ device_mobile_brand_name,
+ device_mobile_model_name,
+ device_os,
+ device_language,
+ device_web_browser,
+ geo_sub_continent,
+ geo_country,
+ geo_region,
+ geo_city,
+ geo_metro,
+ last_traffic_source_medium,
+ last_traffic_source_name,
+ last_traffic_source_source,
+ first_traffic_source_medium,
+ first_traffic_source_name,
+ first_traffic_source_source,
+ has_signed_in_with_user_id,{% for feature in short_list_features %}
+ {{feature.feature_name}}_past_1_day,
+ {{feature.feature_name}}_past_2_day,
+ {{feature.feature_name}}_past_3_day,
+ {{feature.feature_name}}_past_4_day,
+ {{feature.feature_name}}_past_5_day,{% endfor %}
+ will_{{target_event}},
+ -- Number of rows per user, per day, per split. Only one row per user, per day, per slip.
+ ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, feature_date, data_split, will_{{target_event}} ORDER BY feature_date DESC) AS row_order_peruser_perday_persplit
+ FROM `{{project_id}}.{{dataset}}.lead_score_propensity_training_5_1`
+)
+WHERE
+ row_order_peruser_perday_persplit = 1
+)
+WHERE
+ --Skipping windows of 5 days, which is the past window size.
+ MOD(row_order_peruser_persplit-1, 5) = 0;
+
+
+-- This is a view preparing rows for lead score propensity modelling looking back 5 days and looking ahead 1 day.
+-- This specifically filter rows which are most recent for each user.
+CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_lead_score_propensity_training_5_1_last_window`
+(processed_timestamp,
+ data_split,
+ user_pseudo_id,
+ user_id,
+ user_ltv_revenue,
+ device_category,
+ device_mobile_brand_name,
+ device_mobile_model_name,
+ device_os,
+ device_language,
+ device_web_browser,
+ geo_sub_continent,
+ geo_country,
+ geo_region,
+ geo_city,
+ geo_metro,
+ last_traffic_source_medium,
+ last_traffic_source_name,
+ last_traffic_source_source,
+ first_traffic_source_medium,
+ first_traffic_source_name,
+ first_traffic_source_source,
+ has_signed_in_with_user_id,{% for feature in short_list_features %}
+ {{feature.feature_name}}_past_1_day,
+ {{feature.feature_name}}_past_2_day,
+ {{feature.feature_name}}_past_3_day,
+ {{feature.feature_name}}_past_4_day,
+ {{feature.feature_name}}_past_5_day,{% endfor %}
+ will_{{target_event}})
+OPTIONS(
+ --expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
+ friendly_name="v_lead_score_propensity_training_5_1_last_window",
+ description="View Lead Score Propensity Training dataset using 5 days back to predict 1 day ahead.",
+ labels=[("org_unit", "development")]
+) AS
+SELECT DISTINCT
+ processed_timestamp,
+ data_split,
+ user_pseudo_id,
+ user_id,
+ user_ltv_revenue,
+ device_category,
+ device_mobile_brand_name,
+ device_mobile_model_name,
+ device_os,
+ device_language,
+ device_web_browser,
+ geo_sub_continent,
+ geo_country,
+ geo_region,
+ geo_city,
+ geo_metro,
+ last_traffic_source_medium,
+ last_traffic_source_name,
+ last_traffic_source_source,
+ first_traffic_source_medium,
+ first_traffic_source_name,
+ first_traffic_source_source,
+ has_signed_in_with_user_id,{% for feature in short_list_features %}
+ {{feature.feature_name}}_past_1_day,
+ {{feature.feature_name}}_past_2_day,
+ {{feature.feature_name}}_past_3_day,
+ {{feature.feature_name}}_past_4_day,
+ {{feature.feature_name}}_past_5_day,{% endfor %}
+ will_{{target_event}}
+FROM(
+SELECT DISTINCT
+ processed_timestamp,
+ data_split,
+ user_pseudo_id,
+ user_id,
+ user_ltv_revenue,
+ device_category,
+ device_mobile_brand_name,
+ device_mobile_model_name,
+ device_os,
+ device_language,
+ device_web_browser,
+ geo_sub_continent,
+ geo_country,
+ geo_region,
+ geo_city,
+ geo_metro,
+ last_traffic_source_medium,
+ last_traffic_source_name,
+ last_traffic_source_source,
+ first_traffic_source_medium,
+ first_traffic_source_name,
+ first_traffic_source_source,
+ has_signed_in_with_user_id,{% for feature in short_list_features %}
+ {{feature.feature_name}}_past_1_day,
+ {{feature.feature_name}}_past_2_day,
+ {{feature.feature_name}}_past_3_day,
+ {{feature.feature_name}}_past_4_day,
+ {{feature.feature_name}}_past_5_day,{% endfor %}
+ will_{{target_event}},
+ ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, data_split, will_{{target_event}} ORDER BY feature_date DESC) AS user_row_order
+ --ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, data_split ORDER BY feature_date DESC) AS user_row_order
+ FROM `{{project_id}}.{{dataset}}.lead_score_propensity_training_5_1`
+)
+WHERE
+ user_row_order = 1;
+
+
+-- This is a view preparing rows for lead score propensity modelling looking back 5 days and looking ahead 1 day.
+-- This is to be used in case recently no {{target_event}}s are registered, and you don't have a way to train the classification model.
+CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_lead_score_propensity_training_5_1_rare_{{target_event}}s`
+(processed_timestamp,
+ data_split,
+ user_pseudo_id,
+ user_id,
+ user_ltv_revenue,
+ device_category,
+ device_mobile_brand_name,
+ device_mobile_model_name,
+ device_os,
+ device_language,
+ device_web_browser,
+ geo_sub_continent,
+ geo_country,
+ geo_region,
+ geo_city,
+ geo_metro,
+ last_traffic_source_medium,
+ last_traffic_source_name,
+ last_traffic_source_source,
+ first_traffic_source_medium,
+ first_traffic_source_name,
+ first_traffic_source_source,
+ has_signed_in_with_user_id,{% for feature in short_list_features %}
+ {{feature.feature_name}}_past_1_day,
+ {{feature.feature_name}}_past_2_day,
+ {{feature.feature_name}}_past_3_day,
+ {{feature.feature_name}}_past_4_day,
+ {{feature.feature_name}}_past_5_day,{% endfor %}
+ will_{{target_event}})
+OPTIONS(
+ --expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
+ friendly_name="v_lead_score_propensity_training_5_1_rare_{{target_event}}s",
+ description="View Lead Score Propensity Training dataset using 5 days back to predict 1 day ahead.",
+ labels=[("org_unit", "development")]
+) AS
+SELECT DISTINCT
+ processed_timestamp,
+ data_split,
+ user_pseudo_id,
+ user_id,
+ user_ltv_revenue,
+ device_category,
+ device_mobile_brand_name,
+ device_mobile_model_name,
+ device_os,
+ device_language,
+ device_web_browser,
+ geo_sub_continent,
+ geo_country,
+ geo_region,
+ geo_city,
+ geo_metro,
+ last_traffic_source_medium,
+ last_traffic_source_name,
+ last_traffic_source_source,
+ first_traffic_source_medium,
+ first_traffic_source_name,
+ first_traffic_source_source,
+ has_signed_in_with_user_id,{% for feature in short_list_features %}
+ {{feature.feature_name}}_past_1_day,
+ {{feature.feature_name}}_past_2_day,
+ {{feature.feature_name}}_past_3_day,
+ {{feature.feature_name}}_past_4_day,
+ {{feature.feature_name}}_past_5_day,{% endfor %}
+ will_{{target_event}}
+ FROM
+ (SELECT DISTINCT
+ *
+ FROM `{{project_id}}.{{dataset}}.v_lead_score_propensity_training_5_1_last_window`
+ )
+ UNION ALL
+ (
+ SELECT DISTINCT
+ * EXCEPT(user_row_order, feature_date)
+ FROM(
+ SELECT DISTINCT
+ *,
+ ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, data_split ORDER BY feature_date DESC) AS user_row_order
+ FROM `{{project_id}}.{{dataset}}.lead_score_propensity_training_5_1`
+ WHERE will_{{target_event}} = 1
+ )
+ WHERE
+ user_row_order = 1
+ LIMIT 100
+ )
+;
\ No newline at end of file
diff --git a/sql/procedure/purchase_propensity_inference_preparation.sqlx b/sql/procedure/purchase_propensity_inference_preparation.sqlx
index c848da8d..63194a95 100644
--- a/sql/procedure/purchase_propensity_inference_preparation.sqlx
+++ b/sql/procedure/purchase_propensity_inference_preparation.sqlx
@@ -627,626 +627,6 @@ CREATE OR REPLACE TABLE `{{project_id}}.{{dataset}}.purchase_propensity_inferenc
FROM `{{project_id}}.{{dataset}}.{{insert_table}}`
);
-CREATE OR REPLACE TABLE `{{project_id}}.{{dataset}}.purchase_propensity_inference_15_15` AS(
- SELECT DISTINCT
- CURRENT_TIMESTAMP() AS processed_timestamp,
- feature_date,
- user_pseudo_id,
- LAST_VALUE(user_id) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS user_id,
- LAST_VALUE(user_ltv_revenue) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS user_ltv_revenue,
- LAST_VALUE(device_category) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS device_category,
- LAST_VALUE(device_mobile_brand_name) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS device_mobile_brand_name,
- LAST_VALUE(device_mobile_model_name) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS device_mobile_model_name,
- LAST_VALUE(device_os) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS device_os,
- LAST_VALUE(device_language) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS device_language,
- LAST_VALUE(device_web_browser) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS device_web_browser,
- LAST_VALUE(geo_sub_continent) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS geo_sub_continent,
- LAST_VALUE(geo_country) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS geo_country,
- LAST_VALUE(geo_region) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS geo_region,
- LAST_VALUE(geo_city) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS geo_city,
- LAST_VALUE(geo_metro) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS geo_metro,
- LAST_VALUE(last_traffic_source_medium) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS last_traffic_source_medium,
- LAST_VALUE(last_traffic_source_name) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS last_traffic_source_name,
- LAST_VALUE(last_traffic_source_source) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS last_traffic_source_source,
- LAST_VALUE(first_traffic_source_medium) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS first_traffic_source_medium,
- LAST_VALUE(first_traffic_source_name) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS first_traffic_source_name,
- LAST_VALUE(first_traffic_source_source) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS first_traffic_source_source,
- LAST_VALUE(has_signed_in_with_user_id) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS has_signed_in_with_user_id,
- LAST_VALUE(active_users_past_1_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS active_users_past_1_day,
- LAST_VALUE(active_users_past_2_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS active_users_past_2_day,
- LAST_VALUE(active_users_past_3_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS active_users_past_3_day,
- LAST_VALUE(active_users_past_4_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS active_users_past_4_day,
- LAST_VALUE(active_users_past_5_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS active_users_past_5_day,
- LAST_VALUE(active_users_past_6_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS active_users_past_6_day,
- LAST_VALUE(active_users_past_7_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS active_users_past_7_day,
- LAST_VALUE(active_users_past_8_14_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS active_users_past_8_14_day,
- LAST_VALUE(purchases_past_1_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS purchases_past_1_day,
- LAST_VALUE(purchases_past_2_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS purchases_past_2_day,
- LAST_VALUE(purchases_past_3_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS purchases_past_3_day,
- LAST_VALUE(purchases_past_4_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS purchases_past_4_day,
- LAST_VALUE(purchases_past_5_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS purchases_past_5_day,
- LAST_VALUE(purchases_past_6_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS purchases_past_6_day,
- LAST_VALUE(purchases_past_7_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS purchases_past_7_day,
- LAST_VALUE(purchases_past_8_14_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS purchases_past_8_14_day,
- LAST_VALUE(visits_past_1_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS visits_past_1_day,
- LAST_VALUE(visits_past_2_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS visits_past_2_day,
- LAST_VALUE(visits_past_3_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS visits_past_3_day,
- LAST_VALUE(visits_past_4_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS visits_past_4_day,
- LAST_VALUE(visits_past_5_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS visits_past_5_day,
- LAST_VALUE(visits_past_6_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS visits_past_6_day,
- LAST_VALUE(visits_past_7_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS visits_past_7_day,
- LAST_VALUE(visits_past_8_14_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS visits_past_8_14_day,
- LAST_VALUE(view_items_past_1_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS view_items_past_1_day,
- LAST_VALUE(view_items_past_2_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS view_items_past_2_day,
- LAST_VALUE(view_items_past_3_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS view_items_past_3_day,
- LAST_VALUE(view_items_past_4_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS view_items_past_4_day,
- LAST_VALUE(view_items_past_5_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS view_items_past_5_day,
- LAST_VALUE(view_items_past_6_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS view_items_past_6_day,
- LAST_VALUE(view_items_past_7_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS view_items_past_7_day,
- LAST_VALUE(view_items_past_8_14_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS view_items_past_8_14_day,
- LAST_VALUE(add_to_carts_past_1_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS add_to_carts_past_1_day,
- LAST_VALUE(add_to_carts_past_2_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS add_to_carts_past_2_day,
- LAST_VALUE(add_to_carts_past_3_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS add_to_carts_past_3_day,
- LAST_VALUE(add_to_carts_past_4_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS add_to_carts_past_4_day,
- LAST_VALUE(add_to_carts_past_5_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS add_to_carts_past_5_day,
- LAST_VALUE(add_to_carts_past_6_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS add_to_carts_past_6_day,
- LAST_VALUE(add_to_carts_past_7_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS add_to_carts_past_7_day,
- LAST_VALUE(add_to_carts_past_8_14_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS add_to_carts_past_8_14_day,
- LAST_VALUE(checkouts_past_1_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS checkouts_past_1_day,
- LAST_VALUE(checkouts_past_2_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS checkouts_past_2_day,
- LAST_VALUE(checkouts_past_3_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS checkouts_past_3_day,
- LAST_VALUE(checkouts_past_4_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS checkouts_past_4_day,
- LAST_VALUE(checkouts_past_5_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS checkouts_past_5_day,
- LAST_VALUE(checkouts_past_6_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS checkouts_past_6_day,
- LAST_VALUE(checkouts_past_7_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS checkouts_past_7_day,
- LAST_VALUE(checkouts_past_8_14_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS checkouts_past_8_14_day,
- FROM `{{project_id}}.{{dataset}}.{{insert_table}}`
-);
-
-CREATE OR REPLACE TABLE `{{project_id}}.{{dataset}}.purchase_propensity_inference_15_7` AS(
- SELECT DISTINCT
- CURRENT_TIMESTAMP() AS processed_timestamp,
- feature_date,
- user_pseudo_id,
- LAST_VALUE(user_id) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS user_id,
- LAST_VALUE(user_ltv_revenue) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS user_ltv_revenue,
- LAST_VALUE(device_category) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS device_category,
- LAST_VALUE(device_mobile_brand_name) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS device_mobile_brand_name,
- LAST_VALUE(device_mobile_model_name) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS device_mobile_model_name,
- LAST_VALUE(device_os) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS device_os,
- LAST_VALUE(device_language) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS device_language,
- LAST_VALUE(device_web_browser) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS device_web_browser,
- LAST_VALUE(geo_sub_continent) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS geo_sub_continent,
- LAST_VALUE(geo_country) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS geo_country,
- LAST_VALUE(geo_region) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS geo_region,
- LAST_VALUE(geo_city) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS geo_city,
- LAST_VALUE(geo_metro) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS geo_metro,
- LAST_VALUE(last_traffic_source_medium) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS last_traffic_source_medium,
- LAST_VALUE(last_traffic_source_name) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS last_traffic_source_name,
- LAST_VALUE(last_traffic_source_source) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS last_traffic_source_source,
- LAST_VALUE(first_traffic_source_medium) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS first_traffic_source_medium,
- LAST_VALUE(first_traffic_source_name) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS first_traffic_source_name,
- LAST_VALUE(first_traffic_source_source) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS first_traffic_source_source,
- LAST_VALUE(has_signed_in_with_user_id) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS has_signed_in_with_user_id,
- LAST_VALUE(active_users_past_1_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS active_users_past_1_day,
- LAST_VALUE(active_users_past_2_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS active_users_past_2_day,
- LAST_VALUE(active_users_past_3_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS active_users_past_3_day,
- LAST_VALUE(active_users_past_4_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS active_users_past_4_day,
- LAST_VALUE(active_users_past_5_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS active_users_past_5_day,
- LAST_VALUE(active_users_past_6_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS active_users_past_6_day,
- LAST_VALUE(active_users_past_7_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS active_users_past_7_day,
- LAST_VALUE(active_users_past_8_14_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS active_users_past_8_14_day,
- LAST_VALUE(purchases_past_1_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS purchases_past_1_day,
- LAST_VALUE(purchases_past_2_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS purchases_past_2_day,
- LAST_VALUE(purchases_past_3_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS purchases_past_3_day,
- LAST_VALUE(purchases_past_4_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS purchases_past_4_day,
- LAST_VALUE(purchases_past_5_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS purchases_past_5_day,
- LAST_VALUE(purchases_past_6_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS purchases_past_6_day,
- LAST_VALUE(purchases_past_7_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS purchases_past_7_day,
- LAST_VALUE(purchases_past_8_14_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS purchases_past_8_14_day,
- LAST_VALUE(visits_past_1_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS visits_past_1_day,
- LAST_VALUE(visits_past_2_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS visits_past_2_day,
- LAST_VALUE(visits_past_3_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS visits_past_3_day,
- LAST_VALUE(visits_past_4_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS visits_past_4_day,
- LAST_VALUE(visits_past_5_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS visits_past_5_day,
- LAST_VALUE(visits_past_6_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS visits_past_6_day,
- LAST_VALUE(visits_past_7_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS visits_past_7_day,
- LAST_VALUE(visits_past_8_14_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS visits_past_8_14_day,
- LAST_VALUE(view_items_past_1_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS view_items_past_1_day,
- LAST_VALUE(view_items_past_2_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS view_items_past_2_day,
- LAST_VALUE(view_items_past_3_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS view_items_past_3_day,
- LAST_VALUE(view_items_past_4_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS view_items_past_4_day,
- LAST_VALUE(view_items_past_5_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS view_items_past_5_day,
- LAST_VALUE(view_items_past_6_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS view_items_past_6_day,
- LAST_VALUE(view_items_past_7_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS view_items_past_7_day,
- LAST_VALUE(view_items_past_8_14_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS view_items_past_8_14_day,
- LAST_VALUE(add_to_carts_past_1_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS add_to_carts_past_1_day,
- LAST_VALUE(add_to_carts_past_2_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS add_to_carts_past_2_day,
- LAST_VALUE(add_to_carts_past_3_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS add_to_carts_past_3_day,
- LAST_VALUE(add_to_carts_past_4_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS add_to_carts_past_4_day,
- LAST_VALUE(add_to_carts_past_5_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS add_to_carts_past_5_day,
- LAST_VALUE(add_to_carts_past_6_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS add_to_carts_past_6_day,
- LAST_VALUE(add_to_carts_past_7_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS add_to_carts_past_7_day,
- LAST_VALUE(add_to_carts_past_8_14_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS add_to_carts_past_8_14_day,
- LAST_VALUE(checkouts_past_1_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS checkouts_past_1_day,
- LAST_VALUE(checkouts_past_2_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS checkouts_past_2_day,
- LAST_VALUE(checkouts_past_3_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS checkouts_past_3_day,
- LAST_VALUE(checkouts_past_4_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS checkouts_past_4_day,
- LAST_VALUE(checkouts_past_5_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS checkouts_past_5_day,
- LAST_VALUE(checkouts_past_6_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS checkouts_past_6_day,
- LAST_VALUE(checkouts_past_7_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS checkouts_past_7_day,
- LAST_VALUE(checkouts_past_8_14_day) OVER(PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS checkouts_past_8_14_day
- FROM `{{project_id}}.{{dataset}}.{{insert_table}}`
-);
-
-CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_purchase_propensity_inference_15_15`
-(processed_timestamp,
- feature_date,
- user_pseudo_id,
- user_id,
- user_ltv_revenue,
- device_category,
- device_mobile_brand_name,
- device_mobile_model_name,
- device_os,
- device_language,
- device_web_browser,
- geo_sub_continent,
- geo_country,
- geo_region,
- geo_city,
- geo_metro,
- last_traffic_source_medium,
- last_traffic_source_name,
- last_traffic_source_source,
- first_traffic_source_medium,
- first_traffic_source_name,
- first_traffic_source_source,
- has_signed_in_with_user_id,
- active_users_past_1_day,
- active_users_past_2_day,
- active_users_past_3_day,
- active_users_past_4_day,
- active_users_past_5_day,
- active_users_past_6_day,
- active_users_past_7_day,
- active_users_past_8_14_day,
- purchases_past_1_day,
- purchases_past_2_day,
- purchases_past_3_day,
- purchases_past_4_day,
- purchases_past_5_day,
- purchases_past_6_day,
- purchases_past_7_day,
- purchases_past_8_14_day,
- visits_past_1_day,
- visits_past_2_day,
- visits_past_3_day,
- visits_past_4_day,
- visits_past_5_day,
- visits_past_6_day,
- visits_past_7_day,
- visits_past_8_14_day,
- view_items_past_1_day,
- view_items_past_2_day,
- view_items_past_3_day,
- view_items_past_4_day,
- view_items_past_5_day,
- view_items_past_6_day,
- view_items_past_7_day,
- view_items_past_8_14_day,
- add_to_carts_past_1_day,
- add_to_carts_past_2_day,
- add_to_carts_past_3_day,
- add_to_carts_past_4_day,
- add_to_carts_past_5_day,
- add_to_carts_past_6_day,
- add_to_carts_past_7_day,
- add_to_carts_past_8_14_day,
- checkouts_past_1_day,
- checkouts_past_2_day,
- checkouts_past_3_day,
- checkouts_past_4_day,
- checkouts_past_5_day,
- checkouts_past_6_day,
- checkouts_past_7_day,
- checkouts_past_8_14_day
- )
-OPTIONS(
- --expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL {{expiration_duration_hours}} HOUR),
- friendly_name="v_purchase_propensity_inference_15_15",
- description="View Purchase Propensity Inference dataset using 15 days back to predict 15 days ahead. View expires after 48h and should run daily.",
- labels=[("org_unit", "development")]
-) AS
-SELECT DISTINCT
- processed_timestamp,
- feature_date,
- user_pseudo_id,
- user_id,
- user_ltv_revenue,
- device_category,
- device_mobile_brand_name,
- device_mobile_model_name,
- device_os,
- device_language,
- device_web_browser,
- geo_sub_continent,
- geo_country,
- geo_region,
- geo_city,
- geo_metro,
- last_traffic_source_medium,
- last_traffic_source_name,
- last_traffic_source_source,
- first_traffic_source_medium,
- first_traffic_source_name,
- first_traffic_source_source,
- has_signed_in_with_user_id,
- active_users_past_1_day,
- active_users_past_2_day,
- active_users_past_3_day,
- active_users_past_4_day,
- active_users_past_5_day,
- active_users_past_6_day,
- active_users_past_7_day,
- active_users_past_8_14_day,
- purchases_past_1_day,
- purchases_past_2_day,
- purchases_past_3_day,
- purchases_past_4_day,
- purchases_past_5_day,
- purchases_past_6_day,
- purchases_past_7_day,
- purchases_past_8_14_day,
- visits_past_1_day,
- visits_past_2_day,
- visits_past_3_day,
- visits_past_4_day,
- visits_past_5_day,
- visits_past_6_day,
- visits_past_7_day,
- visits_past_8_14_day,
- view_items_past_1_day,
- view_items_past_2_day,
- view_items_past_3_day,
- view_items_past_4_day,
- view_items_past_5_day,
- view_items_past_6_day,
- view_items_past_7_day,
- view_items_past_8_14_day,
- add_to_carts_past_1_day,
- add_to_carts_past_2_day,
- add_to_carts_past_3_day,
- add_to_carts_past_4_day,
- add_to_carts_past_5_day,
- add_to_carts_past_6_day,
- add_to_carts_past_7_day,
- add_to_carts_past_8_14_day,
- checkouts_past_1_day,
- checkouts_past_2_day,
- checkouts_past_3_day,
- checkouts_past_4_day,
- checkouts_past_5_day,
- checkouts_past_6_day,
- checkouts_past_7_day,
- checkouts_past_8_14_day
-FROM (
- SELECT DISTINCT
- processed_timestamp,
- feature_date,
- user_pseudo_id,
- user_id,
- user_ltv_revenue,
- device_category,
- device_mobile_brand_name,
- device_mobile_model_name,
- device_os,
- device_language,
- device_web_browser,
- geo_sub_continent,
- geo_country,
- geo_region,
- geo_city,
- geo_metro,
- last_traffic_source_medium,
- last_traffic_source_name,
- last_traffic_source_source,
- first_traffic_source_medium,
- first_traffic_source_name,
- first_traffic_source_source,
- has_signed_in_with_user_id,
- active_users_past_1_day,
- active_users_past_2_day,
- active_users_past_3_day,
- active_users_past_4_day,
- active_users_past_5_day,
- active_users_past_6_day,
- active_users_past_7_day,
- active_users_past_8_14_day,
- purchases_past_1_day,
- purchases_past_2_day,
- purchases_past_3_day,
- purchases_past_4_day,
- purchases_past_5_day,
- purchases_past_6_day,
- purchases_past_7_day,
- purchases_past_8_14_day,
- visits_past_1_day,
- visits_past_2_day,
- visits_past_3_day,
- visits_past_4_day,
- visits_past_5_day,
- visits_past_6_day,
- visits_past_7_day,
- visits_past_8_14_day,
- view_items_past_1_day,
- view_items_past_2_day,
- view_items_past_3_day,
- view_items_past_4_day,
- view_items_past_5_day,
- view_items_past_6_day,
- view_items_past_7_day,
- view_items_past_8_14_day,
- add_to_carts_past_1_day,
- add_to_carts_past_2_day,
- add_to_carts_past_3_day,
- add_to_carts_past_4_day,
- add_to_carts_past_5_day,
- add_to_carts_past_6_day,
- add_to_carts_past_7_day,
- add_to_carts_past_8_14_day,
- checkouts_past_1_day,
- checkouts_past_2_day,
- checkouts_past_3_day,
- checkouts_past_4_day,
- checkouts_past_5_day,
- checkouts_past_6_day,
- checkouts_past_7_day,
- checkouts_past_8_14_day,
- -- Row number partitioned by user_pseudo_id and ordered by feature_date descending
- ROW_NUMBER() OVER (PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS user_row_order
- FROM `{{project_id}}.{{dataset}}.purchase_propensity_inference_15_15`
-)
-WHERE
- -- Filter only for the most recent user example
- user_row_order = 1;
-
-
-
-CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_purchase_propensity_inference_15_7`
-(
- processed_timestamp,
- feature_date,
- user_pseudo_id,
- user_id,
- user_ltv_revenue,
- device_category,
- device_mobile_brand_name,
- device_mobile_model_name,
- device_os,
- device_language,
- device_web_browser,
- geo_sub_continent,
- geo_country,
- geo_region,
- geo_city,
- geo_metro,
- last_traffic_source_medium,
- last_traffic_source_name,
- last_traffic_source_source,
- first_traffic_source_medium,
- first_traffic_source_name,
- first_traffic_source_source,
- has_signed_in_with_user_id,
- active_users_past_1_day,
- active_users_past_2_day,
- active_users_past_3_day,
- active_users_past_4_day,
- active_users_past_5_day,
- active_users_past_6_day,
- active_users_past_7_day,
- active_users_past_8_14_day,
- purchases_past_1_day,
- purchases_past_2_day,
- purchases_past_3_day,
- purchases_past_4_day,
- purchases_past_5_day,
- purchases_past_6_day,
- purchases_past_7_day,
- purchases_past_8_14_day,
- visits_past_1_day,
- visits_past_2_day,
- visits_past_3_day,
- visits_past_4_day,
- visits_past_5_day,
- visits_past_6_day,
- visits_past_7_day,
- visits_past_8_14_day,
- view_items_past_1_day,
- view_items_past_2_day,
- view_items_past_3_day,
- view_items_past_4_day,
- view_items_past_5_day,
- view_items_past_6_day,
- view_items_past_7_day,
- view_items_past_8_14_day,
- add_to_carts_past_1_day,
- add_to_carts_past_2_day,
- add_to_carts_past_3_day,
- add_to_carts_past_4_day,
- add_to_carts_past_5_day,
- add_to_carts_past_6_day,
- add_to_carts_past_7_day,
- add_to_carts_past_8_14_day,
- checkouts_past_1_day,
- checkouts_past_2_day,
- checkouts_past_3_day,
- checkouts_past_4_day,
- checkouts_past_5_day,
- checkouts_past_6_day,
- checkouts_past_7_day,
- checkouts_past_8_14_day
- )
-OPTIONS(
- --expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL {{expiration_duration_hours}} HOUR),
- friendly_name="v_purchase_propensity_inference_15_7",
- description="View Purchase Propensity Inference dataset using 15 days back to predict 7 days ahead. View expires after 48h and should run daily.",
- labels=[("org_unit", "development")]
-) AS
-SELECT DISTINCT
- processed_timestamp,
- feature_date,
- user_pseudo_id,
- user_id,
- user_ltv_revenue,
- device_category,
- device_mobile_brand_name,
- device_mobile_model_name,
- device_os,
- device_language,
- device_web_browser,
- geo_sub_continent,
- geo_country,
- geo_region,
- geo_city,
- geo_metro,
- last_traffic_source_medium,
- last_traffic_source_name,
- last_traffic_source_source,
- first_traffic_source_medium,
- first_traffic_source_name,
- first_traffic_source_source,
- has_signed_in_with_user_id,
- active_users_past_1_day,
- active_users_past_2_day,
- active_users_past_3_day,
- active_users_past_4_day,
- active_users_past_5_day,
- active_users_past_6_day,
- active_users_past_7_day,
- active_users_past_8_14_day,
- purchases_past_1_day,
- purchases_past_2_day,
- purchases_past_3_day,
- purchases_past_4_day,
- purchases_past_5_day,
- purchases_past_6_day,
- purchases_past_7_day,
- purchases_past_8_14_day,
- visits_past_1_day,
- visits_past_2_day,
- visits_past_3_day,
- visits_past_4_day,
- visits_past_5_day,
- visits_past_6_day,
- visits_past_7_day,
- visits_past_8_14_day,
- view_items_past_1_day,
- view_items_past_2_day,
- view_items_past_3_day,
- view_items_past_4_day,
- view_items_past_5_day,
- view_items_past_6_day,
- view_items_past_7_day,
- view_items_past_8_14_day,
- add_to_carts_past_1_day,
- add_to_carts_past_2_day,
- add_to_carts_past_3_day,
- add_to_carts_past_4_day,
- add_to_carts_past_5_day,
- add_to_carts_past_6_day,
- add_to_carts_past_7_day,
- add_to_carts_past_8_14_day,
- checkouts_past_1_day,
- checkouts_past_2_day,
- checkouts_past_3_day,
- checkouts_past_4_day,
- checkouts_past_5_day,
- checkouts_past_6_day,
- checkouts_past_7_day,
- checkouts_past_8_14_day
-FROM (
-SELECT DISTINCT
- processed_timestamp,
- feature_date,
- user_pseudo_id,
- user_id,
- user_ltv_revenue,
- device_category,
- device_mobile_brand_name,
- device_mobile_model_name,
- device_os,
- device_language,
- device_web_browser,
- geo_sub_continent,
- geo_country,
- geo_region,
- geo_city,
- geo_metro,
- last_traffic_source_medium,
- last_traffic_source_name,
- last_traffic_source_source,
- first_traffic_source_medium,
- first_traffic_source_name,
- first_traffic_source_source,
- has_signed_in_with_user_id,
- active_users_past_1_day,
- active_users_past_2_day,
- active_users_past_3_day,
- active_users_past_4_day,
- active_users_past_5_day,
- active_users_past_6_day,
- active_users_past_7_day,
- active_users_past_8_14_day,
- purchases_past_1_day,
- purchases_past_2_day,
- purchases_past_3_day,
- purchases_past_4_day,
- purchases_past_5_day,
- purchases_past_6_day,
- purchases_past_7_day,
- purchases_past_8_14_day,
- visits_past_1_day,
- visits_past_2_day,
- visits_past_3_day,
- visits_past_4_day,
- visits_past_5_day,
- visits_past_6_day,
- visits_past_7_day,
- visits_past_8_14_day,
- view_items_past_1_day,
- view_items_past_2_day,
- view_items_past_3_day,
- view_items_past_4_day,
- view_items_past_5_day,
- view_items_past_6_day,
- view_items_past_7_day,
- view_items_past_8_14_day,
- add_to_carts_past_1_day,
- add_to_carts_past_2_day,
- add_to_carts_past_3_day,
- add_to_carts_past_4_day,
- add_to_carts_past_5_day,
- add_to_carts_past_6_day,
- add_to_carts_past_7_day,
- add_to_carts_past_8_14_day,
- checkouts_past_1_day,
- checkouts_past_2_day,
- checkouts_past_3_day,
- checkouts_past_4_day,
- checkouts_past_5_day,
- checkouts_past_6_day,
- checkouts_past_7_day,
- checkouts_past_8_14_day,
- -- Row number partitioned by user_pseudo_id and ordered by feature_date descending
- ROW_NUMBER() OVER (PARTITION BY user_pseudo_id ORDER BY feature_date DESC) AS user_row_order
- FROM `{{project_id}}.{{dataset}}.purchase_propensity_inference_15_7`
-)
-WHERE
- -- Filter for the most recent user example
- user_row_order = 1;
-
-
CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_purchase_propensity_inference_30_15`
(processed_timestamp,
feature_date,
diff --git a/sql/procedure/purchase_propensity_training_preparation.sqlx b/sql/procedure/purchase_propensity_training_preparation.sqlx
index e12ef6f8..a4e8f017 100644
--- a/sql/procedure/purchase_propensity_training_preparation.sqlx
+++ b/sql/procedure/purchase_propensity_training_preparation.sqlx
@@ -662,929 +662,8 @@ CREATE OR REPLACE TABLE `{{project_id}}.{{dataset}}.purchase_propensity_training
FROM `{{project_id}}.{{dataset}}.purchase_propensity_training_full_dataset`
);
-
--- This is a table preparing rows for purchase propensity modelling looking back 15 days and looking ahead 15 days.
-CREATE OR REPLACE TABLE `{{project_id}}.{{dataset}}.purchase_propensity_training_15_15` AS(
- SELECT DISTINCT
- CURRENT_TIMESTAMP() AS processed_timestamp,
- data_split,
- feature_date,
- user_pseudo_id,
- LAST_VALUE(user_id) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS user_id,
- LAST_VALUE(user_ltv_revenue) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS user_ltv_revenue,
- LAST_VALUE(device_category) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_category,
- LAST_VALUE(device_mobile_brand_name) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_mobile_brand_name,
- LAST_VALUE(device_mobile_model_name) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_mobile_model_name,
- LAST_VALUE(device_os) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_os,
- LAST_VALUE(device_language) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_language,
- LAST_VALUE(device_web_browser) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_web_browser,
- LAST_VALUE(geo_sub_continent) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_sub_continent,
- LAST_VALUE(geo_country) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_country,
- LAST_VALUE(geo_region) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_region,
- LAST_VALUE(geo_city) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_city,
- LAST_VALUE(geo_metro) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_metro,
- LAST_VALUE(last_traffic_source_medium) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS last_traffic_source_medium,
- LAST_VALUE(last_traffic_source_name) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS last_traffic_source_name,
- LAST_VALUE(last_traffic_source_source) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS last_traffic_source_source,
- LAST_VALUE(first_traffic_source_medium) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS first_traffic_source_medium,
- LAST_VALUE(first_traffic_source_name) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS first_traffic_source_name,
- LAST_VALUE(first_traffic_source_source) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS first_traffic_source_source,
- LAST_VALUE(has_signed_in_with_user_id) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS has_signed_in_with_user_id,
- LAST_VALUE(active_users_past_1_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_1_day,
- LAST_VALUE(active_users_past_2_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_2_day,
- LAST_VALUE(active_users_past_3_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_3_day,
- LAST_VALUE(active_users_past_4_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_4_day,
- LAST_VALUE(active_users_past_5_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_5_day,
- LAST_VALUE(active_users_past_6_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_6_day,
- LAST_VALUE(active_users_past_7_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_7_day,
- LAST_VALUE(active_users_past_8_14_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_8_14_day,
- LAST_VALUE(purchases_past_1_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_1_day,
- LAST_VALUE(purchases_past_2_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_2_day,
- LAST_VALUE(purchases_past_3_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_3_day,
- LAST_VALUE(purchases_past_4_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_4_day,
- LAST_VALUE(purchases_past_5_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_5_day,
- LAST_VALUE(purchases_past_6_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_6_day,
- LAST_VALUE(purchases_past_7_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_7_day,
- LAST_VALUE(purchases_past_8_14_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_8_14_day,
- LAST_VALUE(visits_past_1_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_1_day,
- LAST_VALUE(visits_past_2_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_2_day,
- LAST_VALUE(visits_past_3_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_3_day,
- LAST_VALUE(visits_past_4_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_4_day,
- LAST_VALUE(visits_past_5_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_5_day,
- LAST_VALUE(visits_past_6_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_6_day,
- LAST_VALUE(visits_past_7_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_7_day,
- LAST_VALUE(visits_past_8_14_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_8_14_day,
- LAST_VALUE(view_items_past_1_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_1_day,
- LAST_VALUE(view_items_past_2_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_2_day,
- LAST_VALUE(view_items_past_3_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_3_day,
- LAST_VALUE(view_items_past_4_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_4_day,
- LAST_VALUE(view_items_past_5_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_5_day,
- LAST_VALUE(view_items_past_6_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_6_day,
- LAST_VALUE(view_items_past_7_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_7_day,
- LAST_VALUE(view_items_past_8_14_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_8_14_day,
- LAST_VALUE(add_to_carts_past_1_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_1_day,
- LAST_VALUE(add_to_carts_past_2_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_2_day,
- LAST_VALUE(add_to_carts_past_3_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_3_day,
- LAST_VALUE(add_to_carts_past_4_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_4_day,
- LAST_VALUE(add_to_carts_past_5_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_5_day,
- LAST_VALUE(add_to_carts_past_6_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_6_day,
- LAST_VALUE(add_to_carts_past_7_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_7_day,
- LAST_VALUE(add_to_carts_past_8_14_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_8_14_day,
- LAST_VALUE(checkouts_past_1_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_1_day,
- LAST_VALUE(checkouts_past_2_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_2_day,
- LAST_VALUE(checkouts_past_3_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_3_day,
- LAST_VALUE(checkouts_past_4_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_4_day,
- LAST_VALUE(checkouts_past_5_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_5_day,
- LAST_VALUE(checkouts_past_6_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_6_day,
- LAST_VALUE(checkouts_past_7_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_7_day,
- LAST_VALUE(checkouts_past_8_14_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_8_14_day,
- -- Calculate the will purchase label.
- -- Label for a purchase propensity model. It indicates whether a user made a purchase within the next 15 days based on their purchase history.
- -- This label is then used to train a model that can predict the likelihood of future purchases for other users.
- LAST_VALUE(CASE WHEN (
- purchase_day_1+
- purchase_day_2+
- purchase_day_3+
- purchase_day_4+
- purchase_day_5+
- purchase_day_6+
- purchase_day_7+
- purchase_day_8+
- purchase_day_9+
- purchase_day_10+
- purchase_day_11+
- purchase_day_12+
- purchase_day_13+
- purchase_day_14) = 0 THEN 0 ELSE 1 END) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) as will_purchase
- FROM `{{project_id}}.{{dataset}}.purchase_propensity_training_full_dataset`
-);
-
--- This is a table preparing rows for purchase propensity modelling looking back 15 days and looking ahead 7 days.
-CREATE OR REPLACE TABLE `{{project_id}}.{{dataset}}.purchase_propensity_training_15_7` AS(
- SELECT DISTINCT
- CURRENT_TIMESTAMP() AS processed_timestamp,
- data_split,
- feature_date,
- user_pseudo_id,
- LAST_VALUE(user_id) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS user_id,
- LAST_VALUE(user_ltv_revenue) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS user_ltv_revenue,
- LAST_VALUE(device_category) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_category,
- LAST_VALUE(device_mobile_brand_name) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_mobile_brand_name,
- LAST_VALUE(device_mobile_model_name) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_mobile_model_name,
- LAST_VALUE(device_os) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_os,
- LAST_VALUE(device_language) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_language,
- LAST_VALUE(device_web_browser) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS device_web_browser,
- LAST_VALUE(geo_sub_continent) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_sub_continent,
- LAST_VALUE(geo_country) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_country,
- LAST_VALUE(geo_region) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_region,
- LAST_VALUE(geo_city) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_city,
- LAST_VALUE(geo_metro) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS geo_metro,
- LAST_VALUE(last_traffic_source_medium) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS last_traffic_source_medium,
- LAST_VALUE(last_traffic_source_name) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS last_traffic_source_name,
- LAST_VALUE(last_traffic_source_source) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS last_traffic_source_source,
- LAST_VALUE(first_traffic_source_medium) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS first_traffic_source_medium,
- LAST_VALUE(first_traffic_source_name) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS first_traffic_source_name,
- LAST_VALUE(first_traffic_source_source) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS first_traffic_source_source,
- LAST_VALUE(has_signed_in_with_user_id) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS has_signed_in_with_user_id,
- LAST_VALUE(active_users_past_1_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_1_day,
- LAST_VALUE(active_users_past_2_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_2_day,
- LAST_VALUE(active_users_past_3_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_3_day,
- LAST_VALUE(active_users_past_4_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_4_day,
- LAST_VALUE(active_users_past_5_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_5_day,
- LAST_VALUE(active_users_past_6_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_6_day,
- LAST_VALUE(active_users_past_7_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_7_day,
- LAST_VALUE(active_users_past_8_14_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS active_users_past_8_14_day,
- LAST_VALUE(purchases_past_1_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_1_day,
- LAST_VALUE(purchases_past_2_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_2_day,
- LAST_VALUE(purchases_past_3_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_3_day,
- LAST_VALUE(purchases_past_4_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_4_day,
- LAST_VALUE(purchases_past_5_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_5_day,
- LAST_VALUE(purchases_past_6_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_6_day,
- LAST_VALUE(purchases_past_7_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_7_day,
- LAST_VALUE(purchases_past_8_14_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS purchases_past_8_14_day,
- LAST_VALUE(visits_past_1_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_1_day,
- LAST_VALUE(visits_past_2_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_2_day,
- LAST_VALUE(visits_past_3_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_3_day,
- LAST_VALUE(visits_past_4_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_4_day,
- LAST_VALUE(visits_past_5_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_5_day,
- LAST_VALUE(visits_past_6_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_6_day,
- LAST_VALUE(visits_past_7_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_7_day,
- LAST_VALUE(visits_past_8_14_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS visits_past_8_14_day,
- LAST_VALUE(view_items_past_1_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_1_day,
- LAST_VALUE(view_items_past_2_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_2_day,
- LAST_VALUE(view_items_past_3_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_3_day,
- LAST_VALUE(view_items_past_4_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_4_day,
- LAST_VALUE(view_items_past_5_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_5_day,
- LAST_VALUE(view_items_past_6_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_6_day,
- LAST_VALUE(view_items_past_7_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_7_day,
- LAST_VALUE(view_items_past_8_14_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS view_items_past_8_14_day,
- LAST_VALUE(add_to_carts_past_1_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_1_day,
- LAST_VALUE(add_to_carts_past_2_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_2_day,
- LAST_VALUE(add_to_carts_past_3_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_3_day,
- LAST_VALUE(add_to_carts_past_4_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_4_day,
- LAST_VALUE(add_to_carts_past_5_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_5_day,
- LAST_VALUE(add_to_carts_past_6_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_6_day,
- LAST_VALUE(add_to_carts_past_7_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_7_day,
- LAST_VALUE(add_to_carts_past_8_14_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS add_to_carts_past_8_14_day,
- LAST_VALUE(checkouts_past_1_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_1_day,
- LAST_VALUE(checkouts_past_2_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_2_day,
- LAST_VALUE(checkouts_past_3_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_3_day,
- LAST_VALUE(checkouts_past_4_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_4_day,
- LAST_VALUE(checkouts_past_5_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_5_day,
- LAST_VALUE(checkouts_past_6_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_6_day,
- LAST_VALUE(checkouts_past_7_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_7_day,
- LAST_VALUE(checkouts_past_8_14_day) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) AS checkouts_past_8_14_day,
- -- Calculate the will purchase label.
- -- Label for a purchase propensity model. It indicates whether a user made a purchase within the next 7 days based on their purchase history.
- -- This label is then used to train a model that can predict the likelihood of future purchases for other users.
- LAST_VALUE(CASE WHEN (
- purchase_day_1+
- purchase_day_2+
- purchase_day_3+
- purchase_day_4+
- purchase_day_5+
- purchase_day_6+
- purchase_day_7) = 0 THEN 0 ELSE 1 END) OVER(PARTITION BY user_pseudo_id, feature_date ORDER BY feature_date DESC) as will_purchase
- FROM `{{project_id}}.{{dataset}}.purchase_propensity_training_full_dataset`
-);
-
--- This is a view preparing rows for purchase propensity modelling looking back 15 days and looking ahead 15 days.
-CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_purchase_propensity_training_15_15`
-(processed_timestamp,
- data_split,
- user_pseudo_id,
- user_id,
- user_ltv_revenue,
- device_category,
- device_mobile_brand_name,
- device_mobile_model_name,
- device_os,
- device_language,
- device_web_browser,
- geo_sub_continent,
- geo_country,
- geo_region,
- geo_city,
- geo_metro,
- last_traffic_source_medium,
- last_traffic_source_name,
- last_traffic_source_source,
- first_traffic_source_medium,
- first_traffic_source_name,
- first_traffic_source_source,
- has_signed_in_with_user_id,
- active_users_past_1_day,
- active_users_past_2_day,
- active_users_past_3_day,
- active_users_past_4_day,
- active_users_past_5_day,
- active_users_past_6_day,
- active_users_past_7_day,
- active_users_past_8_14_day,
- purchases_past_1_day,
- purchases_past_2_day,
- purchases_past_3_day,
- purchases_past_4_day,
- purchases_past_5_day,
- purchases_past_6_day,
- purchases_past_7_day,
- purchases_past_8_14_day,
- visits_past_1_day,
- visits_past_2_day,
- visits_past_3_day,
- visits_past_4_day,
- visits_past_5_day,
- visits_past_6_day,
- visits_past_7_day,
- visits_past_8_14_day,
- view_items_past_1_day,
- view_items_past_2_day,
- view_items_past_3_day,
- view_items_past_4_day,
- view_items_past_5_day,
- view_items_past_6_day,
- view_items_past_7_day,
- view_items_past_8_14_day,
- add_to_carts_past_1_day,
- add_to_carts_past_2_day,
- add_to_carts_past_3_day,
- add_to_carts_past_4_day,
- add_to_carts_past_5_day,
- add_to_carts_past_6_day,
- add_to_carts_past_7_day,
- add_to_carts_past_8_14_day,
- checkouts_past_1_day,
- checkouts_past_2_day,
- checkouts_past_3_day,
- checkouts_past_4_day,
- checkouts_past_5_day,
- checkouts_past_6_day,
- checkouts_past_7_day,
- checkouts_past_8_14_day,
- will_purchase)
-OPTIONS(
- --expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
- friendly_name="v_purchase_propensity_training_15_15",
- description="View Purchase Propensity Training dataset using 15 days back to predict 15 days ahead. View expires after 48h and should run daily.",
- labels=[("org_unit", "development")]
-) AS
-SELECT DISTINCT
- processed_timestamp,
- data_split,
- user_pseudo_id,
- user_id,
- user_ltv_revenue,
- device_category,
- device_mobile_brand_name,
- device_mobile_model_name,
- device_os,
- device_language,
- device_web_browser,
- geo_sub_continent,
- geo_country,
- geo_region,
- geo_city,
- geo_metro,
- last_traffic_source_medium,
- last_traffic_source_name,
- last_traffic_source_source,
- first_traffic_source_medium,
- first_traffic_source_name,
- first_traffic_source_source,
- has_signed_in_with_user_id,
- active_users_past_1_day,
- active_users_past_2_day,
- active_users_past_3_day,
- active_users_past_4_day,
- active_users_past_5_day,
- active_users_past_6_day,
- active_users_past_7_day,
- active_users_past_8_14_day,
- purchases_past_1_day,
- purchases_past_2_day,
- purchases_past_3_day,
- purchases_past_4_day,
- purchases_past_5_day,
- purchases_past_6_day,
- purchases_past_7_day,
- purchases_past_8_14_day,
- visits_past_1_day,
- visits_past_2_day,
- visits_past_3_day,
- visits_past_4_day,
- visits_past_5_day,
- visits_past_6_day,
- visits_past_7_day,
- visits_past_8_14_day,
- view_items_past_1_day,
- view_items_past_2_day,
- view_items_past_3_day,
- view_items_past_4_day,
- view_items_past_5_day,
- view_items_past_6_day,
- view_items_past_7_day,
- view_items_past_8_14_day,
- add_to_carts_past_1_day,
- add_to_carts_past_2_day,
- add_to_carts_past_3_day,
- add_to_carts_past_4_day,
- add_to_carts_past_5_day,
- add_to_carts_past_6_day,
- add_to_carts_past_7_day,
- add_to_carts_past_8_14_day,
- checkouts_past_1_day,
- checkouts_past_2_day,
- checkouts_past_3_day,
- checkouts_past_4_day,
- checkouts_past_5_day,
- checkouts_past_6_day,
- checkouts_past_7_day,
- checkouts_past_8_14_day,
- will_purchase
-FROM(
-SELECT DISTINCT
- processed_timestamp,
- data_split,
- user_pseudo_id,
- user_id,
- user_ltv_revenue,
- device_category,
- device_mobile_brand_name,
- device_mobile_model_name,
- device_os,
- device_language,
- device_web_browser,
- geo_sub_continent,
- geo_country,
- geo_region,
- geo_city,
- geo_metro,
- last_traffic_source_medium,
- last_traffic_source_name,
- last_traffic_source_source,
- first_traffic_source_medium,
- first_traffic_source_name,
- first_traffic_source_source,
- has_signed_in_with_user_id,
- active_users_past_1_day,
- active_users_past_2_day,
- active_users_past_3_day,
- active_users_past_4_day,
- active_users_past_5_day,
- active_users_past_6_day,
- active_users_past_7_day,
- active_users_past_8_14_day,
- purchases_past_1_day,
- purchases_past_2_day,
- purchases_past_3_day,
- purchases_past_4_day,
- purchases_past_5_day,
- purchases_past_6_day,
- purchases_past_7_day,
- purchases_past_8_14_day,
- visits_past_1_day,
- visits_past_2_day,
- visits_past_3_day,
- visits_past_4_day,
- visits_past_5_day,
- visits_past_6_day,
- visits_past_7_day,
- visits_past_8_14_day,
- view_items_past_1_day,
- view_items_past_2_day,
- view_items_past_3_day,
- view_items_past_4_day,
- view_items_past_5_day,
- view_items_past_6_day,
- view_items_past_7_day,
- view_items_past_8_14_day,
- add_to_carts_past_1_day,
- add_to_carts_past_2_day,
- add_to_carts_past_3_day,
- add_to_carts_past_4_day,
- add_to_carts_past_5_day,
- add_to_carts_past_6_day,
- add_to_carts_past_7_day,
- add_to_carts_past_8_14_day,
- checkouts_past_1_day,
- checkouts_past_2_day,
- checkouts_past_3_day,
- checkouts_past_4_day,
- checkouts_past_5_day,
- checkouts_past_6_day,
- checkouts_past_7_day,
- checkouts_past_8_14_day,
- will_purchase,
- ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, feature_date, data_split, will_purchase ORDER BY feature_date DESC) AS user_row_order
- FROM `{{project_id}}.{{dataset}}.purchase_propensity_training_15_15`
-)
-WHERE
- user_row_order = 1;
-
-
--- This is a view preparing rows for purchase propensity modelling looking back 15 days and looking ahead 7 days.
-CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_purchase_propensity_training_15_7`
-(
- processed_timestamp,
- data_split,
- user_pseudo_id,
- user_id,
- user_ltv_revenue,
- device_category,
- device_mobile_brand_name,
- device_mobile_model_name,
- device_os,
- device_language,
- device_web_browser,
- geo_sub_continent,
- geo_country,
- geo_region,
- geo_city,
- geo_metro,
- last_traffic_source_medium,
- last_traffic_source_name,
- last_traffic_source_source,
- first_traffic_source_medium,
- first_traffic_source_name,
- first_traffic_source_source,
- has_signed_in_with_user_id,
- active_users_past_1_day,
- active_users_past_2_day,
- active_users_past_3_day,
- active_users_past_4_day,
- active_users_past_5_day,
- active_users_past_6_day,
- active_users_past_7_day,
- active_users_past_8_14_day,
- purchases_past_1_day,
- purchases_past_2_day,
- purchases_past_3_day,
- purchases_past_4_day,
- purchases_past_5_day,
- purchases_past_6_day,
- purchases_past_7_day,
- purchases_past_8_14_day,
- visits_past_1_day,
- visits_past_2_day,
- visits_past_3_day,
- visits_past_4_day,
- visits_past_5_day,
- visits_past_6_day,
- visits_past_7_day,
- visits_past_8_14_day,
- view_items_past_1_day,
- view_items_past_2_day,
- view_items_past_3_day,
- view_items_past_4_day,
- view_items_past_5_day,
- view_items_past_6_day,
- view_items_past_7_day,
- view_items_past_8_14_day,
- add_to_carts_past_1_day,
- add_to_carts_past_2_day,
- add_to_carts_past_3_day,
- add_to_carts_past_4_day,
- add_to_carts_past_5_day,
- add_to_carts_past_6_day,
- add_to_carts_past_7_day,
- add_to_carts_past_8_14_day,
- checkouts_past_1_day,
- checkouts_past_2_day,
- checkouts_past_3_day,
- checkouts_past_4_day,
- checkouts_past_5_day,
- checkouts_past_6_day,
- checkouts_past_7_day,
- checkouts_past_8_14_day,
- will_purchase
- )
-OPTIONS(
- --expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
- friendly_name="v_purchase_propensity_training_15_7",
- description="View Purchase Propensity Training dataset using 15 days back to predict 7 days ahead. View expires after 48h and should run daily.",
- labels=[("org_unit", "development")]
-) AS
-SELECT DISTINCT
- processed_timestamp,
- data_split,
- user_pseudo_id,
- user_id,
- user_ltv_revenue,
- device_category,
- device_mobile_brand_name,
- device_mobile_model_name,
- device_os,
- device_language,
- device_web_browser,
- geo_sub_continent,
- geo_country,
- geo_region,
- geo_city,
- geo_metro,
- last_traffic_source_medium,
- last_traffic_source_name,
- last_traffic_source_source,
- first_traffic_source_medium,
- first_traffic_source_name,
- first_traffic_source_source,
- has_signed_in_with_user_id,
- active_users_past_1_day,
- active_users_past_2_day,
- active_users_past_3_day,
- active_users_past_4_day,
- active_users_past_5_day,
- active_users_past_6_day,
- active_users_past_7_day,
- active_users_past_8_14_day,
- purchases_past_1_day,
- purchases_past_2_day,
- purchases_past_3_day,
- purchases_past_4_day,
- purchases_past_5_day,
- purchases_past_6_day,
- purchases_past_7_day,
- purchases_past_8_14_day,
- visits_past_1_day,
- visits_past_2_day,
- visits_past_3_day,
- visits_past_4_day,
- visits_past_5_day,
- visits_past_6_day,
- visits_past_7_day,
- visits_past_8_14_day,
- view_items_past_1_day,
- view_items_past_2_day,
- view_items_past_3_day,
- view_items_past_4_day,
- view_items_past_5_day,
- view_items_past_6_day,
- view_items_past_7_day,
- view_items_past_8_14_day,
- add_to_carts_past_1_day,
- add_to_carts_past_2_day,
- add_to_carts_past_3_day,
- add_to_carts_past_4_day,
- add_to_carts_past_5_day,
- add_to_carts_past_6_day,
- add_to_carts_past_7_day,
- add_to_carts_past_8_14_day,
- checkouts_past_1_day,
- checkouts_past_2_day,
- checkouts_past_3_day,
- checkouts_past_4_day,
- checkouts_past_5_day,
- checkouts_past_6_day,
- checkouts_past_7_day,
- checkouts_past_8_14_day,
- will_purchase
-FROM(
-SELECT DISTINCT
- processed_timestamp,
- data_split,
- user_pseudo_id,
- user_id,
- user_ltv_revenue,
- device_category,
- device_mobile_brand_name,
- device_mobile_model_name,
- device_os,
- device_language,
- device_web_browser,
- geo_sub_continent,
- geo_country,
- geo_region,
- geo_city,
- geo_metro,
- last_traffic_source_medium,
- last_traffic_source_name,
- last_traffic_source_source,
- first_traffic_source_medium,
- first_traffic_source_name,
- first_traffic_source_source,
- has_signed_in_with_user_id,
- active_users_past_1_day,
- active_users_past_2_day,
- active_users_past_3_day,
- active_users_past_4_day,
- active_users_past_5_day,
- active_users_past_6_day,
- active_users_past_7_day,
- active_users_past_8_14_day,
- purchases_past_1_day,
- purchases_past_2_day,
- purchases_past_3_day,
- purchases_past_4_day,
- purchases_past_5_day,
- purchases_past_6_day,
- purchases_past_7_day,
- purchases_past_8_14_day,
- visits_past_1_day,
- visits_past_2_day,
- visits_past_3_day,
- visits_past_4_day,
- visits_past_5_day,
- visits_past_6_day,
- visits_past_7_day,
- visits_past_8_14_day,
- view_items_past_1_day,
- view_items_past_2_day,
- view_items_past_3_day,
- view_items_past_4_day,
- view_items_past_5_day,
- view_items_past_6_day,
- view_items_past_7_day,
- view_items_past_8_14_day,
- add_to_carts_past_1_day,
- add_to_carts_past_2_day,
- add_to_carts_past_3_day,
- add_to_carts_past_4_day,
- add_to_carts_past_5_day,
- add_to_carts_past_6_day,
- add_to_carts_past_7_day,
- add_to_carts_past_8_14_day,
- checkouts_past_1_day,
- checkouts_past_2_day,
- checkouts_past_3_day,
- checkouts_past_4_day,
- checkouts_past_5_day,
- checkouts_past_6_day,
- checkouts_past_7_day,
- checkouts_past_8_14_day,
- will_purchase,
- ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, feature_date, data_split, will_purchase ORDER BY feature_date DESC) AS user_row_order
- FROM `{{project_id}}.{{dataset}}.purchase_propensity_training_15_7`
-)
-WHERE
- user_row_order = 1;
-
-
--- This is a view preparing rows for purchase propensity modelling looking back 30 days and looking ahead 15 days.
-CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_purchase_propensity_training_30_15`
-(processed_timestamp,
- data_split,
- user_pseudo_id,
- user_id,
- user_ltv_revenue,
- device_category,
- device_mobile_brand_name,
- device_mobile_model_name,
- device_os,
- device_language,
- device_web_browser,
- geo_sub_continent,
- geo_country,
- geo_region,
- geo_city,
- geo_metro,
- last_traffic_source_medium,
- last_traffic_source_name,
- last_traffic_source_source,
- first_traffic_source_medium,
- first_traffic_source_name,
- first_traffic_source_source,
- has_signed_in_with_user_id,
- active_users_past_1_day,
- active_users_past_2_day,
- active_users_past_3_day,
- active_users_past_4_day,
- active_users_past_5_day,
- active_users_past_6_day,
- active_users_past_7_day,
- active_users_past_8_14_day,
- active_users_past_15_30_day,
- purchases_past_1_day,
- purchases_past_2_day,
- purchases_past_3_day,
- purchases_past_4_day,
- purchases_past_5_day,
- purchases_past_6_day,
- purchases_past_7_day,
- purchases_past_8_14_day,
- purchases_past_15_30_day,
- visits_past_1_day,
- visits_past_2_day,
- visits_past_3_day,
- visits_past_4_day,
- visits_past_5_day,
- visits_past_6_day,
- visits_past_7_day,
- visits_past_8_14_day,
- visits_past_15_30_day,
- view_items_past_1_day,
- view_items_past_2_day,
- view_items_past_3_day,
- view_items_past_4_day,
- view_items_past_5_day,
- view_items_past_6_day,
- view_items_past_7_day,
- view_items_past_8_14_day,
- view_items_past_15_30_day,
- add_to_carts_past_1_day,
- add_to_carts_past_2_day,
- add_to_carts_past_3_day,
- add_to_carts_past_4_day,
- add_to_carts_past_5_day,
- add_to_carts_past_6_day,
- add_to_carts_past_7_day,
- add_to_carts_past_8_14_day,
- add_to_carts_past_15_30_day,
- checkouts_past_1_day,
- checkouts_past_2_day,
- checkouts_past_3_day,
- checkouts_past_4_day,
- checkouts_past_5_day,
- checkouts_past_6_day,
- checkouts_past_7_day,
- checkouts_past_8_14_day,
- checkouts_past_15_30_day,
- will_purchase)
-OPTIONS(
- --expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
- friendly_name="v_purchase_propensity_training_30_15",
- description="View Purchase Propensity Training dataset using 30 days back to predict 15 days ahead. View expires after 48h and should run daily.",
- labels=[("org_unit", "development")]
-) AS
-SELECT DISTINCT
- * EXCEPT(feature_date, row_order_peruser_persplit)
-FROM (
-SELECT DISTINCT
- processed_timestamp,
- user_pseudo_id,
- data_split,
- feature_date,
- -- Now, I want to skip rows per user, per split every 15 days.
- ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, data_split, will_purchase ORDER BY feature_date ASC) AS row_order_peruser_persplit,
- user_id,
- user_ltv_revenue,
- device_category,
- device_mobile_brand_name,
- device_mobile_model_name,
- device_os,
- device_language,
- device_web_browser,
- geo_sub_continent,
- geo_country,
- geo_region,
- geo_city,
- geo_metro,
- last_traffic_source_medium,
- last_traffic_source_name,
- last_traffic_source_source,
- first_traffic_source_medium,
- first_traffic_source_name,
- first_traffic_source_source,
- has_signed_in_with_user_id,
- active_users_past_1_day,
- active_users_past_2_day,
- active_users_past_3_day,
- active_users_past_4_day,
- active_users_past_5_day,
- active_users_past_6_day,
- active_users_past_7_day,
- active_users_past_8_14_day,
- active_users_past_15_30_day,
- purchases_past_1_day,
- purchases_past_2_day,
- purchases_past_3_day,
- purchases_past_4_day,
- purchases_past_5_day,
- purchases_past_6_day,
- purchases_past_7_day,
- purchases_past_8_14_day,
- purchases_past_15_30_day,
- visits_past_1_day,
- visits_past_2_day,
- visits_past_3_day,
- visits_past_4_day,
- visits_past_5_day,
- visits_past_6_day,
- visits_past_7_day,
- visits_past_8_14_day,
- visits_past_15_30_day,
- view_items_past_1_day,
- view_items_past_2_day,
- view_items_past_3_day,
- view_items_past_4_day,
- view_items_past_5_day,
- view_items_past_6_day,
- view_items_past_7_day,
- view_items_past_8_14_day,
- view_items_past_15_30_day,
- add_to_carts_past_1_day,
- add_to_carts_past_2_day,
- add_to_carts_past_3_day,
- add_to_carts_past_4_day,
- add_to_carts_past_5_day,
- add_to_carts_past_6_day,
- add_to_carts_past_7_day,
- add_to_carts_past_8_14_day,
- add_to_carts_past_15_30_day,
- checkouts_past_1_day,
- checkouts_past_2_day,
- checkouts_past_3_day,
- checkouts_past_4_day,
- checkouts_past_5_day,
- checkouts_past_6_day,
- checkouts_past_7_day,
- checkouts_past_8_14_day,
- checkouts_past_15_30_day,
- will_purchase
-FROM(
-SELECT DISTINCT
- processed_timestamp,
- data_split,
- feature_date,
- user_pseudo_id,
- user_id,
- user_ltv_revenue,
- device_category,
- device_mobile_brand_name,
- device_mobile_model_name,
- device_os,
- device_language,
- device_web_browser,
- geo_sub_continent,
- geo_country,
- geo_region,
- geo_city,
- geo_metro,
- last_traffic_source_medium,
- last_traffic_source_name,
- last_traffic_source_source,
- first_traffic_source_medium,
- first_traffic_source_name,
- first_traffic_source_source,
- has_signed_in_with_user_id,
- active_users_past_1_day,
- active_users_past_2_day,
- active_users_past_3_day,
- active_users_past_4_day,
- active_users_past_5_day,
- active_users_past_6_day,
- active_users_past_7_day,
- active_users_past_8_14_day,
- active_users_past_15_30_day,
- purchases_past_1_day,
- purchases_past_2_day,
- purchases_past_3_day,
- purchases_past_4_day,
- purchases_past_5_day,
- purchases_past_6_day,
- purchases_past_7_day,
- purchases_past_8_14_day,
- purchases_past_15_30_day,
- visits_past_1_day,
- visits_past_2_day,
- visits_past_3_day,
- visits_past_4_day,
- visits_past_5_day,
- visits_past_6_day,
- visits_past_7_day,
- visits_past_8_14_day,
- visits_past_15_30_day,
- view_items_past_1_day,
- view_items_past_2_day,
- view_items_past_3_day,
- view_items_past_4_day,
- view_items_past_5_day,
- view_items_past_6_day,
- view_items_past_7_day,
- view_items_past_8_14_day,
- view_items_past_15_30_day,
- add_to_carts_past_1_day,
- add_to_carts_past_2_day,
- add_to_carts_past_3_day,
- add_to_carts_past_4_day,
- add_to_carts_past_5_day,
- add_to_carts_past_6_day,
- add_to_carts_past_7_day,
- add_to_carts_past_8_14_day,
- add_to_carts_past_15_30_day,
- checkouts_past_1_day,
- checkouts_past_2_day,
- checkouts_past_3_day,
- checkouts_past_4_day,
- checkouts_past_5_day,
- checkouts_past_6_day,
- checkouts_past_7_day,
- checkouts_past_8_14_day,
- checkouts_past_15_30_day,
- will_purchase,
- -- Number of rows per user, per day, per split. Only one row per user, per day, per slip.
- ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, feature_date, data_split, will_purchase ORDER BY feature_date DESC) AS row_order_peruser_perday_persplit
- FROM `{{project_id}}.{{dataset}}.purchase_propensity_training_30_15`
-)
-WHERE
- row_order_peruser_perday_persplit = 1
-)
-WHERE
- --Skipping windows of 15 days, which is the future window size.
- MOD(row_order_peruser_persplit-1, 15) = 0;
-
-
-- This is a view preparing rows for purchase propensity modelling looking back 30 days and looking ahead 15 days.
--- This specifically filter rows which are most recent for each user.
-CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_purchase_propensity_training_30_15_last_window`
+CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_purchase_propensity_training_30_15`
(processed_timestamp,
data_split,
user_pseudo_id,
@@ -1665,14 +744,20 @@ CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_purchase_propensity_trainin
will_purchase)
OPTIONS(
--expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
- friendly_name="v_purchase_propensity_training_30_15_last_window",
+ friendly_name="v_purchase_propensity_training_30_15",
description="View Purchase Propensity Training dataset using 30 days back to predict 15 days ahead. View expires after 48h and should run daily.",
labels=[("org_unit", "development")]
) AS
+SELECT DISTINCT
+ * EXCEPT(feature_date, row_order_peruser_persplit)
+FROM (
SELECT DISTINCT
processed_timestamp,
- data_split,
user_pseudo_id,
+ data_split,
+ feature_date,
+ -- Now, I want to skip rows per user, per split every 15 days.
+ ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, data_split, will_purchase ORDER BY feature_date ASC) AS row_order_peruser_persplit,
user_id,
user_ltv_revenue,
device_category,
@@ -1752,6 +837,7 @@ FROM(
SELECT DISTINCT
processed_timestamp,
data_split,
+ feature_date,
user_pseudo_id,
user_id,
user_ltv_revenue,
@@ -1828,15 +914,21 @@ SELECT DISTINCT
checkouts_past_8_14_day,
checkouts_past_15_30_day,
will_purchase,
- ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, data_split, will_purchase ORDER BY feature_date DESC) AS user_row_order
- --ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, data_split ORDER BY feature_date DESC) AS user_row_order
+ -- Number of rows per user, per day, per split. Only one row per user, per day, per slip.
+ ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, feature_date, data_split, will_purchase ORDER BY feature_date DESC) AS row_order_peruser_perday_persplit
FROM `{{project_id}}.{{dataset}}.purchase_propensity_training_30_15`
)
WHERE
- user_row_order = 1;
+ row_order_peruser_perday_persplit = 1
+)
+WHERE
+ --Skipping windows of 15 days, which is the future window size.
+ MOD(row_order_peruser_persplit-1, 15) = 0;
+
+
-- This is a view preparing rows for purchase propensity modelling looking back 30 days and looking ahead 15 days.
--- This is to be used in case recently no purchases are registered, and you don't have a way to train the classification model.
-CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_purchase_propensity_training_30_15_rare_sales`
+-- This specifically filter rows which are most recent for each user.
+CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_purchase_propensity_training_30_15_last_window`
(processed_timestamp,
data_split,
user_pseudo_id,
@@ -1917,115 +1009,12 @@ CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_purchase_propensity_trainin
will_purchase)
OPTIONS(
--expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
- friendly_name="v_purchase_propensity_training_30_15_rare_sales",
+ friendly_name="v_purchase_propensity_training_30_15_last_window",
description="View Purchase Propensity Training dataset using 30 days back to predict 15 days ahead. View expires after 48h and should run daily.",
labels=[("org_unit", "development")]
) AS
SELECT DISTINCT
- processed_timestamp,
- data_split,
- user_pseudo_id,
- user_id,
- user_ltv_revenue,
- device_category,
- device_mobile_brand_name,
- device_mobile_model_name,
- device_os,
- device_language,
- device_web_browser,
- geo_sub_continent,
- geo_country,
- geo_region,
- geo_city,
- geo_metro,
- last_traffic_source_medium,
- last_traffic_source_name,
- last_traffic_source_source,
- first_traffic_source_medium,
- first_traffic_source_name,
- first_traffic_source_source,
- has_signed_in_with_user_id,
- active_users_past_1_day,
- active_users_past_2_day,
- active_users_past_3_day,
- active_users_past_4_day,
- active_users_past_5_day,
- active_users_past_6_day,
- active_users_past_7_day,
- active_users_past_8_14_day,
- active_users_past_15_30_day,
- purchases_past_1_day,
- purchases_past_2_day,
- purchases_past_3_day,
- purchases_past_4_day,
- purchases_past_5_day,
- purchases_past_6_day,
- purchases_past_7_day,
- purchases_past_8_14_day,
- purchases_past_15_30_day,
- visits_past_1_day,
- visits_past_2_day,
- visits_past_3_day,
- visits_past_4_day,
- visits_past_5_day,
- visits_past_6_day,
- visits_past_7_day,
- visits_past_8_14_day,
- visits_past_15_30_day,
- view_items_past_1_day,
- view_items_past_2_day,
- view_items_past_3_day,
- view_items_past_4_day,
- view_items_past_5_day,
- view_items_past_6_day,
- view_items_past_7_day,
- view_items_past_8_14_day,
- view_items_past_15_30_day,
- add_to_carts_past_1_day,
- add_to_carts_past_2_day,
- add_to_carts_past_3_day,
- add_to_carts_past_4_day,
- add_to_carts_past_5_day,
- add_to_carts_past_6_day,
- add_to_carts_past_7_day,
- add_to_carts_past_8_14_day,
- add_to_carts_past_15_30_day,
- checkouts_past_1_day,
- checkouts_past_2_day,
- checkouts_past_3_day,
- checkouts_past_4_day,
- checkouts_past_5_day,
- checkouts_past_6_day,
- checkouts_past_7_day,
- checkouts_past_8_14_day,
- checkouts_past_15_30_day,
- will_purchase
- FROM
- (SELECT DISTINCT
- *
- FROM `{{project_id}}.{{dataset}}.v_purchase_propensity_training_30_15_last_window`
- )
- UNION ALL
- (
- SELECT DISTINCT
- * EXCEPT(user_row_order, feature_date)
- FROM(
- SELECT DISTINCT
- *,
- ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, data_split ORDER BY feature_date DESC) AS user_row_order
- FROM `{{project_id}}.{{dataset}}.purchase_propensity_training_30_15`
- WHERE will_purchase = 1
- )
- WHERE
- user_row_order = 1
- LIMIT 100
- )
-;
-
--- This is a view preparing rows for purchase propensity modelling looking back 30 days and looking ahead 15 days.
--- This balances out the dataset, in case you need the purchase propensity model to split customer in two classes: future purchases and non-purchases.
-CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_purchase_propensity_training_30_15_balanced`
-(processed_timestamp,
+ processed_timestamp,
data_split,
user_pseudo_id,
user_id,
@@ -2102,111 +1091,10 @@ CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_purchase_propensity_trainin
checkouts_past_7_day,
checkouts_past_8_14_day,
checkouts_past_15_30_day,
- will_purchase)
-OPTIONS(
- --expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
- friendly_name="v_purchase_propensity_training_30_15_balanced",
- description="View Purchase Propensity Training dataset using 30 days back to predict 15 days ahead. View expires after 48h and should run daily.",
- labels=[("org_unit", "development")]
-) AS
- SELECT DISTINCT
- processed_timestamp,
- data_split,
- user_pseudo_id,
- user_id,
- user_ltv_revenue,
- device_category,
- device_mobile_brand_name,
- device_mobile_model_name,
- device_os,
- device_language,
- device_web_browser,
- geo_sub_continent,
- geo_country,
- geo_region,
- geo_city,
- geo_metro,
- last_traffic_source_medium,
- last_traffic_source_name,
- last_traffic_source_source,
- first_traffic_source_medium,
- first_traffic_source_name,
- first_traffic_source_source,
- has_signed_in_with_user_id,
- active_users_past_1_day,
- active_users_past_2_day,
- active_users_past_3_day,
- active_users_past_4_day,
- active_users_past_5_day,
- active_users_past_6_day,
- active_users_past_7_day,
- active_users_past_8_14_day,
- active_users_past_15_30_day,
- purchases_past_1_day,
- purchases_past_2_day,
- purchases_past_3_day,
- purchases_past_4_day,
- purchases_past_5_day,
- purchases_past_6_day,
- purchases_past_7_day,
- purchases_past_8_14_day,
- purchases_past_15_30_day,
- visits_past_1_day,
- visits_past_2_day,
- visits_past_3_day,
- visits_past_4_day,
- visits_past_5_day,
- visits_past_6_day,
- visits_past_7_day,
- visits_past_8_14_day,
- visits_past_15_30_day,
- view_items_past_1_day,
- view_items_past_2_day,
- view_items_past_3_day,
- view_items_past_4_day,
- view_items_past_5_day,
- view_items_past_6_day,
- view_items_past_7_day,
- view_items_past_8_14_day,
- view_items_past_15_30_day,
- add_to_carts_past_1_day,
- add_to_carts_past_2_day,
- add_to_carts_past_3_day,
- add_to_carts_past_4_day,
- add_to_carts_past_5_day,
- add_to_carts_past_6_day,
- add_to_carts_past_7_day,
- add_to_carts_past_8_14_day,
- add_to_carts_past_15_30_day,
- checkouts_past_1_day,
- checkouts_past_2_day,
- checkouts_past_3_day,
- checkouts_past_4_day,
- checkouts_past_5_day,
- checkouts_past_6_day,
- checkouts_past_7_day,
- checkouts_past_8_14_day,
- checkouts_past_15_30_day,
- will_purchase
- FROM (
- SELECT
- DISTINCT *,
- ROW_NUMBER() OVER (PARTITION BY will_purchase ORDER BY RAND()) AS rn
- FROM
- `{{project_id}}.{{dataset}}.v_purchase_propensity_training_30_15` )
- WHERE
- rn <= (
- SELECT
- COUNT(will_purchase)
- FROM
- `{{project_id}}.{{dataset}}.v_purchase_propensity_training_30_15`
- WHERE
- will_purchase = 1)
-;
-
-
-CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_purchase_propensity_training_30_30`
-(processed_timestamp,
+ will_purchase
+FROM(
+SELECT DISTINCT
+ processed_timestamp,
data_split,
user_pseudo_id,
user_id,
@@ -2283,23 +1171,20 @@ CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_purchase_propensity_trainin
checkouts_past_7_day,
checkouts_past_8_14_day,
checkouts_past_15_30_day,
- will_purchase)
-OPTIONS(
- --expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
- friendly_name="v_purchase_propensity_training_30_30",
- description="View Purchase Propensity Training dataset using 30 days back to predict 15 days ahead. View expires after 48h and should run daily.",
- labels=[("org_unit", "development")]
-) AS
-SELECT DISTINCT
- * EXCEPT(feature_date, row_order_peruser_persplit)
-FROM (
-SELECT DISTINCT
- processed_timestamp,
+ will_purchase,
+ ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, data_split, will_purchase ORDER BY feature_date DESC) AS user_row_order
+ --ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, data_split ORDER BY feature_date DESC) AS user_row_order
+ FROM `{{project_id}}.{{dataset}}.purchase_propensity_training_30_15`
+ TABLESAMPLE SYSTEM (1 PERCENT)
+)
+WHERE
+ user_row_order = 1;
+
+
+CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_purchase_propensity_training_30_30`
+(processed_timestamp,
data_split,
user_pseudo_id,
- feature_date,
- --Now, I want to skip rows per user, per split every 15 days.
- ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, data_split, will_purchase ORDER BY feature_date ASC) AS row_order_peruser_persplit,
user_id,
user_ltv_revenue,
device_category,
@@ -2374,13 +1259,23 @@ SELECT DISTINCT
checkouts_past_7_day,
checkouts_past_8_14_day,
checkouts_past_15_30_day,
- will_purchase
-FROM(
+ will_purchase)
+OPTIONS(
+ --expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
+ friendly_name="v_purchase_propensity_training_30_30",
+ description="View Purchase Propensity Training dataset using 30 days back to predict 15 days ahead. View expires after 48h and should run daily.",
+ labels=[("org_unit", "development")]
+) AS
+SELECT DISTINCT
+ * EXCEPT(feature_date, row_order_peruser_persplit)
+FROM (
SELECT DISTINCT
processed_timestamp,
data_split,
- feature_date,
user_pseudo_id,
+ feature_date,
+ --Now, I want to skip rows per user, per split every 15 days.
+ ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, data_split, will_purchase ORDER BY feature_date ASC) AS row_order_peruser_persplit,
user_id,
user_ltv_revenue,
device_category,
@@ -2455,22 +1350,12 @@ SELECT DISTINCT
checkouts_past_7_day,
checkouts_past_8_14_day,
checkouts_past_15_30_day,
- will_purchase,
- -- Number of rows per user, per day, per split. Only one row per user, per day, per slip.
- ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, feature_date, data_split, will_purchase ORDER BY feature_date DESC) AS row_order_peruser_perday_persplit
- FROM `{{project_id}}.{{dataset}}.purchase_propensity_training_30_30`
-)
-WHERE
- row_order_peruser_perday_persplit = 1
-)
-WHERE
- -- Skipping windows of 30 days, which is the future window size.
- MOD(row_order_peruser_persplit-1, 30) = 0;
-
-
-CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_purchase_propensity_training_30_30_balanced`
-(processed_timestamp,
+ will_purchase
+FROM(
+SELECT DISTINCT
+ processed_timestamp,
data_split,
+ feature_date,
user_pseudo_id,
user_id,
user_ltv_revenue,
@@ -2546,107 +1431,17 @@ CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_purchase_propensity_trainin
checkouts_past_7_day,
checkouts_past_8_14_day,
checkouts_past_15_30_day,
- will_purchase)
-OPTIONS(
- --expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
- friendly_name="v_purchase_propensity_training_30_30_balanced",
- description="View Purchase Propensity Training dataset using 30 days back to predict 15 days ahead. View expires after 48h and should run daily.",
- labels=[("org_unit", "development")]
-) AS
- SELECT DISTINCT
- processed_timestamp,
- data_split,
- user_pseudo_id,
- user_id,
- user_ltv_revenue,
- device_category,
- device_mobile_brand_name,
- device_mobile_model_name,
- device_os,
- device_language,
- device_web_browser,
- geo_sub_continent,
- geo_country,
- geo_region,
- geo_city,
- geo_metro,
- last_traffic_source_medium,
- last_traffic_source_name,
- last_traffic_source_source,
- first_traffic_source_medium,
- first_traffic_source_name,
- first_traffic_source_source,
- has_signed_in_with_user_id,
- active_users_past_1_day,
- active_users_past_2_day,
- active_users_past_3_day,
- active_users_past_4_day,
- active_users_past_5_day,
- active_users_past_6_day,
- active_users_past_7_day,
- active_users_past_8_14_day,
- active_users_past_15_30_day,
- purchases_past_1_day,
- purchases_past_2_day,
- purchases_past_3_day,
- purchases_past_4_day,
- purchases_past_5_day,
- purchases_past_6_day,
- purchases_past_7_day,
- purchases_past_8_14_day,
- purchases_past_15_30_day,
- visits_past_1_day,
- visits_past_2_day,
- visits_past_3_day,
- visits_past_4_day,
- visits_past_5_day,
- visits_past_6_day,
- visits_past_7_day,
- visits_past_8_14_day,
- visits_past_15_30_day,
- view_items_past_1_day,
- view_items_past_2_day,
- view_items_past_3_day,
- view_items_past_4_day,
- view_items_past_5_day,
- view_items_past_6_day,
- view_items_past_7_day,
- view_items_past_8_14_day,
- view_items_past_15_30_day,
- add_to_carts_past_1_day,
- add_to_carts_past_2_day,
- add_to_carts_past_3_day,
- add_to_carts_past_4_day,
- add_to_carts_past_5_day,
- add_to_carts_past_6_day,
- add_to_carts_past_7_day,
- add_to_carts_past_8_14_day,
- add_to_carts_past_15_30_day,
- checkouts_past_1_day,
- checkouts_past_2_day,
- checkouts_past_3_day,
- checkouts_past_4_day,
- checkouts_past_5_day,
- checkouts_past_6_day,
- checkouts_past_7_day,
- checkouts_past_8_14_day,
- checkouts_past_15_30_day,
- will_purchase
- FROM (
- SELECT
- DISTINCT *,
- ROW_NUMBER() OVER (PARTITION BY will_purchase ORDER BY RAND()) AS rn
- FROM
- `{{project_id}}.{{dataset}}.v_purchase_propensity_training_30_30` )
- WHERE
- rn <= (
- SELECT
- COUNT(will_purchase)
- FROM
- `{{project_id}}.{{dataset}}.v_purchase_propensity_training_30_30`
- WHERE
- will_purchase = 1)
-;
+ will_purchase,
+ -- Number of rows per user, per day, per split. Only one row per user, per day, per slip.
+ ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, feature_date, data_split, will_purchase ORDER BY feature_date DESC) AS row_order_peruser_perday_persplit
+ FROM `{{project_id}}.{{dataset}}.purchase_propensity_training_30_30`
+)
+WHERE
+ row_order_peruser_perday_persplit = 1
+)
+WHERE
+ -- Skipping windows of 30 days, which is the future window size.
+ MOD(row_order_peruser_persplit-1, 30) = 0;
CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_purchase_propensity_training_30_30_last_window`
@@ -2896,193 +1691,7 @@ SELECT DISTINCT
ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, data_split, will_purchase ORDER BY feature_date DESC) AS user_row_order
--ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, data_split ORDER BY feature_date DESC) AS user_row_order
FROM `{{project_id}}.{{dataset}}.purchase_propensity_training_30_30`
+ TABLESAMPLE SYSTEM (1 PERCENT)
)
WHERE
user_row_order = 1;
-
-
-CREATE OR REPLACE VIEW `{{project_id}}.{{dataset}}.v_purchase_propensity_training_30_30_rare_sales`
-(processed_timestamp,
- data_split,
- user_pseudo_id,
- user_id,
- user_ltv_revenue,
- device_category,
- device_mobile_brand_name,
- device_mobile_model_name,
- device_os,
- device_language,
- device_web_browser,
- geo_sub_continent,
- geo_country,
- geo_region,
- geo_city,
- geo_metro,
- last_traffic_source_medium,
- last_traffic_source_name,
- last_traffic_source_source,
- first_traffic_source_medium,
- first_traffic_source_name,
- first_traffic_source_source,
- has_signed_in_with_user_id,
- active_users_past_1_day,
- active_users_past_2_day,
- active_users_past_3_day,
- active_users_past_4_day,
- active_users_past_5_day,
- active_users_past_6_day,
- active_users_past_7_day,
- active_users_past_8_14_day,
- active_users_past_15_30_day,
- purchases_past_1_day,
- purchases_past_2_day,
- purchases_past_3_day,
- purchases_past_4_day,
- purchases_past_5_day,
- purchases_past_6_day,
- purchases_past_7_day,
- purchases_past_8_14_day,
- purchases_past_15_30_day,
- visits_past_1_day,
- visits_past_2_day,
- visits_past_3_day,
- visits_past_4_day,
- visits_past_5_day,
- visits_past_6_day,
- visits_past_7_day,
- visits_past_8_14_day,
- visits_past_15_30_day,
- view_items_past_1_day,
- view_items_past_2_day,
- view_items_past_3_day,
- view_items_past_4_day,
- view_items_past_5_day,
- view_items_past_6_day,
- view_items_past_7_day,
- view_items_past_8_14_day,
- view_items_past_15_30_day,
- add_to_carts_past_1_day,
- add_to_carts_past_2_day,
- add_to_carts_past_3_day,
- add_to_carts_past_4_day,
- add_to_carts_past_5_day,
- add_to_carts_past_6_day,
- add_to_carts_past_7_day,
- add_to_carts_past_8_14_day,
- add_to_carts_past_15_30_day,
- checkouts_past_1_day,
- checkouts_past_2_day,
- checkouts_past_3_day,
- checkouts_past_4_day,
- checkouts_past_5_day,
- checkouts_past_6_day,
- checkouts_past_7_day,
- checkouts_past_8_14_day,
- checkouts_past_15_30_day,
- will_purchase)
-OPTIONS(
- --expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
- friendly_name="v_purchase_propensity_training_30_30_rare_sales",
- description="View Purchase Propensity Training dataset using 30 days back to predict 15 days ahead. View expires after 48h and should run daily.",
- labels=[("org_unit", "development")]
-) AS
-SELECT DISTINCT
- processed_timestamp,
- data_split,
- user_pseudo_id,
- user_id,
- user_ltv_revenue,
- device_category,
- device_mobile_brand_name,
- device_mobile_model_name,
- device_os,
- device_language,
- device_web_browser,
- geo_sub_continent,
- geo_country,
- geo_region,
- geo_city,
- geo_metro,
- last_traffic_source_medium,
- last_traffic_source_name,
- last_traffic_source_source,
- first_traffic_source_medium,
- first_traffic_source_name,
- first_traffic_source_source,
- has_signed_in_with_user_id,
- active_users_past_1_day,
- active_users_past_2_day,
- active_users_past_3_day,
- active_users_past_4_day,
- active_users_past_5_day,
- active_users_past_6_day,
- active_users_past_7_day,
- active_users_past_8_14_day,
- active_users_past_15_30_day,
- purchases_past_1_day,
- purchases_past_2_day,
- purchases_past_3_day,
- purchases_past_4_day,
- purchases_past_5_day,
- purchases_past_6_day,
- purchases_past_7_day,
- purchases_past_8_14_day,
- purchases_past_15_30_day,
- visits_past_1_day,
- visits_past_2_day,
- visits_past_3_day,
- visits_past_4_day,
- visits_past_5_day,
- visits_past_6_day,
- visits_past_7_day,
- visits_past_8_14_day,
- visits_past_15_30_day,
- view_items_past_1_day,
- view_items_past_2_day,
- view_items_past_3_day,
- view_items_past_4_day,
- view_items_past_5_day,
- view_items_past_6_day,
- view_items_past_7_day,
- view_items_past_8_14_day,
- view_items_past_15_30_day,
- add_to_carts_past_1_day,
- add_to_carts_past_2_day,
- add_to_carts_past_3_day,
- add_to_carts_past_4_day,
- add_to_carts_past_5_day,
- add_to_carts_past_6_day,
- add_to_carts_past_7_day,
- add_to_carts_past_8_14_day,
- add_to_carts_past_15_30_day,
- checkouts_past_1_day,
- checkouts_past_2_day,
- checkouts_past_3_day,
- checkouts_past_4_day,
- checkouts_past_5_day,
- checkouts_past_6_day,
- checkouts_past_7_day,
- checkouts_past_8_14_day,
- checkouts_past_15_30_day,
- will_purchase
- FROM
- (SELECT DISTINCT
- *
- FROM `{{project_id}}.{{dataset}}.v_purchase_propensity_training_30_30_last_window`
- )
- UNION ALL
- (
- SELECT DISTINCT
- * EXCEPT(user_row_order, feature_date)
- FROM(
- SELECT DISTINCT
- *,
- ROW_NUMBER() OVER (PARTITION BY user_pseudo_id, data_split ORDER BY feature_date DESC) AS user_row_order
- FROM `{{project_id}}.{{dataset}}.purchase_propensity_training_30_30`
- WHERE will_purchase = 1
- )
- WHERE
- user_row_order = 1
- LIMIT 100
- )
-;
diff --git a/sql/procedure/user_rolling_window_lead_metrics.sqlx b/sql/procedure/user_rolling_window_lead_metrics.sqlx
new file mode 100644
index 00000000..26e25155
--- /dev/null
+++ b/sql/procedure/user_rolling_window_lead_metrics.sqlx
@@ -0,0 +1,129 @@
+-- Copyright 2023 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- http://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- Setting procedure to lookback from the day before `input_date` until the day before `end_date`
+-- Subtract one day from `input_date`
+SET input_date = DATE_SUB(input_date, INTERVAL 1 DAY);
+-- Subtract one day from `end_date`
+SET end_date = DATE_SUB(end_date, INTERVAL 1 DAY);
+
+{% for feature in short_list_features %}
+-- Past User metrics: 1-day {{feature.feature_name}} events per user, 2-5-day {{feature.feature_name}} events per user
+-- Create a temporary table `rolling_{{feature.feature_name}}_past_days` to store the rolling {{feature.feature_name}} events count for each user
+CREATE OR REPLACE TEMP TABLE rolling_{{feature.feature_name}}_past_days AS (
+SELECT
+ -- User's unique identifier
+ user_pseudo_id,
+ -- Calculate the number of {{feature.feature_name}} made in the past 1 day
+ MAX(COUNT(DISTINCT CASE WHEN DATE_DIFF(input_date, event_date, DAY) = 1 THEN event_timestamp END)) OVER(PARTITION BY user_pseudo_id) AS {{feature.feature_name}}_past_1_day,
+ -- Calculate the number of {{feature.feature_name}} made in the past 2 days
+ MAX(COUNT(DISTINCT CASE WHEN DATE_DIFF(input_date, event_date, DAY) = 2 THEN event_timestamp END)) OVER(PARTITION BY user_pseudo_id) AS {{feature.feature_name}}_past_2_day,
+ -- Calculate the number of {{feature.feature_name}} made in the past 3 days
+ MAX(COUNT(DISTINCT CASE WHEN DATE_DIFF(input_date, event_date, DAY) = 3 THEN event_timestamp END)) OVER(PARTITION BY user_pseudo_id) AS {{feature.feature_name}}_past_3_day,
+ -- Calculate the number of {{feature.feature_name}} made in the past 4 days
+ MAX(COUNT(DISTINCT CASE WHEN DATE_DIFF(input_date, event_date, DAY) = 4 THEN event_timestamp END)) OVER(PARTITION BY user_pseudo_id) AS {{feature.feature_name}}_past_4_day,
+ -- Calculate the number of {{feature.feature_name}} made in the past 5 days
+ MAX(COUNT(DISTINCT CASE WHEN DATE_DIFF(input_date, event_date, DAY) = 5 THEN event_timestamp END)) OVER(PARTITION BY user_pseudo_id) AS {{feature.feature_name}}_past_5_day
+FROM `{{mds_project_id}}.{{mds_dataset}}.event` as E
+-- Filter events within the defined date range
+WHERE event_date BETWEEN end_date AND input_date
+-- Filter for {{feature.feature_name}} events
+AND event_name='{{feature.feature_name}}'
+-- Ensure valid session ID
+AND ga_session_id IS NOT NULL
+-- Group the results by user pseudo ID
+GROUP BY user_pseudo_id
+);
+
+{% endfor %}
+
+-- All users in the platform
+CREATE OR REPLACE TEMP TABLE events_users_days as (
+ SELECT DISTINCT
+ -- User pseudo ID
+ Users.user_pseudo_id,
+ -- distinct event date
+ Days.event_date as event_date
+ FROM `{{mds_project_id}}.{{mds_dataset}}.event` Users
+ -- 'Days' is an alias for a temporary table containing distinct event dates
+ CROSS JOIN
+ (SELECT DISTINCT event_date FROM `{{mds_project_id}}.{{mds_dataset}}.event`) Days
+ INNER JOIN `{{mds_project_id}}.{{mds_dataset}}.device` as D
+ ON Users.device_type_id = D.device_type_id
+ -- Exclude events without a valid session ID
+ WHERE Users.ga_session_id IS NOT NULL
+ -- Exclude events without a valid device operating system
+ AND D.device_os IS NOT NULL
+ -- Filter events within the defined date range
+ AND Days.event_date BETWEEN end_date AND input_date)
+;
+
+-- Create a temporary table to store data for the target table
+CREATE OR REPLACE TEMP TABLE DataForTargetTable AS
+SELECT DISTINCT
+ -- Current timestamp
+ CURRENT_TIMESTAMP() AS processed_timestamp,
+ -- Feature date
+ input_date AS feature_date,
+ -- User pseudo ID
+ EUD.user_pseudo_id{% for feature in short_list_features %},
+ COALESCE({{feature.feature_name}}_past_1_day,0) AS {{feature.feature_name}}_past_1_day,
+ COALESCE({{feature.feature_name}}_past_2_day,0) AS {{feature.feature_name}}_past_2_day,
+ COALESCE({{feature.feature_name}}_past_3_day,0) AS {{feature.feature_name}}_past_3_day,
+ COALESCE({{feature.feature_name}}_past_4_day,0) AS {{feature.feature_name}}_past_4_day,
+ COALESCE({{feature.feature_name}}_past_5_day,0) AS {{feature.feature_name}}_past_5_day{% endfor %}
+ FROM events_users_days AS EUD{% for feature in short_list_features %}
+ FULL OUTER JOIN rolling_{{feature.feature_name}}_past_days AS {{feature.feature_name}}
+ ON EUD.user_pseudo_id = {{feature.feature_name}}.user_pseudo_id{% endfor %}
+ -- Exclude rows without a valid user pseudo ID
+ WHERE EUD.user_pseudo_id IS NOT NULL
+ ;
+
+-- Merge data into the target table
+MERGE `{{project_id}}.{{dataset}}.{{insert_table}}` I
+USING DataForTargetTable T
+ON I.feature_date = T.feature_date
+ AND I.user_pseudo_id = T.user_pseudo_id
+WHEN MATCHED THEN
+ UPDATE SET
+ -- Update the processed timestamp and rolling window features
+ I.processed_timestamp = T.processed_timestamp{% for feature in short_list_features %},
+ I.{{feature.feature_name}}_past_1_day = T.{{feature.feature_name}}_past_1_day,
+ I.{{feature.feature_name}}_past_2_day = T.{{feature.feature_name}}_past_2_day,
+ I.{{feature.feature_name}}_past_3_day = T.{{feature.feature_name}}_past_3_day,
+ I.{{feature.feature_name}}_past_4_day = T.{{feature.feature_name}}_past_4_day,
+ I.{{feature.feature_name}}_past_5_day = T.{{feature.feature_name}}_past_5_day{% endfor %}
+WHEN NOT MATCHED THEN
+ INSERT
+ (processed_timestamp,
+ feature_date,
+ user_pseudo_id{% for feature in short_list_features %},
+ {{feature.feature_name}}_past_1_day,
+ {{feature.feature_name}}_past_2_day,
+ {{feature.feature_name}}_past_3_day,
+ {{feature.feature_name}}_past_4_day,
+ {{feature.feature_name}}_past_5_day{% endfor %})
+ VALUES
+ (T.processed_timestamp,
+ T.feature_date,
+ T.user_pseudo_id{% for feature in short_list_features %},
+ T.{{feature.feature_name}}_past_1_day,
+ T.{{feature.feature_name}}_past_2_day,
+ T.{{feature.feature_name}}_past_3_day,
+ T.{{feature.feature_name}}_past_4_day,
+ T.{{feature.feature_name}}_past_5_day{% endfor %})
+;
+
+-- Set a variable to track the number of rows added
+SET rows_added = (SELECT COUNT(DISTINCT user_pseudo_id) FROM `{{project_id}}.{{dataset}}.{{insert_table}}`);
diff --git a/sql/query/create_gemini_model.sqlx b/sql/query/create_gemini_model.sqlx
index 84612d8f..4e365c4a 100644
--- a/sql/query/create_gemini_model.sqlx
+++ b/sql/query/create_gemini_model.sqlx
@@ -18,6 +18,6 @@
-- Your supervised tuning computations also occur in the europe-west4 region, because that's where TPU resources are located.
-- Create a {{endpoint_name}} model using a remote connection to {{region}}.{{connection_name}}
-CREATE OR REPLACE MODEL `{{project_id}}.{{dataset}}.{{model_name}}`
+CREATE MODEL IF NOT EXISTS `{{project_id}}.{{dataset}}.{{model_name}}`
REMOTE WITH CONNECTION `{{project_id}}.{{region}}.{{connection_name}}`
OPTIONS (ENDPOINT = '{{endpoint_name}}');
\ No newline at end of file
diff --git a/sql/query/invoke_backfill_churn_propensity_label.sqlx b/sql/query/invoke_backfill_churn_propensity_label.sqlx
index 4cbe77ac..9dd41da7 100644
--- a/sql/query/invoke_backfill_churn_propensity_label.sqlx
+++ b/sql/query/invoke_backfill_churn_propensity_label.sqlx
@@ -119,7 +119,13 @@ GROUP BY
);
-- Insert data into the target table, combining user information with churn and bounce status
-INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}`
+INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}` (
+ processed_timestamp,
+ feature_date,
+ user_pseudo_id,
+ churned,
+ bounced
+)
SELECT DISTINCT
-- Current timestamp as the processing timestamp
CURRENT_TIMESTAMP() AS processed_timestamp,
diff --git a/sql/query/invoke_backfill_customer_lifetime_value_label.sqlx b/sql/query/invoke_backfill_customer_lifetime_value_label.sqlx
index 27ea59d0..569e5db5 100644
--- a/sql/query/invoke_backfill_customer_lifetime_value_label.sqlx
+++ b/sql/query/invoke_backfill_customer_lifetime_value_label.sqlx
@@ -109,7 +109,14 @@ CREATE OR REPLACE TEMP TABLE future_revenue_per_user AS (
);
-- Insert data into the target table
-INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}`
+INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}` (
+ processed_timestamp,
+ feature_date,
+ user_pseudo_id,
+ pltv_revenue_30_days,
+ pltv_revenue_90_days,
+ pltv_revenue_180_days
+)
SELECT DISTINCT
-- Current timestamp of the processing
CURRENT_TIMESTAMP() AS processed_timestamp,
diff --git a/sql/query/invoke_backfill_lead_score_propensity_label.sqlx b/sql/query/invoke_backfill_lead_score_propensity_label.sqlx
new file mode 100644
index 00000000..eba85784
--- /dev/null
+++ b/sql/query/invoke_backfill_lead_score_propensity_label.sqlx
@@ -0,0 +1,116 @@
+-- Copyright 2023 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- http://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- Declares a variable to store the maximum date for analysis
+DECLARE max_date DATE;
+-- Declares a variable to store the minimum date for analysis
+DECLARE min_date DATE;
+-- Sets the max_date variable to the latest event_date minus a specified number of days ({{interval_max_date}}) from the 'event' table
+SET max_date = (SELECT DATE_SUB(MAX(event_date), INTERVAL {{interval_max_date}} DAY) FROM `{{mds_project_id}}.{{mds_dataset}}.event`);
+-- Sets the min_date variable to the earliest event_date plus a specified number of days ({{interval_min_date}}) from the 'event' table
+SET min_date = (SELECT DATE_ADD(MIN(event_date), INTERVAL {{interval_min_date}} DAY) FROM `{{mds_project_id}}.{{mds_dataset}}.event`);
+
+-- If min_date > maximum event_date OR max_date < minimum event_date, then set min_date for the max event_date and set max_date for the min event_date
+IF min_date >= (SELECT MAX(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`) OR max_date <= (SELECT MIN(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`) OR min_date >= max_date THEN
+ SET min_date = (SELECT MIN(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`);
+ SET max_date = (SELECT MAX(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`);
+END IF;
+
+-- This code block acts as a safeguard to ensure that the min_date and max_date used for further analysis are always within the bounds of the actual data available in the table.
+-- It prevents situations where calculations might mistakenly consider dates beyond the real data range, which could lead to errors or misleading results.
+IF max_date > (SELECT MAX(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`) OR min_date < (SELECT MIN(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`) THEN
+ SET min_date = (SELECT MIN(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`);
+ SET max_date = (SELECT MAX(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`);
+END IF;
+
+-- Creates a temporary table called dates_interval to store distinct event dates and their corresponding end dates
+CREATE OR REPLACE TEMP TABLE dates_interval as (
+ SELECT DISTINCT
+ -- Selects the distinct event_date and assigns it to the column input_date
+ event_date as input_date,
+ -- Calculates the end date by adding a specified number of days ({{interval_end_date}}) to the input_date
+ DATE_ADD(event_date, INTERVAL {{interval_end_date}} DAY) as end_date
+ FROM `{{mds_project_id}}.{{mds_dataset}}.event`
+ -- Filters the events to include only those within the defined date range (between min_date and max_date)
+ WHERE event_date BETWEEN min_date AND max_date
+ ORDER BY input_date DESC
+);
+
+-- All users in the platform
+-- Creates a temporary table called all_users_possible_{{target_event}}s to store user {{target_event}} data
+CREATE OR REPLACE TEMP TABLE all_users_possible_{{target_event}}s as (
+ SELECT DISTINCT
+ -- Selects the user_pseudo_id from the 'event' table and assigns it to the column user_pseudo_id
+ Users.user_pseudo_id,
+ -- Selects the event_date from the date array generated using GENERATE_DATE_ARRAY and assigns it to the column feature_date
+ DI.event_date as feature_date,
+ -- Creates a series of columns ({{target_event}}_day_1) and initializes them with NULL values
+ -- These columns will be populated later with {{target_event}} data for specific days
+ NULL as {{target_event}}_day_1
+ FROM `{{mds_project_id}}.{{mds_dataset}}.event` Users
+ -- Performs a cross join with a subquery that generates a date array using GENERATE_DATE_ARRAY
+ -- The date array includes dates from min_date to max_date with a 1-day interval
+ CROSS JOIN (SELECT event_date FROM UNNEST(GENERATE_DATE_ARRAY(min_date, max_date, INTERVAL 1 DAY)) as event_date) as DI
+ -- Filters the data to include events where event_name is '{{target_event}}'
+ WHERE LOWER(Users.event_name) IN ('{{target_event}}')
+ AND Users.ga_session_id IS NOT NULL
+ );
+
+-- Creates a temporary table called future_{{target_event}}s_per_user to store user {{target_event}} data in the future
+-- Future User metrics: 1-7-day future {{target_event}}s per user, 1-day future {{target_event}}s per user
+CREATE OR REPLACE TEMP TABLE future_{{target_event}}s_per_user AS (
+ SELECT
+ -- Selects user_pseudo_id from the event table and assigns it to column user_pseudo_id
+ user_pseudo_id,
+ -- Selects input_date from the dates_interval table and assigns it to column feature_date
+ input_date as feature_date,
+ -- This calculation is performed over a window partitioned by user_pseudo_id and input_date
+ -- Repeats the above logic for different day offsets (1) to calculate future {{target_event}} counts for different days
+ MAX(COUNT(DISTINCT CASE DATE_DIFF(event_date, input_date, DAY) = 1 WHEN TRUE THEN ecommerce.transaction_id END)) OVER(PARTITION BY user_pseudo_id, input_date) AS {{target_event}}_day_1
+ FROM `{{mds_project_id}}.{{mds_dataset}}.event` as E
+ INNER JOIN `{{mds_project_id}}.{{mds_dataset}}.device` as D
+ ON E.device_type_id = D.device_type_id
+ CROSS JOIN dates_interval as DI
+ -- Filters events to be within the date range defined by input_date and end_date from dates_interval
+ WHERE E.event_date BETWEEN DI.input_date AND DI.end_date
+ AND LOWER(E.event_name) IN ('{{target_event}}')
+ AND E.ga_session_id IS NOT NULL
+ AND D.device_os IS NOT NULL
+ -- Groups the result by user_pseudo_id and feature_date
+ GROUP BY user_pseudo_id, feature_date
+);
+
+-- Inserts data into the target table
+INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}` (
+ processed_timestamp,
+ feature_date,
+ user_pseudo_id,
+ {{target_event}}_day_1
+)
+SELECT DISTINCT
+ -- Selects the current timestamp and assigns it to the column processed_timestamp
+ CURRENT_TIMESTAMP() AS processed_timestamp,
+ -- Selects the feature_date from the all_users_possible_{{target_event}}s table and assigns it to the column feature_date
+ A.feature_date,
+ -- Selects the user_pseudo_id from the all_users_possible_{{target_event}}s table and assigns it to the column user_pseudo_id
+ A.user_pseudo_id,
+ -- Uses the LEAST function to get the minimum value between the coalesced value of {{target_event}}_day_1 and 1
+ -- COALESCE is used to handle null values, replacing them with 0
+ -- This pattern is repeated for {{target_event}}_day_1 to populate the respective columns
+ LEAST(COALESCE(B.{{target_event}}_day_1, 0), 1) AS {{target_event}}_day_1
+FROM all_users_possible_{{target_event}}s AS A
+-- Performs a left join with the future_{{target_event}}s_per_user table (aliased as B) using user_pseudo_id and feature_date
+LEFT JOIN future_{{target_event}}s_per_user AS B
+ON B.user_pseudo_id = A.user_pseudo_id AND B.feature_date = A.feature_date
+;
\ No newline at end of file
diff --git a/sql/query/invoke_backfill_purchase_propensity_label.sqlx b/sql/query/invoke_backfill_purchase_propensity_label.sqlx
index a2c8bee0..b062dc58 100644
--- a/sql/query/invoke_backfill_purchase_propensity_label.sqlx
+++ b/sql/query/invoke_backfill_purchase_propensity_label.sqlx
@@ -125,7 +125,26 @@ CREATE OR REPLACE TEMP TABLE future_purchases_per_user AS (
);
-- Inserts data into the target table
-INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}`
+INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}` (
+ processed_timestamp,
+ feature_date,
+ user_pseudo_id,
+ purchase_day_1,
+ purchase_day_2,
+ purchase_day_3,
+ purchase_day_4,
+ purchase_day_5,
+ purchase_day_6,
+ purchase_day_7,
+ purchase_day_8,
+ purchase_day_9,
+ purchase_day_10,
+ purchase_day_11,
+ purchase_day_12,
+ purchase_day_13,
+ purchase_day_14,
+ purchase_day_15_30
+)
SELECT DISTINCT
-- Selects the current timestamp and assigns it to the column processed_timestamp
CURRENT_TIMESTAMP() AS processed_timestamp,
diff --git a/sql/query/invoke_backfill_user_dimensions.sqlx b/sql/query/invoke_backfill_user_dimensions.sqlx
index c27dd299..6c81b412 100644
--- a/sql/query/invoke_backfill_user_dimensions.sqlx
+++ b/sql/query/invoke_backfill_user_dimensions.sqlx
@@ -122,7 +122,31 @@ CREATE OR REPLACE TEMP TABLE events_users as (
;
-- Inserting aggregated user data into the target table.
-INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}`
+INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}` (
+ processed_timestamp,
+ feature_date,
+ user_pseudo_id,
+ user_id,
+ user_ltv_revenue,
+ device_category,
+ device_mobile_brand_name,
+ device_mobile_model_name,
+ device_os,
+ device_language,
+ device_web_browser,
+ geo_sub_continent,
+ geo_country,
+ geo_region,
+ geo_city,
+ geo_metro,
+ last_traffic_source_medium,
+ last_traffic_source_name,
+ last_traffic_source_source,
+ first_traffic_source_medium,
+ first_traffic_source_name,
+ first_traffic_source_source,
+ has_signed_in_with_user_id
+)
SELECT DISTINCT
-- Timestamp of the data processing
CURRENT_TIMESTAMP() AS processed_timestamp,
diff --git a/sql/query/invoke_backfill_user_lifetime_dimensions.sqlx b/sql/query/invoke_backfill_user_lifetime_dimensions.sqlx
index 4001878f..b05611e0 100644
--- a/sql/query/invoke_backfill_user_lifetime_dimensions.sqlx
+++ b/sql/query/invoke_backfill_user_lifetime_dimensions.sqlx
@@ -137,7 +137,31 @@ CREATE OR REPLACE TEMP TABLE events_users as (
-- This code block inserts data into the specified table, combining information from the "events_users" table
-- and the "user_dimensions_event_session_scoped" table.
-- It aggregates user-level features for each user and date.
-INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}`
+INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}` (
+ processed_timestamp,
+ feature_date,
+ user_pseudo_id,
+ user_id,
+ user_ltv_revenue,
+ device_category,
+ device_mobile_brand_name,
+ device_mobile_model_name,
+ device_os,
+ device_language,
+ device_web_browser,
+ geo_sub_continent,
+ geo_country,
+ geo_region,
+ geo_city,
+ geo_metro,
+ last_traffic_source_medium,
+ last_traffic_source_name,
+ last_traffic_source_source,
+ first_traffic_source_medium,
+ first_traffic_source_name,
+ first_traffic_source_source,
+ has_signed_in_with_user_id
+)
SELECT DISTINCT
-- The current timestamp.
CURRENT_TIMESTAMP() AS processed_timestamp,
diff --git a/sql/query/invoke_backfill_user_lookback_metrics.sqlx b/sql/query/invoke_backfill_user_lookback_metrics.sqlx
index 25e3566b..37bd4563 100644
--- a/sql/query/invoke_backfill_user_lookback_metrics.sqlx
+++ b/sql/query/invoke_backfill_user_lookback_metrics.sqlx
@@ -230,7 +230,25 @@ AND D.device_os IS NOT NULL
-- This code is part of a larger process for building a machine learning model that predicts
-- user behavior based on their past activity. The features generated by this code can be used
-- as input to the model, helping it learn patterns and make predictions.
-INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}`
+INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}`Β (
+ processed_timestamp,
+ feature_date,
+ user_pseudo_id,
+ active_users_past_1_7_day,
+ active_users_past_8_14_day,
+ purchases_past_1_7_day,
+ purchases_past_8_14_day,
+ visits_past_1_7_day,
+ visits_past_8_14_day,
+ view_items_past_1_7_day,
+ view_items_past_8_14_day,
+ add_to_carts_past_1_7_day,
+ add_to_carts_past_8_14_day,
+ checkouts_past_1_7_day,
+ checkouts_past_8_14_day,
+ ltv_revenue_past_1_7_day,
+ ltv_revenue_past_7_15_day
+)
SELECT DISTINCT
-- Timestamp indicating when the data was processed
CURRENT_TIMESTAMP() AS processed_timestamp,
diff --git a/sql/query/invoke_backfill_user_rolling_window_lead_metrics.sqlx b/sql/query/invoke_backfill_user_rolling_window_lead_metrics.sqlx
new file mode 100644
index 00000000..b8b364cc
--- /dev/null
+++ b/sql/query/invoke_backfill_user_rolling_window_lead_metrics.sqlx
@@ -0,0 +1,127 @@
+-- Copyright 2023 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- http://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- This SQL code defines a series of temporary tables to calculate and store user engagement metrics based on
+-- rolling window aggregations. These tables are then used to populate a target table with daily user engagement features.
+
+DECLARE max_date DATE;
+DECLARE min_date DATE;
+-- Sets max_date to the latest event_date from the event table, minus an offset specified by the interval_max_date
+SET max_date = (SELECT DATE_SUB(MAX(event_date), INTERVAL {{interval_max_date}} DAY) FROM `{{mds_project_id}}.{{mds_dataset}}.event`);
+-- Sets min_date to the earliest event_date from the event table, plus an offset specified by the interval_min_date
+SET min_date = (SELECT DATE_ADD(MIN(event_date), INTERVAL {{interval_min_date}} DAY) FROM `{{mds_project_id}}.{{mds_dataset}}.event`);
+
+-- If min_date > maximum event_date OR max_date < minimum event_date, then set min_date for the max event_date and set max_date for the min event_date
+IF min_date >= (SELECT MAX(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`) OR max_date <= (SELECT MIN(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`) OR min_date >= max_date THEN
+ SET min_date = (SELECT MIN(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`);
+ SET max_date = (SELECT MAX(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`);
+END IF;
+
+-- This code block acts as a safeguard to ensure that the min_date and max_date used for further analysis are always within the bounds of the actual data available in the table.
+-- It prevents situations where calculations might mistakenly consider dates beyond the real data range, which could lead to errors or misleading results.
+IF max_date > (SELECT MAX(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`) OR min_date < (SELECT MIN(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`) THEN
+ SET min_date = (SELECT MIN(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`);
+ SET max_date = (SELECT MAX(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`);
+END IF;
+
+-- This section determines the date range for analysis and creates a temporary table dates_interval with distinct date intervals.
+CREATE OR REPLACE TEMP TABLE dates_interval as (
+ SELECT DISTINCT
+ -- Select each distinct event_date as 'input_date', representing the current date in the analysis
+ event_date as input_date,
+ -- Calculate the 'end_date' by subtracting a specified interval from the 'input_date'
+ DATE_SUB(event_date, INTERVAL {{interval_end_date}} DAY) as end_date
+ FROM `{{mds_project_id}}.{{mds_dataset}}.event`
+ WHERE event_date BETWEEN min_date AND max_date
+ ORDER BY input_date DESC
+);
+
+{% for feature in short_list_features %}
+-- Run these windows aggregations every day. For each date in training and inference date ranges.
+-- All users metrics: 1β5-day {{feature.feature_name}} users
+CREATE OR REPLACE TEMP TABLE rolling_{{feature.feature_name}}_past_days AS (
+ SELECT
+ user_pseudo_id,
+ input_date as feature_date,
+ -- Number of times the user has {{feature.feature_name}} in the past 1st day
+ MAX(COUNT(DISTINCT CASE DATE_DIFF(input_date, event_date, DAY) = 1 WHEN TRUE THEN event_timestamp END)) OVER(PARTITION BY user_pseudo_id, input_date) AS {{feature.feature_name}}_past_1_day,
+ -- Number of times the user has {{feature.feature_name}} in the past 2nd day
+ MAX(COUNT(DISTINCT CASE DATE_DIFF(input_date, event_date, DAY) = 2 WHEN TRUE THEN event_timestamp END)) OVER(PARTITION BY user_pseudo_id, input_date) AS {{feature.feature_name}}_past_2_day,
+ -- Number of times the user has {{feature.feature_name}} in the past 3rd day
+ MAX(COUNT(DISTINCT CASE DATE_DIFF(input_date, event_date, DAY) = 3 WHEN TRUE THEN event_timestamp END)) OVER(PARTITION BY user_pseudo_id, input_date) AS {{feature.feature_name}}_past_3_day,
+ -- Number of times the user has {{feature.feature_name}} in the past 4th day
+ MAX(COUNT(DISTINCT CASE DATE_DIFF(input_date, event_date, DAY) = 4 WHEN TRUE THEN event_timestamp END)) OVER(PARTITION BY user_pseudo_id, input_date) AS {{feature.feature_name}}_past_4_day,
+ -- Number of times the user has {{feature.feature_name}} in the past 5th day
+ MAX(COUNT(DISTINCT CASE DATE_DIFF(input_date, event_date, DAY) = 5 WHEN TRUE THEN event_timestamp END)) OVER(PARTITION BY user_pseudo_id, input_date) AS {{feature.feature_name}}_past_5_day
+ FROM `{{mds_project_id}}.{{mds_dataset}}.event` as E
+ CROSS JOIN dates_interval as DI
+ -- Filter events to be within the defined date range
+ WHERE E.event_date BETWEEN DI.end_date AND DI.input_date
+ -- Filter for {{feature.feature_name}} events
+ AND event_name='{{feature.feature_name}}'
+ -- Ensure valid session ID
+ AND ga_session_id IS NOT NULL
+ -- Group the results by user pseudo ID and feature date
+ GROUP BY user_pseudo_id, feature_date
+);
+
+{% endfor %}
+
+-- All users in the platform
+-- This code creates a temporary table that contains a distinct list of user pseudo IDs
+-- and their corresponding feature dates, filtering for events with valid session IDs,
+-- device operating systems, and falling within the specified date range.
+CREATE OR REPLACE TEMP TABLE events_users as (
+ SELECT DISTINCT
+ Users.user_pseudo_id,
+ DI.input_date as feature_date
+ FROM `{{mds_project_id}}.{{mds_dataset}}.event` Users
+ INNER JOIN `{{mds_project_id}}.{{mds_dataset}}.device` as D
+ ON Users.device_type_id = D.device_type_id
+ CROSS JOIN dates_interval as DI
+ WHERE Users.ga_session_id IS NOT NULL
+ AND Users.event_date BETWEEN DI.end_date AND DI.input_date
+ AND D.device_os IS NOT NULL
+);
+
+-- This code block inserts data into a table, combining information from the events_users
+-- table and several temporary tables containing rolling window features. The resulting data
+-- represents user-level features for each user and date, capturing their past activity within
+-- different time windows.
+INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}` (
+ processed_timestamp,
+ feature_date,
+ user_pseudo_id{% for feature in short_list_features %},
+ {{feature.feature_name}}_past_1_day,
+ {{feature.feature_name}}_past_2_day,
+ {{feature.feature_name}}_past_3_day,
+ {{feature.feature_name}}_past_4_day,
+ {{feature.feature_name}}_past_5_day{% endfor %}
+)
+ SELECT DISTINCT
+ -- This selects the current timestamp and assigns it to the column processed_timestamp.
+ CURRENT_TIMESTAMP() AS processed_timestamp,
+ EUD.feature_date,
+ EUD.user_pseudo_id{% for feature in short_list_features %},
+ COALESCE({{feature.feature_name}}_past_1_day,0) AS {{feature.feature_name}}_past_1_day,
+ COALESCE({{feature.feature_name}}_past_2_day,0) AS {{feature.feature_name}}_past_2_day,
+ COALESCE({{feature.feature_name}}_past_3_day,0) AS {{feature.feature_name}}_past_3_day,
+ COALESCE({{feature.feature_name}}_past_4_day,0) AS {{feature.feature_name}}_past_4_day,
+ COALESCE({{feature.feature_name}}_past_5_day,0) AS {{feature.feature_name}}_past_5_day{% endfor %}
+ FROM events_users AS EUD{% for feature in short_list_features %}
+ FULL OUTER JOIN rolling_scroll_50_past_days AS {{feature.feature_name}}
+ ON EUD.user_pseudo_id = A.user_pseudo_id{% endfor %}
+ -- This filters the results to include only rows where the user_pseudo_id is not null.
+ WHERE EUD.user_pseudo_id IS NOT NULL
+ ;
\ No newline at end of file
diff --git a/sql/query/invoke_backfill_user_rolling_window_lifetime_metrics.sqlx b/sql/query/invoke_backfill_user_rolling_window_lifetime_metrics.sqlx
index 2ee219f1..b4a0a415 100644
--- a/sql/query/invoke_backfill_user_rolling_window_lifetime_metrics.sqlx
+++ b/sql/query/invoke_backfill_user_rolling_window_lifetime_metrics.sqlx
@@ -283,7 +283,50 @@ AND D.device_os IS NOT NULL
-- This code is part of a larger process for building a machine learning model that predicts
-- user behavior based on their past activity. The features generated by this code can be used
-- as input to the model, helping it learn patterns and make predictions.
-INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}`
+INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}` (
+ processed_timestamp,
+ feature_date,
+ user_pseudo_id,
+ active_users_past_1_30_day,
+ active_users_past_30_60_day,
+ active_users_past_60_90_day,
+ active_users_past_90_120_day,
+ active_users_past_120_150_day,
+ active_users_past_150_180_day,
+ purchases_past_1_30_day,
+ purchases_past_30_60_day,
+ purchases_past_60_90_day,
+ purchases_past_90_120_day,
+ purchases_past_120_150_day,
+ purchases_past_150_180_day,
+ visits_past_1_30_day,
+ visits_past_30_60_day,
+ visits_past_60_90_day,
+ visits_past_90_120_day,
+ visits_past_120_150_day,
+ visits_past_150_180_day,
+ view_items_past_1_30_day,
+ view_items_past_30_60_day,
+ view_items_past_60_90_day,
+ view_items_past_90_120_day,
+ view_items_past_120_150_day,
+ view_items_past_150_180_day,
+ add_to_carts_past_1_30_day,
+ add_to_carts_past_30_60_day,
+ add_to_carts_past_60_90_day,
+ add_to_carts_past_90_120_day,
+ add_to_carts_past_120_150_day,
+ add_to_carts_past_150_180_day,
+ checkouts_past_1_30_day,
+ checkouts_past_30_60_day,
+ checkouts_past_60_90_day,
+ checkouts_past_90_120_day,
+ checkouts_past_120_150_day,
+ checkouts_past_150_180_day,
+ ltv_revenue_past_1_30_day,
+ ltv_revenue_past_30_90_day,
+ ltv_revenue_past_90_180_day
+)
SELECT DISTINCT
-- This selects the current timestamp and assigns it to the column processed_timestamp.
CURRENT_TIMESTAMP() AS processed_timestamp,
diff --git a/sql/query/invoke_backfill_user_rolling_window_metrics.sqlx b/sql/query/invoke_backfill_user_rolling_window_metrics.sqlx
index 9317225a..be0a0860 100644
--- a/sql/query/invoke_backfill_user_rolling_window_metrics.sqlx
+++ b/sql/query/invoke_backfill_user_rolling_window_metrics.sqlx
@@ -272,7 +272,65 @@ CREATE OR REPLACE TEMP TABLE events_users as (
-- table and several temporary tables containing rolling window features. The resulting data
-- represents user-level features for each user and date, capturing their past activity within
-- different time windows.
-INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}`
+INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}` (
+ processed_timestamp,
+ feature_date,
+ user_pseudo_id,
+ active_users_past_1_day,
+ active_users_past_2_day,
+ active_users_past_3_day,
+ active_users_past_4_day,
+ active_users_past_5_day,
+ active_users_past_6_day,
+ active_users_past_7_day,
+ active_users_past_8_14_day,
+ active_users_past_15_30_day,
+ purchases_past_1_day,
+ purchases_past_2_day,
+ purchases_past_3_day,
+ purchases_past_4_day,
+ purchases_past_5_day,
+ purchases_past_6_day,
+ purchases_past_7_day,
+ purchases_past_8_14_day,
+ purchases_past_15_30_day,
+ visits_past_1_day,
+ visits_past_2_day,
+ visits_past_3_day,
+ visits_past_4_day,
+ visits_past_5_day,
+ visits_past_6_day,
+ visits_past_7_day,
+ visits_past_8_14_day,
+ visits_past_15_30_day,
+ view_items_past_1_day,
+ view_items_past_2_day,
+ view_items_past_3_day,
+ view_items_past_4_day,
+ view_items_past_5_day,
+ view_items_past_6_day,
+ view_items_past_7_day,
+ view_items_past_8_14_day,
+ view_items_past_15_30_day,
+ add_to_carts_past_1_day,
+ add_to_carts_past_2_day,
+ add_to_carts_past_3_day,
+ add_to_carts_past_4_day,
+ add_to_carts_past_5_day,
+ add_to_carts_past_6_day,
+ add_to_carts_past_7_day,
+ add_to_carts_past_8_14_day,
+ add_to_carts_past_15_30_day,
+ checkouts_past_1_day,
+ checkouts_past_2_day,
+ checkouts_past_3_day,
+ checkouts_past_4_day,
+ checkouts_past_5_day,
+ checkouts_past_6_day,
+ checkouts_past_7_day,
+ checkouts_past_8_14_day,
+ checkouts_past_15_30_day
+)
SELECT DISTINCT
-- This selects the current timestamp and assigns it to the column processed_timestamp.
CURRENT_TIMESTAMP() AS processed_timestamp,
diff --git a/sql/query/invoke_backfill_user_scoped_lifetime_metrics.sqlx b/sql/query/invoke_backfill_user_scoped_lifetime_metrics.sqlx
index bfb93869..ed4bf30e 100644
--- a/sql/query/invoke_backfill_user_scoped_lifetime_metrics.sqlx
+++ b/sql/query/invoke_backfill_user_scoped_lifetime_metrics.sqlx
@@ -163,7 +163,35 @@ CREATE OR REPLACE TEMP TABLE first_purchasers as (
);
-- This SQL code calculates various user engagement and revenue metrics at a daily level and inserts the results into a target table. It leverages several temporary tables created earlier in the script to aggregate data efficiently.
-INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}`
+INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}` (
+ processed_timestamp,
+ feature_date,
+ lifetime_purchasers_users,
+ lifetime_average_daily_purchasers,
+ lifetime_active_users,
+ lifetime_DAU,
+ lifetime_MAU,
+ lifetime_WAU,
+ lifetime_dau_per_mau,
+ lifetime_dau_per_wau,
+ lifetime_wau_per_mau,
+ lifetime_users_engagement_duration_seconds,
+ lifetime_average_engagement_time,
+ lifetime_average_engagement_time_per_session,
+ lifetime_average_sessions_per_user,
+ lifetime_ARPPU,
+ lifetime_ARPU,
+ lifetime_average_daily_revenue,
+ lifetime_max_daily_revenue,
+ lifetime_min_daily_revenue,
+ lifetime_new_users,
+ lifetime_returning_users,
+ lifetime_first_time_purchasers,
+ lifetime_first_time_purchaser_conversion,
+ lifetime_first_time_purchasers_per_new_user,
+ lifetime_avg_user_conversion_rate,
+ lifetime_avg_session_conversion_rate
+)
SELECT
-- Records the current timestamp when the query is executed.
CURRENT_TIMESTAMP() AS processed_timestamp,
diff --git a/sql/query/invoke_backfill_user_scoped_metrics.sqlx b/sql/query/invoke_backfill_user_scoped_metrics.sqlx
index 3cc45b49..c5252519 100644
--- a/sql/query/invoke_backfill_user_scoped_metrics.sqlx
+++ b/sql/query/invoke_backfill_user_scoped_metrics.sqlx
@@ -183,7 +183,35 @@ CREATE OR REPLACE TEMP TABLE new_users_ as (
);
-- Insert data into the target table after calculating various user engagement and revenue metrics.
-INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}`
+INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}` (
+ processed_timestamp,
+ feature_date,
+ purchasers_users,
+ average_daily_purchasers,
+ active_users,
+ DAU,
+ MAU,
+ WAU,
+ dau_per_mau,
+ dau_per_wau,
+ wau_per_mau,
+ users_engagement_duration_seconds,
+ average_engagement_time,
+ average_engagement_time_per_session,
+ average_sessions_per_user,
+ ARPPU,
+ ARPU,
+ average_daily_revenue,
+ max_daily_revenue,
+ min_daily_revenue,
+ new_users,
+ returning_users,
+ first_time_purchasers,
+ first_time_purchaser_conversion,
+ first_time_purchasers_per_new_user,
+ avg_user_conversion_rate,
+ avg_session_conversion_rate
+)
SELECT DISTINCT
-- Record the current timestamp when the query is executed.
CURRENT_TIMESTAMP() AS processed_timestamp,
diff --git a/sql/query/invoke_backfill_user_scoped_segmentation_metrics.sqlx b/sql/query/invoke_backfill_user_scoped_segmentation_metrics.sqlx
index c6f03aaa..251dfead 100644
--- a/sql/query/invoke_backfill_user_scoped_segmentation_metrics.sqlx
+++ b/sql/query/invoke_backfill_user_scoped_segmentation_metrics.sqlx
@@ -136,7 +136,35 @@ GROUP BY feature_date
);
-- This SQL code calculates various user engagement and revenue metrics at a daily level and inserts the results into a target table. It leverages several temporary tables created earlier in the script to aggregate data efficiently.
-INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}`
+INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}` (
+ processed_timestamp,
+ feature_date,
+ purchasers_users,
+ average_daily_purchasers,
+ active_users,
+ DAU,
+ MAU,
+ WAU,
+ dau_per_mau,
+ dau_per_wau,
+ wau_per_mau,
+ users_engagement_duration_seconds,
+ average_engagement_time,
+ average_engagement_time_per_session,
+ average_sessions_per_user,
+ ARPPU,
+ ARPU,
+ average_daily_revenue,
+ max_daily_revenue,
+ min_daily_revenue,
+ new_users,
+ returning_users,
+ first_time_purchasers,
+ first_time_purchaser_conversion,
+ first_time_purchasers_per_new_user,
+ avg_user_conversion_rate,
+ avg_session_conversion_rate
+)
SELECT
-- Records the current timestamp when the query is executed.
CURRENT_TIMESTAMP() AS processed_timestamp,
diff --git a/sql/query/invoke_backfill_user_segmentation_dimensions.sqlx b/sql/query/invoke_backfill_user_segmentation_dimensions.sqlx
index be402415..cf2dc7ff 100644
--- a/sql/query/invoke_backfill_user_segmentation_dimensions.sqlx
+++ b/sql/query/invoke_backfill_user_segmentation_dimensions.sqlx
@@ -95,7 +95,31 @@ CREATE OR REPLACE TEMP TABLE events_users as (
-- This code snippet performs a complex aggregation and insertion operation. It combines data from two temporary tables,
-- calculates various user-level dimensions, and inserts the aggregated results into a target table. The use of window functions,
-- approximate aggregation, and careful joining ensures that the query is efficient and produces meaningful insights from the data.
-INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}`
+INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}` (
+ processed_timestamp,
+ feature_date,
+ user_pseudo_id,
+ user_id,
+ user_ltv_revenue,
+ device_category,
+ device_mobile_brand_name,
+ device_mobile_model_name,
+ device_os,
+ device_language,
+ device_web_browser,
+ geo_sub_continent,
+ geo_country,
+ geo_region,
+ geo_city,
+ geo_metro,
+ last_traffic_source_medium,
+ last_traffic_source_name,
+ last_traffic_source_source,
+ first_traffic_source_medium,
+ first_traffic_source_name,
+ first_traffic_source_source,
+ has_signed_in_with_user_id
+)
-- The DISTINCT keyword ensures that only unique rows are inserted, eliminating any potential duplicates.
SELECT DISTINCT
CURRENT_TIMESTAMP() AS processed_timestamp,
diff --git a/sql/query/invoke_backfill_user_session_event_aggregated_metrics.sqlx b/sql/query/invoke_backfill_user_session_event_aggregated_metrics.sqlx
index 7ba0e2f7..4c6f3373 100644
--- a/sql/query/invoke_backfill_user_session_event_aggregated_metrics.sqlx
+++ b/sql/query/invoke_backfill_user_session_event_aggregated_metrics.sqlx
@@ -354,7 +354,45 @@ CREATE OR REPLACE TEMP TABLE events_users_days as (
-- user_events_per_day_event_scoped (UEPDES): Contains user-level event metrics aggregated on a daily basis. Metrics include add_to_carts, cart_to_view_rate, checkouts, ecommerce_purchases, etc.
-- repeated_purchase (R): Stores information about whether a user has made previous purchases, indicated by the how_many_purchased_before column.
-- cart_to_purchase (CP): Contains a flag (has_abandoned_cart) indicating whether a user abandoned their cart on a given day.
-INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}`
+INSERT INTO `{{project_id}}.{{dataset}}.{{insert_table}}` (
+ processed_timestamp,
+ feature_date,
+ user_pseudo_id,
+ engagement_rate,
+ engaged_sessions_per_user,
+ session_conversion_rate,
+ bounces,
+ bounce_rate_per_user,
+ sessions_per_user,
+ avg_views_per_session,
+ sum_engagement_time_seconds,
+ avg_engagement_time_seconds,
+ new_visits,
+ returning_visits,
+ add_to_carts,
+ cart_to_view_rate,
+ checkouts,
+ ecommerce_purchases,
+ ecommerce_quantity,
+ ecommerce_revenue,
+ item_revenue,
+ item_quantity,
+ item_refund_amount,
+ item_view_events,
+ items_clicked_in_promotion,
+ items_clicked_in_list,
+ items_checked_out,
+ items_added_to_cart,
+ item_list_click_events,
+ item_list_view_events,
+ purchase_revenue,
+ purchase_to_view_rate,
+ refunds,
+ transactions_per_purchaser,
+ user_conversion_rate,
+ how_many_purchased_before,
+ has_abandoned_cart
+)
SELECT
CURRENT_TIMESTAMP() AS processed_timestamp,
EUD.feature_date,
diff --git a/sql/query/invoke_churn_propensity_training_preparation.sqlx b/sql/query/invoke_churn_propensity_training_preparation.sqlx
index 632fb03b..10a48ef4 100644
--- a/sql/query/invoke_churn_propensity_training_preparation.sqlx
+++ b/sql/query/invoke_churn_propensity_training_preparation.sqlx
@@ -57,14 +57,14 @@ SET churners = (SELECT COUNT(DISTINCT user_pseudo_id)
);
-- Setting Training Dates
--- If there are churners in the training set, then keep the user-defined dates, or else set
--- the start and end dates instead.
+-- If there are churners in the training set, then keep the calculated dates, or else set
+-- the start and end dates to a fixed interval preventing `train_start_date` and `train_end_date` from being NULL.
IF churners > 0 THEN
- SET train_start_date = GREATEST(train_start_date, min_date);
- SET train_end_date = LEAST(train_end_date, max_date);
-ELSE
SET train_start_date = min_date;
SET train_end_date = max_date;
+ELSE
+ SET train_start_date = DATE_SUB(CURRENT_DATE(), INTERVAL 3 YEAR);
+ SET train_end_date = DATE_SUB(CURRENT_DATE(), INTERVAL 5 DAY);
END IF;
-- Finally, the script calls a stored procedure, passing the adjusted training dates and split numbers as arguments.
diff --git a/sql/query/invoke_customer_lifetime_value_training_preparation.sqlx b/sql/query/invoke_customer_lifetime_value_training_preparation.sqlx
index 597dcac8..cfbad806 100644
--- a/sql/query/invoke_customer_lifetime_value_training_preparation.sqlx
+++ b/sql/query/invoke_customer_lifetime_value_training_preparation.sqlx
@@ -54,17 +54,18 @@ SET validation_split_end_number = {{validation_split_end_number}};
-- IF there are no users in the time interval selected, then set "train_start_date" and "train_end_date" as "max_date" and "min_date".
SET purchasers = (SELECT COUNT(DISTINCT user_pseudo_id)
FROM `{{mds_project_id}}.{{mds_dataset}}.event`
- WHERE event_date BETWEEN train_start_date AND train_end_date
+ WHERE event_date BETWEEN min_date AND max_date
);
--- If there are purchasers no changes to the train_start_date and train_end_date
--- Else, expand the interval, hopefully a purchaser will be in the interval
+-- Setting Training Dates
+-- If there are churners in the training set, then keep the calculated dates, or else set
+-- the start and end dates to a fixed interval preventing `train_start_date` and `train_end_date` from being NULL.
IF purchasers > 0 THEN
- SET train_start_date = train_start_date;
- SET train_end_date = train_end_date;
-ELSE
SET train_start_date = min_date;
SET train_end_date = max_date;
+ELSE
+ SET train_start_date = DATE_SUB(CURRENT_DATE(), INTERVAL 3 YEAR);
+ SET train_end_date = DATE_SUB(CURRENT_DATE(), INTERVAL 5 DAY);
END IF;
-- Finally, the script calls a stored procedure, passing the adjusted training dates and split numbers as arguments. This stored procedure likely handles the actual data preparation for the model.
diff --git a/sql/query/invoke_lead_score_propensity_inference_preparation.sqlx b/sql/query/invoke_lead_score_propensity_inference_preparation.sqlx
new file mode 100644
index 00000000..54e937d7
--- /dev/null
+++ b/sql/query/invoke_lead_score_propensity_inference_preparation.sqlx
@@ -0,0 +1,23 @@
+-- Copyright 2023 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- http://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- This script determines the current date and then passes it as an argument to a
+-- stored procedure in your BigQuery project. This pattern is commonly used when
+-- you want a stored procedure to perform operations or calculations that are
+-- relevant to the current date, such as data processing, analysis, or reporting tasks.
+
+DECLARE inference_date DATE DEFAULT NULL;
+SET inference_date = CURRENT_DATE();
+
+CALL `{{project_id}}.{{dataset}}.{{stored_procedure}}`(inference_date);
diff --git a/sql/query/invoke_lead_score_propensity_label.sqlx b/sql/query/invoke_lead_score_propensity_label.sqlx
new file mode 100644
index 00000000..f4288278
--- /dev/null
+++ b/sql/query/invoke_lead_score_propensity_label.sqlx
@@ -0,0 +1,39 @@
+-- Copyright 2023 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- http://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- This script sets up a date range, calls a stored procedure with this range and a variable to
+-- store a result, and then returns the result of the stored procedure. This pattern is common
+-- for orchestrating data processing tasks within BigQuery using stored procedures.
+
+DECLARE input_date DATE;
+DECLARE end_date DATE;
+DECLARE users_added INT64 DEFAULT NULL;
+
+SET end_date= CURRENT_DATE();
+SET input_date= (SELECT DATE_SUB(end_date, INTERVAL {{interval_input_date}} DAY));
+
+-- This code block ensures that the end_date used in subsequent operations is not later than one day after the latest available data in
+-- the specified events table. This prevents potential attempts to process data for a date range that extends beyond the actual data availability.
+IF (SELECT DATE_SUB(end_date, INTERVAL 1 DAY)) > (SELECT MAX(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`) THEN
+ SET end_date = (SELECT DATE_ADD(MAX(event_date), INTERVAL 1 DAY) FROM `{{mds_project_id}}.{{mds_dataset}}.event`);
+END IF;
+
+-- This code block ensures that the input_date used in subsequent operations is not before the earliest available data in the
+-- specified events table. This prevents potential errors or unexpected behavior that might occur when trying to process data
+-- for a date range that precedes the actual data availability.
+IF input_date < (SELECT MIN(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`) THEN
+ SET input_date = (SELECT DATE_ADD(MIN(event_date), INTERVAL 1 DAY) FROM `{{mds_project_id}}.{{mds_dataset}}.event`);
+END IF;
+
+CALL `{{project_id}}.{{dataset}}.{{stored_procedure}}`(input_date, end_date, users_added);
\ No newline at end of file
diff --git a/sql/query/invoke_lead_score_propensity_training_preparation.sqlx b/sql/query/invoke_lead_score_propensity_training_preparation.sqlx
new file mode 100644
index 00000000..3d515348
--- /dev/null
+++ b/sql/query/invoke_lead_score_propensity_training_preparation.sqlx
@@ -0,0 +1,73 @@
+-- Copyright 2023 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- http://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- This script intelligently determines the optimal date range for training a purchase
+-- propensity model by considering user-defined parameters and the availability of purchase
+-- events within the dataset. It ensures that the training data includes purchase events if
+-- they exist within the specified bounds.
+
+-- Intended start and end dates for training data
+-- Initializing Training Dates
+DECLARE train_start_date DATE DEFAULT NULL;
+DECLARE train_end_date DATE DEFAULT NULL;
+
+-- Control data splitting for training and validation (likely used in a subsequent process).
+DECLARE train_split_end_number INT64 DEFAULT NULL;
+DECLARE validation_split_end_number INT64 DEFAULT NULL;
+
+-- Will store the count of distinct users who made a {{target_event}} within a given period.
+DECLARE {{target_event}}_users INT64 DEFAULT NULL;
+
+-- Used to store the maximum and minimum event dates from the source data.
+DECLARE max_date DATE;
+DECLARE min_date DATE;
+
+-- Determining Maximum and Minimum Dates
+SET max_date = (SELECT DATE_SUB(MAX(event_date), INTERVAL {{interval_max_date}} DAY) FROM `{{mds_project_id}}.{{mds_dataset}}.event`);
+SET min_date = (SELECT DATE_ADD(MIN(event_date), INTERVAL {{interval_min_date}} DAY) FROM `{{mds_project_id}}.{{mds_dataset}}.event`);
+
+-- If min_date > maximum event_date OR max_date < minimum event_date, then set min_date for the min event_date and set max_date for the max event_date
+IF min_date >= (SELECT MAX(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`) OR max_date <= (SELECT MIN(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`) OR min_date >= max_date THEN
+ SET min_date = (SELECT MIN(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`);
+ SET max_date = (SELECT MAX(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`);
+END IF;
+
+-- Setting Split Numbers
+-- Sets the train_split_end_number to a user-defined value. This value likely determines the proportion of data used for training.
+SET train_split_end_number = {{train_split_end_number}}; -- If you want 60% for training use number 5. If you want 80% use number 7.
+-- Sets the validation_split_end_number to a user-defined value, controlling the proportion of data used for validation.
+SET validation_split_end_number = {{validation_split_end_number}};
+
+-- This crucial step counts distinct users who have an event named '{{target_event}}' within the initially set training date range.
+-- IF there are no users with {{target_event}} event in the time interval selected, then set "train_start_date" and "train_end_date" as "max_date" and "min_date".
+SET {{target_event}}_users = (SELECT COUNT(DISTINCT user_pseudo_id)
+ FROM `{{mds_project_id}}.{{mds_dataset}}.event`
+ WHERE event_name = '{{target_event}}' AND
+ event_date BETWEEN min_date AND max_date
+ );
+
+-- Setting Training Dates
+-- If there are {{target_event}}_users in the training set, then keep the calculated dates, or else set
+-- the start and end dates to a fixed interval preventing `train_start_date` and `train_end_date` from being NULL.
+IF {{target_event}}_users > 0 THEN
+ SET train_start_date = min_date;
+ SET train_end_date = max_date;
+ELSE
+ SET train_start_date = DATE_SUB(CURRENT_DATE(), INTERVAL 3 YEAR);
+ SET train_end_date = DATE_SUB(CURRENT_DATE(), INTERVAL 5 DAY);
+END IF;
+
+-- Finally, the script calls a stored procedure, passing the adjusted training dates and split numbers as arguments. This stored procedure
+-- handles the actual data preparation for the lead score propensity model.
+CALL `{{project_id}}.{{dataset}}.{{stored_procedure}}`(train_start_date, train_end_date, train_split_end_number, validation_split_end_number);
diff --git a/sql/query/invoke_purchase_propensity_training_preparation.sqlx b/sql/query/invoke_purchase_propensity_training_preparation.sqlx
index 4d2eab86..b8738465 100644
--- a/sql/query/invoke_purchase_propensity_training_preparation.sqlx
+++ b/sql/query/invoke_purchase_propensity_training_preparation.sqlx
@@ -54,17 +54,18 @@ SET validation_split_end_number = {{validation_split_end_number}};
SET purchasers = (SELECT COUNT(DISTINCT user_pseudo_id)
FROM `{{mds_project_id}}.{{mds_dataset}}.event`
WHERE event_name = 'purchase' AND
- event_date BETWEEN train_start_date AND train_end_date
+ event_date BETWEEN min_date AND max_date
);
--- If there are purchasers no changes to the train_start_date and train_end_date
--- Else, expand the interval, hopefully a purchaser will be in the interval
+-- Setting Training Dates
+-- If there are purchasers in the training set, then keep the calculated dates, or else set
+-- the start and end dates to a fixed interval preventing `train_start_date` and `train_end_date` from being NULL.
IF purchasers > 0 THEN
- SET train_start_date = GREATEST(train_start_date, min_date);
- SET train_end_date = LEAST(train_end_date, max_date);
-ELSE
SET train_start_date = min_date;
SET train_end_date = max_date;
+ELSE
+ SET train_start_date = DATE_SUB(CURRENT_DATE(), INTERVAL 3 YEAR);
+ SET train_end_date = DATE_SUB(CURRENT_DATE(), INTERVAL 5 DAY);
END IF;
-- Finally, the script calls a stored procedure, passing the adjusted training dates and split numbers as arguments. This stored procedure
diff --git a/sql/query/invoke_user_rolling_window_lead_metrics.sqlx b/sql/query/invoke_user_rolling_window_lead_metrics.sqlx
new file mode 100644
index 00000000..e469a2d7
--- /dev/null
+++ b/sql/query/invoke_user_rolling_window_lead_metrics.sqlx
@@ -0,0 +1,28 @@
+-- Copyright 2023 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- http://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- This script sets up a date range, calls a stored procedure with this range and a variable to
+-- store a result, and then returns the result of the stored procedure. This pattern is common
+-- for orchestrating data processing tasks within BigQuery using stored procedures.
+
+DECLARE input_date DATE;
+DECLARE end_date DATE;
+DECLARE users_added INT64 DEFAULT NULL;
+
+SET input_date= CURRENT_DATE();
+SET end_date= (SELECT DATE_SUB(input_date, INTERVAL {{interval_end_date}} DAY));
+
+CALL `{{project_id}}.{{dataset}}.{{stored_procedure}}`(input_date, end_date, users_added);
+
+SELECT users_added;
\ No newline at end of file
diff --git a/sql/schema/table/lead_score_propensity_inference_preparation.json b/sql/schema/table/lead_score_propensity_inference_preparation.json
new file mode 100644
index 00000000..5fc9e6ec
--- /dev/null
+++ b/sql/schema/table/lead_score_propensity_inference_preparation.json
@@ -0,0 +1,337 @@
+[
+ {
+ "name": "user_pseudo_id",
+ "type": "STRING",
+ "description": "The user pseudo identifier"
+ },
+ {
+ "name": "user_id",
+ "type": "STRING",
+ "description": "The user identifier when the user is logged in"
+ },
+ {
+ "name": "feature_date",
+ "type": "DATE",
+ "description": "Date that serves as the basis for the calculation of the features"
+ },
+ {
+ "name": "user_ltv_revenue",
+ "type": "FLOAT",
+ "description": "The current customer lifetime value revenue of the user"
+ },
+ {
+ "name": "device_category",
+ "type": "STRING",
+ "description": "The device category the user last accessed"
+ },
+ {
+ "name": "device_mobile_brand_name",
+ "type": "STRING",
+ "description": "The device mobile brand name the user last accessed"
+ },
+ {
+ "name": "device_mobile_model_name",
+ "type": "STRING",
+ "description": "The device mobile model name the user last accessed"
+ },
+ {
+ "name": "device_os",
+ "type": "STRING",
+ "description": "The device operating system the user last accessed"
+ },
+ {
+ "name": "device_language",
+ "type": "STRING",
+ "description": "The device language the user last accessed"
+ },
+ {
+ "name": "device_web_browser",
+ "type": "STRING",
+ "description": "The device web browser the user last accessed"
+ },
+ {
+ "name": "geo_sub_continent",
+ "type": "STRING",
+ "description": "The geographic subcontinent the user last accessed from"
+ },
+ {
+ "name": "geo_country",
+ "type": "STRING",
+ "description": "The geographic country the user last accessed from"
+ },
+ {
+ "name": "geo_region",
+ "type": "STRING",
+ "description": "The geographic region the user last accessed from"
+ },
+ {
+ "name": "geo_city",
+ "type": "STRING",
+ "description": "The geographic city the user last accessed from"
+ },
+ {
+ "name": "geo_metro",
+ "type": "STRING",
+ "description": "The geographic metropolitan area the user last accessed from"
+ },
+ {
+ "name": "last_traffic_source_medium",
+ "type": "STRING",
+ "description": "The last traffic source medium the user has been acquired"
+ },
+ {
+ "name": "last_traffic_source_name",
+ "type": "STRING",
+ "description": "The last traffic source name the user has been acquired"
+ },
+ {
+ "name": "last_traffic_source_source",
+ "type": "STRING",
+ "description": "The last traffic source source the user has been acquired"
+ },
+ {
+ "name": "first_traffic_source_medium",
+ "type": "STRING",
+ "description": "The first traffic source medium the user has been acquired"
+ },
+ {
+ "name": "first_traffic_source_name",
+ "type": "STRING",
+ "description": "The first traffic source name the user has been acquired"
+ },
+ {
+ "name": "first_traffic_source_source",
+ "type": "STRING",
+ "description": "The first traffic source source the user has been acquired"
+ },
+ {
+ "name": "has_signed_in_with_user_id",
+ "type": "BOOLEAN",
+ "description": "A boolean indicating whether the user has signed in with an user id"
+ },
+ {
+ "name": "scroll_50_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has been active in the past 1 day"
+ },
+ {
+ "name": "scroll_50_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has been active in the past 2nd day"
+ },
+ {
+ "name": "scroll_50_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has been active in the past 3rd day"
+ },
+ {
+ "name": "scroll_50_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has been active in the past 4th day"
+ },
+ {
+ "name": "scroll_50_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has been active in the past 5th day"
+ },
+ {
+ "name": "scroll_90_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has scrolled 90p pages in the past day"
+ },
+ {
+ "name": "scroll_90_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has scrolled 90p pages in the past 2nd day"
+ },
+ {
+ "name": "scroll_90_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has scrolled 90p pages in the past 3rd day"
+ },
+ {
+ "name": "scroll_90_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has scrolled 90p pages in the past 4th day"
+ },
+ {
+ "name": "scroll_90_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has scrolled 90p pages in the past 5th day"
+ },
+ {
+ "name": "view_search_results_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past day"
+ },
+ {
+ "name": "view_search_results_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 2nd day"
+ },
+ {
+ "name": "view_search_results_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 3rd day"
+ },
+ {
+ "name": "view_search_results_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 4th day"
+ },
+ {
+ "name": "view_search_results_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 5th day"
+ },
+ {
+ "name": "file_download_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past day"
+ },
+ {
+ "name": "file_download_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 2nd day"
+ },
+ {
+ "name": "file_download_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 3rd day"
+ },
+ {
+ "name": "file_download_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 4th day"
+ },
+ {
+ "name": "file_download_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 5th day"
+ },
+ {
+ "name": "recipe_add_to_list_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has viewed items in the past day"
+ },
+ {
+ "name": "recipe_add_to_list_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has viewed items in the past 2nd day"
+ },
+ {
+ "name": "recipe_add_to_list_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has viewed items in the past 3rd day"
+ },
+ {
+ "name": "recipe_add_to_list_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has viewed items in the past 4th day"
+ },
+ {
+ "name": "recipe_add_to_list_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has viewed items in the past 5th day"
+ },
+ {
+ "name": "recipe_print_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has added items to cart in the past day"
+ },
+ {
+ "name": "recipe_print_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has added items to cart in the past 2nd day"
+ },
+ {
+ "name": "recipe_print_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has added items to cart in the past 3rd day"
+ },
+ {
+ "name": "recipe_print_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has added items to cart in the past 4th day"
+ },
+ {
+ "name": "recipe_print_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has added items to cart in the past 5th day"
+ },
+ {
+ "name": "sign_up_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past day"
+ },
+ {
+ "name": "sign_up_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 2nd day"
+ },
+ {
+ "name": "sign_up_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 3rd day"
+ },
+ {
+ "name": "sign_up_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 4th day"
+ },
+ {
+ "name": "sign_up_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 5th day"
+ },
+ {
+ "name": "recipe_favorite_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past day"
+ },
+ {
+ "name": "recipe_favorite_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 2nd day"
+ },
+ {
+ "name": "recipe_favorite_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 3rd day"
+ },
+ {
+ "name": "recipe_favorite_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 4th day"
+ },
+ {
+ "name": "recipe_favorite_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 5th day"
+ },
+ {
+ "name": "recipe_add_to_menu_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past day"
+ },
+ {
+ "name": "recipe_add_to_menu_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 2nd day"
+ },
+ {
+ "name": "recipe_add_to_menu_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 3rd day"
+ },
+ {
+ "name": "recipe_add_to_menu_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 4th day"
+ },
+ {
+ "name": "recipe_add_to_menu_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 5th day"
+ }
+]
\ No newline at end of file
diff --git a/sql/schema/table/lead_score_propensity_label.json b/sql/schema/table/lead_score_propensity_label.json
new file mode 100644
index 00000000..8b63bc6f
--- /dev/null
+++ b/sql/schema/table/lead_score_propensity_label.json
@@ -0,0 +1,22 @@
+[
+ {
+ "name": "processed_timestamp",
+ "type": "TIMESTAMP",
+ "description": "Timestamp of when the data was processed"
+ },
+ {
+ "name": "feature_date",
+ "type": "DATE",
+ "description": "The date serving as basis for the features calculation"
+ },
+ {
+ "name": "user_pseudo_id",
+ "type": "STRING",
+ "description": "The user pseudo identifier"
+ },
+ {
+ "name": "login_day_1",
+ "type": "INTEGER",
+ "description": "Predicted number of logins by the user in the next 1st day from the feature date"
+ }
+]
\ No newline at end of file
diff --git a/sql/schema/table/lead_score_propensity_training_preparation.json b/sql/schema/table/lead_score_propensity_training_preparation.json
new file mode 100644
index 00000000..f5647417
--- /dev/null
+++ b/sql/schema/table/lead_score_propensity_training_preparation.json
@@ -0,0 +1,352 @@
+[
+ {
+ "name": "processed_timestamp",
+ "type": "TIMESTAMP",
+ "description": "Timestamp of when the data was processed"
+ },
+ {
+ "name": "data_split",
+ "type": "STRING",
+ "description": "The indication of whether the row should be used for TRAINING, VALIDATION or TESTING"
+ },
+ {
+ "name": "feature_date",
+ "type": "DATE",
+ "description": "The date serving as basis for the features calculation"
+ },
+ {
+ "name": "user_pseudo_id",
+ "type": "STRING",
+ "description": "The user pseudo identifier"
+ },
+ {
+ "name": "user_id",
+ "type": "STRING",
+ "description": "The user identifier of when the user has logged in"
+ },
+ {
+ "name": "user_ltv_revenue",
+ "type": "FLOAT",
+ "description": "The current user lifetime value"
+ },
+ {
+ "name": "device_category",
+ "type": "STRING",
+ "description": "The device category of the user last used to access"
+ },
+ {
+ "name": "device_mobile_brand_name",
+ "type": "STRING",
+ "description": "The device mobile brand name last used by the user"
+ },
+ {
+ "name": "device_mobile_model_name",
+ "type": "STRING",
+ "description": "The device mobile model name last used by the user"
+ },
+ {
+ "name": "device_os",
+ "type": "STRING",
+ "description": "The device operating system last used by the user"
+ },
+ {
+ "name": "device_language",
+ "type": "STRING",
+ "description": "The device language last used by the user"
+ },
+ {
+ "name": "device_web_browser",
+ "type": "STRING",
+ "description": "The device web browser last used by the user"
+ },
+ {
+ "name": "geo_sub_continent",
+ "type": "STRING",
+ "description": "The geographic subcontinent from the user last access"
+ },
+ {
+ "name": "geo_country",
+ "type": "STRING",
+ "description": "The geographic country from the user last access"
+ },
+ {
+ "name": "geo_region",
+ "type": "STRING",
+ "description": "The geographic region from the user last access"
+ },
+ {
+ "name": "geo_city",
+ "type": "STRING",
+ "description": "The geographic city from the user last access"
+ },
+ {
+ "name": "geo_metro",
+ "type": "STRING",
+ "description": "The geographic metropolitan area from the user user last access"
+ },
+ {
+ "name": "last_traffic_source_medium",
+ "type": "STRING",
+ "description": "The last traffic source medium from where the user was acquired"
+ },
+ {
+ "name": "last_traffic_source_name",
+ "type": "STRING",
+ "description": "The last traffic source name from where the user was acquired"
+ },
+ {
+ "name": "last_traffic_source_source",
+ "type": "STRING",
+ "description": "The last traffic source soure from where the user was acquired"
+ },
+ {
+ "name": "first_traffic_source_medium",
+ "type": "STRING",
+ "description": "The first traffic source medium from where the user was acquired"
+ },
+ {
+ "name": "first_traffic_source_name",
+ "type": "STRING",
+ "description": "The first traffic source name from where the user was acquired"
+ },
+ {
+ "name": "first_traffic_source_source",
+ "type": "STRING",
+ "description": "The first traffic source source from where the user was acquired"
+ },
+ {
+ "name": "has_signed_in_with_user_id",
+ "type": "BOOLEAN",
+ "description": "A boolean indicating whether the user has signed in with the user id"
+ },
+ {
+ "name": "scroll_50_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has been active in the past 1 day"
+ },
+ {
+ "name": "scroll_50_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has been active in the past 2nd day"
+ },
+ {
+ "name": "scroll_50_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has been active in the past 3rd day"
+ },
+ {
+ "name": "scroll_50_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has been active in the past 4th day"
+ },
+ {
+ "name": "scroll_50_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has been active in the past 5th day"
+ },
+ {
+ "name": "scroll_90_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has scrolled 90p pages in the past day"
+ },
+ {
+ "name": "scroll_90_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has scrolled 90p pages in the past 2nd day"
+ },
+ {
+ "name": "scroll_90_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has scrolled 90p pages in the past 3rd day"
+ },
+ {
+ "name": "scroll_90_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has scrolled 90p pages in the past 4th day"
+ },
+ {
+ "name": "scroll_90_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has scrolled 90p pages in the past 5th day"
+ },
+ {
+ "name": "view_search_results_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past day"
+ },
+ {
+ "name": "view_search_results_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 2nd day"
+ },
+ {
+ "name": "view_search_results_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 3rd day"
+ },
+ {
+ "name": "view_search_results_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 4th day"
+ },
+ {
+ "name": "view_search_results_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 5th day"
+ },
+ {
+ "name": "file_download_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past day"
+ },
+ {
+ "name": "file_download_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 2nd day"
+ },
+ {
+ "name": "file_download_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 3rd day"
+ },
+ {
+ "name": "file_download_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 4th day"
+ },
+ {
+ "name": "file_download_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 5th day"
+ },
+ {
+ "name": "recipe_add_to_list_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has viewed items in the past day"
+ },
+ {
+ "name": "recipe_add_to_list_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has viewed items in the past 2nd day"
+ },
+ {
+ "name": "recipe_add_to_list_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has viewed items in the past 3rd day"
+ },
+ {
+ "name": "recipe_add_to_list_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has viewed items in the past 4th day"
+ },
+ {
+ "name": "recipe_add_to_list_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has viewed items in the past 5th day"
+ },
+ {
+ "name": "recipe_print_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has added items to cart in the past day"
+ },
+ {
+ "name": "recipe_print_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has added items to cart in the past 2nd day"
+ },
+ {
+ "name": "recipe_print_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has added items to cart in the past 3rd day"
+ },
+ {
+ "name": "recipe_print_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has added items to cart in the past 4th day"
+ },
+ {
+ "name": "recipe_print_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has added items to cart in the past 5th day"
+ },
+ {
+ "name": "sign_up_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past day"
+ },
+ {
+ "name": "sign_up_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 2nd day"
+ },
+ {
+ "name": "sign_up_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 3rd day"
+ },
+ {
+ "name": "sign_up_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 4th day"
+ },
+ {
+ "name": "sign_up_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 5th day"
+ },
+ {
+ "name": "recipe_favorite_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past day"
+ },
+ {
+ "name": "recipe_favorite_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 2nd day"
+ },
+ {
+ "name": "recipe_favorite_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 3rd day"
+ },
+ {
+ "name": "recipe_favorite_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 4th day"
+ },
+ {
+ "name": "recipe_favorite_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 5th day"
+ },
+ {
+ "name": "recipe_add_to_menu_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past day"
+ },
+ {
+ "name": "recipe_add_to_menu_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 2nd day"
+ },
+ {
+ "name": "recipe_add_to_menu_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 3rd day"
+ },
+ {
+ "name": "recipe_add_to_menu_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 4th day"
+ },
+ {
+ "name": "recipe_add_to_menu_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 5th day"
+ },
+ {
+ "name": "will_login",
+ "type": "INTEGER",
+ "description": "A boolean indicating whether the user will login in the next period"
+ }
+]
\ No newline at end of file
diff --git a/sql/schema/table/purchase_propensity_inference_preparation.json b/sql/schema/table/purchase_propensity_inference_preparation.json
index 0fe328b2..2f8b1256 100644
--- a/sql/schema/table/purchase_propensity_inference_preparation.json
+++ b/sql/schema/table/purchase_propensity_inference_preparation.json
@@ -109,161 +109,6 @@
"type": "BOOLEAN",
"description": "A boolean indicating whether the user has signed in with an user id"
},
- {
- "name": "engagement_rate",
- "type": "FLOAT",
- "description": "The percentage of sessions that were engaged sessions. Engagement rate = engaged sessions / total sessions Engagement rate is the inverse of bounce rate"
- },
- {
- "name": "engaged_sessions_per_user",
- "type": "INTEGER",
- "description": "The number of engaged sessions per user"
- },
- {
- "name": "session_conversion_rate",
- "type": "FLOAT",
- "description": "The session conversion rate is calculated by dividing the number of sessions with a conversion event by the total number of sessions"
- },
- {
- "name": "bounces",
- "type": "INTEGER",
- "description": "The number of not engaged sessions"
- },
- {
- "name": "bounce_rate_per_user",
- "type": "FLOAT",
- "description": "The percentage of sessions that were not engaged sessions per user. Bounce rate = not engaged sessions / total sessions Bounce rate is the inverse of engagement rate"
- },
- {
- "name": "sessions_per_user",
- "type": "INTEGER",
- "description": "The number of sessions per user"
- },
- {
- "name": "avg_views_per_session",
- "type": "FLOAT",
- "description": "The average number of views per sessions"
- },
- {
- "name": "sum_engagement_time_seconds",
- "type": "FLOAT",
- "description": "The sum of time that your website was in focus in a user's browser or an app was in the foreground of a user's device in seconds per user"
- },
- {
- "name": "avg_engagement_time_seconds",
- "type": "FLOAT",
- "description": "The average time that your website was in focus in a user's browser or an app was in the foreground of a user's device. Average engagement time = total user engagement durations / number of active users"
- },
- {
- "name": "new_visits",
- "type": "INTEGER",
- "description": "The number of times your users opened your website for the first time"
- },
- {
- "name": "returning_visits",
- "type": "INTEGER",
- "description": "The number of users who have initiated at least one previous session, regardless of whether or not the previous sessions were engaged sessions"
- },
- {
- "name": "add_to_carts",
- "type": "INTEGER",
- "description": "The number of times users added items to their shopping carts"
- },
- {
- "name": "cart_to_view_rate",
- "type": "FLOAT",
- "description": "The number of times users added items to their shopping carts divided by the the number of mobile app screens or web pages your users saw. Repeated views of a single screen or page are counted"
- },
- {
- "name": "checkouts",
- "type": "INTEGER",
- "description": "The number of times users started the checkout process"
- },
- {
- "name": "ecommerce_purchases",
- "type": "INTEGER",
- "description": "The number of purchases on your website or app"
- },
- {
- "name": "ecommerce_quantity",
- "type": "INTEGER",
- "description": "The number of units for an ecommerce event"
- },
- {
- "name": "ecommerce_revenue",
- "type": "FLOAT",
- "description": "The sum of revenue from purchases made on your website or app, minus any refunds given. Purchase revenue = purchases + in-app purchases + subscriptions - refund"
- },
- {
- "name": "item_revenue",
- "type": "FLOAT",
- "description": "The total revenue from items only minus refunds, excluding tax and shipping"
- },
- {
- "name": "item_quantity",
- "type": "INTEGER",
- "description": "The number of units for a single item included in ecommerce events"
- },
- {
- "name": "item_view_events",
- "type": "INTEGER",
- "description": "The number of times an item was viewed"
- },
- {
- "name": "items_clicked_in_promotion",
- "type": "INTEGER",
- "description": "The number of items that the customer clicked in a promotion"
- },
- {
- "name": "items_clicked_in_list",
- "type": "INTEGER",
- "description": "The number of items that the customer clicked in a list of items"
- },
- {
- "name": "items_checked_out",
- "type": "INTEGER",
- "description": "The number of times the user has checked out"
- },
- {
- "name": "items_added_to_cart",
- "type": "INTEGER",
- "description": "The number of times the user has added items to cart"
- },
- {
- "name": "item_list_view_events",
- "type": "INTEGER",
- "description": "The number of times the user has viewed items in list"
- },
- {
- "name": "purchase_revenue",
- "type": "FLOAT",
- "description": "The total revenue from purchases, in-app purchases, subscriptions, and ad revenue. Total revenue = purchases + in-app purchases + subscriptions + ad revenue - refunds"
- },
- {
- "name": "purchase_to_view_rate",
- "type": "FLOAT",
- "description": "The number of purchases on your website or app divided by the number of mobile app screens or web pages your users saw"
- },
- {
- "name": "transactions_per_purchaser",
- "type": "FLOAT",
- "description": "The average number of purchases per buyer for the selected time frame"
- },
- {
- "name": "user_conversion_rate",
- "type": "FLOAT",
- "description": "The number of users who performed a conversion action divided by the total number of users"
- },
- {
- "name": "how_many_purchased_before",
- "type": "INTEGER",
- "description": "The number of times the user have purchased before"
- },
- {
- "name": "has_abandoned_cart",
- "type": "BOOLEAN",
- "description": "a boolean indicating whether the user has abandoned a cart in the past"
- },
{
"name": "active_users_past_1_day",
"type": "INTEGER",
diff --git a/sql/schema/table/purchase_propensity_predictions_placeholder.json b/sql/schema/table/purchase_propensity_predictions_placeholder.json
new file mode 100644
index 00000000..39651f90
--- /dev/null
+++ b/sql/schema/table/purchase_propensity_predictions_placeholder.json
@@ -0,0 +1,26 @@
+[
+ {
+ "name": "prediction",
+ "type": "STRING"
+ },
+ {
+ "name": "prediction_prob",
+ "type": "FLOAT"
+ },
+ {
+ "name": "processed_timestamp",
+ "type": "TIMESTAMP"
+ },
+ {
+ "name": "feature_date",
+ "type": "DATE"
+ },
+ {
+ "name": "user_pseudo_id",
+ "type": "STRING"
+ },
+ {
+ "name": "user_id",
+ "type": "STRING"
+ }
+]
\ No newline at end of file
diff --git a/sql/schema/table/purchase_propensity_training_preparation.json b/sql/schema/table/purchase_propensity_training_preparation.json
index e5d284d5..f984f42e 100644
--- a/sql/schema/table/purchase_propensity_training_preparation.json
+++ b/sql/schema/table/purchase_propensity_training_preparation.json
@@ -119,161 +119,6 @@
"type": "BOOLEAN",
"description": "A boolean indicating whether the user has signed in with the user id"
},
- {
- "name": "engagement_rate",
- "type": "FLOAT",
- "description": "The percentage of sessions that were engaged sessions. Engagement rate = engaged sessions / total sessions Engagement rate is the inverse of bounce rate"
- },
- {
- "name": "engaged_sessions_per_user",
- "type": "INTEGER",
- "description": "The number of engaged sessions per user"
- },
- {
- "name": "session_conversion_rate",
- "type": "FLOAT",
- "description": "The session conversion rate is calculated by dividing the number of sessions with a conversion event by the total number of sessions"
- },
- {
- "name": "bounces",
- "type": "INTEGER",
- "description": "The number of not engaged sessions"
- },
- {
- "name": "bounce_rate_per_user",
- "type": "FLOAT",
- "description": "The percentage of sessions that were not engaged sessions per user. Bounce rate = not engaged sessions / total sessions Bounce rate is the inverse of engagement rate"
- },
- {
- "name": "sessions_per_user",
- "type": "INTEGER",
- "description": "The number of sessions per user"
- },
- {
- "name": "avg_views_per_session",
- "type": "FLOAT",
- "description": "The average number of views per sessions"
- },
- {
- "name": "sum_engagement_time_seconds",
- "type": "FLOAT",
- "description": "The sum of time that your website was in focus in a user's browser or an app was in the foreground of a user's device in seconds per user"
- },
- {
- "name": "avg_engagement_time_seconds",
- "type": "FLOAT",
- "description": "The average time that your website was in focus in a user's browser or an app was in the foreground of a user's device. Average engagement time = total user engagement durations / number of active users"
- },
- {
- "name": "new_visits",
- "type": "INTEGER",
- "description": "The number of times your users opened your website for the first time"
- },
- {
- "name": "returning_visits",
- "type": "INTEGER",
- "description": "The number of users who have initiated at least one previous session, regardless of whether or not the previous sessions were engaged sessions"
- },
- {
- "name": "add_to_carts",
- "type": "INTEGER",
- "description": "The number of times users added items to their shopping carts"
- },
- {
- "name": "cart_to_view_rate",
- "type": "FLOAT",
- "description": "The number of times users added items to their shopping carts divided by the the number of mobile app screens or web pages your users saw. Repeated views of a single screen or page are counted"
- },
- {
- "name": "checkouts",
- "type": "INTEGER",
- "description": "The number of times users started the checkout process"
- },
- {
- "name": "ecommerce_purchases",
- "type": "INTEGER",
- "description": "The number of purchases on your website or app"
- },
- {
- "name": "ecommerce_quantity",
- "type": "INTEGER",
- "description": "The number of units for an ecommerce event"
- },
- {
- "name": "ecommerce_revenue",
- "type": "FLOAT",
- "description": "The sum of revenue from purchases made on your website or app, minus any refunds given. Purchase revenue = purchases + in-app purchases + subscriptions - refund"
- },
- {
- "name": "item_revenue",
- "type": "FLOAT",
- "description": "The total revenue from items only minus refunds, excluding tax and shipping"
- },
- {
- "name": "item_quantity",
- "type": "INTEGER",
- "description": "The number of units for a single item included in ecommerce events"
- },
- {
- "name": "item_view_events",
- "type": "INTEGER",
- "description": "The number of times an item was viewed"
- },
- {
- "name": "items_clicked_in_promotion",
- "type": "INTEGER",
- "description": "The number of items that the customer clicked in a promotion"
- },
- {
- "name": "items_clicked_in_list",
- "type": "INTEGER",
- "description": "The number of items that the customer clicked in a list of items"
- },
- {
- "name": "items_checked_out",
- "type": "INTEGER",
- "description": "The number of times the user has checked out"
- },
- {
- "name": "items_added_to_cart",
- "type": "INTEGER",
- "description": "The number of times the user has added items to cart"
- },
- {
- "name": "item_list_view_events",
- "type": "INTEGER",
- "description": "The number of times the user has viewed items in list"
- },
- {
- "name": "purchase_revenue",
- "type": "FLOAT",
- "description": "The total revenue from purchases, in-app purchases, subscriptions, and ad revenue. Total revenue = purchases + in-app purchases + subscriptions + ad revenue - refunds"
- },
- {
- "name": "purchase_to_view_rate",
- "type": "FLOAT",
- "description": "The number of purchases on your website or app divided by the number of mobile app screens or web pages your users saw"
- },
- {
- "name": "transactions_per_purchaser",
- "type": "FLOAT",
- "description": "The average number of purchases per buyer for the selected time frame"
- },
- {
- "name": "user_conversion_rate",
- "type": "FLOAT",
- "description": "The number of users who performed a conversion action divided by the total number of users"
- },
- {
- "name": "how_many_purchased_before",
- "type": "INTEGER",
- "description": "The number of times the user have purchased before"
- },
- {
- "name": "has_abandoned_cart",
- "type": "BOOLEAN",
- "description": "a boolean indicating whether the user has abandoned a cart in the past"
- },
{
"name": "active_users_past_1_day",
"type": "INTEGER",
@@ -544,131 +389,6 @@
"type": "INTEGER",
"description": "The number of times the user has checked out in the past 15 to 30 days"
},
- {
- "name": "purchasers_users",
- "type": "INTEGER",
- "description": "The number of distinct users who have purchases in the past"
- },
- {
- "name": "average_daily_purchasers",
- "type": "FLOAT",
- "description": "The average number of purchasers across all the days in the selected time frame"
- },
- {
- "name": "active_users",
- "type": "INTEGER",
- "description": "The number of distinct users who visited your website or application. An active user is any user who has an engaged session or when Analytics collects: the first_visit event or engagement_time_msec parameter from a website the first_open event or engagement_time_msec parameter from an Android app the first_open or user_engagement event from an iOS app"
- },
- {
- "name": "DAU",
- "type": "FLOAT",
- "description": "The number of users who engaged for the calendar day"
- },
- {
- "name": "MAU",
- "type": "FLOAT",
- "description": "The number of users who engaged in the last 30 days"
- },
- {
- "name": "WAU",
- "type": "FLOAT",
- "description": "The number of users who engaged in the last week"
- },
- {
- "name": "dau_per_mau",
- "type": "FLOAT",
- "description": "Daily Active Users (DAU) / Monthly Active Users (MAU) shows the percentage of users who engaged for the calendar day out of the users who engaged in the last 30 days. A higher ratio suggests good engagement and user retention"
- },
- {
- "name": "dau_per_wau",
- "type": "FLOAT",
- "description": "Daily Active Users (DAU) / Weekly Active Users (WAU) shows the percentage of users who engaged in the last 24 hours out of the users who engaged in the last 7 days. A higher ratio suggests good engagement and user retention"
- },
- {
- "name": "wau_per_mau",
- "type": "FLOAT",
- "description": "Weekly Active Users (DAU) / Monthly Active Users (MAU) shows the percentage of users who engaged in the last 7 days out of the users who engaged in the last 30 days. A higher ratio suggests good engagement and user retention"
- },
- {
- "name": "users_engagement_duration_seconds",
- "type": "FLOAT",
- "description": "The length of time that your app screen was in the foreground or your web page was in focus in seconds"
- },
- {
- "name": "average_engagement_time",
- "type": "FLOAT",
- "description": "The average time that your website was in focus in a user's browser or an app was in the foreground of a user's device. Average engagement time = total user engagement durations / number of active users"
- },
- {
- "name": "average_engagement_time_per_session",
- "type": "FLOAT",
- "description": "The average engagement time per session"
- },
- {
- "name": "average_sessions_per_user",
- "type": "FLOAT",
- "description": "The average number of sessions per user"
- },
- {
- "name": "ARPPU",
- "type": "FLOAT",
- "description": "Average revenue per paying user (ARPPU) is the total purchase revenue per active user who made a purchase"
- },
- {
- "name": "ARPU",
- "type": "FLOAT",
- "description": "Average revenue per active user (ARPU) is the total revenue generated on average from each active user, whether they made a purchase or not. ARPU = (Total ad revenue + purchase revenue + in-app purchase revenue + subscriptions) / Active users"
- },
- {
- "name": "average_daily_revenue",
- "type": "FLOAT",
- "description": "Average daily revenue The average total revenue for a day over the selected time frame"
- },
- {
- "name": "max_daily_revenue",
- "type": "FLOAT",
- "description": "The maximum total revenue for a day over the selected time frame"
- },
- {
- "name": "min_daily_revenue",
- "type": "FLOAT",
- "description": "The minimum total revenue for a day over the selected time frame"
- },
- {
- "name": "new_users",
- "type": "INTEGER",
- "description": "The number of new unique user IDs that logged the first_open or first_visit event. The metric allows you to measure the number of users who interacted with your site or launched your app for the first time"
- },
- {
- "name": "returning_users",
- "type": "INTEGER",
- "description": "The number of users who have initiated at least one previous session, regardless of whether or not the previous sessions were engaged sessions"
- },
- {
- "name": "first_time_purchasers",
- "type": "INTEGER",
- "description": "The number of users who made their first purchase in the selected time frame."
- },
- {
- "name": "first_time_purchaser_conversion",
- "type": "FLOAT",
- "description": "The percentage of active users who made their first purchase. This metric is returned as a fraction; for example, 0.092 means 9.2% of active users were first-time purchasers"
- },
- {
- "name": "first_time_purchasers_per_new_user",
- "type": "FLOAT",
- "description": "The average number of first-time purchasers per new user"
- },
- {
- "name": "avg_user_conversion_rate",
- "type": "FLOAT",
- "description": "The average number of converting user per total users"
- },
- {
- "name": "avg_session_conversion_rate",
- "type": "FLOAT",
- "description": "The average number of converting session per total sessions"
- },
{
"name": "will_purchase",
"type": "INTEGER",
diff --git a/sql/schema/table/user_rolling_window_lead_metrics.json b/sql/schema/table/user_rolling_window_lead_metrics.json
new file mode 100644
index 00000000..e22d0ceb
--- /dev/null
+++ b/sql/schema/table/user_rolling_window_lead_metrics.json
@@ -0,0 +1,242 @@
+[
+ {
+ "name": "processed_timestamp",
+ "type": "TIMESTAMP",
+ "description": "Timestamp of when the data was processed"
+ },
+ {
+ "name": "feature_date",
+ "type": "DATE",
+ "description": "The date serving as basis for the features calculation"
+ },
+ {
+ "name": "user_pseudo_id",
+ "type": "STRING",
+ "description": "The user pseudo identifier"
+ },
+ {
+ "name": "scroll_50_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has been active in the past 1 day"
+ },
+ {
+ "name": "scroll_50_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has been active in the past 2nd day"
+ },
+ {
+ "name": "scroll_50_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has been active in the past 3rd day"
+ },
+ {
+ "name": "scroll_50_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has been active in the past 4th day"
+ },
+ {
+ "name": "scroll_50_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has been active in the past 5th day"
+ },
+ {
+ "name": "scroll_90_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has purchased in the past day"
+ },
+ {
+ "name": "scroll_90_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has purchased in the past 2nd day"
+ },
+ {
+ "name": "scroll_90_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has purchased in the past 3rd day"
+ },
+ {
+ "name": "scroll_90_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has purchased in the past 4th day"
+ },
+ {
+ "name": "scroll_90_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has purchased in the past 5th day"
+ },
+ {
+ "name": "view_search_results_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past day"
+ },
+ {
+ "name": "view_search_results_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 2nd day"
+ },
+ {
+ "name": "view_search_results_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 3rd day"
+ },
+ {
+ "name": "view_search_results_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 4th day"
+ },
+ {
+ "name": "view_search_results_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 5th day"
+ },
+ {
+ "name": "file_download_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past day"
+ },
+ {
+ "name": "file_download_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 2nd day"
+ },
+ {
+ "name": "file_download_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 3rd day"
+ },
+ {
+ "name": "file_download_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 4th day"
+ },
+ {
+ "name": "file_download_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has visited in the past 5th day"
+ },
+ {
+ "name": "recipe_add_to_list_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has viewed items in the past day"
+ },
+ {
+ "name": "recipe_add_to_list_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has viewed items in the past 2nd day"
+ },
+ {
+ "name": "recipe_add_to_list_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has viewed items in the past 3rd day"
+ },
+ {
+ "name": "recipe_add_to_list_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has viewed items in the past 4th day"
+ },
+ {
+ "name": "recipe_add_to_list_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has viewed items in the past 5th day"
+ },
+ {
+ "name": "recipe_print_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has added items to cart in the past day"
+ },
+ {
+ "name": "recipe_print_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has added items to cart in the past 2nd day"
+ },
+ {
+ "name": "recipe_print_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has added items to cart in the past 3rd day"
+ },
+ {
+ "name": "recipe_print_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has added items to cart in the past 4th day"
+ },
+ {
+ "name": "recipe_print_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has added items to cart in the past 5th day"
+ },
+ {
+ "name": "sign_up_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past day"
+ },
+ {
+ "name": "sign_up_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 2nd day"
+ },
+ {
+ "name": "sign_up_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 3rd day"
+ },
+ {
+ "name": "sign_up_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 4th day"
+ },
+ {
+ "name": "sign_up_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 5th day"
+ },
+ {
+ "name": "recipe_favorite_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past day"
+ },
+ {
+ "name": "recipe_favorite_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 2nd day"
+ },
+ {
+ "name": "recipe_favorite_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 3rd day"
+ },
+ {
+ "name": "recipe_favorite_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 4th day"
+ },
+ {
+ "name": "recipe_favorite_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 5th day"
+ },
+ {
+ "name": "recipe_add_to_menu_past_1_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past day"
+ },
+ {
+ "name": "recipe_add_to_menu_past_2_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 2nd day"
+ },
+ {
+ "name": "recipe_add_to_menu_past_3_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 3rd day"
+ },
+ {
+ "name": "recipe_add_to_menu_past_4_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 4th day"
+ },
+ {
+ "name": "recipe_add_to_menu_past_5_day",
+ "type": "INTEGER",
+ "description": "The number of times the user has checked out in the past 5th day"
+ }
+]
\ No newline at end of file
diff --git a/sql/schema/table/vbb_activation_configuration.json b/sql/schema/table/vbb_activation_configuration.json
new file mode 100644
index 00000000..f0cbf242
--- /dev/null
+++ b/sql/schema/table/vbb_activation_configuration.json
@@ -0,0 +1,17 @@
+[
+ {
+ "name": "activation_type",
+ "type": "STRING",
+ "description": "Specifies the type of activation, e.g., purchase-propensity"
+ },
+ {
+ "name": "decile",
+ "type": "INTEGER",
+ "description": "Represents the decile number (1-10) for the prediction"
+ },
+ {
+ "name": "value",
+ "type": "FLOAT",
+ "description": "The monetary value multiplier for the given decile, relative to the average transaction value"
+ }
+]
diff --git a/templates/activation_query/audience_segmentation_query_template.sqlx b/templates/activation_query/audience_segmentation_query_template.sqlx
index 40c8c9a5..89eec5e0 100644
--- a/templates/activation_query/audience_segmentation_query_template.sqlx
+++ b/templates/activation_query/audience_segmentation_query_template.sqlx
@@ -1,8 +1,9 @@
SELECT
- a.prediction AS a_s_prediction,
+ a.prediction AS user_prop_a_s_prediction,
b.user_pseudo_id AS client_id,
b.user_id AS user_id,
- b.ga_session_id AS session_id,
+ b.ga_session_id AS event_param_session_id,
+ '100' AS event_param_engagement_time_msec,
CASE WHEN EXTRACT(MICROSECOND FROM b.event_timestamp) = 1 THEN b.event_timestamp ELSE TIMESTAMP_SUB(b.event_timestamp, INTERVAL 1 MICROSECOND) END AS inference_date
FROM
`${mds_project_id}.marketing_ga4_v1_${mds_dataset_suffix}.latest_event_per_user_last_72_hours` b,
diff --git a/templates/activation_query/auto_audience_segmentation_query_template.sqlx b/templates/activation_query/auto_audience_segmentation_query_template.sqlx
index 5b6c0eef..d4f0c02a 100644
--- a/templates/activation_query/auto_audience_segmentation_query_template.sqlx
+++ b/templates/activation_query/auto_audience_segmentation_query_template.sqlx
@@ -1,12 +1,13 @@
SELECT
- a.prediction AS a_a_s_prediction,
+ a.prediction AS user_prop_a_a_s_prediction,
b.user_pseudo_id AS client_id,
b.user_id AS user_id,
- b.ga_session_id AS session_id,
+ b.ga_session_id AS event_param_session_id,
+ '100' AS event_param_engagement_time_msec,
CASE WHEN EXTRACT(MICROSECOND FROM b.event_timestamp) = 1 THEN b.event_timestamp ELSE TIMESTAMP_SUB(b.event_timestamp, INTERVAL 1 MICROSECOND) END AS inference_date
FROM
`${mds_project_id}.marketing_ga4_v1_${mds_dataset_suffix}.latest_event_per_user_last_72_hours` b,
`{{source_table}}` a
WHERE
- a.user_id = b.user_pseudo_id
+ a.user_pseudo_id = b.user_pseudo_id
AND a.prediction IS NOT NULL
diff --git a/templates/activation_query/churn_propensity_query_template.sqlx b/templates/activation_query/churn_propensity_query_template.sqlx
index 5ab39212..ae42604a 100644
--- a/templates/activation_query/churn_propensity_query_template.sqlx
+++ b/templates/activation_query/churn_propensity_query_template.sqlx
@@ -1,9 +1,10 @@
SELECT
- a.prediction AS c_p_prediction,
- NTILE(10) OVER (ORDER BY a.prediction_prob DESC) AS c_p_decile,
+ a.prediction AS user_prop_c_p_prediction,
+ NTILE(10) OVER (ORDER BY a.prediction_prob DESC) AS user_prop_c_p_decile,
b.user_pseudo_id AS client_id,
b.user_id AS user_id,
- b.ga_session_id AS session_id,
+ b.ga_session_id AS event_param_session_id,
+ '100' AS event_param_engagement_time_msec,
CASE WHEN EXTRACT(MICROSECOND FROM b.event_timestamp) = 1 THEN b.event_timestamp ELSE TIMESTAMP_SUB(b.event_timestamp, INTERVAL 1 MICROSECOND) END AS inference_date
FROM
`${mds_project_id}.marketing_ga4_v1_${mds_dataset_suffix}.latest_event_per_user_last_72_hours` b,
diff --git a/templates/activation_query/cltv_query_template.sqlx b/templates/activation_query/cltv_query_template.sqlx
index 3a94982d..bdd4bffd 100644
--- a/templates/activation_query/cltv_query_template.sqlx
+++ b/templates/activation_query/cltv_query_template.sqlx
@@ -1,8 +1,9 @@
SELECT
- NTILE(10) OVER (ORDER BY a.prediction DESC) AS cltv_decile,
+ NTILE(10) OVER (ORDER BY a.prediction DESC) AS user_prop_cltv_decile,
b.user_pseudo_id AS client_id,
b.user_id AS user_id,
- b.ga_session_id AS session_id,
+ b.ga_session_id AS event_param_session_id,
+ '100' AS event_param_engagement_time_msec,
CASE WHEN EXTRACT(MICROSECOND FROM b.event_timestamp) = 1 THEN b.event_timestamp ELSE TIMESTAMP_SUB(b.event_timestamp, INTERVAL 1 MICROSECOND) END AS inference_date
FROM
`${mds_project_id}.marketing_ga4_v1_${mds_dataset_suffix}.latest_event_per_user_last_72_hours` b,
diff --git a/templates/activation_query/lead_score_propensity_query_template.sqlx b/templates/activation_query/lead_score_propensity_query_template.sqlx
new file mode 100644
index 00000000..5ad0b874
--- /dev/null
+++ b/templates/activation_query/lead_score_propensity_query_template.sqlx
@@ -0,0 +1,14 @@
+SELECT
+ a.prediction AS user_prop_l_s_p_prediction,
+ NTILE(10) OVER (ORDER BY a.prediction_prob DESC) AS user_prop_l_s_p_decile,
+ b.user_pseudo_id AS client_id,
+ b.user_id AS user_id,
+ b.ga_session_id AS event_param_session_id,
+ '100' AS event_param_engagement_time_msec,
+ CASE WHEN EXTRACT(MICROSECOND FROM b.event_timestamp) = 1 THEN b.event_timestamp ELSE TIMESTAMP_SUB(b.event_timestamp, INTERVAL 1 MICROSECOND) END AS inference_date
+FROM
+ `${mds_project_id}.marketing_ga4_v1_${mds_dataset_suffix}.latest_event_per_user_last_72_hours` b,
+ `{{source_table}}` a
+WHERE
+ COALESCE(a.user_id, "") = COALESCE(b.user_id, "")
+ AND a.user_pseudo_id = b.user_pseudo_id
diff --git a/templates/activation_query/lead_score_propensity_vbb_query_template.sqlx b/templates/activation_query/lead_score_propensity_vbb_query_template.sqlx
new file mode 100644
index 00000000..9be0e0a9
--- /dev/null
+++ b/templates/activation_query/lead_score_propensity_vbb_query_template.sqlx
@@ -0,0 +1,35 @@
+WITH user_prediction_decile AS (
+ SELECT
+ a.prediction AS l_s_p_prediction,
+ NTILE(10) OVER (ORDER BY a.prediction_prob DESC) AS l_s_p_decile,
+ b.user_pseudo_id AS client_id,
+ b.user_id AS user_id,
+ b.ga_session_id AS session_id,
+ CASE
+ WHEN EXTRACT(MICROSECOND FROM b.event_timestamp) = 1 THEN b.event_timestamp
+ ELSE TIMESTAMP_SUB(b.event_timestamp, INTERVAL 1 MICROSECOND)
+ END AS inference_date
+ FROM
+ `${mds_project_id}.marketing_ga4_v1_${mds_dataset_suffix}.latest_event_per_user_last_24_hours` b,
+ `{{source_table}}` a
+ WHERE
+ COALESCE(a.user_id, "") = COALESCE(b.user_id, "")
+ AND a.user_pseudo_id = b.user_pseudo_id)
+SELECT
+ a.l_s_p_prediction AS user_prop_l_s_p_prediction,
+ a.l_s_p_decile AS user_prop_l_s_p_decile,
+ b.value AS event_param_value,
+ 'USD' AS event_param_currency,
+ a.client_id,
+ a.user_id,
+ a.session_id AS event_param_session_id,
+ a.inference_date
+FROM
+ user_prediction_decile AS a
+LEFT JOIN
+ `${activation_project_id}.${dataset}.vbb_activation_configuration` AS b
+ON
+ a.l_s_p_decile = b.decile
+WHERE
+ b.activation_type = 'lead-score-propensity'
+AND b.value > 0
\ No newline at end of file
diff --git a/templates/activation_query/purchase_propensity_query_template.sqlx b/templates/activation_query/purchase_propensity_query_template.sqlx
index 40fe5c40..985edf03 100644
--- a/templates/activation_query/purchase_propensity_query_template.sqlx
+++ b/templates/activation_query/purchase_propensity_query_template.sqlx
@@ -1,9 +1,10 @@
SELECT
- a.prediction AS p_p_prediction,
- NTILE(10) OVER (ORDER BY a.prediction_prob DESC) AS p_p_decile,
+ a.prediction AS user_prop_p_p_prediction,
+ NTILE(10) OVER (ORDER BY a.prediction_prob DESC) AS user_prop_p_p_decile,
b.user_pseudo_id AS client_id,
b.user_id AS user_id,
- b.ga_session_id AS session_id,
+ b.ga_session_id AS event_param_session_id,
+ '100' AS event_param_engagement_time_msec,
CASE WHEN EXTRACT(MICROSECOND FROM b.event_timestamp) = 1 THEN b.event_timestamp ELSE TIMESTAMP_SUB(b.event_timestamp, INTERVAL 1 MICROSECOND) END AS inference_date
FROM
`${mds_project_id}.marketing_ga4_v1_${mds_dataset_suffix}.latest_event_per_user_last_72_hours` b,
diff --git a/templates/activation_query/purchase_propensity_vbb_query_template.sqlx b/templates/activation_query/purchase_propensity_vbb_query_template.sqlx
new file mode 100644
index 00000000..f81fce9f
--- /dev/null
+++ b/templates/activation_query/purchase_propensity_vbb_query_template.sqlx
@@ -0,0 +1,35 @@
+WITH user_prediction_decile AS (
+ SELECT
+ a.prediction AS p_p_prediction,
+ NTILE(10) OVER (ORDER BY a.prediction_prob DESC) AS p_p_decile,
+ b.user_pseudo_id AS client_id,
+ b.user_id AS user_id,
+ b.ga_session_id AS session_id,
+ CASE
+ WHEN EXTRACT(MICROSECOND FROM b.event_timestamp) = 1 THEN b.event_timestamp
+ ELSE TIMESTAMP_SUB(b.event_timestamp, INTERVAL 1 MICROSECOND)
+ END AS inference_date
+ FROM
+ `${mds_project_id}.marketing_ga4_v1_${mds_dataset_suffix}.latest_event_per_user_last_24_hours` b,
+ `{{source_table}}` a
+ WHERE
+ COALESCE(a.user_id, "") = COALESCE(b.user_id, "")
+ AND a.user_pseudo_id = b.user_pseudo_id)
+SELECT
+ a.p_p_prediction AS user_prop_p_p_prediction,
+ a.p_p_decile AS user_prop_p_p_decile,
+ b.value AS event_param_value,
+ 'USD' AS event_param_currency,
+ a.client_id,
+ a.user_id,
+ a.session_id AS event_param_session_id,
+ a.inference_date
+FROM
+ user_prediction_decile AS a
+LEFT JOIN
+ `${activation_project_id}.${dataset}.vbb_activation_configuration` AS b
+ON
+ a.p_p_decile = b.decile
+WHERE
+ b.activation_type = 'purchase-propensity'
+AND b.value > 0
\ No newline at end of file
diff --git a/templates/activation_type_configuration_template.tpl b/templates/activation_type_configuration_template.tpl
index 22afddc3..913b70a2 100644
--- a/templates/activation_type_configuration_template.tpl
+++ b/templates/activation_type_configuration_template.tpl
@@ -1,47 +1,54 @@
{
"audience-segmentation-15": {
"activation_event_name": "maj_audience_segmentation_15",
- "source_query_template": "${audience_segmentation_query_template_gcs_path}",
- "measurement_protocol_payload_template": "${measurement_protocol_payload_template_gcs_path}"
+ "source_query_template": "${audience_segmentation_query_template_gcs_path}"
},
"auto-audience-segmentation-15": {
"activation_event_name": "maj_auto_audience_segmentation_15",
- "source_query_template": "${auto_audience_segmentation_query_template_gcs_path}",
- "measurement_protocol_payload_template": "${measurement_protocol_payload_template_gcs_path}"
+ "source_query_template": "${auto_audience_segmentation_query_template_gcs_path}"
},
"cltv-180-180": {
"activation_event_name": "maj_cltv_180_180",
- "source_query_template": "${cltv_query_template_gcs_path}",
- "measurement_protocol_payload_template": "${measurement_protocol_payload_template_gcs_path}"
+ "source_query_template": "${cltv_query_template_gcs_path}"
},
"cltv-180-90": {
"activation_event_name": "maj_cltv_180_90",
- "source_query_template": "${cltv_query_template_gcs_path}",
- "measurement_protocol_payload_template": "${measurement_protocol_payload_template_gcs_path}"
+ "source_query_template": "${cltv_query_template_gcs_path}"
},
"cltv-180-30": {
"activation_event_name": "maj_cltv_180_30",
- "source_query_template": "${cltv_query_template_gcs_path}",
- "measurement_protocol_payload_template": "${measurement_protocol_payload_template_gcs_path}"
+ "source_query_template": "${cltv_query_template_gcs_path}"
},
"purchase-propensity-30-15": {
"activation_event_name": "maj_purchase_propensity_30_15",
- "source_query_template": "${purchase_propensity_query_template_gcs_path}",
- "measurement_protocol_payload_template": "${measurement_protocol_payload_template_gcs_path}"
+ "source_query_template": "${purchase_propensity_query_template_gcs_path}"
+ },
+ "purchase-propensity-vbb-30-15": {
+ "activation_event_name": "maj_purchase_propensity_vbb_30_15",
+ "source_query_template": "${purchase_propensity_vbb_query_template_gcs_path}"
},
"purchase-propensity-15-15": {
"activation_event_name": "maj_purchase_propensity_15_15",
- "source_query_template": "${purchase_propensity_query_template_gcs_path}",
- "measurement_protocol_payload_template": "${measurement_protocol_payload_template_gcs_path}"
+ "source_query_template": "${purchase_propensity_query_template_gcs_path}"
},
"purchase-propensity-15-7": {
"activation_event_name": "maj_purchase_propensity_15_7",
- "source_query_template": "${purchase_propensity_query_template_gcs_path}",
- "measurement_protocol_payload_template": "${measurement_protocol_payload_template_gcs_path}"
+ "source_query_template": "${purchase_propensity_query_template_gcs_path}"
},
"churn-propensity-30-15": {
"activation_event_name": "maj_churn_propensity_30_15",
- "source_query_template": "${churn_propensity_query_template_gcs_path}",
- "measurement_protocol_payload_template": "${measurement_protocol_payload_template_gcs_path}"
+ "source_query_template": "${churn_propensity_query_template_gcs_path}"
+ },
+ "churn-propensity-15-15": {
+ "activation_event_name": "maj_churn_propensity_15_15",
+ "source_query_template": "${churn_propensity_query_template_gcs_path}"
+ },
+ "churn-propensity-15-7": {
+ "activation_event_name": "maj_churn_propensity_15_7",
+ "source_query_template": "${churn_propensity_query_template_gcs_path}"
+ },
+ "lead-score-propensity-30-15": {
+ "activation_event_name": "maj_lead_score_propensity_30_15",
+ "source_query_template": "${lead_score_propensity_query_template_gcs_path}"
}
-}
\ No newline at end of file
+}
diff --git a/templates/activation_user_import/lead_score_propensity_csv_export.sqlx b/templates/activation_user_import/lead_score_propensity_csv_export.sqlx
new file mode 100644
index 00000000..376cea56
--- /dev/null
+++ b/templates/activation_user_import/lead_score_propensity_csv_export.sqlx
@@ -0,0 +1,27 @@
+DECLARE
+ select_query STRING;
+SET
+ select_query = FORMAT("""
+ CREATE TEMPORARY TABLE tmp_selection AS
+ SELECT
+ user_pseudo_id AS client_id,
+ '${ga4_stream_id}' AS stream_id,
+ prediction AS l_s_p_prediction,
+ NTILE(10) OVER (ORDER BY prediction_prob DESC) AS l_s_p_decile
+ FROM `%s`
+ """, prediction_table_name);
+EXECUTE IMMEDIATE
+ select_query;
+EXPORT DATA
+ OPTIONS ( uri = 'gs://${export_bucket}/csv-export/lead_score_propensity-*.csv',
+ format = 'CSV',
+ OVERWRITE = TRUE,
+ header = TRUE,
+ field_delimiter = ',' ) AS (
+ SELECT
+ client_id,
+ stream_id,
+ l_s_p_prediction,
+ l_s_p_decile
+ FROM
+ tmp_selection );
diff --git a/templates/app_payload_template.jinja2 b/templates/app_payload_template.jinja2
deleted file mode 100644
index 33179784..00000000
--- a/templates/app_payload_template.jinja2
+++ /dev/null
@@ -1,20 +0,0 @@
-{
- "client_id": "{{client_id}}",
- {{user_id}}
- "timestamp_micros": "{{event_timestamp}}",
- "nonPersonalizedAds": false,
- "consent": {
- "ad_user_data": "GRANTED",
- "ad_personalization": "GRANTED"
- },
- "user_properties":
- {{user_properties}},
- "events": [
- {
- "name": "{{event_name}}",
- "params": {
- "session_id": "{{session_id}}"
- }
- }
- ]
-}
diff --git a/templates/load_vbb_activation_configuration.sql.tpl b/templates/load_vbb_activation_configuration.sql.tpl
new file mode 100644
index 00000000..b256e9ca
--- /dev/null
+++ b/templates/load_vbb_activation_configuration.sql.tpl
@@ -0,0 +1,33 @@
+-- Copyright 2023 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- http://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+-- Step 1: Load JSON data from GCS into the temporary table
+LOAD DATA OVERWRITE `${project_id}.${dataset}.temp_json_data`
+FROM FILES (
+ format = 'JSON',
+ uris = ['${config_file_uri}']
+);
+
+-- Step 2: Transform and load into the final table
+CREATE OR REPLACE TABLE `${project_id}.${dataset}.vbb_activation_configuration` AS
+ SELECT
+ t.activation_type AS activation_type,
+ dm.decile,
+ (t.value_norm * dm.multiplier) AS value
+ FROM
+ `${project_id}.${dataset}.temp_json_data` AS t,
+ UNNEST(t.decile_multiplier) AS dm;
+
+-- Step 3: Clean up temporary tables
+DROP TABLE `${project_id}.${dataset}.temp_json_data`;
diff --git a/templates/looker_studio_create_dashboard_url_template.txt b/templates/looker_studio_create_dashboard_url_template.txt
index 81eea9cb..5407bd49 100644
--- a/templates/looker_studio_create_dashboard_url_template.txt
+++ b/templates/looker_studio_create_dashboard_url_template.txt
@@ -1 +1 @@
-https://lookerstudio.google.com/reporting/create?c.reportId=${report_id}&c.explain=true&r.reportName=Marketing%20Analytics%20Sample&ds.GA4_sessions.connector=bigQuery&ds.GA4_sessions.type=TABLE&ds.GA4_sessions.tableId=session_date&ds.GA4_sessions.datasetId=${mds_ga4_product_dataset}&ds.GA4_sessions.projectId=${mds_project}&ds.GA4_sessions.datasourceName=MDS%20GA4%20Sessions&ds.GA4_session_device.connector=bigQuery&ds.GA4_session_device.type=TABLE&ds.GA4_session_device.tableId=session_device_daily_metrics&ds.GA4_session_device.datasetId=${mds_ga4_product_dataset}&ds.GA4_session_device.projectId=${mds_project}&ds.GA4_session_device.datasourceName=MDS%20GA4%20Session%20Device&ds.GA4_session_location.connector=bigQuery&ds.GA4_session_location.type=TABLE&ds.GA4_session_location.tableId=session_location_daily_metrics&ds.GA4_session_location.datasetId=${mds_ga4_product_dataset}&ds.GA4_session_location.projectId=${mds_project}&ds.GA4_session_location.datasourceName=MDS%20GA4%20Session%20Location&ds.GA4_event_page.connector=bigQuery&ds.GA4_event_page.type=TABLE&ds.GA4_event_page.tableId=event_page&ds.GA4_event_page.datasetId=${mds_ga4_product_dataset}&ds.GA4_event_page.projectId=${mds_project}&ds.GA4_event_page.datasourceName=MDS%20GA4%20Event%20Page&ds.GA4_unique_page_views.connector=bigQuery&ds.GA4_unique_page_views.type=TABLE&ds.GA4_unique_page_views.tableId=unique_page_views&ds.GA4_unique_page_views.datasetId=${mds_ga4_product_dataset}&ds.GA4_unique_page_views.projectId=${mds_project}&ds.GA4_unique_page_views.datasourceName=MDS%20GA4%20Unique%20Page%20Views&ds.GA4_page_session.connector=bigQuery&ds.GA4_page_session.type=TABLE&ds.GA4_page_session.tableId=page_session_daily_metrics&ds.GA4_page_session.datasetId=${mds_ga4_product_dataset}&ds.GA4_page_session.projectId=${mds_project}&ds.GA4_page_session.datasourceName=MDS%20GA4%20Page%20Session&ds.Ads_perf_conversions.connector=bigQuery&ds.Ads_perf_conversions.type=TABLE&ds.Ads_perf_conversions.tableId=ad_performance_conversions&ds.Ads_perf_conversions.datasetId=${mds_ads_product_dataset}&ds.Ads_perf_conversions.projectId=${mds_project}&ds.Ads_perf_conversions.datasourceName=MDS%20Ads%20Ad%20Performance%20x%20Conversions&ds.MAJ_resource_link.connector=bigQuery&ds.MAJ_resource_link.type=TABLE&ds.MAJ_resource_link.tableId=resource_link&ds.MAJ_resource_link.datasetId=maj_dashboard&ds.MAJ_resource_link.projectId=${monitor_project}&ds.MAJ_resource_link.datasourceName=MAJ%20Resource%20Link&ds.GA4_base_event.connector=bigQuery&ds.GA4_base_event.type=TABLE&ds.GA4_base_event.tableId=event&ds.GA4_base_event.datasetId=${mds_ga4_base_dataset}&ds.GA4_base_event.projectId=${mds_project}&ds.GA4_base_event.datasourceName=MDS%20GA4%20Base%20Event&ds.MDS_execution_log.connector=bigQuery&ds.MDS_execution_log.type=TABLE&ds.MDS_execution_log.tableId=${dataform_log_table_id}&ds.MDS_execution_log.datasetId=${logs_dataset}&ds.MDS_execution_log.projectId=${monitor_project}&ds.MDS_execution_log.datasourceName=MDS%20Execution%20Log&ds.Activation_log.connector=bigQuery&ds.Activation_log.type=TABLE&ds.Activation_log.tableId=${dataflow_log_table_id}&ds.Activation_log.datasetId=${logs_dataset}&ds.Activation_log.projectId=${monitor_project}&ds.Activation_log.datasourceName=Activation%20Execution%20Log&ds.Vertex_log.connector=bigQuery&ds.Vertex_log.type=TABLE&ds.Vertex_log.tableId=${vertex_pipelines_log_table_id}&ds.Vertex_log.datasetId=${logs_dataset}&ds.Vertex_log.projectId=${monitor_project}&ds.Vertex_log.datasourceName=Vertex%20AI%20Pipelines%20Log&ds.Aggregated_vbb_volume_daily.connector=bigQuery&ds.Aggregated_vbb_volume_daily.type=TABLE&ds.Aggregated_vbb_volume_daily.tableId=aggregated_value_based_bidding_volume_daily&ds.Aggregated_vbb_volume_daily.datasetId=${aggregated_vbb_dataset}&ds.Aggregated_vbb_volume_daily.projectId=${feature_store_project}&ds.Aggregated_vbb_volume_daily.datasourceName=Aggregated%20VBB%20Volume%20Daily&ds.Aggregated_vbb_volume_weekly.connector=bigQuery&ds.Aggregated_vbb_volume_weekly.type=TABLE&ds.Aggregated_vbb_volume_weekly.tableId=aggregated_value_based_bidding_volume_weekly&ds.Aggregated_vbb_volume_weekly.datasetId=${aggregated_vbb_dataset}&ds.Aggregated_vbb_volume_weekly.projectId=${feature_store_project}&ds.Aggregated_vbb_volume_weekly.datasourceName=Aggregated%20VBB%20Volume%20Weekly&ds.Aggregated_vbb_correlation.connector=bigQuery&ds.Aggregated_vbb_correlation.type=TABLE&ds.Aggregated_vbb_correlation.tableId=aggregated_value_based_bidding_correlation&ds.Aggregated_vbb_correlation.datasetId=${aggregated_vbb_dataset}&ds.Aggregated_vbb_correlation.projectId=${feature_store_project}&ds.Aggregated_vbb_correlation.datasourceName=Aggregated%20VBB%20Correlation&ds.Aggregated_vbb_weights.connector=bigQuery&ds.Aggregated_vbb_weights.type=TABLE&ds.Aggregated_vbb_weights.tableId=vbb_weights&ds.Aggregated_vbb_weights.datasetId=${aggregated_vbb_dataset}&ds.Aggregated_vbb_weights.projectId=${feature_store_project}&ds.Aggregated_vbb_weights.datasourceName=Aggregated%20VBB%20Weights&ds.Aggregated_predictions.connector=bigQuery&ds.Aggregated_predictions.type=TABLE&ds.Aggregated_predictions.tableId=latest&ds.Aggregated_predictions.datasetId=${aggregated_predictions_dataset}&ds.Aggregated_predictions.projectId=${feature_store_project}&ds.Aggregated_predictions.datasourceName=Aggregated%20Predictions&ds.User_behaviour_revenue_insights_daily.connector=bigQuery&ds.User_behaviour_revenue_insights_daily.type=TABLE&ds.User_behaviour_revenue_insights_daily.tableId=user_behaviour_revenue_insights_daily&ds.User_behaviour_revenue_insights_daily.datasetId=${gemini_insights_dataset}&ds.User_behaviour_revenue_insights_daily.projectId=${feature_store_project}&ds.User_behaviour_revenue_insights_daily.datasourceName=User%20Behaviour%20Revenue%20Insights%20Daily
\ No newline at end of file
+https://lookerstudio.google.com/reporting/create?c.reportId=${report_id}&c.explain=true&r.reportName=Marketing%20Analytics%20Sample&ds.GA4_sessions.connector=bigQuery&ds.GA4_sessions.type=TABLE&ds.GA4_sessions.tableId=session_date&ds.GA4_sessions.datasetId=${mds_ga4_product_dataset}&ds.GA4_sessions.projectId=${mds_project}&ds.GA4_sessions.datasourceName=MDS%20GA4%20Sessions&ds.GA4_session_device.connector=bigQuery&ds.GA4_session_device.type=TABLE&ds.GA4_session_device.tableId=session_device_daily_metrics&ds.GA4_session_device.datasetId=${mds_ga4_product_dataset}&ds.GA4_session_device.projectId=${mds_project}&ds.GA4_session_device.datasourceName=MDS%20GA4%20Session%20Device&ds.GA4_session_location.connector=bigQuery&ds.GA4_session_location.type=TABLE&ds.GA4_session_location.tableId=session_location_daily_metrics&ds.GA4_session_location.datasetId=${mds_ga4_product_dataset}&ds.GA4_session_location.projectId=${mds_project}&ds.GA4_session_location.datasourceName=MDS%20GA4%20Session%20Location&ds.GA4_event_page.connector=bigQuery&ds.GA4_event_page.type=TABLE&ds.GA4_event_page.tableId=event_page&ds.GA4_event_page.datasetId=${mds_ga4_product_dataset}&ds.GA4_event_page.projectId=${mds_project}&ds.GA4_event_page.datasourceName=MDS%20GA4%20Event%20Page&ds.GA4_unique_page_views.connector=bigQuery&ds.GA4_unique_page_views.type=TABLE&ds.GA4_unique_page_views.tableId=unique_page_views&ds.GA4_unique_page_views.datasetId=${mds_ga4_product_dataset}&ds.GA4_unique_page_views.projectId=${mds_project}&ds.GA4_unique_page_views.datasourceName=MDS%20GA4%20Unique%20Page%20Views&ds.GA4_page_session.connector=bigQuery&ds.GA4_page_session.type=TABLE&ds.GA4_page_session.tableId=page_session_daily_metrics&ds.GA4_page_session.datasetId=${mds_ga4_product_dataset}&ds.GA4_page_session.projectId=${mds_project}&ds.GA4_page_session.datasourceName=MDS%20GA4%20Page%20Session&ds.Ads_perf_conversions.connector=bigQuery&ds.Ads_perf_conversions.type=TABLE&ds.Ads_perf_conversions.tableId=ad_performance_conversions&ds.Ads_perf_conversions.datasetId=${mds_ads_product_dataset}&ds.Ads_perf_conversions.projectId=${mds_project}&ds.Ads_perf_conversions.datasourceName=MDS%20Ads%20Ad%20Performance%20x%20Conversions&ds.MAJ_resource_link.connector=bigQuery&ds.MAJ_resource_link.type=TABLE&ds.MAJ_resource_link.tableId=resource_link&ds.MAJ_resource_link.datasetId=maj_dashboard&ds.MAJ_resource_link.projectId=${monitor_project}&ds.MAJ_resource_link.datasourceName=MAJ%20Resource%20Link&ds.GA4_base_event.connector=bigQuery&ds.GA4_base_event.type=TABLE&ds.GA4_base_event.tableId=event&ds.GA4_base_event.datasetId=${mds_ga4_base_dataset}&ds.GA4_base_event.projectId=${mds_project}&ds.GA4_base_event.datasourceName=MDS%20GA4%20Base%20Event&ds.MDS_execution_log.connector=bigQuery&ds.MDS_execution_log.type=TABLE&ds.MDS_execution_log.tableId=${dataform_log_table_id}&ds.MDS_execution_log.datasetId=${logs_dataset}&ds.MDS_execution_log.projectId=${monitor_project}&ds.MDS_execution_log.datasourceName=MDS%20Execution%20Log&ds.Activation_log.connector=bigQuery&ds.Activation_log.type=TABLE&ds.Activation_log.tableId=${dataflow_log_table_id}&ds.Activation_log.datasetId=${logs_dataset}&ds.Activation_log.projectId=${monitor_project}&ds.Activation_log.datasourceName=Activation%20Execution%20Log&ds.Vertex_log.connector=bigQuery&ds.Vertex_log.type=TABLE&ds.Vertex_log.tableId=${vertex_pipelines_log_table_id}&ds.Vertex_log.datasetId=${logs_dataset}&ds.Vertex_log.projectId=${monitor_project}&ds.Vertex_log.datasourceName=Vertex%20AI%20Pipelines%20Log&ds.Aggregated_vbb_volume_daily.connector=bigQuery&ds.Aggregated_vbb_volume_daily.type=TABLE&ds.Aggregated_vbb_volume_daily.tableId=aggregated_value_based_bidding_volume_daily&ds.Aggregated_vbb_volume_daily.datasetId=${aggregated_vbb_dataset}&ds.Aggregated_vbb_volume_daily.projectId=${feature_store_project}&ds.Aggregated_vbb_volume_daily.datasourceName=Aggregated%20VBB%20Volume%20Daily&ds.Aggregated_vbb_volume_weekly.connector=bigQuery&ds.Aggregated_vbb_volume_weekly.type=TABLE&ds.Aggregated_vbb_volume_weekly.tableId=aggregated_value_based_bidding_volume_weekly&ds.Aggregated_vbb_volume_weekly.datasetId=${aggregated_vbb_dataset}&ds.Aggregated_vbb_volume_weekly.projectId=${feature_store_project}&ds.Aggregated_vbb_volume_weekly.datasourceName=Aggregated%20VBB%20Volume%20Weekly&ds.Aggregated_vbb_correlation.connector=bigQuery&ds.Aggregated_vbb_correlation.type=TABLE&ds.Aggregated_vbb_correlation.tableId=aggregated_value_based_bidding_correlation&ds.Aggregated_vbb_correlation.datasetId=${aggregated_vbb_dataset}&ds.Aggregated_vbb_correlation.projectId=${feature_store_project}&ds.Aggregated_vbb_correlation.datasourceName=Aggregated%20VBB%20Correlation&ds.Aggregated_vbb_weights.connector=bigQuery&ds.Aggregated_vbb_weights.type=TABLE&ds.Aggregated_vbb_weights.tableId=vbb_weights&ds.Aggregated_vbb_weights.datasetId=${aggregated_vbb_dataset}&ds.Aggregated_vbb_weights.projectId=${feature_store_project}&ds.Aggregated_vbb_weights.datasourceName=Aggregated%20VBB%20Weights&ds.Aggregated_predictions.connector=bigQuery&ds.Aggregated_predictions.type=TABLE&ds.Aggregated_predictions.tableId=latest&ds.Aggregated_predictions.datasetId=${aggregated_predictions_dataset}&ds.Aggregated_predictions.projectId=${feature_store_project}&ds.Aggregated_predictions.datasourceName=Aggregated%20Predictions&ds.User_behaviour_revenue_insights_daily.connector=bigQuery&ds.User_behaviour_revenue_insights_daily.type=TABLE&ds.User_behaviour_revenue_insights_daily.tableId=user_behaviour_revenue_insights_daily&ds.User_behaviour_revenue_insights_daily.datasetId=${gemini_insights_dataset}&ds.User_behaviour_revenue_insights_daily.projectId=${feature_store_project}&ds.User_behaviour_revenue_insights_daily.datasourceName=User%20Behaviour%20Revenue%20Insights%20Daily&ds.Bid_strategy_roas_vbb.connector=bigQuery&ds.Bid_strategy_roas_vbb.type=TABLE&ds.Bid_strategy_roas_vbb.tableId=bid_strategy_roas&ds.Bid_strategy_roas_vbb.datasetId=${mds_ads_base_dataset}&ds.Bid_strategy_roas_vbb.projectId=${mds_project}&ds.Bid_strategy_roas_vbb.datasourceName=Bid%20Strategy%20ROAS%20VBB&ds.Prediction_stats.connector=bigQuery&ds.Prediction_stats.type=TABLE&ds.Prediction_stats.tableId=prediction_stats&ds.Prediction_stats.datasetId=${purchase_propensity_dataset}&ds.Prediction_stats.projectId=${feature_store_project}&ds.Prediction_stats.datasourceName=Prediction%20Stats
\ No newline at end of file
diff --git a/templates/purchase_propensity_smart_bidding_view.sql.tpl b/templates/purchase_propensity_smart_bidding_view.sql.tpl
new file mode 100644
index 00000000..493a5e9b
--- /dev/null
+++ b/templates/purchase_propensity_smart_bidding_view.sql.tpl
@@ -0,0 +1,41 @@
+-- Copyright 2024 Google LLC
+--
+-- Licensed under the Apache License, Version 2.0 (the "License");
+-- you may not use this file except in compliance with the License.
+-- You may obtain a copy of the License at
+--
+-- http://www.apache.org/licenses/LICENSE-2.0
+--
+-- Unless required by applicable law or agreed to in writing, software
+-- distributed under the License is distributed on an "AS IS" BASIS,
+-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-- See the License for the specific language governing permissions and
+-- limitations under the License.
+
+SELECT
+ p_stat.inference_date,
+ p_stat.p_p_decile,
+ p_stat.number_of_users,
+ conf.value*p_stat.number_of_users AS predicted_purchase_value
+FROM (
+ SELECT
+ inference_date,
+ p_p_decile,
+ COUNT(p_p_decile) AS number_of_users
+ FROM (
+ SELECT
+ PARSE_DATE('%Y_%m_%d', SUBSTR(_TABLE_SUFFIX, 1,10)) AS inference_date,
+ NTILE(10) OVER (PARTITION BY _TABLE_SUFFIX ORDER BY b.prediction_prob DESC) AS p_p_decile,
+ FROM
+ `${project_id}.${purchase_propensity_dataset}.predictions_*` b
+ WHERE
+ ENDS_WITH(_TABLE_SUFFIX, '_view') )
+ GROUP BY
+ inference_date,
+ p_p_decile ) AS p_stat
+JOIN
+ `${project_id}.${activation_dataset}.${smart_bidding_configuration_table}` conf
+ON
+ p_stat.p_p_decile = decile
+WHERE
+ conf.activation_type = 'purchase-propensity'
\ No newline at end of file
diff --git a/templates/vbb_activation_configuration.jsonl b/templates/vbb_activation_configuration.jsonl
new file mode 100644
index 00000000..57b200e0
--- /dev/null
+++ b/templates/vbb_activation_configuration.jsonl
@@ -0,0 +1,3 @@
+{"activation_type":"purchase-propensity","value_norm":150,"decile_multiplier":[{"decile":1,"multiplier":5.5},{"decile":2,"multiplier":3},{"decile":3,"multiplier":2},{"decile":4,"multiplier":1},{"decile":5,"multiplier":0},{"decile":6,"multiplier":0},{"decile":7,"multiplier":0},{"decile":8,"multiplier":0},{"decile":9,"multiplier":0},{"decile":10,"multiplier":0}]}
+{"activation_type":"cltv","value_norm":500,"decile_multiplier":[{"decile":1,"multiplier":5.5},{"decile":2,"multiplier":3},{"decile":3,"multiplier":2},{"decile":4,"multiplier":1},{"decile":5,"multiplier":0},{"decile":6,"multiplier":0},{"decile":7,"multiplier":0},{"decile":8,"multiplier":0},{"decile":9,"multiplier":0},{"decile":10,"multiplier":0}]}
+{"activation_type":"lead-score-propensity","value_norm":150,"decile_multiplier":[{"decile":1,"multiplier":5.5},{"decile":2,"multiplier":3},{"decile":3,"multiplier":2},{"decile":4,"multiplier":1},{"decile":5,"multiplier":0},{"decile":6,"multiplier":0},{"decile":7,"multiplier":0},{"decile":8,"multiplier":0},{"decile":9,"multiplier":0},{"decile":10,"multiplier":0}]}
\ No newline at end of file