-
Notifications
You must be signed in to change notification settings - Fork 51
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Carlos Timoteo
committed
Dec 17, 2024
1 parent
ca3a14c
commit b31ea28
Showing
8 changed files
with
416 additions
and
138 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
376 changes: 242 additions & 134 deletions
376
sql/query/invoke_backfill_user_rolling_window_lead_metrics.sqlx
Large diffs are not rendered by default.
Oops, something went wrong.
23 changes: 23 additions & 0 deletions
23
sql/query/invoke_lead_score_propensity_inference_preparation.sqlx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
-- Copyright 2023 Google LLC | ||
-- | ||
-- Licensed under the Apache License, Version 2.0 (the "License"); | ||
-- you may not use this file except in compliance with the License. | ||
-- You may obtain a copy of the License at | ||
-- | ||
-- http://www.apache.org/licenses/LICENSE-2.0 | ||
-- | ||
-- Unless required by applicable law or agreed to in writing, software | ||
-- distributed under the License is distributed on an "AS IS" BASIS, | ||
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
-- See the License for the specific language governing permissions and | ||
-- limitations under the License. | ||
|
||
-- This script determines the current date and then passes it as an argument to a | ||
-- stored procedure in your BigQuery project. This pattern is commonly used when | ||
-- you want a stored procedure to perform operations or calculations that are | ||
-- relevant to the current date, such as data processing, analysis, or reporting tasks. | ||
|
||
DECLARE inference_date DATE DEFAULT NULL; | ||
SET inference_date = CURRENT_DATE(); | ||
|
||
CALL `{{project_id}}.{{dataset}}.{{stored_procedure}}`(inference_date); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
-- Copyright 2023 Google LLC | ||
-- | ||
-- Licensed under the Apache License, Version 2.0 (the "License"); | ||
-- you may not use this file except in compliance with the License. | ||
-- You may obtain a copy of the License at | ||
-- | ||
-- http://www.apache.org/licenses/LICENSE-2.0 | ||
-- | ||
-- Unless required by applicable law or agreed to in writing, software | ||
-- distributed under the License is distributed on an "AS IS" BASIS, | ||
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
-- See the License for the specific language governing permissions and | ||
-- limitations under the License. | ||
|
||
-- This script sets up a date range, calls a stored procedure with this range and a variable to | ||
-- store a result, and then returns the result of the stored procedure. This pattern is common | ||
-- for orchestrating data processing tasks within BigQuery using stored procedures. | ||
|
||
DECLARE input_date DATE; | ||
DECLARE end_date DATE; | ||
DECLARE users_added INT64 DEFAULT NULL; | ||
|
||
SET end_date= CURRENT_DATE(); | ||
SET input_date= (SELECT DATE_SUB(end_date, INTERVAL {{interval_input_date}} DAY)); | ||
|
||
-- This code block ensures that the end_date used in subsequent operations is not later than one day after the latest available data in | ||
-- the specified events table. This prevents potential attempts to process data for a date range that extends beyond the actual data availability. | ||
IF (SELECT DATE_SUB(end_date, INTERVAL 1 DAY)) > (SELECT MAX(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`) THEN | ||
SET end_date = (SELECT DATE_ADD(MAX(event_date), INTERVAL 1 DAY) FROM `{{mds_project_id}}.{{mds_dataset}}.event`); | ||
END IF; | ||
|
||
-- This code block ensures that the input_date used in subsequent operations is not before the earliest available data in the | ||
-- specified events table. This prevents potential errors or unexpected behavior that might occur when trying to process data | ||
-- for a date range that precedes the actual data availability. | ||
IF input_date < (SELECT MIN(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`) THEN | ||
SET input_date = (SELECT DATE_ADD(MIN(event_date), INTERVAL 1 DAY) FROM `{{mds_project_id}}.{{mds_dataset}}.event`); | ||
END IF; | ||
|
||
CALL `{{project_id}}.{{dataset}}.{{stored_procedure}}`(input_date, end_date, users_added); |
73 changes: 73 additions & 0 deletions
73
sql/query/invoke_lead_score_propensity_training_preparation.sqlx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
-- Copyright 2023 Google LLC | ||
-- | ||
-- Licensed under the Apache License, Version 2.0 (the "License"); | ||
-- you may not use this file except in compliance with the License. | ||
-- You may obtain a copy of the License at | ||
-- | ||
-- http://www.apache.org/licenses/LICENSE-2.0 | ||
-- | ||
-- Unless required by applicable law or agreed to in writing, software | ||
-- distributed under the License is distributed on an "AS IS" BASIS, | ||
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
-- See the License for the specific language governing permissions and | ||
-- limitations under the License. | ||
|
||
-- This script intelligently determines the optimal date range for training a purchase | ||
-- propensity model by considering user-defined parameters and the availability of purchase | ||
-- events within the dataset. It ensures that the training data includes purchase events if | ||
-- they exist within the specified bounds. | ||
|
||
-- Intended start and end dates for training data | ||
-- Initializing Training Dates | ||
DECLARE train_start_date DATE DEFAULT NULL; | ||
DECLARE train_end_date DATE DEFAULT NULL; | ||
|
||
-- Control data splitting for training and validation (likely used in a subsequent process). | ||
DECLARE train_split_end_number INT64 DEFAULT NULL; | ||
DECLARE validation_split_end_number INT64 DEFAULT NULL; | ||
|
||
-- Will store the count of distinct users who made a login within a given period. | ||
DECLARE logged_users INT64 DEFAULT NULL; | ||
|
||
-- Used to store the maximum and minimum event dates from the source data. | ||
DECLARE max_date DATE; | ||
DECLARE min_date DATE; | ||
|
||
-- Determining Maximum and Minimum Dates | ||
SET max_date = (SELECT DATE_SUB(MAX(event_date), INTERVAL {{interval_max_date}} DAY) FROM `{{mds_project_id}}.{{mds_dataset}}.event`); | ||
SET min_date = (SELECT DATE_ADD(MIN(event_date), INTERVAL {{interval_min_date}} DAY) FROM `{{mds_project_id}}.{{mds_dataset}}.event`); | ||
|
||
-- If min_date > maximum event_date OR max_date < minimum event_date, then set min_date for the min event_date and set max_date for the max event_date | ||
IF min_date >= (SELECT MAX(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`) OR max_date <= (SELECT MIN(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`) OR min_date >= max_date THEN | ||
SET min_date = (SELECT MIN(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`); | ||
SET max_date = (SELECT MAX(event_date) FROM `{{mds_project_id}}.{{mds_dataset}}.event`); | ||
END IF; | ||
|
||
-- Setting Split Numbers | ||
-- Sets the train_split_end_number to a user-defined value. This value likely determines the proportion of data used for training. | ||
SET train_split_end_number = {{train_split_end_number}}; -- If you want 60% for training use number 5. If you want 80% use number 7. | ||
-- Sets the validation_split_end_number to a user-defined value, controlling the proportion of data used for validation. | ||
SET validation_split_end_number = {{validation_split_end_number}}; | ||
|
||
-- This crucial step counts distinct users who have an event named 'login' within the initially set training date range. | ||
-- IF there are no logged_users in the time interval selected, then set "train_start_date" and "train_end_date" as "max_date" and "min_date". | ||
SET logged_users = (SELECT COUNT(DISTINCT user_pseudo_id) | ||
FROM `{{mds_project_id}}.{{mds_dataset}}.event` | ||
WHERE event_name = 'login' AND | ||
event_date BETWEEN min_date AND max_date | ||
); | ||
|
||
-- Setting Training Dates | ||
-- If there are logged_users in the training set, then keep the calculated dates, or else set | ||
-- the start and end dates to a fixed interval preventing `train_start_date` and `train_end_date` from being NULL. | ||
IF logged_users > 0 THEN | ||
SET train_start_date = min_date; | ||
SET train_end_date = max_date; | ||
ELSE | ||
SET train_start_date = DATE_SUB(CURRENT_DATE(), INTERVAL 3 YEAR); | ||
SET train_end_date = DATE_SUB(CURRENT_DATE(), INTERVAL 5 DAY); | ||
END IF; | ||
|
||
-- Finally, the script calls a stored procedure, passing the adjusted training dates and split numbers as arguments. This stored procedure | ||
-- handles the actual data preparation for the lead score propensity model. | ||
CALL `{{project_id}}.{{dataset}}.{{stored_procedure}}`(train_start_date, train_end_date, train_split_end_number, validation_split_end_number); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
-- Copyright 2023 Google LLC | ||
-- | ||
-- Licensed under the Apache License, Version 2.0 (the "License"); | ||
-- you may not use this file except in compliance with the License. | ||
-- You may obtain a copy of the License at | ||
-- | ||
-- http://www.apache.org/licenses/LICENSE-2.0 | ||
-- | ||
-- Unless required by applicable law or agreed to in writing, software | ||
-- distributed under the License is distributed on an "AS IS" BASIS, | ||
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
-- See the License for the specific language governing permissions and | ||
-- limitations under the License. | ||
|
||
-- This script sets up a date range, calls a stored procedure with this range and a variable to | ||
-- store a result, and then returns the result of the stored procedure. This pattern is common | ||
-- for orchestrating data processing tasks within BigQuery using stored procedures. | ||
|
||
DECLARE input_date DATE; | ||
DECLARE end_date DATE; | ||
DECLARE users_added INT64 DEFAULT NULL; | ||
|
||
SET input_date= CURRENT_DATE(); | ||
SET end_date= (SELECT DATE_SUB(input_date, INTERVAL {{interval_end_date}} DAY)); | ||
|
||
CALL `{{project_id}}.{{dataset}}.{{stored_procedure}}`(input_date, end_date, users_added); | ||
|
||
SELECT users_added; |