Skip to content

Commit 123552c

Browse files
authored
version queries and add clients_last_seen and mau28_by_dimensions (#1)
1 parent 9954669 commit 123552c

File tree

5 files changed

+110
-3
lines changed

5 files changed

+110
-3
lines changed

README.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,39 @@ BigQuery ETL
22
===
33

44
Bigquery UDFs and SQL queries for building derived datasets.
5+
6+
Recommended practices
7+
===
8+
9+
- Should name sql files like `sql/destination_table_with_version.sql` e.g.
10+
`sql/clients_daily_v6.sql`
11+
- Should not specify a project or dataset in table names to simplify testing
12+
- Should use incremental queries
13+
- Should filter input tables on partition and clustering columns
14+
- Should use UDF language `SQL` over `js` for performance
15+
- Should use UDFs for reusability
16+
- Should use query parameters over jinja templating
17+
- Temporary issue: Airflow 1.10+ is required in order to use query parameters
18+
19+
Incremental Queries
20+
===
21+
22+
Incremental queries have these benefits:
23+
24+
- BigQuery billing discounts for destination table partitions not modified in
25+
the last 90 days
26+
- Requires less airflow configuration
27+
- Will have tooling to automate backfilling
28+
- Will have tooling to replace partitions atomically to prevent duplicate data
29+
- Will have tooling to generate an optimized "destination plus" view that
30+
calculates the most recent partition
31+
32+
Incremental queries have these properties:
33+
34+
- Must accept a date via `@submission_date` query parameter
35+
- Must output a column named `submission_date` matching the query parameter
36+
- Must produce similar results when run multiple times
37+
- Should produce identical results when run multiple times
38+
- May depend on the previous partition
39+
- If using previous partition, must include a `.init.sql` query to init the
40+
first partition

sql/clients_last_seen_v1.init.sql

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
SELECT
2+
@submission_date AS submission_date,
3+
CURRENT_DATETIME() AS generated_time,
4+
MAX(submission_date_s3) AS last_seen_date,
5+
-- approximate LAST_VALUE(input).*
6+
ARRAY_AGG(input
7+
ORDER BY submission_date_s3
8+
DESC LIMIT 1
9+
)[OFFSET(0)].* EXCEPT (submission_date_s3)
10+
FROM
11+
clients_daily_v6 AS input
12+
WHERE
13+
submission_date_s3 <= @submission_date
14+
AND
15+
submission_date_s3 > DATE_SUB(@submission_date, INTERVAL 28 DAY)
16+
GROUP BY
17+
input.client_id

sql/clients_last_seen_v1.sql

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
WITH current_sample AS (
2+
SELECT
3+
submission_date_s3 AS last_seen_date,
4+
* EXCEPT (submission_date_s3)
5+
FROM
6+
clients_daily_v6
7+
WHERE
8+
submission_date_s3 = @submission_date
9+
), previous AS (
10+
SELECT
11+
* EXCEPT (submission_date,
12+
generated_time)
13+
FROM
14+
analysis.clients_last_seen_v1
15+
WHERE
16+
submission_date = DATE_SUB(@submission_date, INTERVAL 1 DAY)
17+
AND last_seen_date > DATE_SUB(@submission_date, INTERVAL 28 DAY)
18+
)
19+
SELECT
20+
@submission_date AS submission_date,
21+
CURRENT_DATETIME() AS generated_time,
22+
IF(current_sample.client_id IS NOT NULL,
23+
current_sample,
24+
previous).*
25+
FROM
26+
current_sample
27+
FULL JOIN
28+
previous
29+
USING
30+
(client_id)
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
SELECT
2+
submission_date,
3+
CURRENT_DATETIME() AS generated_time,
4+
COUNT(*) AS mau,
5+
COUNTIF(last_seen_date = submission_date) AS dau,
6+
-- requested fields from bug 1525689
7+
source,
8+
medium,
9+
campaign,
10+
content,
11+
country,
12+
distribution_id
13+
FROM
14+
clients_last_seen_v1
15+
WHERE
16+
submission_date = @submission_date
17+
GROUP BY
18+
submission_date,
19+
source,
20+
medium,
21+
campaign,
22+
content,
23+
country,
24+
distribution_id

sql/firefox_desktop_exact_mau28.sql renamed to sql/firefox_desktop_exact_mau28_v1.sql

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@ SELECT
22
@submission_date AS submission_date,
33
CURRENT_DATETIME() AS generated_time,
44
COUNT(DISTINCT client_id) AS mau,
5-
SUM(CAST(submission_date_s3 = @submission_date AS INT64)) as dau
5+
COUNTIF(submission_date_s3 = @submission_date) AS dau
66
FROM
7-
telemetry.clients_daily_v6
7+
clients_daily_v6
88
WHERE
99
submission_date_s3 <= @submission_date
10-
AND submission_date_s3 > DATE_ADD(@submission_date, INTERVAL -28 DAY)
10+
AND submission_date_s3 > DATE_SUB(@submission_date, INTERVAL 28 DAY)

0 commit comments

Comments
 (0)