version queries and add clients_last_seen and mau28_by_dimensions #1

relud · 2019-02-14T00:34:14Z

No description provided.

relud · 2019-02-14T00:37:48Z

@jklukas requesting your review because you said in a meeting recently you would be working on BigQuery ETL related stuff

jklukas

I'm looking forward to seeing this come together and the particular jobs in this PR are directly relevant to the growth dashboard work I'm doing. It looks like clients_last_seen is exactly what I need to provide efficient dashboarding by dimension.

README.md

jklukas · 2019-02-14T15:03:54Z

README.md

+
+- Should name sql files like `sql/destination_table_with_version.sql` e.g.
+  `sql/clients_daily_v6.sql`
+- Should not specify a project or dataset in table names to simplify testing


Do we know at this point what the hierarchy of projects, datasets, and tables is going to look like? Will these derived tables live in the same project and dataset as the source data?

With GCP ingestion so far, we're splitting tables to different datasets based on document namespace. We would need to change that practice to meet this requirement.

There are implications for permissions, testing, etc. that I haven't fully thought through yet.

i don't know, and dataset per document namespace seems good to me. this has lots of implications, but if we can avoid depending on a static dataset name then we only need unique datasets per test instead of unique projects in order to run tests in parallel.

i think this is fine for queries that only read one input table (hence should not must), because output dataset can be specified separately from default dataset. For queries that need to read multiple tables from multiple datasets, I think for now we can just assume their either run in series or require multiple projects. The first time we need that we can consider solutions like templating dataset names for testing and adding a recommendation here to follow the chosen solution

README.md

jklukas · 2019-02-14T15:13:54Z

sql/clients_last_seen_v1.init.sql

+  ARRAY_AGG(input
+    ORDER BY submission_date_s3
+    DESC LIMIT 1
+  )[OFFSET(0)].* EXCEPT (submission_date_s3)


This is fascinating. I like this better than having to use a ROW_NUMBER window function and then select n = 1.

we could alternately ANY_VALUE(LAST_VALUE(input) OVER (PARTITION BY client_id ORDER BY submission_date_s3)), but i don't know the performance implications of that

but i don't know the performance implications of that

I decided to check and it's not as simple as above, but using a window is so much faster it hurts (runs in ~1/6th of the time and uses ~1/8th of the compute)

sql/clients_last_seen_v1.sql

jklukas · 2019-02-14T15:14:42Z

sql/clients_last_seen_v1.sql

+    * EXCEPT (submission_date,
+      generated_time)
+  FROM
+    analysis.last_seen_v1


Is the dataset prefix here intended?

i take that back, this one is needed because it won't match the dataset on line 6 above. i will figure out how to make this better as i test it.

sql/firefox_desktop_exact_mau28_by_dimensions_v1.sql

Remove bigquery-etl resolution

version queries and add clients_last_seen and mau28_by_dimensions

ffa63bb

relud requested a review from jklukas February 14, 2019 00:36

jklukas approved these changes Feb 14, 2019

View reviewed changes

address review

b68eafd

relud merged commit 123552c into master Feb 15, 2019

relud deleted the mau_by_dimensions branch February 15, 2019 21:37

dataops-ci-bot mentioned this pull request Nov 24, 2023

Bug 1866469 - Exclude use_counters from GLAM ETL #4603

Merged

5 tasks

skahmann3 mentioned this pull request Jan 3, 2024

[RS-834] Add country to search_revenue_levers_daily #4739

Merged

5 tasks

wwyc mentioned this pull request Mar 15, 2024

DENG-802 Changed backfill cli commands #5217

Merged

5 tasks

quiiver pushed a commit that referenced this pull request Jun 25, 2024

Setup initial tooling and templates (#1)

08951f5

quiiver pushed a commit that referenced this pull request Jun 25, 2024

Merge pull request #1 from acmiyaguchi/update

0921e9f

Remove bigquery-etl resolution

kwindau mentioned this pull request Dec 12, 2024

Create aggregate tables for Firefox Health Indicator dashboard #6665

Merged

data-sync-user mentioned this pull request May 16, 2025

Some ETLs and views may be silently unioning data incorrectly #7461

Open

kwindau mentioned this pull request Jul 14, 2025

fix(DENG-9112): Fix usage_reporting_active_users joins #7776

Merged

version queries and add clients_last_seen and mau28_by_dimensions #1

version queries and add clients_last_seen and mau28_by_dimensions #1

Uh oh!

Conversation

relud commented Feb 14, 2019

Uh oh!

relud commented Feb 14, 2019

Uh oh!

jklukas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jklukas Feb 14, 2019

Choose a reason for hiding this comment

Uh oh!

relud Feb 15, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jklukas Feb 14, 2019

Choose a reason for hiding this comment

Uh oh!

relud Feb 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

relud Feb 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jklukas Feb 14, 2019

Choose a reason for hiding this comment

Uh oh!

relud Feb 15, 2019

Choose a reason for hiding this comment

Uh oh!

relud Feb 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

relud Feb 15, 2019 •

edited

Loading

relud Feb 27, 2019 •

edited

Loading

relud Feb 15, 2019 •

edited

Loading