Skip to content

Conversation

caugner
Copy link
Contributor

@caugner caugner commented Oct 9, 2025

Description

Updates the mdn.legacy_action and mdn.legacy_page views to keep only the last occurrence of each UTM key (later one wins).

Resolves the Scalar subquery produced more than one element error with this query when using mdn.legacy_page instead of mdn_yari.page.

Related Tickets & Documents

  • (none)

Reviewer, please follow this checklist

Keep only the last occurrence of each UTM key (later one wins).
@caugner caugner requested a review from LeoMcA October 9, 2025 14:08
@dataops-ci-bot
Copy link

Integration report for "fix(mdn): handle duplicate UTM parameters"

sql.diff

Click to expand!
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/mdn/legacy_action/view.sql /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/mdn/legacy_action/view.sql
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/mdn/legacy_action/view.sql	2025-10-09 14:10:07.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/mdn/legacy_action/view.sql	2025-10-09 14:06:47.000000000 +0000
@@ -101,10 +101,20 @@
         STRUCT(
           ARRAY(
             SELECT AS STRUCT
+              key,
+              value
+            FROM
+              (
+                SELECT
               REGEXP_EXTRACT(kv, r'^utm_(.*?)=') AS key,
-              REGEXP_EXTRACT(kv, r'=(.*)$') AS value
+                  REGEXP_EXTRACT(kv, r'=(.*)$') AS value,
+                  off
             FROM
               UNNEST(REGEXP_EXTRACT_ALL(JSON_VALUE(event_extra.url), r'[?&](utm_[^&]+)')) AS kv
+                  WITH OFFSET off
+              )
+            QUALIFY
+              ROW_NUMBER() OVER (PARTITION BY key ORDER BY off DESC) = 1
           ) AS page_utm
         ) AS labeled_string,
         STRUCT(CAST(NULL AS ARRAY<STRING>) AS navigator_user_languages) AS string_list,
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/mdn/legacy_page/view.sql /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/mdn/legacy_page/view.sql
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/mdn/legacy_page/view.sql	2025-10-09 14:10:07.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/mdn/legacy_page/view.sql	2025-10-09 14:06:47.000000000 +0000
@@ -76,10 +76,20 @@
         STRUCT(
           ARRAY(
             SELECT AS STRUCT
+              key,
+              value
+            FROM
+              (
+                SELECT
               REGEXP_EXTRACT(kv, r'^utm_(.*?)=') AS key,
-              REGEXP_EXTRACT(kv, r'=(.*)$') AS value
+                  REGEXP_EXTRACT(kv, r'=(.*)$') AS value,
+                  off
             FROM
               UNNEST(REGEXP_EXTRACT_ALL(JSON_VALUE(event_extra.url), r'[?&](utm_[^&]+)')) AS kv
+                  WITH OFFSET off
+              )
+            QUALIFY
+              ROW_NUMBER() OVER (PARTITION BY key ORDER BY off DESC) = 1
           ) AS page_utm
         ) AS labeled_string,
         STRUCT(CAST(NULL AS ARRAY<STRING>) AS navigator_user_languages) AS string_list,

Link to full diff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants