Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transfer to v3 #1024

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,10 @@
"src/queries/rum-targets.sql",
"src/queries/dash/auth-all-domains.sql",
"src/queries/dash/domain-list.sql",
"src/queries/dash/update-domain-info.sql"
"src/queries/dash/update-domain-info.sql",
"src/queries/dash/pageviews.sql",
"src/queries/dash/github-commits.sql"

]
},
"nodemonConfig": {
Expand Down
40 changes: 40 additions & 0 deletions src/queries/dash/github-commits.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
--- description: Get Daily Commits For a Site or Repo
--- Authorization: none
--- Access-Control-Allow-Origin: *
--- limit: 30
--- interval: 30
--- offset: 0
--- startdate: 2023-02-01
--- enddate: 2023-05-28
--- timezone: UTC
--- timeunit: day
--- exactmatch: false
--- url: -
--- device: all
--- domainkey: secret
WITH current_data AS (
SELECT *

Check failure on line 16 in src/queries/dash/github-commits.sql

View workflow job for this annotation

GitHub Actions / SQLFluff Lint

SQLFluff

AM04: Query produces an unknown number of result columns.
FROM
`HELIX-225321.HELIX_EXTERNAL_DATA.DAILY_COMMITS`(
@url,
@offset,
@interval,
@startdate,
@enddate,
@domainkey
)
)

SELECT * FROM current_data WHERE
NOT user = 'GitHub'
AND NOT user = 'GitHub Action'
AND NOT user = 'GitHub Enterprise'
AND NOT user = 'CircleCi Build'
AND NOT user = 'Helix Bot'
AND NOT user = 'adobe-alloy-bot'
AND NOT user = 'github-actions'
AND NOT user = 'github-actions[bot]'
AND NOT user = 'helix-bot[bot]'
AND NOT user = 'renovate[bot]'
AND NOT user = 'semantic-release-bot'
ORDER BY owner_repo ASC, commit_date ASC
30 changes: 30 additions & 0 deletions src/queries/dash/pageviews.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
--- description: Get Helix RUM data for a given domain or owner/repo combination
--- Authorization: none
--- Access-Control-Allow-Origin: *
--- limit: 10
--- interval: 30
--- offset: 0
--- startdate: 2022-02-01
--- enddate: 2022-05-28
--- timezone: UTC
--- url: -
--- device: all
--- domainkey: secret
WITH pageviews_by_id AS (
SELECT
hostname,
id,
MAX(weight) AS pageviews
FROM
`helix-225321.helix_rum.EVENTS_V4`(
net.host(@url), @offset, @interval, '-', '-', 'UTC', 'all', @domainkey
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • @url should have the same semantics as the rest of the @url parameters, so URL prefix without https://, optional path and $ or ? delimiters.
  • if you list startdate and enddate in the supported parameters, you should also pass them through here
  • same for timezone
  • same for device

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the same data should come from PAGEVIEWS_V4 – why does this query exist when rum-pageviews is there already?

)
GROUP BY id, hostname
)

SELECT
hostname,
SUM(pageviews) AS pageviews
FROM pageviews_by_id
GROUP BY hostname
ORDER BY pageviews DESC
2 changes: 1 addition & 1 deletion src/queries/rum-pageviews.sql
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
--- timezone: UTC
--- domainkey: secret
DECLARE results NUMERIC;
CREATE OR REPLACE PROCEDURE helix_rum.UPDATE_PAGEVIEWS(

Check failure on line 11 in src/queries/rum-pageviews.sql

View workflow job for this annotation

GitHub Actions / SQLFluff Lint

SQLFluff

LT01: Expected single whitespace between procedure name identifier and start bracket '('.
ingranularity INT64,
inlimit INT64,
inoffset INT64,
Expand All @@ -18,11 +18,11 @@
OUT results NUMERIC
)
BEGIN
CREATE TEMP TABLE temp_pageviews(

Check failure on line 21 in src/queries/rum-pageviews.sql

View workflow job for this annotation

GitHub Actions / SQLFluff Lint

SQLFluff

LT01: Expected single whitespace between naked identifier and start bracket '('.
year INT64,
month INT64,
day INT64,
time STRING,

Check failure on line 25 in src/queries/rum-pageviews.sql

View workflow job for this annotation

GitHub Actions / SQLFluff Lint

SQLFluff

RF04: Keywords should not be used as identifiers.
url INT64,
pageviews NUMERIC
)
Expand All @@ -37,7 +37,7 @@
WHEN 365 THEN TIMESTAMP_TRUNC(time, YEAR)
ELSE TIMESTAMP_TRUNC(time, DAY)
END AS date
FROM helix_rum.PAGEVIEWS_V3(
FROM helix_rum.PAGEVIEWS_V4(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not create PAGEVIEWS_V4 and I object strongly to using it in any of the default queries as it is hiding data.

inurl, # url
(inoffset * ingranularity) - 1, # offset
inlimit * ingranularity, # days to fetch
Expand All @@ -64,7 +64,7 @@
EXTRACT(YEAR FROM date) AS year,
EXTRACT(MONTH FROM date) AS month,
EXTRACT(DAY FROM date) AS day,
STRING(date) AS time,

Check failure on line 67 in src/queries/rum-pageviews.sql

View workflow job for this annotation

GitHub Actions / SQLFluff Lint

SQLFluff

RF04: Keywords should not be used as identifiers.
COUNT(url) AS urls,
SUM(weight) AS pageviews
FROM pageviews_by_id
Expand All @@ -91,7 +91,7 @@
WHEN 90 THEN TIMESTAMP_TRUNC(alldates, QUARTER)
WHEN 365 THEN TIMESTAMP_TRUNC(alldates, YEAR)
ELSE TIMESTAMP_TRUNC(alldates, DAY)
END AS alldates FROM basicdates

Check failure on line 94 in src/queries/rum-pageviews.sql

View workflow job for this annotation

GitHub Actions / SQLFluff Lint

SQLFluff

LT02: Expected indent of 4 spaces.
GROUP BY alldates
),

Expand All @@ -100,7 +100,7 @@
EXTRACT(YEAR FROM dates.alldates) AS year,
EXTRACT(MONTH FROM dates.alldates) AS month,
EXTRACT(DAY FROM dates.alldates) AS day,
STRING(dates.alldates) AS time,

Check failure on line 103 in src/queries/rum-pageviews.sql

View workflow job for this annotation

GitHub Actions / SQLFluff Lint

SQLFluff

RF04: Keywords should not be used as identifiers.
COALESCE(dailydata.urls, 0) AS distinct_urls,
COALESCE(dailydata.pageviews, 0) AS pageviews
FROM dates
Expand All @@ -112,7 +112,7 @@
SELECT * FROM finaldata ORDER BY time DESC;
SET results = (SELECT SUM(pageviews) FROM (SELECT * FROM temp_pageviews));
END;
IF (CAST(@granularity AS STRING) = "auto") THEN

Check failure on line 115 in src/queries/rum-pageviews.sql

View workflow job for this annotation

GitHub Actions / SQLFluff Lint

SQLFluff

PRS: Line 115, Position 1: Found unparsable section: 'IF (CAST(@granularity AS STRING) = "auto...'
CALL helix_rum.UPDATE_PAGEVIEWS(1, CAST(@interval AS INT64), CAST(@offset AS INT64), @url, @timezone, @domainkey, results);
IF (results > (CAST(@interval AS INT64) * 200)) THEN
# we have enough results, use the daily granularity
Expand Down
Loading