-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transfer to v3 #1024
base: main
Are you sure you want to change the base?
Transfer to v3 #1024
Conversation
This PR will trigger a minor release when merged. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1024 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 6 6
Lines 710 710
=========================================
Hits 710 710 ☔ View full report in Codecov by Sentry. |
I believe that not every PR merged into |
Please run SQLFluff. |
Given the move to |
Sounds good, separate PR makes sense, no reason to combine here. My overarching goal is to ensure queries used by data.aem.live and queries used to feed spacecat and eventually DaaS -- especially those related to page view estimates -- are based on common logic so that we maintain consistency. I agree with you this is a separate topic. |
0effbdc
to
c39dc76
Compare
source, | ||
COUNT(id) AS ids, | ||
COUNT(DISTINCT url) AS pages, | ||
APPROX_TOP_COUNT(url, 1)[OFFSET(0)].value AS topurl, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see a slack thread started by @shsteimer in #aem-data-desk
on Jan 3. I believe that grouping by source and taking the top url is hiding data. Might make more sense to group by source+url combination, or group by url and take the top source (referer).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't add _V4 in any of the built-in queries, make sure sqlfluff runs cleanly.
@@ -37,7 +37,7 @@ BEGIN | |||
WHEN 365 THEN TIMESTAMP_TRUNC(time, YEAR) | |||
ELSE TIMESTAMP_TRUNC(time, DAY) | |||
END AS date | |||
FROM helix_rum.PAGEVIEWS_V3( | |||
FROM helix_rum.PAGEVIEWS_V4( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not create PAGEVIEWS_V4
and I object strongly to using it in any of the default queries as it is hiding data.
MAX(weight) AS pageviews | ||
FROM | ||
`helix-225321.helix_rum.EVENTS_V4`( | ||
net.host(@url), @offset, @interval, '-', '-', 'UTC', 'all', @domainkey |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@url
should have the same semantics as the rest of the@url
parameters, so URL prefix withouthttps://
, optional path and$
or?
delimiters.- if you list
startdate
andenddate
in the supported parameters, you should also pass them through here - same for
timezone
- same for
device
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the same data should come from PAGEVIEWS_V4
– why does this query exist when rum-pageviews
is there already?
A minimal set of data endpoints necessary to run https://data.aem.live/rum-dashboard
Also I updated rum-pageviews to use Events_V4; because the version that uses Events_V3 cannot resolve www.petplace.com to petplace.com
When we use Events_V4 which prepends www and checks for results for a url, www.petplace.com and petplace.com can appear.
@trieloff
@langswei
@dbrrt