Skip to content

Commit

Permalink
Change default session identifier and add passthroughs
Browse files Browse the repository at this point in the history
  • Loading branch information
georgewoodhead committed Nov 22, 2023
1 parent 1ca6e6f commit 3c0e407
Show file tree
Hide file tree
Showing 19 changed files with 423 additions and 147 deletions.
10 changes: 9 additions & 1 deletion CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,11 +1,19 @@
snowplow-media-player 0.7.0 (2023-xx-xx)
---------------------------------------
## Summary
This release adds a more robust unique media identifier. This fixes an issue where duplicate `media_id` values could occur in the media stats table as a result of incorrect tracking implementation (e.g. sharing the same media label across different media types). This release also fixes the incremental materialization of the media_ad_views table by adding a unique primary key.
This version adds new features powered by a complete refactor of the core processing of the package by moving it out to the new `base` macro functionality provided in `snowplow_utils`. This enables users to now specify custom fields for sessionization and user identification, to add custom entities/SDEs fields to the base events table for redshift/postgres, and to add passthrough fields to the derived tables so you can now more easily add your own fields to our tables.

In addition this release adds a more robust unique media identifier. This fixes an issue where duplicate `media_id` values could occur in the media stats table as a result of incorrect tracking implementation (e.g. sharing the same media label across different media types). This release also fixes the incremental materialization of the media_ad_views table by adding a unique primary key.

## Features
- Migrate base models to the new `base` macros for flexibility and consistency
- Add ability to pass fields through to derived media base and ad views tables
- Add new field `domain_sessionid_array` to derived tables (where applicable)

## Fixes
- Add unique media identifier (close #59)
- Add missing primary key to media_ad_views
- Fix field names in custom session stats model yaml (close #63)

## Under the hood

Expand Down
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

# dbt-snowplow-media-player

A fully incremental model that transforms media player event data into derived tables for easier querying generated by the Snowplow [JavaScript tracker][javascript-tracker] in combination with media tracking specific plugins such as the [Media Tracking plugin][media-tracking] or the [YouTube Tracking plugin][youtube-tracking].
A fully incremental model that transforms media player event data into derived tables for easier querying generated by the Snowplow [JavaScript tracker][javascript-tracker] in combination with media tracking specific plugins such as the [Media Tracking plugin][media-tracking] or the [YouTube Tracking plugin][youtube-tracking]. The package also supports media events generated by the Snowplow [iOS and Android trackers][mobile-media-tracker-docs].

Please refer to the [doc site][snowplow-media-player-docs] for a full breakdown of the package.

Expand Down Expand Up @@ -122,3 +122,5 @@ limitations under the License.

[snowplow-media-player-docs-dbt]: https://snowplow.github.io/dbt-snowplow-media-player/#!/overview/snowplow_media_player
[snowplow-media-player-docs]: https://docs.snowplow.io/docs/modeling-your-data/modeling-your-data-with-dbt/dbt-models/dbt-media-player-data-model/

[mobile-media-tracker-docs]: https://docs.snowplow.io/docs/collecting-data/collecting-from-own-applications/mobile-trackers/tracking-events/media-tracking/
3 changes: 3 additions & 0 deletions dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,9 @@ vars:
snowplow__enable_web_events: true
snowplow__enable_mobile_events: false
snowplow__enable_ad_quartile_event: false
# add extra custom fields:
snowplow__base_passthroughs: []
snowplow__ad_views_passthroughs: []

# Variables - Warehouse Specific
snowplow__media_player_event_context: 'com_snowplowanalytics_snowplow_media_player_event_1'
Expand Down
24 changes: 16 additions & 8 deletions docs/markdown/snowplow_media_player_common_cols.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,15 @@ A UUID for each page view e.g. `c6ef3124-b53a-4b13-a233-0088f79dcbcb`.
{% enddocs %}

{% docs col_session_identifier %}
A visit / session UUID e.g. `c6ef3124-b53a-4b13-a233-0088f79dcbcb`.
The session identifier as defined in your project variables. Default to the media_session_id, or to page_view_id if the media session entity is not enabled.
{% enddocs %}

{% docs col_domain_sessionid_array %}
All domain_sessionids seen for a play_id.
{% enddocs %}

{% docs col_user_identifier %}
The user identifier as defined in your project variables. Default to domain_userid.
{% enddocs %}

{% docs col_domain_userid %}
Expand Down Expand Up @@ -1005,31 +1013,31 @@ Datetime of the last event.
{% enddocs %}

{% docs col_views_unique %}
Number of users that viewed the ad (identified by their domain_userid).
Number of users that viewed the ad (identified by their user_identifier).
{% enddocs %}

{% docs col_clicked_unique %}
Number of users that clicked on the ad (identified by their domain_userid).
Number of users that clicked on the ad (identified by their user_identifier).
{% enddocs %}

{% docs col_skipped_unique %}
Number of users that skipped the ad (identified by their domain_userid).
Number of users that skipped the ad (identified by their user_identifier).
{% enddocs %}

{% docs col_percent_reached_25_unique %}
Number of users that watched 25% of the ad (identified by their domain_userid).
Number of users that watched 25% of the ad (identified by their user_identifier).
{% enddocs %}

{% docs col_percent_reached_50_unique %}
Number of users that watched 50% of the ad (identified by their domain_userid).
Number of users that watched 50% of the ad (identified by their user_identifier).
{% enddocs %}

{% docs col_percent_reached_75_unique %}
Number of users that watched 75% of the ad (identified by their domain_userid).
Number of users that watched 75% of the ad (identified by their user_identifier).
{% enddocs %}

{% docs col_percent_reached_100_unique %}
Number of users that watched 100% of the ad (identified by their domain_userid).
Number of users that watched 100% of the ad (identified by their user_identifier).
{% enddocs %}

{% docs col_media_session_id %}
Expand Down
10 changes: 3 additions & 7 deletions docs/markdown/snowplow_media_player_model_docs.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,3 @@
{% docs table_interactions_this_run %}
This staging table shows all media player events within the current incremental run and calculates play_time. It could be used in custom models for more in-depth time based calculations.
{% enddocs %}

{% docs table_base_this_run %}
This staging table aggregates media player interactions within the current run to a pageview level that is considered a base level for media plays.
{% enddocs %}
Expand All @@ -15,11 +11,11 @@ This view removes impressions from the derived snowplow_media_player_base table
{% enddocs %}

{% docs table_session_stats %}
This table aggregates the pageview level interactions to show session level media stats.
This table aggregates the base level plays to show session level media stats.
{% enddocs %}

{% docs table_user_stats %}
This table aggregates the pageview level interactions to show user level media stats.
This table aggregates the session level stats to show user level media stats.
{% enddocs %}

{% docs table_media_stats %}
Expand All @@ -37,5 +33,5 @@ This derived table aggregates information about ad views. Each ad view (a user v
{% enddocs %}

{% docs table_media_ads %}
This derived table aggregates information about ads. Each row represents one ad played within a certain media on a certain platform. Stats about the number of ad clicks, progress reached and more are calculated as total values but also as counts of unique users (identified using `domain_userid`).
This derived table aggregates information about ads. Each row represents one ad played within a certain media on a certain platform. Stats about the number of ad clicks, progress reached and more are calculated as total values but also as counts of unique users (identified using `user_identifier`).
{% enddocs %}
36 changes: 14 additions & 22 deletions docs/markdown/snowplow_media_player_overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,31 +4,23 @@

# Snowplow Media Player Package

Welcome to the documentation site for the Snowplow Media Player dbt package. The package is built as an extension of the [dbt-snowplow-web package][dbt-snowplow-web] that transforms raw media player event data into derived tables for easier querying generated by the Snowplow [JavaScript tracker][javascript-tracker] in combination with media tracking specific plugins such as the [Media Tracking plugin][media-tracking] or the [YouTube Tracking plugin][youtube-tracking].
Welcome to the documentation site for the Snowplow Media Player dbt package. The package transforms raw media player event data into derived tables for easier querying generated by the Snowplow [JavaScript tracker][javascript-tracker] in combination with media tracking specific plugins such as the [Media Tracking plugin][media-tracking] or the [YouTube Tracking plugin][youtube-tracking]. The package also supports media events generated by the Snowplow [iOS and Android trackers][mobile-media-tracker-docs].

**For more information, including the dependency on the Snowplow Web package as well as a QuickStart guide, operation and configuration, and implementing your own custom modules on top of this please visit the [Snowplow Docs](https://docs.snowplow.io/docs/modeling-your-data/modeling-your-data-with-dbt/) for more detailed information.**
**For more information, including a QuickStart guide, operation and configuration, and implementing your own custom modules on top of this please visit the [Snowplow Docs](https://docs.snowplow.io/docs/modeling-your-data/modeling-your-data-with-dbt/) for more detailed information.**

*Note this model design doc site is linked to latest release of the package. If you are not using the latest release, [generate and serve](https://docs.getdbt.com/reference/commands/cmd-docs#dbt-docs-serve) the doc site locally for accurate documentation.*

## Overview

This package consists of a series of dbt models with the goal to produce the following main aggregated models from the raw media player events and relevant contexts:
This package consists of a series of incremental dbt models with the goal to produce the following main aggregated models from the raw media player events and relevant contexts:

- `snowplow_media_player_base`: This derived table summarizes the key media player events and metrics of each media element on a media_identifier and pageview level which is considered as a base aggregation level for media interactions.

- `snowplow_media_player_plays_by_pageview`: This view removes impressions from the '_base' table to summarize media plays on a page_view by media_identifier level.

- `snowplow_media_player_media_stats`: This derived table aggregates the '_base' table to individual media_identifier level, calculating the main KPIs and overall video/audio metrics.

The package is built on top of the [dbt-snowplow-web package][dbt-snowplow-web] taking that as a basis to carry out the incremental update. It is designed to be run together with the web model in a similar manner to how a custom module would run:

The `_interactions_this_run` table takes the `snowplow_web_base_events_this_run` table generated by the web package as an input then adds the various contexts to enrich the base table with the additional media related fields. It could be used for custom models for more in-depth event level derived tables and further analysis.

The `_base_this_run` table then aggregates the `_interactions_this_run` table to media_identifier and pageview level and serves as a basis for the incrementalized derived table `_media_base`.

The main `_media_stats` derived table will also be updated incrementally based on the `_media_base` derived table, however not through the snowplow_incremental materialization, but using the native dbt incremental materialization on a pageview basis after a set time window passed. This is to prevent complex and expensive queries due to metrics which need to take the whole page_view events into calculation. This way the metrics will only be calculated once per pageview / media, after no new events are expected.

The additional `_pivot_base` table is there to calculate the percent_progress boundaries and weights that are used to calculate the total play_time and other related media fields.
| Model | Description |
|------------------------------------------|------------------------------------------------------------------------------------------------------------------|
| snowplow_media_player_base | A table summarizing media player events by media and pageview including impressions. |
| snowplow_media_player_plays_by_pageview | A view summarizing media plays by media on a pageview level. |
| snowplow_media_player_media_stats | An aggregated table of media metrics on a media_identifier level. |
| snowplow_media_player_media_ad_views | A view summarizing each ad viewed within a media playback (only for v2 schemas, see above). |
| snowplow_media_player_media_ads | An aggregated table of ad metrics for each ad played within each media content (only for v2 schemas, see above). |

## Installation

Expand All @@ -46,7 +38,7 @@ If you find a bug, please report an issue on GitHub.

The snowplow-media-player package is Copyright 2022 Snowplow Analytics Ltd.

Licensed under the [Apache License, Version 2.0][license] (the "License");
Licensed under the [Snowplow Personal and Academic License][license] (the "License");
you may not use this software except in compliance with the License.

Unless required by applicable law or agreed to in writing, software
Expand All @@ -55,8 +47,8 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

[license]: http://www.apache.org/licenses/LICENSE-2.0
[license-image]: http://img.shields.io/badge/license-Apache--2-blue.svg?style=flat
[license]: https://docs.snowplow.io/personal-and-academic-license-1.0/
[license-image]: http://img.shields.io/badge/license-Snowplow--Personal--and--Academic--1-blue.svg?style=flat
[tracker-classificiation]: https://docs.snowplow.io/docs/collecting-data/collecting-from-own-applications/tracker-maintenance-classification/
[early-release]: https://img.shields.io/static/v1?style=flat&label=Snowplow&message=Early%20Release&color=014477&labelColor=9ba0aa&logo=

Expand Down Expand Up @@ -88,7 +80,7 @@ limitations under the License.

[dbt-snowplow-web]: https://snowplow.github.io/dbt-snowplow-web/#!/overview/snowplow_web

[flutter-tracker]: https://docs.snowplow.io/docs/collecting-data/collecting-from-own-applications/flutter-tracker/
[mobile-media-tracker-docs]: https://docs.snowplow.io/docs/collecting-data/collecting-from-own-applications/mobile-trackers/tracking-events/media-tracking/


{% endraw %}
Expand Down
19 changes: 8 additions & 11 deletions integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,12 @@ models:
+enabled: '{{ target.type in ["redshift", "postgres"] | as_bool() }}'
snowflake:
+enabled: '{{ target.type == "snowflake" | as_bool() }}'
snowplow_media_player:
+persist_docs:
relation: true
columns: true
custom:
+enabled: true

vars:
snowplow__enable_media_ad: true
Expand All @@ -63,17 +69,8 @@ vars:
snowplow__enable_media_ad_break: true
snowplow__enable_ad_quartile_event: true
snowplow__enable_mobile_events: true

# Variables - Warehouse Specific
snowplow__page_view_context: 'snowplow_web_page_view_context'
snowplow__context_mobile_session: |
{% if target.type in ['postgres', 'redshift'] -%}
com_snowplowanalytics_snowplow_client_session_1
{%- elif target.type in ['bigquery'] -%}
contexts_com_snowplowanalytics_snowplow_client_session_1_0_2
{%- else -%}
contexts_com_snowplowanalytics_snowplow_client_session_1
{%- endif %}
snowplow__base_passthroughs: ['v_collector', {'sql': 'v_tracker || app_id', 'alias': 'tracker_app_id'}]
snowplow__ad_views_passthroughs: ['v_collector', {'sql': 'v_tracker || app_id', 'alias': 'tracker_app_id'}]

seeds:
quote_columns: false
Expand Down
Loading

0 comments on commit 3c0e407

Please sign in to comment.