Skip to content

Commit

Permalink
Add unique media identifier (close #59)
Browse files Browse the repository at this point in the history
  • Loading branch information
georgewoodhead committed Nov 13, 2023
1 parent dfb1e8b commit 2f41fde
Show file tree
Hide file tree
Showing 38 changed files with 454 additions and 600 deletions.
13 changes: 13 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,3 +1,16 @@
snowplow-media-player 0.7.0 (2023-xx-xx)
---------------------------------------
## Summary
This release adds a more robust unique media identifier. This fixes an issue where duplicate `media_id` values could occur in the media stats table as a result of incorrect tracking implementation (e.g. sharing the same media label across different media types).

## Features
Add unique media identifier (close #59)

## Under the hood

## 🚨 Breaking Changes 🚨
This version requires a full refresh run if you have been using any previous versions. You will not be able to upgrade and have the package work without doing a full refresh. Check out the [migration guide](https://docs.snowplow.io/docs/modeling-your-data/modeling-your-data-with-dbt/migration-guides/media-player/) for more information when you upgrade.

snowplow-media-player 0.6.1 (2023-10-04)
---------------------------------------
## Summary
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ The package contains multiple staging models however the mart models are as foll
|------------------------------------------|------------------------------------------------------------------------------------------------------------------|
| snowplow_media_player_base | A table summarizing media player events by media and pageview including impressions. |
| snowplow_media_player_plays_by_pageview | A view summarizing media plays by media on a pageview level. |
| snowplow_media_player_media_stats | An aggregated table of media metrics on a media_id level. |
| snowplow_media_player_media_stats | An aggregated table of media metrics on a media_identifier level. |
| snowplow_media_player_media_ad_views | A view summarizing each ad viewed within a media playback (only for v2 schemas, see above). |
| snowplow_media_player_media_ads | An aggregated table of ad metrics for each ad played within each media content (only for v2 schemas, see above). |

Expand Down
2 changes: 1 addition & 1 deletion dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: 'snowplow_media_player'
version: '0.6.1'
version: '0.7.0'
config-version: 2

require-dbt-version: ['>=1.4.0', '<2.0.0']
Expand Down
14 changes: 9 additions & 5 deletions docs/markdown/snowplow_media_player_common_cols.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,16 @@
A UUID for each event e.g. `c6ef3124-b53a-4b13-a233-0088f79dcbcb`.
{% enddocs %}

{% docs col_media_id %}
The unique identifier of a specific media element. It is the `player_id` in case of YouTube and `html_id` in case of HTML5.
{% docs col_media_identifier %}
The surrogate key generated from `media_id`, `media_label`, `media_type` and `media_player_type` to create a unique media element identifier.
{% enddocs %}

{% docs col_player_id %}
The HTML id attribute of the media content. It is the `player_id` in case of YouTube and `html_id` in case of HTML5.
{% enddocs %}

{% docs col_play_id %}
The surrogate key generated from `page_view_id` and `media_id `to create a unique play event identifier.
The surrogate key generated from `page_view_id`, `media_id`, `media_label`, `media_type` and `media_player_type` to create a unique play event identifier.
{% enddocs %}

{% docs col_page_view_id %}
Expand Down Expand Up @@ -199,7 +203,7 @@ Average playback rate (1 is normal speed).
{% enddocs %}

{% docs col_play_rate %}
Total plays divided by impressions. Please note that as the base for media plays is pageview / media_id, in case the same video is played multiple times within the same pageview, it will still count as one play.
Total plays divided by impressions. Please note that as the base for media plays is pageview / media_identifier, in case the same video is played multiple times within the same pageview, it will still count as one play.
{% enddocs %}

{% docs col_complete_plays %}
Expand Down Expand Up @@ -905,7 +909,7 @@ The index of the event in the corresponding session.
{% enddocs %}

{% docs col_media_ad_id %}
Generated identifier that identifies an ad (identified using the ad_id) played with a specific media (identified using the media_id) and on a specific platform (based on the platform property).
Generated identifier that identifies an ad (identified using the ad_id) played with a specific media (identified using the media_identifier) and on a specific platform (based on the platform property).
{% enddocs %}

{% docs col_ad_id %}
Expand Down
6 changes: 3 additions & 3 deletions docs/markdown/snowplow_media_player_macro_docs.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,13 +124,13 @@ select
{% endraw %}
{% enddocs %}

{% docs macro_media_id_field %}
{% docs macro_player_id_field %}
{% raw %}
This macro produces the value media_id column in the snowplow_media_player_base_events_this_run table based on the values of the youtube_player_id and media_player_id columns.
This macro produces the value player_id column in the snowplow_media_player_base_events_this_run table based on the values of the youtube_player_id and media_player_id columns.

#### Returns

The query for the media_id column.
The query for the player_id column.

#### Usage

Expand Down
8 changes: 4 additions & 4 deletions docs/markdown/snowplow_media_player_overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,17 +14,17 @@ Welcome to the documentation site for the Snowplow Media Player dbt package. The

This package consists of a series of dbt models with the goal to produce the following main aggregated models from the raw media player events and relevant contexts:

- `snowplow_media_player_base`: This derived table summarises the key media player events and metrics of each media element on a media_id and pageview level which is considered as a base aggregation level for media interactions.
- `snowplow_media_player_base`: This derived table summarizes the key media player events and metrics of each media element on a media_identifier and pageview level which is considered as a base aggregation level for media interactions.

- `snowplow_media_player_plays_by_pageview`: This view removes impressions from the '_base' table to summarise media plays on a page_view by media_id level.
- `snowplow_media_player_plays_by_pageview`: This view removes impressions from the '_base' table to summarize media plays on a page_view by media_identifier level.

- `snowplow_media_player_media_stats`: This derived table aggregates the '_base' table to individual media_id level, calculating the main KPIs and overall video/audio metrics.
- `snowplow_media_player_media_stats`: This derived table aggregates the '_base' table to individual media_identifier level, calculating the main KPIs and overall video/audio metrics.

The package is built on top of the [dbt-snowplow-web package][dbt-snowplow-web] taking that as a basis to carry out the incremental update. It is designed to be run together with the web model in a similar manner to how a custom module would run:

The `_interactions_this_run` table takes the `snowplow_web_base_events_this_run` table generated by the web package as an input then adds the various contexts to enrich the base table with the additional media related fields. It could be used for custom models for more in-depth event level derived tables and further analysis.

The `_base_this_run` table then aggregates the `_interactions_this_run` table to media_id and pageview level and serves as a basis for the incrementalized derived table `_media_base`.
The `_base_this_run` table then aggregates the `_interactions_this_run` table to media_identifier and pageview level and serves as a basis for the incrementalized derived table `_media_base`.

The main `_media_stats` derived table will also be updated incrementally based on the `_media_base` derived table, however not through the snowplow_incremental materialization, but using the native dbt incremental materialization on a pageview basis after a set time window passed. This is to prevent complex and expensive queries due to metrics which need to take the whole page_view events into calculation. This way the metrics will only be calculated once per pageview / media, after no new events are expected.

Expand Down
Loading

0 comments on commit 2f41fde

Please sign in to comment.