Skip to content
This repository has been archived by the owner on Aug 4, 2023. It is now read-only.

Add Rawpixel to image popularity recalculation logic #897

Merged
merged 2 commits into from
Dec 2, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docker/local_postgres/0007_openledger_audio_view.sql
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@ INSERT INTO public.audio_popularity_metrics (
provider, metric, percentile
) VALUES
('wikimedia_audio', 'global_usage_count', 0.85),
('jamendo', 'listens', 0.85);
('jamendo', 'listens', 0.85),
('freesound', 'num_downloads', 0.85);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! Perhapsfreesound should be included in the title too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just the SQL DDL, which is only used locally. Since Freesound was present in logic file, it was added (either manually or automatically) to the appropriate table quite a while ago. Confirmed in production:

deploy@localhost:openledger> select * from audio_popularity_metrics;
+-----------------+--------------------+------------+
| provider        | metric             | percentile |
|-----------------+--------------------+------------|
| jamendo         | listens            | 0.85       |
| wikimedia_audio | global_usage_count | 0.85       |
| freesound       | num_downloads      | 0.85       |
+-----------------+--------------------+------------+
SELECT 3
Time: 0.098s

So this is more a bookkeeping change than anything 😄



CREATE FUNCTION audio_popularity_percentile(
Expand Down
1 change: 1 addition & 0 deletions justfile
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ _deps:
docker-compose {{ DOCKER_FILES }} run \
-e AIRFLOW_VAR_INGESTION_LIMIT=1000000 \
-v {{ justfile_directory() }}/tests:/usr/local/airflow/tests/ \
-v {{ justfile_directory() }}/docker:/usr/local/airflow/docker/ \
-v {{ justfile_directory() }}/pytest.ini:/usr/local/airflow/pytest.ini \
--rm \
{{ SERVICE }} \
Expand Down
1 change: 1 addition & 0 deletions openverse_catalog/dags/common/popularity/sql.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@
"flickr": {"metric": "views"},
"wikimedia": {"metric": "global_usage_count"},
"stocksnap": {"metric": "downloads_raw"},
"rawpixel": {"metric": "download_count"},
}

AUDIO_POPULARITY_METRICS = {
Expand Down
25 changes: 25 additions & 0 deletions tests/dags/common/popularity/test_sql.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
import os
import re
from collections import namedtuple
from pathlib import Path
from textwrap import dedent
from typing import NamedTuple

Expand All @@ -8,6 +10,7 @@
from common.popularity import sql


DDL_DEFINITIONS_PATH = Path(__file__).parents[4] / "docker" / "local_postgres"
POSTGRES_CONN_ID = os.getenv("TEST_CONN_ID")
POSTGRES_TEST_URI = os.getenv("AIRFLOW_CONN_POSTGRES_OPENLEDGER_TESTING")

Expand Down Expand Up @@ -500,3 +503,25 @@ def test_image_view_calculates_std_pop(postgres_with_image_table, table_info):
rd["fid_d"] == 0.75,
]
)


@pytest.mark.parametrize(
"ddl_filename, metrics",
[
("0004_openledger_image_view.sql", sql.IMAGE_POPULARITY_METRICS),
("0007_openledger_audio_view.sql", sql.AUDIO_POPULARITY_METRICS),
],
)
def test_ddl_matches_definitions(ddl_filename, metrics):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So cool!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I always think of adding it to the SQL file but I didn't even remember the logic file existed 😅

ddl = (DDL_DEFINITIONS_PATH / ddl_filename).read_text()
if not (
match := re.search(
r"INSERT INTO public.\w+_popularity_metrics.*?;",
ddl,
re.MULTILINE | re.DOTALL,
)
):
raise ValueError(f"Could not find insert statement in ddl file {ddl_filename}")

for provider in metrics:
assert provider in match.group(0)