Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create out_zenodo_logs to track changes in downloads over time #209

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

e-belfer
Copy link
Member

@e-belfer e-belfer commented Oct 25, 2024

Overview

Closes #215.

What problem does this address?
Makes it simpler to track changes in Zenodo downloads by version over time. Plotting diffs in Superset makes it impossible to make the legend legible, or to account for the first day of downloads.

What did you change in this PR?
Add out_zenodo_logs, which calculates the difference in downloads and views daily for each version. Also backfills to 0 where the version does not yet exist to make it simpler to handle new records.

  • Update mapping of core_zenodo_logs input to not drop Saturday data by making it a non-partitioned asset
  • Fix dashboards, which are summing dataset views rather than version views and thus duplicating data.

Testing

How did you make sure this worked? How can a reviewer verify this?
Generate and then inspect out_zenodo_logs.

To-do list

Tasks

Preview Give feedback

@e-belfer e-belfer self-assigned this Oct 25, 2024
@e-belfer e-belfer added the zenodo Relating to Zenodo usage metrics label Oct 25, 2024
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@e-belfer e-belfer requested a review from bendnorman December 6, 2024 21:58
Copy link
Member

@bendnorman bendnorman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left just a couple of questions and suggestions.

Column(
"version_id",
Integer,
primary_key=True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nit but could you move this up to top of this table definition to it's easy to understand the full primary key of the table please?

Just so I understand, the version number represents a release of a dataset? I've been out of the zenodo world for a while.


new_df = df.stack()
# Rename the diff columns
new_df = new_df.rename({col: "new_" + col for col in metrics_cols}, axis=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you decide to rename these columns?

@@ -466,6 +466,144 @@
Column("partition_key", String),
)

out_zenodo_logs = Table(
"out_zenodo_logs",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be helpful to add table descriptions as doc strings or via Table.comment so we can easily understand the differences between the core and out zenodo tables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
zenodo Relating to Zenodo usage metrics
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

Zenodo dashboard inconsistencies
3 participants