-
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create out_zenodo_logs
to track changes in downloads over time
#209
base: main
Are you sure you want to change the base?
Conversation
05d984a
to
d35771d
Compare
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left just a couple of questions and suggestions.
Column( | ||
"version_id", | ||
Integer, | ||
primary_key=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nit but could you move this up to top of this table definition to it's easy to understand the full primary key of the table please?
Just so I understand, the version number represents a release of a dataset? I've been out of the zenodo world for a while.
|
||
new_df = df.stack() | ||
# Rename the diff columns | ||
new_df = new_df.rename({col: "new_" + col for col in metrics_cols}, axis=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you decide to rename these columns?
@@ -466,6 +466,144 @@ | |||
Column("partition_key", String), | |||
) | |||
|
|||
out_zenodo_logs = Table( | |||
"out_zenodo_logs", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be helpful to add table descriptions as doc strings or via Table.comment
so we can easily understand the differences between the core and out zenodo tables.
Overview
Closes #215.
What problem does this address?
Makes it simpler to track changes in Zenodo downloads by version over time. Plotting diffs in Superset makes it impossible to make the legend legible, or to account for the first day of downloads.
What did you change in this PR?
Add
out_zenodo_logs
, which calculates the difference in downloads and views daily for each version. Also backfills to 0 where the version does not yet exist to make it simpler to handle new records.core_zenodo_logs
input to not drop Saturday data by making it a non-partitioned assetTesting
How did you make sure this worked? How can a reviewer verify this?
Generate and then inspect
out_zenodo_logs
.To-do list
Tasks