-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEV-1086 - monthly reports #125
Conversation
- dedupes output - omits anything that is in the target dataset at time of report
19aa9ec
to
13735ed
Compare
13735ed
to
6902c6d
Compare
Note that the missed coverage is I think only on a branch that actually sends email, which we don't really want to do (or try to do), and has previously been working in production |
Also note that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
APPROVE (as mwarin would say)
I even did some mean things to the CLI and it reacted in entirely unsurprising ways.
The only thing I don't like is the e-mail text in a heredoc. There would be a complexity cost associated with doing it otherwise (in a file, as a const) so set it aside as a matter of aesthetics.
I moved the part of While |
I will go ahead and deploy this; the daily |
Overview
Data sets created for the HathiTrust research center consist of directory tree of zip files containing only the OCR text files for each volume. There are four subsets with various conditions for inclusion based around the rights attributes of the item. Each night, items are added and removed from the subsets each day based on newly-ingested items and updated rights.
Currently, researchers who use these data sets get an email each day with items removed from the data sets. This is rather noisy, especially since in certain scenarios items can change rights frequently -- note this is also undesired behavior, but even if it was fixed, it is not necessary to send nightly deletion logs. Instead, we want to report on items deleted from the data sets once a month.
Implementation
DedupeDeleteLog
reads these deletion logs, compares against what is in the data set at the end of the month, and outputs only those that remain deleted.bin/notify.rb
is moved tolib/datasets/notify.rb
and integrated with the rest of the CLITesting
DEV-1086-monthly-reports
branchdocker compose build
anddocker compose run test bundle install
ictc-ht-datasets-000:/htprep/datasets/logs/delete_notifications_sent
to a convenient location under this repository (let's say "deletelogs")docker compose run processor bin/datasets.rb notify --dry-run deletelogs/*
bundle install
, and rundocker compose run test
to run those tests yourselfYou should see the text of the emails output.
Review questions
I think in general I'm pretty happy with this; I don't think there are any specific areas I'm looking for feedback on.