Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update historical MD5 values for readme files #2113

Open
14 tasks
rija opened this issue Dec 2, 2024 · 0 comments
Open
14 tasks

Update historical MD5 values for readme files #2113

rija opened this issue Dec 2, 2024 · 0 comments

Comments

@rija
Copy link
Contributor

rija commented Dec 2, 2024

Make sure #2116 is completed before this ticket.

User story

As a curator
I want to bulk update in the database the checksum for the README files following bulk replacement of location
So that they match the actual README files

Acceptance criteria

Given I have a README file with format readme_DOI.txt in wasabi with associated outdated checksum metadata in DB
When I run the update process
Then the checksum in the database will be updated with the actual checksum of that README file

Given I have a README file in Wasabi with format readme_DOI.txt in wasabi but the file table entry is missing
When I run the update process
Then the checksum in the database will be updated with the actual checksum of that README file
And a new entry is added to the file table for that README file

Additional Info

Implementation

Before:
create temporary directories

For each DOI:

  • Gather (with rclone) checksum and filesize from Wasabi API for all the files for DOI and write them in DOI.md5 and DOI.filesizes on the filesystem in temporary location
  • If the readme_DOI.txt is found in DB, update database with MD5 checksum in files attributes and size in file table
  • If readme_DOI.txt file is not found in DB, add new row in file table, then update database with MD5 checksum in files attributes
  • upload the DOI.md5 and DOI.filesizes to Wasabi (overwriting existing ones if they exist)

After:
delete temporary directories

rclone checksum docs: https://rclone.org/commands/rclone_md5sum/#:~:text=By%20default%2C%20the%20hash%20is,algorithms%2C%20see%20the%20hashsum%20command

Product Backlog Item Ready Checklist

  • Business value is clearly articulated
  • Item is understood enough by the IT team so it can make an informed decision as to whether it can complete this item
  • Dependencies are identified and no external dependencies would block this item from being completed
  • At the time of the scheduled sprint, the IT team has the appropriate composition to complete this item
  • This item is estimated and small enough to comfortably be completed in one sprint
  • Acceptance criteria are clear and testable
  • Performance criteria, if any, are defined and testable
  • The Scrum team understands how to demonstrate this item at the sprint review

Product Backlog Item Done Checklist

  • Item(s) in increment pass all Acceptance Criteria
  • Code is refactored to best practices and coding standards
  • Documentation is updated as needed
  • Data security has not been compromised (with particular reference to the personal information we hold in GigaDB)
  • No deviation from the team technology stack and software architecture has been introduced
  • The product is in a releasable state (i.e. the increment has not broken anything)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

2 participants