Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archive vcerare data #438

Merged
merged 21 commits into from
Oct 22, 2024
Merged

Archive vcerare data #438

merged 21 commits into from
Oct 22, 2024

Conversation

e-belfer
Copy link
Member

@e-belfer e-belfer commented Oct 4, 2024

Overview

Closes #434.

What problem does this address?
Adds vcerare data to Zenodo. Because this data was provided to us and isn't available online, we upload our copy to a GCS bucket so we can archive from a stable source, as well as capture any file changes over time as new data is added.

What did you change in this PR?

  • Archive vcerare files from sources.catalyst.coop` GCS bucket
  • Add .pdf and .md filetypes to frictionless
  • Add vcerare to run-archiver.yml and add GCS configuration

Note: Zenodo does not allow you to set which files are previewed from the API, and it automatically previews the "first file", which it determines by alphabetical sorting - this will always be datapackage.json for this repository. We'll need to manually check "preview" next to the README file before publishing when we update this dataset.

Note 2: This PR contemplated adding contributor types to the Zenodo payload. We were setting them in zenodo_role but not passing them to the Zenodo API. I configured this to specify type for creator, as is possible in the GUI, but this is only possible for collaborators in the API. So, unless we reconfigure the way we handle creators this isn't possible - for now, we'll just made any adjustments manually in the GUI since we rarely refresh metadata.

Testing

How did you make sure this worked? How can a reviewer verify this?
Run pudl_archiver --datasets vcerare --sandbox --initialize. See also: https://zenodo.org/records/13919960

To-do list

Tasks

@e-belfer e-belfer added zenodo vcerare VCE Resource Adequacy Renewable Energy (RARE) data labels Oct 4, 2024
@e-belfer e-belfer self-assigned this Oct 4, 2024
@e-belfer e-belfer changed the title Archive vceregen data and actually pass contributor type to Zenodo metadata Archive vceregen data Oct 11, 2024
@e-belfer e-belfer changed the title Archive vceregen data Archive vcerare data Oct 15, 2024
Copy link
Member

@zschira zschira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I had one note for a potential future improvement that I don't think we really need to worry about at this point, and one minor non-blocking style suggestion.

bucket = storage.Client().get_bucket(self.bucket_name)
blobs = bucket.list_blobs(prefix=f"{self.name}/") # Get all blobs in folder

for blob in blobs:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking: At some point it might be cool to use universal-pathlib which allows you to work with GCS objects like normal python Path's. I definitely don't think it's worth the effort right now though.

# If data source was manually archived by us, specify that the
# data_source.path is a documentation link, rather than where we archived
# the data from.
if data_source_id in ["gridpathratoolkit", "vceregen"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be slightly cleaner to just set the title/description in the if statement and have a single return at the end since the rest of the fields are all the same.

@e-belfer e-belfer merged commit 809c37c into main Oct 22, 2024
3 checks passed
@e-belfer e-belfer deleted the vceregen branch October 22, 2024 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
vcerare VCE Resource Adequacy Renewable Energy (RARE) data zenodo
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Archive the raw vceregen data on Zenodo
3 participants