Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datalad/git-annex metadata to file restriction/embargo #303

Open
bpinsard opened this issue Apr 5, 2024 · 2 comments
Open

datalad/git-annex metadata to file restriction/embargo #303

bpinsard opened this issue Apr 5, 2024 · 2 comments

Comments

@bpinsard
Copy link

bpinsard commented Apr 5, 2024

Hi data-lads!

With dataverse becoming a norm for data preservation (and sharing), I am experimenting with pushing existing datasets to dataverse (borealis) and envisioning adding this deployment to the workflow for new datasets (if the platform can support the volume we generate).

I was wondering if it would be possible for dataset with hybrid sensitivity level (eg. subset of file marked with distribution-restrictions=sensitive ) or subject consent level, to automatically assigned restrictions when uploading files to dataverse. I am not familiar with dataverse API, or pydataverse but I would expect that to be possible.

Thanks for developing that extension. I've been following the distribits stream and there were very interesting bits.

Copy link

welcome bot commented Apr 5, 2024

Hi! 👋 We are happy that you opened your first issue here! 😄 If you haven't done so already, please make sure you check out our Code of Conduct.

@bpoldrack
Copy link
Member

bpoldrack commented Apr 6, 2024

Hi @bpinsard!

Something like that is possible with different approaches.

  1. One thing that comes with git-annex natively is preferred content configurations per remote. So, you could configure your dataset to know that particular content simply shouldn't be pushed to a dataverse remote but only be available from another location w/ restricted access. Someone cloning the dataset from dataverse would then still get the information about that location and possibly access (provided credentials). That's one way of addressing this.

  2. Another one would be to flag such files on dataverse using the "restricted files" feature. https://guides.dataverse.org/en/latest/user/dataset-management.html#restricted-files, https://guides.dataverse.org/en/latest/api/native-api.html#add-a-file-to-a-dataset
    There's no support for that in datalad-dataverse currently, but that's certainly doable. Problem would be to define a generalizable way of declaring what should be flagged like that. Not sure whether requiring a very specific type of (annex-)metadata is the nicest way to do that. Do you have any ideas on that?

  3. One could "simply" use git-annex to store such content encrypted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants