Skip to content

Design document for the Zenodo like DOI per dandiset #2012

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 24 commits into
base: master
Choose a base branch
from

Conversation

yarikoptic
Copy link
Member

@yarikoptic yarikoptic commented Aug 22, 2024

A design doc composed with @djarecka to avoid dummy DOIs for dandisets

refs:

TODOs

  • complete initial pass
  • seek review
  • make explicit that DOI generation is optional overall, since LINC does not even need it (ref)
  • potentially add a sequence diagram of interactions between user, archive, datacite across different stages of embargoed -> public dandisets

but could already be checked out by @dandi/archive-maintainers folks since overall idea is formulated already and some early concerns/questions could already be asked/answered

@djarecka
Copy link
Member

djarecka commented Jan 19, 2025

I created some test to simulate the workflow in dandi/dandi-schema#275

- We might want a dedicated 404 page for deleted dandisets, or at least a message that the dandiset was deleted, and ideally describe the reason why it was deleted ("Upon request of maintainer", "Due to violation of terms of service", etc.)
- Then we adjust DOI record to point to that page.

- Should we do anything at dandischema level?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djarecka we are to answer those I think before we could call this "done" ;-)

- Upon changes to a non-embargoed, draft dandiset metadata record:
- If `Draft DOI`, attempt to "promote" it to `Findable`.
- If validation fails - keep `Draft DOI` (very limited validation), attempt to update datacite metadata record while keeping the same target URL.
- **Question to clear up**: what happens to `Draft DOI` if metadata record is invalid? It seems to create one with no metadata, but does it update only the fields it knows about?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djarecka I feel like you clarified on this but we did not put it "in writing" here. What do you remember on this aspect?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @djarecka

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, sorry, i just gave "thumbs up", since you were right, Draft DOI doesn't have to have metadata, it can only have url, and if has more, it will be added

@asmacdo asmacdo mentioned this pull request Apr 22, 2025
18 tasks

A django-admin script should be created and executed to create a `Dandiset DOI` for all existing dandisets.

**Question to address**: Will adding a `Dandiset DOI` in addition to `Version DOI` require a db migration?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the POC I've added a doi field to the Dandiset model which does add a migration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't there also a "draft" Version with DOI (and that's where I guess we inject a fake one), i.e. could we avoid changing DB model?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit that I "just dont like that" since the Dandiset DOI semantically belongs to the Dandiset. I cant think of a reason this wouldnt work, but just feels messy.

To retrieve the Dandiset DOI via Django, someversion.dandiset.draft_version.doi IMO violates the "principle of least surprise".

Prior to publication the Dandiset DOI will point to the draft version (via the DLP) but after publication the dandiset DOI will point to the latest publication, so that would also be surprising

- **Follow up concern**: after dandiset and DOI publish, metadata of the Draft version of the dandiset could still be changed.
This potentially making changed record again "invalid".
Should be Ok'ish
- Test site of datacite had different result of validation that the primary one
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any more information about this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might have been a bad memory, @djarecka do you have any information on this or should we just remove this?

Suggested change
- Test site of datacite had different result of validation that the primary one

Copy link
Member

@djarecka djarecka Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove it. It was based on my memory of some old issue, but wasn't able to reproduce, or find old records.

@asmacdo
Copy link
Member

asmacdo commented Apr 24, 2025

For dandi-schema, we may also want to pull the validation out of to_datacite. This validation does not use the datacite API, rather validation occurs against the schema which has been pre-fetched and committed into the dandi-schema repo.

Currently to_datacite accepts an optional arg validate, which defaults to False-- the only use of to_datacite in dandi-archive DOES NOT enable validation!. I'm curious what problems this could cause, and if those problems have actually occurred-- maybe there are more Dandisets with DOIs stored in the data model without a DOI minted? Or maybe they were created but set to Draft due to validation errors? We would have logged exceptions if the datacite API did not accept a new DOI.

From this design doc only, I gather that we believe Draft DOIs have less stringent validation than Findable, but I have not found any upstream documentation that confirms this. If it is the case, I suggest we pull the validation out of the to_datacite function, which would only be responsible for constructing the API payload. Then, in dandi-archive, we perform a validation. If valid we publish a Findable DOI, and if invalid we fallback to Draft DOI if it fails validation.

Either way, we need to consider using some kind of validation. @djarecka @yarikoptic I suggest this as topic for our discussion tomorrow.

@djarecka
Copy link
Member

For dandi-schema, we may also want to pull the validation out of to_datacite. This validation does not use the datacite API, rather validation occurs against the schema which has been pre-fetched and committed into the dandi-schema repo.

correct!

Currently to_datacite accepts an optional arg validate, which defaults to False-- the only use of to_datacite in dandi-archive DOES NOT enable validation!. I'm curious what problems this could cause, and if those problems have actually occurred-- maybe there are more Dandisets with DOIs stored in the data model without a DOI minted? Or maybe they were created but set to Draft due to validation errors? We would have logged exceptions if the datacite API did not accept a new DOI.

That's probably not good, if the validation is not used before publishing. Was not aware.

From this design doc only, I gather that we believe Draft DOIs have less stringent validation than Findable, but I have not found any upstream documentation that confirms this. If it is the case, I suggest we pull the validation out of the to_datacite function, which would only be responsible for constructing the API payload. Then, in dandi-archive, we perform a validation. If valid we publish a Findable DOI, and if invalid we fallback to Draft DOI if it fails validation.

Yes, no documentation, but that's the case...

## Concerns to keep in mind/address

- **Question to clear up**: what happens to `Draft DOI` if metadata record is invalid?
- It seems to create one with no metadata, but does it update only the fields it knows about?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djarecka did you check in your experiments?

@yarikoptic yarikoptic changed the title Initial draft of a design document for the Zenodo like DOI per dandiset Design document for the Zenodo like DOI per dandiset May 14, 2025
asmacdo added a commit to asmacdo/dandi-archive that referenced this pull request May 21, 2025
- Dandiset DOI will redirect to the DLP
- Example: 10.80507/dandi.000004
- Dandiset DOI is stored in the doi field of the draft version
- Dandiset DOI metadata (on Datacite) will match the draft version until
  first publication
- Once a Dandiset is published, the Dandiset DOI metadata will match the
  latest publication

See the design document for more details: dandi#2012
asmacdo added a commit to asmacdo/dandi-archive that referenced this pull request May 21, 2025
- Dandiset DOI will redirect to the DLP
- Example: 10.80507/dandi.000004
- Dandiset DOI is stored in the doi field of the draft version
- Dandiset DOI metadata (on Datacite) will match the draft version until
  first publication
- Once a Dandiset is published, the Dandiset DOI metadata will match the
  latest publication

See the design document for more details: dandi#2012
yarikoptic and others added 24 commits June 2, 2025 10:44
…previous failures

Clarify no DOIs for dandisets while embargoed
Co-authored-by: Yaroslav Halchenko <[email protected]>
fallback to Dandiset pydantic model would require changes to that model.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants