Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesign of bioimage.io collection #659

Open
FynnBe opened this issue Oct 20, 2023 · 2 comments
Open

Redesign of bioimage.io collection #659

FynnBe opened this issue Oct 20, 2023 · 2 comments
Labels
help wanted Extra attention is needed

Comments

@FynnBe
Copy link
Member

FynnBe commented Oct 20, 2023

Here are some notes that hopefully --- through some more discussion --- will turn into an overview of coming changes.
Special thanks to @oeway and @k-dominik for discussions so far.

Shortcomings of current system

  1. Unclear source of truth --- descriptions on zenodo are patched in this GH repo
  2. Users may suffer from rate limits set out of our contorl (by Zenodo or GitHub)
  3. Lengthy loop of (1) proposing new descriptions (currently upload to zenodo), (2) testing them on our side, and updating the proposal (1) resulting in unusable versions.

Currently ruled out, potential ways to address shortcomings

  1. don't patch
  • con: published Zenodo records are often in need of patching
  1. cache to an S3 storage under our control
  • con: with multiple sources of truth and descriptions contributed by partners directly via GH, keeping a valid cache is challenging and itself relies on access to GH/Zenodo
  1. Use Zenodo sandbox for description proposals
  • con: may disappear if proposal proceeds too slowly, storage not under our control

The currently most promising way to address shortcomings

  1. S3 first approach:
  • Proposals get bioimagieo internal id right away, once they are accepted we publish them on zenodo and add the concept doi and version doi 1(maybe we make the version field mandatory from now on?, so semantiv versions can be mapped to dois?).
  • description updates get a bioimageio internal id right away (maybe 'update-' + their id?), once the update is accepted we publish it on zenodo and get a new version doi.
  1. The S3 first approach makes sure that we are in control of any rate limits
  2. S3 first approach allows for immediate evaluation of user uploads

Cons of "S3 first"

Still unclear (to me) about "S3 first"

  • replacement/update of the current resource description review process including the generated PR that serves as a space to have a chat between contributor and bioimageio maintainers.
    • Maybe we can use https://gitter.im/ ? Apparently there is a matrix.org based API, so we could create a channel for each resource description.
    • looking into gitter brings me to our AI4Life matrix ...

Details in need of further discussion/thought

  • Use of S3 object redirects to realize concept of resource pointing the latest version (alternative is of course simple duplication)

Footnotes

  1. note that one can reserve a DOI and then, e.g. include it in files in that record, see "Can I know the DOI of my record before publishing, so that I can include it in the paper or dataset?"

@FynnBe FynnBe added the help wanted Extra attention is needed label Oct 24, 2023
@FynnBe
Copy link
Member Author

FynnBe commented Oct 30, 2023

Thanks for discussion @jmetz

Our idea:

use GitLab

  • resource (model) contributors have account on our GitLab server
  • upload to S3 creates repo (under their own user account)
  • merge request to the general collection
  • testing etc with CI (and possibly on GPUs)

We need

  • GitLab instance
    • Test CI (rather simple as per resource)
    • figure out a bunch of details!

@jmetz
Copy link

jmetz commented Oct 30, 2023

Also as GitLab can be configured to use S3 for all of its storage, this might even simplify things further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants