Redesign of bioimage.io collection #659

FynnBe · 2023-10-20T21:21:00Z

Here are some notes that hopefully --- through some more discussion --- will turn into an overview of coming changes.
Special thanks to @oeway and @k-dominik for discussions so far.

Shortcomings of current system

Unclear source of truth --- descriptions on zenodo are patched in this GH repo
Users may suffer from rate limits set out of our contorl (by Zenodo or GitHub)
Lengthy loop of (1) proposing new descriptions (currently upload to zenodo), (2) testing them on our side, and updating the proposal (1) resulting in unusable versions.

Currently ruled out, potential ways to address shortcomings

don't patch

con: published Zenodo records are often in need of patching

cache to an S3 storage under our control

con: with multiple sources of truth and descriptions contributed by partners directly via GH, keeping a valid cache is challenging and itself relies on access to GH/Zenodo

Use Zenodo sandbox for description proposals

con: may disappear if proposal proceeds too slowly, storage not under our control

The currently most promising way to address shortcomings

S3 first approach:

Proposals get bioimagieo internal id right away, once they are accepted we publish them on zenodo and add the concept doi and version doi ¹(maybe we make the version field mandatory from now on?, so semantiv versions can be mapped to dois?).
description updates get a bioimageio internal id right away (maybe 'update-' + their id?), once the update is accepted we publish it on zenodo and get a new version doi.

The S3 first approach makes sure that we are in control of any rate limits
S3 first approach allows for immediate evaluation of user uploads

Cons of "S3 first"

not free
renders Zenodo's download statistics for bioimageio descriptions meaningless (we need our own solution, @oeway proposed a light-weight proxy service that can keep track of accesses; alternativley, maybe there are even some "built-in" mechanisms for this? something in the direction of https://docs.aws.amazon.com/AmazonS3/latest/userguide/aws-usage-report.html)

Still unclear (to me) about "S3 first"

replacement/update of the current resource description review process including the generated PR that serves as a space to have a chat between contributor and bioimageio maintainers.
- Maybe we can use https://gitter.im/ ? Apparently there is a matrix.org based API, so we could create a channel for each resource description.
- looking into gitter brings me to our AI4Life matrix ...

Details in need of further discussion/thought

Use of S3 object redirects to realize concept of resource pointing the latest version (alternative is of course simple duplication)

note that one can reserve a DOI and then, e.g. include it in files in that record, see "Can I know the DOI of my record before publishing, so that I can include it in the paper or dataset?" ↩

FynnBe · 2023-10-30T13:53:20Z

Thanks for discussion @jmetz

Our idea:

use GitLab

resource (model) contributors have account on our GitLab server
upload to S3 creates repo (under their own user account)
merge request to the general collection
testing etc with CI (and possibly on GPUs)

We need

GitLab instance
- Test CI (rather simple as per resource)
- figure out a bunch of details!

jmetz · 2023-10-30T15:35:22Z

Also as GitLab can be configured to use S3 for all of its storage, this might even simplify things further.

FynnBe added the help wanted Extra attention is needed label Oct 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redesign of bioimage.io collection #659

Redesign of bioimage.io collection #659

FynnBe commented Oct 20, 2023 •

edited

Loading

FynnBe commented Oct 30, 2023

jmetz commented Oct 30, 2023

Redesign of bioimage.io collection #659

Redesign of bioimage.io collection #659

Comments

FynnBe commented Oct 20, 2023 • edited Loading

Shortcomings of current system

Currently ruled out, potential ways to address shortcomings

The currently most promising way to address shortcomings

Cons of "S3 first"

Still unclear (to me) about "S3 first"

Details in need of further discussion/thought

Footnotes

FynnBe commented Oct 30, 2023

use GitLab

jmetz commented Oct 30, 2023

FynnBe commented Oct 20, 2023 •

edited

Loading