Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAREamics as a community partner #100

Open
melisande-c opened this issue Oct 17, 2024 · 15 comments
Open

CAREamics as a community partner #100

melisande-c opened this issue Oct 17, 2024 · 15 comments

Comments

@melisande-c
Copy link
Contributor

melisande-c commented Oct 17, 2024

We would like to add CAREamics as a community partner to the bioimage model zoo!

About

  • CAREamics is a PyTorch library aimed at simplifying the use of Noise2Void and its many variants and cousins (CARE, Noise2Noise, N2V2, P(P)N2V, HDN, muSplit etc.).

Resources

  • CAREamics has functions to export and load from bio-image-zoo archive files, this allows users to easily package models to upload to the zoo.

Maintenance

  • CAREamics has a permanent engineering team and we will be committed to staying compatible with the bioimage model zoo.

Links

CAREamics organisation: https://github.com/CAREamics
CAREamics source code: https://github.com/CAREamics/careamics

@FynnBe
Copy link
Member

FynnBe commented Nov 8, 2024

Let's get CAREamics onboard!

Here you can find details on the technical steps of how to become a community partner (I will help you complete these steps):
https://github.com/bioimage-io/collection?tab=readme-ov-file#add-community-partner

  • add your info to the bioimageio_collection_config.json analog to this example
  • compatibility check script
  • compatibility check workflow

@melisande-c
Copy link
Contributor Author

Hi @FynnBe, thanks for getting back to this, I will make the PR adding the CAREamics info to the json file shortly.

For the compatibility script, is there a way to test it is working as expected before we make a PR? I gather we just need to save a compatibility report file using the CompatibilityReport class, is there anything else we need to do?

@FynnBe
Copy link
Member

FynnBe commented Nov 12, 2024

No, that's pretty much it. You can also use the CompatibilityReport(TypedDict) as a typed dict in your code if you have issues with the dependencies of the collection_backoffice.

@FynnBe
Copy link
Member

FynnBe commented Nov 12, 2024

Small update:
I have refactored the ilastik example to provide script_utils.check_tool_compatibility:

def check_tool_compatibility(
tool_name: str,
tool_version: str,
*,
all_version_path: Path,
output_folder: Path,
check_tool_compatibility_impl: Callable[
[str, str], Union[CompatibilityReportDict, "CompatibilityReport"]
],
applicable_types: Set[str],
):
"""helper to implement tool compatibility checks
Args:
tool_name: name of the tool (without version), e.g. "ilastik"
tool_version: version of the tool, e.g. "1.4"
all_versions_path: Path to the `all_versions.json` file.
output_folder: Folder to write compatibility reports to.
check_tool_compatibility_impl:
Function accepting two positional arguments:
URL to an rdf.yaml, SHA-256 of that rdf.yaml.
And returning a compatibility report.
applicable_types: Set of resource types
**check_tool_compatibility_impl** is applicable to.
"""

This simplifies the compatibility script needed from a partner, e.g. ilastik example, so now almost only an analog to

def check_compatibility_ilastik_impl(
rdf_url: str,
sha256: str,
) -> CompatibilityReportDict:
"""Create a `CompatibilityReport` for a resource description.
Args:
rdf_url: URL to the rdf.yaml file
sha256: SHA-256 value of **rdf_url** content
"""

is needed to implement the compatibility check.

Hope this makes things easier now and more maintainable in the future!

@melisande-c
Copy link
Contributor Author

melisande-c commented Nov 12, 2024

CAREamics will need to check that a CAREamics config.yaml is also included and able to instantiate our pydantic classes; to save me looking through source code, what is the best way to retrieve the url for this file?

Additional question: should we also check the model is loadable (i.e. has compatible architecture), or will downloading model weights be too costly/time consuming?

@FynnBe
Copy link
Member

FynnBe commented Nov 12, 2024

I suppose your models add this config.yaml using the attachments field then?
And there is some additional information in the rdf.yaml under config.ceramics to indicate its presence??
Either way you probably want to go through a ModelDescr object (returned by bioimageio.spec.load_model_description(rdf_url)).

You should use bioimageio.spec to download the models (e.g. by simply loading them with model = bioimageio.spec.load_model_description(rdf_url). This way all files will be cached.

If you want to only deal with v0_5.ModelDescr you can simply check the model.format_version attribute.

@melisande-c
Copy link
Contributor Author

I suppose your models add this config.yaml using the attachments field then?
And there is some additional information in the rdf.yaml under config.ceramics to indicate its presence??

Yep it is added in attachments field, but there is no additional info in the rdf file.

You should use bioimageio.spec to download the models (e.g. by simply loading them with model = bioimageio.spec.load_model_description(rdf_url). This way all files will be cached.

So this means, in regards to my previous question, I will have access to the model weights and so I might as well check that the model architecture is compatible?

@FynnBe
Copy link
Member

FynnBe commented Nov 12, 2024

Yep it is added in attachments field, but there is no additional info in the rdf file.

hmm.. config.yaml is not a very unique name. This might lead to confusion. Maybe you could consider renaming this file (in the context of model descriptions) and/or referencing the careamics_config.yaml under config.careamics to know what file to look for (then you could name it arbitrarily).
If these files are not hundreds of lines long you could also just insert it into the rdf.yaml at config.careamics.

So this means, in regards to my previous question, I will have access to the model weights and so I might as well check that the model architecture is compatible?

yes, you should in fact. Ideally even run one training iteration (not epoch) and an inference test. CI only has CPU, but the time limit is pretty generous and we could ensure not to test everything at once if this becomes a bottleneck.

@melisande-c
Copy link
Contributor Author

melisande-c commented Nov 12, 2024

Our config files can be ~60 lines long, we already have 3 models uploaded with a separate configuration file, I would rather not check for both cases so if we change how we export to bmz then I would like to update these existing models.

In the case we do not insert the CAREamics config into the rdf.yaml file, what extra info needs to be added under config.careamics? the file name is already included in the attachments section.

@melisande-c
Copy link
Contributor Author

For developing the script, I would like to test locally, how can I get access to an example rdf_url? (from one of the uploaded CAREamics models).

@FynnBe
Copy link
Member

FynnBe commented Nov 12, 2024

Our config files can be ~60 lines long, we already have 3 models uploaded with a separate configuration file, I would rather not check for both cases so if we change how we export to bmz then I would like to update these existing models.

yeah, updating 3 models isn't a big deal 👍

In the case we do not insert the CAREamics config into the rdf.yaml file, what extra info needs to be added under config.careamics? the file name is already included in the attachments section.

short answer: nothing.
long answer: nothing if you make the file name careamics specific. if you do not then you rely on no other tool ever attaching a config.yaml. In addition I find the config.yaml name confusing as we have the config field inside the rdf.yaml already... Therefore I suggest to go with some version of careamics_config.yaml. Then there is no need to specify that the ubiquitous config.yaml is a careamics config file under config.careamics.

@FynnBe
Copy link
Member

FynnBe commented Nov 12, 2024

For developing the script, I would like to test locally, how can I get access to an example rdf_url? (from one of the uploaded CAREamics models).

hmm.. there are a few options. first to mind: search for the model id in https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/all_versions.json

@melisande-c
Copy link
Contributor Author

Hi @FynnBe, now that I have the compatibility script written, could you give me a few pointers on the CI workflow config? I've had a look at the ilastik one, but it would be good to have an overview. So I obviously need to set up the environment, installing CAREamics and dependencies, then I need to generate the reports using my check_careamics_compatibility.py script and finally the reports should be uploaded using scripts/upload_reports.py and I can copy and paste the S3 environment variables?

@FynnBe
Copy link
Member

FynnBe commented Nov 23, 2024

I can copy and paste the S3 environment variables

I don't know exactly what you mean here, but you don't need to do anything with the S3 env variables. They are already set up in this repo, so your workflow can just use them.

Essentially the careamics workflow can look pretty much exactly like the ilastik one. You can give it a go if you like, otherwise I can make a draft end of next week or so and have you fill in the details there.

@melisande-c
Copy link
Contributor Author

Hi @FynnBe,

All I meant was I should copy how the upload-reports job is set up in the ilastik compatibility check workflow. But if you could get me started with a template that would be much appreciated !

Just to let you know we are working on a couple of things before we make a new release and update the models already in the bmz; the models are not going to pass the checks I wrote until this happens. So no rush 😄. (We are reviewing and updating the information in our generated README, and adding cover generation).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants