-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review and update BMZ export #278
Comments
…onfig file following new `spec` and `core` release (#279) ### Description - **What**: Relaxing the model output validation kwargs, both absolute and relative tolerance, from the default, `1e-4`, to `1e-2`. - **Why**: The defaults are pretty strict and some of our uploaded models are stuck in pending because of slightly mismatching input and outputs. - e.g. (Actually maybe absolute tolerance should be put to 0, otherwise it still might not pass after this PR) ```console Output and expected output disagree: Not equal to tolerance rtol=0.0001, atol=0.00015 Mismatched elements: 40202 / 1048576 (3.83%) Max absolute difference: 0.1965332 Max relative difference: 0.0003221 ``` - **How**: In the model description config param, added the new test kwargs. Additionally, updated `bmz_export` so that the test kwargs in the model description are used during model testing at export time. ### Changes Made - **Modified**: Describe existing features or files modified. - `create_model_description`: added test_kwargs to config param - `export_to_bmz`: use test_kwargs in model description for model testing at export time. ### Related Issues - Resolves - last checkbox in #278 EDIT: This PR also fixes loading from BMZ following an incompatible release of `bioimageio/core` (`0.7.0`) and `bioimageio/spec` (`0.5.3.5`). The problem was `load_model_description` no longer unzips the archive file but only streams the `rdf.yaml` file data. This means we have to now extract the weights and careamics config from the zip to load them, which can be done using `bioimageio.spec._internal.io.resolve_and_extract` --- **Please ensure your PR meets the following requirements:** - [x] Code builds and passes tests locally, including doctests - [ ] New tests have been added (for bug fixes/features) - [x] Pre-commit passes - [ ] PR to the documentation exists (for bug fixes / features) --------- Co-authored-by: Joran Deschamps <[email protected]>
Additional point not to forget:
|
Just to share my experience, even though I've put a |
In the documentation you shared in #276 it says
I assume this means it can be |
So half meticulous. :) |
Something we should also think about is the model versioning, it seems we hardcode the model version to be 0.1.0, but if a model needs to be updated I am not sure how the bmz handles this (I think different model versions are saved in different folders in the s3 bucket, but if the version is the same maybe it gets overwritten). Below is where we hardcode the version:
|
This is the issue Caterina asked community partners to add some documentation: |
#4) * Update LVAE DataType list (CAREamics#251) ### Description Update the list of possible data types for LVAE datasets according to planned reproducibility experiments * 3D model (CAREamics#240) ### Description Adding 3D/2.5D to LVAE model - **What**: Merge current implementation in the original repo with refactoring done previously. ### Changes Made - Relevant pydantic config - Added encoder_conv_strides and decoder_conv_strides params. They control the strides in conv layers, and can have len 2 or 3. They're meant to make a choice between 2D and 3D convs and control the shape of encoder/decoder (subject to change) - Input shape is now a tuple containing shapes of 2/3 dimensions - Stochastic layer are in the separate module (subject to change) - NonStochastic is removed alongside with relevant parameters - Some docs update - Basic tests ### Notes/Problems - Dosctings are a mess, should be fixed in a separate PR later - Lot's of mypy etc issues - Some tests don't pass because we need to clarify multiscale count param --- **Please ensure your PR meets the following requirements:** - [x ] Code builds and passes tests locally, including doctests - [x ] New tests have been added (for bug fixes/features) - [ ] Pre-commit passes - [ ] PR to the documentation exists (for bug fixes / features) --------- Co-authored-by: Joran Deschamps <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: melisande-c <[email protected]> Co-authored-by: federico-carrara <[email protected]> Co-authored-by: federico-carrara <[email protected]> * ci(pre-commit.ci): autoupdate (CAREamics#250) <!--pre-commit.ci start--> updates: - [github.com/abravalheri/validate-pyproject: v0.19 → v0.20.2](abravalheri/validate-pyproject@v0.19...v0.20.2) - [github.com/astral-sh/ruff-pre-commit: v0.6.3 → v0.6.9](astral-sh/ruff-pre-commit@v0.6.3...v0.6.9) <!--pre-commit.ci end--> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Joran Deschamps <[email protected]> * Add image.sc badge * Update README.md * Feature: Predict to disk (outerloop implementation) (CAREamics#253) ### Description - **What**: Add a `predict_to_disk` function to the `CAREamist` class. - **Why**: So users can save predictions without having to write the saving process themselves. - **How**: This implementation loops through the files, predicts the result and saves each one in turn. N.b. this will eventually be replaced with the `PredictionWriterCallback` version. ### Changes Made - **Added**: - `predict_to_disk` function to `CAREamist` class. - tests - **Modified**: - Some type hints. ### Additional Notes and Examples Currently the results are saved to a directory called "predictions" but maybe it should be configurable by the user? This function can only be called on source data from a path because the prediction files are saved with the same name as the source files. Not the neatest as I expect this to be replaced soon. --- **Please ensure your PR meets the following requirements:** - [x] Code builds and passes tests locally, including doctests - [x] New tests have been added (for bug fixes/features) - [x] Pre-commit passes - [ ] PR to the documentation exists (for bug fixes / features) * refac: modularized + cleaned the code for LVAE losses (CAREamics#255) ### Description Please provide a brief description of the changes in this PR. Include any relevant context or background information. - **What**: Modularized loss functions for LVAE based models. Now it is possible to create custom losses for different algorithms. In addition, superfluous code has been removed. - **Why**: it allows to implement new losses for new algorithm in a more clean and modular way using the existing building blocks. Moreover, the code is now a bit simpler to read. - **How**: Created general function to aggregate KL loss according to different approaches. Simplified the way reconstruction loss is computed. Changed all the tests accordingly. NOTE: the API to call loss functions for existing algorithms (e.g., muSplit and denoiSplit) has been kept untouched. --- **Please ensure your PR meets the following requirements:** - [x] Code builds and passes tests locally, including doctests - [x] New tests have been added (for bug fixes/features) - [x] Pre-commit passes - [ ] PR to the documentation exists (for bug fixes / features) --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * refac: reorganization of pydantic configs in LVAE model (CAREamics#256) ### Description - **What**: Following the polishing of LVAE losses, we reorganized also the pydantic models that handle losses and likelihoods for better readability, usability, and overall organization. - **Why**: Make clean more readable, usable, and organized for further developments. - **How**: Implemented `LVAELossConfig` to replace `LVAELossParameters`. Removed unnecessary attributes from other pydantic models. ### Breaking changes Refactoring of pydantic models will certainly break the examples in `micrSplit_reproducibility` repo. ### Note This PR is based on CAREamics#255. --- **Please ensure your PR meets the following requirements:** - [x] Code builds and passes tests locally, including doctests - [ ] New tests have been added (for bug fixes/features) - [x] Pre-commit passes - [ ] PR to the documentation exists (for bug fixes / features) --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix(BMZ export): torchvision version; model validation (CAREamics#257) ### Description - **What**: Two fixes to the bmz export: The torchvision version in the environment.yaml file was incorrectly set to be the torch version; and bmz `ModelDescription` configuration had the incorrect parameter `decimals` (it was meant to be `decimal` without the 's'). - **Why**: The incorrect env file meant the CI couldn't environment could be created and the incorrect parameter meant the model validation could not be run in the CI. - **How**: Set the correct torchvision version in the env file. Removed the configuration from the `ModelDesc`, following bioimage-io/core-bioimage-io-python#418 the decimal parameter is deprecated. ### Changes Made - **Modified**: - func `create_env_text` in `src/careamics/model_io/bioimage/bioimage_utils.py` - src/careamics/model_io/bmz_io.py - func `create_model_description` in `src/careamics/model_io/bioimage/model_description.py` --- **Please ensure your PR meets the following requirements:** - [x] Code builds and passes tests locally, including doctests - [x] New tests have been added (for bug fixes/features) - [x] Pre-commit passes - [ ] PR to the documentation exists (for bug fixes / features) * Fix: Enforce dataloader params to have shuffle=True (CAREamics#259) ### Description - **What**: It seems that in the `CAREamics` `TrainDataModule` the dataloader does not have shuffle set to `True`. - **Why**: Not shuffling the data during training can result in worse training, e.g. overfitting. - **How**: Allow users to explicitly pass shuffle=False with a warning, otherwise `{"shuffle": True}` is added to the param dictionary, if the dataset is not a subclass of `IterableDataset`.` ### Changes Made - **Modified**: `TrainDataModule.train_dataloader` ### Additional Notes and Examples See the discussion in CAREamics#258 for details. --- **Please ensure your PR meets the following requirements:** - [x] Code builds and passes tests locally, including doctests - [x] New tests have been added (for bug fixes/features) - [x] Pre-commit passes - [ ] PR to the documentation exists (for bug fixes / features) --------- Co-authored-by: Joran Deschamps <[email protected]> * Performance test induced fixes (CAREamics#260) Different changes happened during performance testing ### Changes Made Pydantic configs Losses NM/Likelihood refac(from CAREamics#256 ) Tests TODOs for later refactoring --- **Please ensure your PR meets the following requirements:** - [ x] Code builds and passes tests locally, including doctests - [ ] New tests have been added (for bug fixes/features) - [x ] Pre-commit passes - [ ] PR to the documentation exists (for bug fixes / features) --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * ci(pre-commit.ci): autoupdate (CAREamics#262) <!--pre-commit.ci start--> updates: - [github.com/abravalheri/validate-pyproject: v0.20.2 → v0.22](abravalheri/validate-pyproject@v0.20.2...v0.22) - [github.com/astral-sh/ruff-pre-commit: v0.6.9 → v0.7.2](astral-sh/ruff-pre-commit@v0.6.9...v0.7.2) - [github.com/psf/black: 24.8.0 → 24.10.0](psf/black@24.8.0...24.10.0) - [github.com/pre-commit/mirrors-mypy: v1.11.2 → v1.13.0](pre-commit/mirrors-mypy@v1.11.2...v1.13.0) - [github.com/kynan/nbstripout: 0.7.1 → 0.8.0](kynan/nbstripout@0.7.1...0.8.0) <!--pre-commit.ci end--> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix: Allow singleton channel in convenience functions (CAREamics#265) ### Description Following CAREamics#159, this PR allows creating a configuration with a singleton channel via the configuration convenience functions. - **What**: Allow singleton channel in convenience functions. - **Why**: In some rare cases, a singleton channel might be present in the data. - **How**: Change the `if` statements and error raising conditions of the convenience functions. ### Changes Made - **Modified**: `configuration_factory.py`. ### Related Issues - Fixes [Allow singleton channel in convenience functions](CAREamics#159) --- **Please ensure your PR meets the following requirements:** - [x] Code builds and passes tests locally, including doctests - [x] New tests have been added (for bug fixes/features) - [x] Pre-commit passes - [x] PR to the documentation exists (for bug fixes / features) * feat: Add convenience function to read loss from CSVLogger (CAREamics#267) ### Description Following CAREamics#252, this PR enforces that the `CSVLogger` is always used (even if `WandB` is requested for instance), and add an API entry point in `CAREamist` to return a dictionary of the losses. This allows users to simply plot the loss in a notebook after training for instance. While they will be better off using `WandB` or `TensorBoard`, this is enough for most users. - **What**: Enforce `CSVLogger` and add functions to read the losses from `metrics.csv`. - **Why**: So that users have an easy way to plot the loss curves. - **How**: Add a new `lightning_utils.py` file with the read csv function, and call this method from `CAREamist`. ### Changes Made - **Added**: `lightning_utils.py`. - **Modified**: `CAREamist`. ### Related Issues Link to any related issues or discussions. Use keywords like "Fixes", "Resolves", or "Closes" to link to issues automatically. - Resolves CAREamics#252 ### Additional Notes and Examples An alternative path would have been to add a `Callback` and do the logging ourselves. I decided for the solution that uses the `csv` file that is anyway created by default (when there is no WandB or TB loggers), to minimize the code that needs to be maintained. One potential issue is the particular csv file read is chosen following the experiment name recorded by `CAREamist` and the last `version_*`. This may not be true if the paths have changed, but in most cases it should be valid if called right after training. Here is what it looks like in the notebooks: ``` python import matplotlib.pyplot as plt losses = careamist.get_losses() plt.plot(losses["train_epoch"], losses["train_loss"], label="Train Loss") plt.plot(losses["val_epoch"], losses["val_loss"], label="Val Loss") ``` --- **Please ensure your PR meets the following requirements:** - [x] Code builds and passes tests locally, including doctests - [x] New tests have been added (for bug fixes/features) - [x] Pre-commit passes - [ ] PR to the documentation exists (for bug fixes / features) * Refac: Rename config file to careamics.yaml in BMZ export (CAREamics#271) ### Description - **What**: When exporting to bmz the config file is now called `careamics.yaml`. Searching for the config file during loading has also been made more restrictive: previously the function searched for any `.yml` file in the attachments and now it searches specifically for `careamics.yaml`. - **Why**: Renaming the file makes it clearer to users it relates to CAREamics' functionality and should prevent any future name clashes with other tools. The config file loading was made more restrictive because it was not very robust to possible cases where additional attachments are used, and using the `export_to_bmz` function doesn't allow any choice in the name of the config file. - **How**: Modified config path in `export_to_bmz` and modified config path search in `extract_model_path`. ### Changes Made - **Modified**: - `export_to_bmz` - `extract_model_path` - `save_configuration` docs ### Related Issues - Resolves CAREamics#269 --- **Please ensure your PR meets the following requirements:** - [x] Code builds and passes tests locally, including doctests - [ ] New tests have been added (for bug fixes/features) - [x] Pre-commit passes - [ ] PR to the documentation exists (for bug fixes / features) * Feature: Load model from BMZ using URL (CAREamics#273) ### Description - **What**: Now it is possible to pass a URL to the `load_from_bmz` function to download and load BMZ files. - **Why**: Not many users will have access to the model resource URLs, but this functionality is useful for developing the CAREamics BMZ compatibility script. - **How**: - Type hint `path` as also `pydantic.HttpUrl` in `load_from_bmz` (as in `bioimage.core`); - Remove `path` validation checks from `load_from_bmz` and allow it to be handled in `load_model_description` - Call `download` on the file resources to download and get the correct path. ### Changes Made - **Modified**: - `load_from_bmz` - `extract_model_path` ### Additional Notes and Examples This will have merge conflicts with CAREamics#271. There are currently no official tests (it does work), we can discuss using the URL of one of the existing CAREamics models uploaded to the BMZ or create a Mock. --- **Please ensure your PR meets the following requirements:** - [x] Code builds and passes tests locally, including doctests - [ ] New tests have been added (for bug fixes/features) - [x] Pre-commit passes - [ ] PR to the documentation exists (for bug fixes / features) * feat: Enable prediction step during training (CAREamics#266) ### Description Following CAREamics#148, I have been exploring how to predict during training. This PR would allow adding `Callback` that use `predict_step` during Training. - **What**: Allow callbacks to call `predict_step` during training. - **Why**: Some applications might require predicting consistently on full images to assess training performances throughout training. - **How**: Modified `FCNModule.predict_step` to make it compatible with a `TrainDataModule` (all calls to `trainer.datamodule` were written with the expectation that it returns a `PredictDataModule`. ### Changes Made - **Modified**: `lightning_module.py`, `test_lightning_module.py` ### Related Issues - Resolves CAREamics#148 ### Additional Notes and Examples ```python import numpy as np from pytorch_lightning import Callback, Trainer from careamics import CAREamist, Configuration from careamics.lightning import PredictDataModule, create_predict_datamodule from careamics.prediction_utils import convert_outputs config = Configuration(**minimum_configuration) class CustomPredictAfterValidationCallback(Callback): def __init__(self, pred_datamodule: PredictDataModule): self.pred_datamodule = pred_datamodule # prepare data and setup self.pred_datamodule.prepare_data() self.pred_datamodule.setup() self.pred_dataloader = pred_datamodule.predict_dataloader() def on_validation_epoch_end(self, trainer: Trainer, pl_module): if trainer.sanity_checking: # optional skip return # update statistics in the prediction dataset for coherence # (they can computed on-line by the training dataset) self.pred_datamodule.predict_dataset.image_means = ( trainer.datamodule.train_dataset.image_stats.means ) self.pred_datamodule.predict_dataset.image_stds = ( trainer.datamodule.train_dataset.image_stats.stds ) # predict on the dataset outputs = [] for idx, batch in enumerate(self.pred_dataloader): batch = pl_module.transfer_batch_to_device(batch, pl_module.device, 0) outputs.append(pl_module.predict_step(batch, batch_idx=idx)) data = convert_outputs(outputs, self.pred_datamodule.tiled) # can save data here array = np.arange(32 * 32).reshape((32, 32)) pred_datamodule = create_predict_datamodule( pred_data=array, data_type=config.data_config.data_type, axes=config.data_config.axes, image_means=[11.8], # random placeholder image_stds=[3.14], # can do tiling here ) predict_after_val_callback = CustomPredictAfterValidationCallback( pred_datamodule=pred_datamodule ) engine = CAREamist(config, callbacks=[predict_after_val_callback]) engine.train(train_source=array) ``` Currently, this current implementation is not fully satisfactory and here are a few important points: - For this PR to work we need to discriminate between `TrainDataModule` and `PredictDataModule` in `predict_step`, which is a bit of a hack as it currently check `hasattr(..., "tiled")`. The reason is to avoid a circular import of `PredictDataModule`. We should revisit that. - `TrainDataModule` and `PredictDataModule` have incompatible members: `PredictDataModule` has `.tiled`, and the two have different naming conventions for the statistics (`PredictDataModule` has `image_means` and `image_stds`, while `TrainDataModule` has them wrapped in a `stats` dataclass). These statistics are retrieved either through `_trainer.datamodule.predict_dataset` or `_trainer.datamodule.train_dataset`. - We do not provide the `Callable` that would allow to use such feature. We might want to some heavy lifting here as well (see example). - Probably the most serious issue, normalization is done in the datasets but denormalization is performed in the `predict_step`. In our case, that means that normalization could be applied by a `PredictDataModule` (in the `Callback` and the denormalization by the `TrainDataModule` (in `predict_step`). That is incoherent and due to the way we wrote CAREamics. All in all, this draft exemplifies two problems with CAREamics: - `TrainDataModule` and `PredictDataModule` have different members - Normalization is done by the `DataModule` but denormalization by `LightningModule` --- **Please ensure your PR meets the following requirements:** - [x] Code builds and passes tests locally, including doctests - [x] New tests have been added (for bug fixes/features) - [x] Pre-commit passes - [x] PR to the documentation exists (for bug fixes / features) * feat: added functions to load neuron&astrocytes dataset * refac: added possibility to pick `kl_restricted` loss & other loss-related refactoring (CAREamics#272) ### Description In some LVAE training examples (see `microSplit_reproducibility` repo) there is the need to consider the *restricted KL loss*, instead of the simple sample-wise one. This PR allows the user to pick that one. - **What**: The KL loss type is no longer hardcoded in the loss functions. Now it is possible to pick also the `restricted_kl` KL loss type. - **Why**: It is needed for some experiments/examples. - **How**: added an input parameter to the KL loss functions. ### Breaking changes The `kl_type` parameter is added in the loss functions, so we need to be careful of correctly specifying it in the examples in the `microSplit_reproducibility` repo. --- **Please ensure your PR meets the following requirements:** - [x] Code builds and passes tests locally, including doctests - [x] New tests have been added (for bug fixes/features) - [x] Pre-commit passes - [ ] PR to the documentation exists (for bug fixes / features) * tmp: function to load 3D data, extract 2D slices and save them separately * feat: updates to handle 2D slices * refac: removed main + improved output of laoding function * A new enum for a new splitting task. (CAREamics#270) ### Description Adding a new enum type for a splitting task which I had missed communicating earlier. I am putting the relevant things in microsplit-reproducibility repo after a brief chat with @veegalinova. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Joran Deschamps <[email protected]> * feat: loading supported also for unmixed images * refac: made `get_fnames` public + changed way GroupType is obtained * fix: bug * updated content of examples folder * Fix(BMZ): Relax model output validation kwargs; extract weights and config file following new `spec` and `core` release (CAREamics#279) ### Description - **What**: Relaxing the model output validation kwargs, both absolute and relative tolerance, from the default, `1e-4`, to `1e-2`. - **Why**: The defaults are pretty strict and some of our uploaded models are stuck in pending because of slightly mismatching input and outputs. - e.g. (Actually maybe absolute tolerance should be put to 0, otherwise it still might not pass after this PR) ```console Output and expected output disagree: Not equal to tolerance rtol=0.0001, atol=0.00015 Mismatched elements: 40202 / 1048576 (3.83%) Max absolute difference: 0.1965332 Max relative difference: 0.0003221 ``` - **How**: In the model description config param, added the new test kwargs. Additionally, updated `bmz_export` so that the test kwargs in the model description are used during model testing at export time. ### Changes Made - **Modified**: Describe existing features or files modified. - `create_model_description`: added test_kwargs to config param - `export_to_bmz`: use test_kwargs in model description for model testing at export time. ### Related Issues - Resolves - last checkbox in CAREamics#278 EDIT: This PR also fixes loading from BMZ following an incompatible release of `bioimageio/core` (`0.7.0`) and `bioimageio/spec` (`0.5.3.5`). The problem was `load_model_description` no longer unzips the archive file but only streams the `rdf.yaml` file data. This means we have to now extract the weights and careamics config from the zip to load them, which can be done using `bioimageio.spec._internal.io.resolve_and_extract` --- **Please ensure your PR meets the following requirements:** - [x] Code builds and passes tests locally, including doctests - [ ] New tests have been added (for bug fixes/features) - [x] Pre-commit passes - [ ] PR to the documentation exists (for bug fixes / features) --------- Co-authored-by: Joran Deschamps <[email protected]> * refac: renamed and moved file to read CZI * style: added progress bar to data loading function * fix: fixed FP info fetching for dyes * updated training for astro_neuron dset * added script to train λSplit on astrocytes data * updated training examples * Fix(dependencies): Set bioimageio-core version greater than 0.7.0 (CAREamics#280) ### Description - **What**: Set bioimageio-core version greater than 0.7.0 - **Why**: Following the new `bioimage-core` release (0.7.0), we needed to make some fixes (part of PR CAREamics#279). The most convenient function to solve this problem, `resolve_and_extract` only exists since 0.7.0. - **How**: In pyproject.toml ### Changes Made - **Modified**: pyproject.toml --- **Please ensure your PR meets the following requirements:** - [x] Code builds and passes tests locally, including doctests - [ ] New tests have been added (for bug fixes/features) - [x] Pre-commit passes - [ ] PR to the documentation exists (for bug fixes / features) * added lambda parameters to saved configs * added todos * wip: eval examples for astro neuron dset * fix: fixed a bug in KL loss aggregation (LVAE) (CAREamics#277) ### Description Found a bug in the KL loss aggregation happening in the `LadderVAE` model `training_step()`. Specifically, the application of free-bits (`free_bits_kl()`, basically clamping the values of KL entries to a certain lower threshold) was happening after KL entries were rescaled. In this way, when free-bits threshold was set to 1, all the KL entries were clamped to 1, as normally way smaller than this. - **What**: See above. - **Why**: Clear bug in the code. - **How**: Inverted the order of calls in the `get_kl_divergence_loss()` function & adjusted some parts of the code to reflect the changes. --- **Please ensure your PR meets the following requirements:** - [x] Code builds and passes tests locally, including doctests - [ ] New tests have been added (for bug fixes/features) - [x] Pre-commit passes - [ ] PR to the documentation exists (for bug fixes / features) --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Joran Deschamps <[email protected]> * feat: added command to sort FPs by wavelength at peak intensity before creating FP ref matrix * feat: function to split fnames in train and test * fix: ONNX exportability compatibity test and fix (CAREamics#275) fix: Fix for ONNX export of Maxblurpool layer and performance optimization by registering kernel as a buffer so that it doesn't need to be copied to the GPU over and over again. ### Description - **What**: Converting the pretrained models to ONNX format gives error in the Maxpool layer used in the N2V2 architecture.This is mainly because the convolution kernel is dynamically expanded to a size matching the number of channels in the input in the Maxblurpool layer. But the number of channels should be constant within the model. - **Why**: Users can convert the pytorch models to ONNX for inference in thier platforms - **How**: -- instead of using the symbolic variable x.size(1), explicitly cast it to an integer and make it a constant. -- make the kernel as a buffer to avoid the copying to GPU overhead. -- add tests for ONNX exportability ### Changes Made - **Added**: -- onnx as a test dependency in pyproject.toml -- 'test_lightning_module_onnx_exportability.py' - **Modified**: Maxblurpool module in 'layers.py' **Please ensure your PR meets the following requirements:** - [x] Code builds and passes tests locally, including doctests - [x] New tests have been added (for bug fixes/features) - [x] Pre-commit passes - [ ] PR to the documentation exists (for bug fixes / features) Co-authored-by: Joran Deschamps <[email protected]> * refac: changed the way train and test samples are taken * feat: CL arg to avoid logging (for debugging runs) * rm: removed outdated modules * fix: set `None` as default for `custom_logger` when logging is disables * fix: fixed bugs resulting from previous merge * fix: fixed more bugs related to previous merge + renamed `algorithm_type` into more coherent `training_mode` * fix: made a few changes to mirror updates in the model code * refac: modified function to read CZI to make it compatible with CAREamics dsets * refac: updated loading pipeline for 3D images (added padding to have same Z-dim) * fix: changed some parts of the code to allow 3D training * updated training script for 3D case * fix: deleting pre-loaded arrays of data after init dsets * feat: added fn to sort fnames by exp ID and img ID * refac: adjusted training script for 2D and loading all data in memory * refac: added function to load 3D imgs callable in InMemoryDataset class * feat: allowing passing kwargs to `read_source_fn` * example: implemented more efficient training pipeline for 3D data * style: cleaned outputs * refac: renamed func `get_train_test_fnames` into `split_train_test_fnames` * fix: solving issue of read_source_kwargs==None in patching * fix: removed `dataloader_params` from serializable attributes in `DataConfig` * refac: added dataloader_params to `InferenceConfig` to match organization of `DataConfig` * rm: removed visualization funcs for λSplit, will be moved in apposite experiment repo * rm: example scripts and notebooks --------- Co-authored-by: Vera Galinova <[email protected]> Co-authored-by: Igor Zubarev <[email protected]> Co-authored-by: Joran Deschamps <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: melisande-c <[email protected]> Co-authored-by: federico-carrara <[email protected]> Co-authored-by: Melisande Croft <[email protected]> Co-authored-by: ashesh <[email protected]> Co-authored-by: nimiiit <[email protected]>
### Description Following #278, this PR reorganizes the exported BMZ README: - Put all configuration to the end - Fix title level of algorithm - Put data description first - Remove training section - Add a validation section - Fix link to documentation In addition, the API for the BMZ export has changed: - `data_description` is now mandatory - `model_version` is an optional parameter allowing to version models - `cover` is an optional parameter Users can now provide a path to a cover, no validation is done of that path. If no cover is provided, then we do something that is probably not optimized for multichannels: - For 2D images, the images (input and output) are split horizontally, or vertically if they have different height. If they have different width as well, then an error is raised. - For Z stack, the middle slice is selected. - For images with 2 channels, the channels are used as `blue` and `green` - For images with 3 channels, they are interpreted as RGB - For images with 4 or more channels, the first 4 channels are used. The pixel value are normalised and multiplied with the RGB vectors of 4 pre-defined colors, and all channels are summed into an RGB image. Note that that this is not a great way to represent scientific data, we should apply LUT to the grey scale channels and recompose overlays. Look up tables have been worked out for scientific figures and we could use those. If that is acceptable for now, it will be easy to replace it. ### Changes Made - **Added**: - `cover_factory.py` - helper scripts to inspect BMZ README and covers. - **Modified**: all BMZ related modules. ### Related Issues #278 #176 ### Breaking changes Any call to `careamist.export_bmz`. ### Additional Notes and Examples The results README looks like this: ```markdown # Noise2Void - CAREamics ## Data description Mydata ## Algorithm description: Noise2Void is a UNet-based self-supervised algorithm that uses blind-spot training to denoise images. In short, in every patches during training, random pixels are selected and their value replaced by a neighboring pixel value. The network is then trained to predict the original pixel value. The algorithm relies on the continuity of the signal (neighboring pixels have similar values) and the pixel-wise independence of the noise (the noise in a pixel is not correlated with the noise in neighboring pixels). ## Configuration Noise2Void was trained using CAREamics (version 0.1.0) using the following configuration: algorithm_config: algorithm: n2v loss: n2v lr_scheduler: name: ReduceLROnPlateau parameters: {} model: architecture: UNet conv_dims: 2 depth: 2 final_activation: None in_channels: 1 independent_channels: true n2v2: false num_channels_init: 32 num_classes: 1 optimizer: name: Adam parameters: lr: 0.0001 data_config: axes: YX batch_size: 2 data_type: array patch_size: - 64 - 64 transforms: - flip_x: true flip_y: true name: XYFlip p: 0.5 - name: XYRandomRotate90 p: 0.5 - masked_pixel_percentage: 0.2 name: N2VManipulate roi_size: 11 strategy: uniform struct_mask_axis: none struct_mask_span: 5 experiment_name: export_bmz_readme training_config: accumulate_grad_batches: 1 check_val_every_n_epoch: 1 checkpoint_callback: auto_insert_metric_name: false mode: min monitor: val_loss save_last: true save_top_k: 3 save_weights_only: false verbose: false enable_progress_bar: true gradient_clip_algorithm: norm max_steps: -1 num_epochs: 10 precision: '32' version: 0.1.0 ## Validation In order to validate the model, we encourage users to acquire a test dataset with ground-truth data. Additionally, inspecting the residual image (difference between input and predicted image) can be helpful to identify whether real signal is removed from the input image. ## References Krull, A., Buchholz, T.O. and Jug, F., 2019. "Noise2Void - Learning denoising from single noisy images". In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2129-2137). doi: 10.1109/cvpr.2019.00223 ## Links - [CAREamics repository](https://github.com/CAREamics/careamics) - [CAREamics documentation](https://careamics.github.io/) ``` --- **Please ensure your PR meets the following requirements:** - [x] Code builds and passes tests locally, including doctests - [x] New tests have been added (for bug fixes/features) - [x] Pre-commit passes - [ ] PR to the documentation exists (for bug fixes / features)
List of TODOs for exporting to BMZ. Once these are completed we can create a new CAREamics release re-run our notebooks and re-upload the new versions of the models.
/latest
doesn't exist)config.bioimageio.test_kwargs
. This is where they are extracted during the CI tests: https://github.com/bioimage-io/collection/blob/efb7f06797574544385222de62b08f13ae99854d/bioimageio_collection_backoffice/run_dynamic_tests.py#L109-L115.The text was updated successfully, but these errors were encountered: