Skip to content

Commit

Permalink
Feature/export onnx models (#40)
Browse files Browse the repository at this point in the history
* build: Add optional onnx dependencies

* feat: Add onnx export function

* refactor: Start logger refactoring

* feat: Add onnx export capabilities for classification

* feat: Add onnx export capability

* feat: Add evaluation model wrappers

* feat!: Refactor export_config parameter to become its own config to avoid code replication

* refactor: Refactor export function to avoid excessive code replication

* feat: Add inference configuration

* build: Upgrade anomalib version

* tests: Refactor anomaly tests to perform training and inference with onnx and torchscript

* tests: Improve tests for classification related tasks

* tests: Update segmentation tests with onnx export

* tests: Update ssl tests to integrate onnx export

* tests: Add tests to validate the outputs of exported models

* build: Add pytest lazy fixtures package to test requirements

* tests: Add tests checking the equality of exported models outputs

* tests: Add guards to run tests if onnx is not installed, add onnx installation to github tests

* style: Fix wrong parentheses

* fix: Fix wrong usage of pytest skipif

* fix: Fix missing parameter pop in export onnx function

* tests: Remove onnx export from fastflow test

* fix: Allow model wrapper to retrieve input shapes if instance is a torchscript model

* build: Bump version 1.1.3 -> 1.1.4

* docs: Update changelog

* refactor: Tiny improvements to model export

* refactor: Add dictionary mapping export types and paths to model export function return values

* fix: Fix defaults order

* refactor: Move get_export_extension function

* feat: Use iobinding to handle torch inputs for onnx

* feat: Add cpu method to evaluation models

* fix: Fix wrong configuration parameter

* fix: Fix segmentation analysis not working due to missing parameter

* feat: add gpu unit tests

* docs: Add documentation for model import and export

* docs: Add export information in documentation

* docs: Update changelog

* refactor: Remove references to save_backbone parameter

* docs: Update changelog

* docs: Fix wrong typing

* fix: Avoid exporting ModelSignatureWrapper, fix wrong onnx export with  multiple inputs

* fix: Fix multiple inputs not handled properly in onnx evaluation forward

* feat: Add automatic export with strict=False if normal torchscript fails

* fix: Fix dynamic axes not generated properly when fixed_batch_size isn't passed to configuration

---------

Approved By: @AlessandroPolidori 

Co-authored-by: rcmalli <[email protected]>
  • Loading branch information
lorenzomammana and rcmalli authored Sep 8, 2023
1 parent cea6f86 commit cc81f05
Show file tree
Hide file tree
Showing 61 changed files with 1,798 additions and 704 deletions.
3 changes: 1 addition & 2 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,8 @@ jobs:
- name: Install Package
run: |
python -m pip install -U pip
python -m pip install -e ".[test]" --no-cache-dir
python -m pip install -e ".[test,onnx]" --no-cache-dir
- name: Run Tests
run: |
python -m pytest -v --disable-pytest-warnings --strict-markers --color=yes -m "not slow"
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -68,3 +68,4 @@ docs/javascripts/images/*
test-output.xml
external/
site/
local/
15 changes: 15 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,21 @@ All notable changes to this project will be documented in this file.
#### Added

- Add plot_raw_outputs feature to class VisualizerCallback in anomaly detection, to save the raw images of the segmentation and heatmap output.
- Add support for onnx exportation of trained models.
- Add support for onnx model import in all evaluation tasks.
- Add `export` configuration group to regulate exportation parameters.
- Add `inference` configuration group to regulate inference parameters.

#### Changed

- Move `export_types` parameter from `task` configuration group to `export` configuration group under `types` parameter.
- Refactor export model function to be more generic and be availble from the base task class.
- Remove `save_backbone` parameter for scikit-learn based tasks.

#### Fixed

- Fix failures when trying to override `hydra` configuration groups due to wrong override order.

### [1.1.4]

#### Fixed
Expand Down
20 changes: 14 additions & 6 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,13 +1,16 @@
# Makefile
SHELL := /bin/bash
DEVICE ?= cpu

.PHONY: help
help:
@echo "Commands:"
@echo "clean : cleans all unnecessary files."
@echo "docs-serve : serves the documentation."
@echo "docs-build : builds the documentation."
@echo "style : runs pre-commit."
@echo "clean : cleans all unnecessary files."
@echo "docs-serve : serves the documentation."
@echo "docs-build : builds the documentation."
@echo "style : runs pre-commit."
@echo "units-tests: : runs unit tests."
@echo "integration-tests: : runs integration tests."

# Cleaning
.PHONY: clean
Expand All @@ -27,8 +30,13 @@ style:
pre-commit run --all --verbose
.PHONY: docs-build
docs-build:
mkdocs build -d ./site
mkdocs build -d ./site

.PHONY: docs-serve
docs-serve:
mkdocs serve
mkdocs serve

.PHONY: units-tests
units-tests:
@python -m pytest -v --disable-pytest-warnings --strict-markers --color=yes --device $(DEVICE)

7 changes: 3 additions & 4 deletions docs/tutorials/configurations.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,9 @@ defaults:
- override /scheduler: rop
- override /transforms: default_resize
export:
types: [torchscript]
datamodule:
num_workers: 8
batch_size: 32
Expand All @@ -178,10 +181,6 @@ task:
report: True
output:
example: True
export_config:
types: [torchscript]
input_shapes: # Redefine the input shape if not automatically inferred
core:
tag: "run"
Expand Down
4 changes: 0 additions & 4 deletions docs/tutorials/devices_setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,13 +38,9 @@ _target_: quadra.tasks.SklearnClassification
device: "cuda:0"
output:
folder: "classification_experiment"
save_backbone: false
report: true
example: true
test_full_data: true
export_config:
types: [torchscript]
input_shapes: # Redefine the input shape if not automatically inferred
```

You can change the device to `cpu` or a different cuda device depending on your needs.
Expand Down
9 changes: 6 additions & 3 deletions docs/tutorials/examples/anomaly_detection.md
Original file line number Diff line number Diff line change
Expand Up @@ -184,14 +184,17 @@ As already mentioned anomaly detection requires just good images for training, t

### Experiment

Suppose that we want to run the experiment on the given dataset using the PADIM technique. We can define take the generic padim config for mnist as an example found under `experiment/generic/mnist/anomaly/padim.yaml`.
Suppose that we want to run the experiment on the given dataset using the PADIM technique. We can take the generic padim config for mnist as an example found under `experiment/generic/mnist/anomaly/padim.yaml`.

```yaml
# @package _global_
defaults:
- base/anomaly/padim
- override /datamodule: generic/mnist/anomaly/base
export:
types: [torchscript]
model:
model:
input_size: [224, 224]
Expand Down Expand Up @@ -226,7 +229,7 @@ trainer:
check_val_every_n_epoch: ${trainer.max_epochs}
```

We start from the base configuration for PADIM, then we override the datamodule to use the generic mnist datamodule. Using this configuration we specify that we want to use PADIM, extracting features using the resnet18 backbone with image size 224x224, the dataset is `mnist`, we specify that the task is taken from the anomalib configuration which specify it to be segmentation. One very important thing to watch out is the `check_val_every_n_epoch` parameter. This parameter should match the number of epochs for `PADIM` and `Patchcore`, the reason is that in the validation phase the model will be fitted and we want the fit to be done only once and on all the data, increasing the max_epoch is useful when we apply data augmentation, otherwise it doesn't make a lot of sense as we would fit the model on the same, replicated data.
We start from the base configuration for PADIM, then we override the datamodule to use the generic mnist datamodule. Using this configuration we specify that we want to use PADIM, extracting features using the resnet18 backbone with image size 224x224, the dataset is `mnist`, we specify that the task is taken from the anomalib configuration which specify it to be segmentation. One very important thing to watch out is the `check_val_every_n_epoch` parameter. This parameter should match the number of epochs for `PADIM` and `Patchcore`, the reason is that in the validation phase the model will be fitted and we want the fit to be done only once and on all the data, increasing the max_epoch is useful when we apply data augmentation, otherwise it doesn't make a lot of sense as we would fit the model on the same, replicated data. The model will be exported at the end of the training phase, as we have specified the `export.types` parameter to `torchscript` the model will be exported only in torchscript format.

### Run

Expand Down Expand Up @@ -289,7 +292,7 @@ task:

By default, the inference will recompute the threshold based on test data to maximize the F1-score, if you want to use the threshold from the training phase you can set the `use_training_threshold` parameter to true.

The model path is the path to an exported model, at the moment only `torchscript` models are supported (exported automatically after a training experiment). Right now only the `CFLOW` model is not supported for inference as it's not compatible with torchscript.
The model path is the path to an exported model, at the moment `torchscript` and `onnx` models are supported (exported automatically after a training experiment). Right now only the `CFLOW` model is not supported for inference as it's not compatible with botyh torchscript and onnx.

An inference configuration using the mnist dataset is found under `configs/experiment/generic/mnist/anomaly/inference.yaml`.

Expand Down
11 changes: 4 additions & 7 deletions docs/tutorials/examples/classification.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,9 +114,6 @@ task:
report: True
output:
example: True
export_config:
types: [pytorch, torchscript]
input_shapes: # Redefine the input shape if not automatically inferred
core:
Expand Down Expand Up @@ -172,6 +169,9 @@ defaults:
- override /backbone: vit16_tiny
- _self_
export:
types: [onnx, torchscript]
datamodule:
num_workers: 12
batch_size: 32
Expand All @@ -187,9 +187,6 @@ task:
report: True
output:
example: True # Generate an example of concordants and discordants predictions for each class
export_config:
types: [pytorch, torchscript]
input_shapes: # Redefine the input shape if not automatically inferred
model:
Expand Down Expand Up @@ -236,7 +233,7 @@ checkpoints config_tree.txt deployment_model test
config_resolved.yaml data main.log
```

Where `checkpoints` contains the pytorch lightning checkpoints of the model, `data` contains the joblib dump of the datamodule with its parameters and dataset split, `deployment_model` contains the model in exported format (default is torchscript), `test` contains the test artifacts.
Where `checkpoints` contains the pytorch lightning checkpoints of the model, `data` contains the joblib dump of the datamodule with its parameters and dataset split, `deployment_model` contains the model in exported format (in this case onnx and torchscript, but by default is only torchscript), `test` contains the test artifacts.

## Evaluation

Expand Down
3 changes: 0 additions & 3 deletions docs/tutorials/examples/multilabel_classification.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,9 +139,6 @@ task:
report: False
output:
example: False
export_config:
types: [torchscript]
input_shapes: # Redefine the input shape if not automatically inferred
logger:
Expand Down
9 changes: 4 additions & 5 deletions docs/tutorials/examples/segmentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,9 @@ defaults:
- base/segmentation/smp_multiclass # use smp file as default
- _self_ # use this file as final config
export:
types: [onnx, torchscript]
backbone:
model:
classes: 4 # The total number of classes (background + foreground)
Expand All @@ -171,10 +174,6 @@ task:
report: false # allows to generate reports
evaluate: # custom evaluation toggles
analysis: false # Perform in depth analysis
export_config:
types: [torchscript]
input_shapes: # Redefine the input shape if not automatically inferred
datamodule:
data_path: /path/to/the/dataset # change the path to the dataset
Expand All @@ -200,7 +199,7 @@ core:
When defining the `idx_to_class` dictionary, the keys should be the class index and the values should be the class name. The class index starts from 1.


In the final configuration experiment we have specified the path to the dataset, batch size, split files, GPU device, experiment name and toggled some evaluation options.
In the final configuration experiment we have specified the path to the dataset, batch size, split files, GPU device, experiment name and toggled some evaluation options, moreover we have specified that we want to export the model to `onnx` and `torchscript` formats.

By default data will be logged to `Mlflow`. If `Mlflow` is not available it's possible to configure a simple csv logger by adding an override to the file above:

Expand Down
24 changes: 10 additions & 14 deletions docs/tutorials/examples/sklearn_classification.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,17 +95,14 @@ defaults:
- override /trainer: sklearn_classification
- override /datamodule: base/sklearn_classification
export:
types: [pytorch, torchscript]
backbone:
model:
pretrained: true
freeze: true
task:
export_config:
types: [pytorch, torchscript]
input_shapes: # Redefine the input shape if not automatically inferred
core:
tag: "run"
name: "sklearn-classification"
Expand All @@ -121,6 +118,7 @@ By default the experiment will use dino_vitb8 as backbone, resizing the images t
It will also export the model in two formats, "torchscript" and "pytorch".

An actual configuration file based on the above could be this one (suppose it's saved under `configs/experiment/custom_experiment/sklearn_classification.yaml`):

```yaml
# @package _global_
Expand All @@ -132,6 +130,9 @@ defaults:
core:
name: experiment-name
export:
types: [pytorch, torchscript]
datamodule:
data_path: path_to_dataset
batch_size: 64
Expand All @@ -148,18 +149,13 @@ task:
device: cuda:0
output:
folder: classification_experiment
save_backbone: true
report: true
example: true
test_full_data: true
export_config:
types: [pytorch, torchscript]
input_shapes: # Redefine the input shape if not automatically inferred
```

This will train a logistic regression classifier using a resnet18 backbone, resizing the images to 224x224 and using a 5-fold cross validation. The `class_to_idx` parameter is used to map the class names to indexes, the indexes will be used to train the classifier. The `output` parameter is used to specify the output folder and the type of output to save. The `export_config.types` parameter can be used to export the model in different formats, at the moment `torchscript` and `pytorch` are supported.
Since `save_backbone` is set to true, the backbone (in torchscript format) will be saved along with the classifier. `test_full_data` is used to specify if a final test should be performed on all the data (after training on the training and validation datasets).
This will train a logistic regression classifier using a resnet18 backbone, resizing the images to 224x224 and using a 5-fold cross validation. The `class_to_idx` parameter is used to map the class names to indexes, the indexes will be used to train the classifier. The `output` parameter is used to specify the output folder and the type of output to save. The `export.types` parameter can be used to export the model in different formats, at the moment `torchscript`, `onnx` and `pytorch` are supported.
The backbone (in torchscript and pytorch format) will be saved along with the classifier. `test_full_data` is used to specify if a final test should be performed on all the data (after training on the training and validation datasets).

### Run

Expand All @@ -181,7 +177,7 @@ classification_experiment_2 config_tree.txt test

Each `classification_experiment_X` folder contains the metrics for the corresponding fold while the `classification_experiment` folder contains the metrics computed aggregating the results of all the folds.

The `data` folder contains a joblib version of the datamodule containing parameters and splits for reproducibility. The `deployment_model` folder contains the backbone exported in torchscript format if `save_backbone` to true alongside the joblib version of trained classifier. The `test` folder contains the metrics for the final test on all the data after the model has been trained on both train and validation.
The `data` folder contains a joblib version of the datamodule containing parameters and splits for reproducibility. The `deployment_model` folder contains the backbone exported in torchscript and pytorch format alongside the joblib version of trained classifier. The `test` folder contains the metrics for the final test on all the data after the model has been trained on both train and validation.

## Evaluation
The same datamodule specified before can be used for inference by setting the `phase` parameter to `test`.
Expand Down
10 changes: 4 additions & 6 deletions docs/tutorials/examples/sklearn_patch_classification.md
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,9 @@ defaults:
- override /backbone: resnet18
- _self_
export:
types: [torchscript]
core:
name: experiment-name
Expand All @@ -222,14 +225,9 @@ task:
device: cuda:2
output:
folder: classification_patch_experiment
save_backbone: false
report: true
example: true
reconstruction_method: major_voting
export_config:
types: [torchscript]
input_shapes: # Redefine the input shape if not automatically inferred
```

This will train a resnet18 model on the given dataset, using 256 as batch size and skipping the background class during training.
Expand All @@ -253,7 +251,7 @@ config_tree.txt main.log

Inside the `classification_patch_experiment` folder you should find some report utilities computed over the validation dataset, like the confusion matrix. The `reconstruction_results.json` file contains the reconstruction metrics computed over the validation dataset in terms of covered defects, it will also contain the coordinates of the polygons extracted over predicted areas of the image with the same label.

The `data` folder contains a joblib version of the datamodule containing parameters and splits for reproducibility. The `deployment_model` folder contains the backbone exported in torchscript format if `save_backbone` to true alongside the joblib version of trained classifier.
The `data` folder contains a joblib version of the datamodule containing parameters and splits for reproducibility. The `deployment_model` folder contains the backbone exported in torchscript format alongside the joblib version of trained classifier.

## Evaluation
The same datamodule specified before can be used for inference.
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/examples/ssl.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@ The output folder should contain the following entries:
checkpoints config_resolved.yaml config_tree.txt data deployment_model main.log
```

The `checkpoints` folder contains the saved `pytorch` lightning checkpoints. The `data` folder contains a joblib version of the datamodule containing all parameters and dataset spits. The `deployment_model` folder contains the model ready for production in the format specified in the task `export_config.types` parameter (default `torchscript`).
The `checkpoints` folder contains the saved `pytorch` lightning checkpoints. The `data` folder contains a joblib version of the datamodule containing all parameters and dataset spits. The `deployment_model` folder contains the model ready for production in the format specified in the `export.types` parameter (default `torchscript`).

### Run (Advanced) - Changing transformations

Expand Down
Loading

0 comments on commit cc81f05

Please sign in to comment.