Skip to content

Commit

Permalink
build: Release v1.3.0
Browse files Browse the repository at this point in the history
* feat: Add automatic batch size and safe testing (#68)

* feat: Add automatic batch size lightning callback, is disabled by default

* feat: Automatically skip callbacks if there's a disable parameter set to true

* feat: Implement automatic batch computation for sklearn based tasks

* feat: Add safe test execution with batch scaling for inference tasks

* fix: Fix wrong datamodule initialization

* build: Bump version 1.2.6 -> 1.3.0

* feat: Add automatic batch size callback in anomalib callbacks config

* docs: Update changelog

* docs: Add information about automatic batch size for training

* feat: Add improvement over lightning callback to fix training stage issue and allow selecting application stages

Approved By: @rcmalli

* test: Improve test speed, fix tasks crashing when no checkpoint is provided

Reduce test time (#69)

* test: Start setting up training mocking

* build: Add pytest-mock to requirements

* build: Add --mock-training flag to test pipeline

* fix: Fix export crash when no checkpoint is provided

* fix: Fix breaking task when no checkpoint is available

* test: Fix train mock for patchcore and efficient_ad, reduce test dataset dimension

* test: Add mock training possibility for segmentation tests, reduce default test datasets

* fix: Fix wrong datamodule initialization

* test: Add mock training fixture for classification training, remove test with run_test flag set to false

* test: Reduce the number of patches for patch training

* test: Add mock training fixture to multilabel classification

* feat: Add safe test execution with batch scaling for inference tasks

* fix: Fix number of threads for torch not set properly, set onnx threads for export

* docs: Update changelog

* test: Mark csflow test as slow

* test: Mark draem test as slow

* build: Allow installation using python 3.10, deprecate 3.8

* build: Upgrade minimum requirement to python 3.9, fix packages for 3.10 installation

* refactor: Wrong indentation

* docs: Add python 3.10 information in readme

* test: Run automations using python 3.10

* test: Run automatic tests on both python 3.9 and 3.10

Approved By: @rcmalli

* fix: Fix missing string marks

* feat: Use multiclass datamodule for segmentation generic example (#73)

Update oxford pet segmentation example to multiclass segmentation task (#73)

* feat: update oxford segmentation example

* fix: update parameter name for the model

* feat: update analysis logs

Approved-By: @lorenzomammana

---------

Co-authored-by: Refik Can Malli <[email protected]>
  • Loading branch information
lorenzomammana and rcmalli authored Oct 9, 2023
2 parents d473e25 + 5677cb3 commit 418373f
Show file tree
Hide file tree
Showing 35 changed files with 526 additions and 122 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/docs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ jobs:
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: 3.9
python-version: "3.10"

- name: Install Dependencies
run: |
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ jobs:
fail-fast: false
matrix:
os: [ubuntu-latest]
python-version: ["3.9"]
python-version: ["3.9", "3.10"]
timeout-minutes: 60
steps:
- uses: actions/checkout@v3
Expand All @@ -41,4 +41,4 @@ jobs:
- name: Run Tests
run: |
python -m pytest -v --disable-pytest-warnings --strict-markers --color=yes -m "not slow"
python -m pytest -v --disable-pytest-warnings --strict-markers --mock-training --color=yes -m "not slow"
22 changes: 22 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,28 @@
# Changelog
All notable changes to this project will be documented in this file.

### [1.3.0]

#### Added

- Add batch_size_finder callback for lightning based models (disabled by default).
- Add automatic_batch_size parameter to sklearn based training tasks (disabled by default).
- Add automatic_batch_size decorator to automatically fix the batch size of test functions for evaluation tasks if any out of memory error occurs.
- Add --mock-training flag for tests to skip running the actual training and just run the test.

#### Fixed

- Fix lightning based tasks not working properly when no checkpoint was provided.
- Fix list and dict config not handled properly as input_shapes parameter.

#### Updated

- Greatly reduce the dimension of test datasets to improve testing speed.

#### Updated

- Make `disable` a quadra reserved keyword for all callbacks, to disable a callback just set it to `disable: true` in the configuration file.

### [1.2.7]

#### Fixed
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ ______________________________________________________________________

## Quick Start Guide

Currently we support installing from source since the library is not yet available on `PyPI` and currently supported Python version is `3.9`.
Currently we support installing from source since the library is not yet available on `PyPI` and currently supported Python versions are `3.9` and `3.10`.

```shell
pip install git+https://github.com/orobix/quadra.git
Expand All @@ -59,7 +59,7 @@ If you don't have virtual environment ready, Let's set up our environment for us
Create and activate a new `Conda` environment.

```shell
conda create -n myenv python=3.9
conda create -n myenv python=3.10
conda activate myenv
```

Expand Down
12 changes: 12 additions & 0 deletions docs/tutorials/examples/anomaly_detection.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,8 +126,20 @@ callbacks:
disable: true
plot_only_wrong: false
plot_raw_outputs: false
batch_size_finder:
_target_: quadra.callbacks.lightning.BatchSizeFinder
mode: power
steps_per_trial: 3
init_val: 2
max_trials: 5 # Max 64
batch_arg_name: train_batch_size
disable: true
```

!!! warning

By default lightning batch_size_finder callback is disabled. This callback will automatically try to infer the maximum batch size that can be used for training without running out of memory. We've experimented runtime errors with this callback on some machines due to a Pytorch/CUDNN incompatibility so be careful when using it.

The min_max_normalization callback is used to normalize the anomaly maps to the range [0, 1] such that the threshold will become 0.5.

The threshold_type can be either "image" or "pixel" and it indicates which threshold to use to normalize the pixel level threshold, if no masks are available for segmentation this should always be "image", otherwise the normalization will use the threshold computed without masks which would result in wrong segmentations.
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/examples/segmentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,7 @@ export:
backbone:
model:
classes: 4 # The total number of classes (background + foreground)
num_classes: 4 # The total number of classes (background + foreground)
task:
run_test: true # run test after training is completed
Expand Down
5 changes: 5 additions & 0 deletions docs/tutorials/examples/sklearn_classification.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,9 @@ datamodule:
task:
device: cuda:0
automatic_batch_size:
starting_batch_size: 1024
disable: true
output:
folder: classification_experiment
report: true
Expand All @@ -157,6 +160,8 @@ task:
This will train a logistic regression classifier using a resnet18 backbone, resizing the images to 224x224 and using a 5-fold cross validation. The `class_to_idx` parameter is used to map the class names to indexes, the indexes will be used to train the classifier. The `output` parameter is used to specify the output folder and the type of output to save. The `export.types` parameter can be used to export the model in different formats, at the moment `torchscript`, `onnx` and `pytorch` are supported.
The backbone (in torchscript and pytorch format) will be saved along with the classifier. `test_full_data` is used to specify if a final test should be performed on all the data (after training on the training and validation datasets).

Optionally it's possible to enable the automatic batch size finder by setting `automatic_batch_size.disable` to `false`. This will try to find the maximum batch size that can be used on the given device without running out of memory. The `starting_batch_size` parameter is used to specify the starting batch size to use for the search, the algorithm will start from this value and will try to divide it by two until it doesn't run out of memory.

### Run

Assuming that you have created a virtual environment and installed the `quadra` library, you can run the experiment by running the following command:
Expand Down
3 changes: 3 additions & 0 deletions docs/tutorials/examples/sklearn_patch_classification.md
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,9 @@ datamodule:
task:
device: cuda:2
automatic_batch_size:
starting_batch_size: 1024
disable: true
output:
folder: classification_patch_experiment
report: true
Expand Down
14 changes: 8 additions & 6 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "quadra"
version = "1.2.7"
version = "1.3.0"
description = "Deep Learning experiment orchestration library"
authors = [
{ name = "Alessandro Polidori", email = "[email protected]" },
Expand All @@ -16,7 +16,7 @@ authors = [
keywords = ["deep learning", "experiment", "lightning", "hydra-core"]
license = { file = "LICENSE" }
readme = { file = "README.md", content-type = "text/markdown" }
requires-python = ">=3.8,<3.10"
requires-python = ">=3.9,<3.11"
classifiers = [
"Programming Language :: Python :: 3",
"Intended Audience :: Developers",
Expand Down Expand Up @@ -52,6 +52,7 @@ dependencies = [
"python-dotenv==0.21.*",
"rich==13.2.*",
"scikit-learn==1.2.*",
"pydantic==1.10.10",
"grad-cam==1.4.6",
"matplotlib==3.6.*",
"seaborn==0.12.*",
Expand All @@ -62,13 +63,13 @@ dependencies = [
"tripy==1.0.*",
"h5py==3.8.*",
"timm==0.6.12", # required by smp
"segmentation-models-pytorch==0.3.*",
"anomalib@git+https://github.com/orobix/[email protected]+obx.1.2.1",
"segmentation-models-pytorch==0.3.2",
"anomalib@git+https://github.com/orobix/[email protected]+obx.1.2.3",
"xxhash==3.2.*",
]

[project.optional-dependencies]
test = ["pytest==7.2.*", "pytest-cov==4.0.*", "pytest-lazy-fixture==0.6.*"]
test = ["pytest==7.2.*", "pytest-cov==4.0.*", "pytest-lazy-fixture==0.6.*", "pytest-mock==3.11.*"]

dev = [
"interrogate==1.5.*",
Expand Down Expand Up @@ -118,7 +119,7 @@ repository = "https://github.com/orobix/quadra"

# Adapted from https://realpython.com/pypi-publish-python-package/#version-your-package
[tool.bumpver]
current_version = "1.2.7"
current_version = "1.3.0"
version_pattern = "MAJOR.MINOR.PATCH"
commit_message = "build: Bump version {old_version} -> {new_version}"
commit = true
Expand Down Expand Up @@ -193,6 +194,7 @@ ignore_regex = [
".*on_train.*",
".*on_validation.*",
".*on_test.*",
".*on_predict.*",
".*forward.*",
".*backward.*",
".*training_step.*",
Expand Down
2 changes: 1 addition & 1 deletion quadra/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
__version__ = "1.2.7"
__version__ = "1.3.0"


def get_version():
Expand Down
76 changes: 76 additions & 0 deletions quadra/callbacks/lightning.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
import pytorch_lightning as pl
from pytorch_lightning.callbacks import Callback
from pytorch_lightning.callbacks.batch_size_finder import BatchSizeFinder as LightningBatchSizeFinder
from pytorch_lightning.utilities import rank_zero_only
from torch import nn

from quadra.utils.utils import get_logger

Expand Down Expand Up @@ -35,3 +37,77 @@ def on_fit_start(self, trainer: pl.Trainer, pl_module: pl.LightningModule) -> No
self.log_every_n_steps,
len_train_dataloader,
)


class BatchSizeFinder(LightningBatchSizeFinder):
"""Batch size finder setting the proper model training status as the current one from lightning seems bugged.
It also allows to skip some batch size finding steps.
Args:
find_train_batch_size: Whether to find the training batch size.
find_validation_batch_size: Whether to find the validation batch size.
find_test_batch_size: Whether to find the test batch size.
find_predict_batch_size: Whether to find the predict batch size.
mode: The mode to use for batch size finding. See `pytorch_lightning.callbacks.BatchSizeFinder` for more
details.
steps_per_trial: The number of steps per trial. See `pytorch_lightning.callbacks.BatchSizeFinder` for more
details.
init_val: The initial value for batch size. See `pytorch_lightning.callbacks.BatchSizeFinder` for more details.
max_trials: The maximum number of trials. See `pytorch_lightning.callbacks.BatchSizeFinder` for more details.
batch_arg_name: The name of the batch size argument. See `pytorch_lightning.callbacks.BatchSizeFinder` for more
details.
"""

def __init__(
self,
find_train_batch_size: bool = True,
find_validation_batch_size: bool = False,
find_test_batch_size: bool = False,
find_predict_batch_size: bool = False,
mode: str = "power",
steps_per_trial: int = 3,
init_val: int = 2,
max_trials: int = 25,
batch_arg_name: str = "batch_size",
) -> None:
super().__init__(mode, steps_per_trial, init_val, max_trials, batch_arg_name)
self.find_train_batch_size = find_train_batch_size
self.find_validation_batch_size = find_validation_batch_size
self.find_test_batch_size = find_test_batch_size
self.find_predict_batch_size = find_predict_batch_size

def on_train_start(self, trainer: pl.Trainer, pl_module: pl.LightningModule) -> None:
if not self.find_train_batch_size:
return None

if not isinstance(pl_module.model, nn.Module):
raise ValueError("The model must be a nn.Module")
pl_module.model.train()
return super().on_train_epoch_start(trainer, pl_module)

def on_validation_start(self, trainer: pl.Trainer, pl_module: pl.LightningModule) -> None:
if not self.find_validation_batch_size:
return None

if not isinstance(pl_module.model, nn.Module):
raise ValueError("The model must be a nn.Module")
pl_module.model.eval()
return super().on_validation_epoch_start(trainer, pl_module)

def on_test_start(self, trainer: pl.Trainer, pl_module: pl.LightningModule) -> None:
if not self.find_test_batch_size:
return None

if not isinstance(pl_module.model, nn.Module):
raise ValueError("The model must be a nn.Module")
pl_module.model.eval()
return super().on_test_epoch_start(trainer, pl_module)

def on_predict_start(self, trainer: pl.Trainer, pl_module: pl.LightningModule) -> None:
if not self.find_predict_batch_size:
return None

if not isinstance(pl_module.model, nn.Module):
raise ValueError("The model must be a nn.Module")
pl_module.model.eval()
return super().on_predict_epoch_start(trainer, pl_module)
13 changes: 13 additions & 0 deletions quadra/configs/callbacks/default.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,5 +20,18 @@ progress_bar:
lightning_trainer_setup:
_target_: quadra.callbacks.lightning.LightningTrainerBaseSetup
log_every_n_steps: 1

batch_size_finder:
_target_: quadra.callbacks.lightning.BatchSizeFinder
mode: power
steps_per_trial: 3
init_val: 2
max_trials: 5 # Max 64
batch_arg_name: batch_size
disable: true
find_train_batch_size: true
find_validation_batch_size: false
find_test_batch_size: false
find_predict_batch_size: false
#gpu_stats: TODO: This is not working with the current PL version
# _target_: nvitop.callbacks.lightning.GpuStatsLogger
8 changes: 8 additions & 0 deletions quadra/configs/callbacks/default_anomalib.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -55,5 +55,13 @@ progress_bar:
lightning_trainer_setup:
_target_: quadra.callbacks.lightning.LightningTrainerBaseSetup
log_every_n_steps: 1
batch_size_finder:
_target_: quadra.callbacks.lightning.BatchSizeFinder
mode: power
steps_per_trial: 3
init_val: 2
max_trials: 5 # Max 64
batch_arg_name: train_batch_size
disable: true
#gpu_stats: TODO: This is not working with the current PL version
# _target_: nvitop.callbacks.lightning.GpuStatsLogger
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
_target_: quadra.datamodules.generic.oxford_pet.OxfordPetSegmentationDataModule
idx_to_class:
1: cat_or_dog
data_path: ${oc.env:HOME}/.quadra/datasets/oxford-pet
test_size: 0.2
val_size: 0.2
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,18 @@
defaults:
- base/segmentation/smp # use smp file as default
- override /datamodule: generic/oxford_pet/segmentation/base # update datamodule
- override /loss: smp_dice_multiclass
- override /model: smp_multiclass
- _self_ # use this file as final config

trainer:
devices: [0]
max_epochs: 10

backbone:
model:
num_classes: 2 # The total number of classes (background + foreground)

task:
report: true
evaluate:
Expand Down
7 changes: 5 additions & 2 deletions quadra/configs/task/sklearn_classification.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
_target_: quadra.tasks.SklearnClassification
device: "cuda:0"
device: cuda:0
automatic_batch_size:
starting_batch_size: 1024
disable: true
output:
folder: "classification_experiment"
folder: classification_experiment
report: true
example: true
test_full_data: true
5 changes: 4 additions & 1 deletion quadra/configs/task/sklearn_classification_patch.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
_target_: quadra.tasks.PatchSklearnClassification
device: cuda:2
device: cuda:0
automatic_batch_size:
starting_batch_size: 1024
disable: true
output:
folder: classification_patch_experiment
report: true
Expand Down
Loading

0 comments on commit 418373f

Please sign in to comment.