Skip to content

Commit

Permalink
Merge pull request #305 from ellisdg/torch
Browse files Browse the repository at this point in the history
Switch just about everything to Torch/MONAI
  • Loading branch information
ellisdg authored Aug 2, 2023
2 parents 0ab7de6 + 08ea1cb commit 198da24
Show file tree
Hide file tree
Showing 123 changed files with 6,228 additions and 41,052 deletions.
9 changes: 9 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
FROM projectmonai/monai

RUN pip install nilearn

COPY ./ /opt/3DUnetCNN
ENV PYTHONPATH=/opt/3DUnetCNN:$PYTHONPATH
ENV PATH=/opt/3DUnetCNN/unet3d/scripts:$PATH
RUN chmod +x /opt/3DUnetCNN/unet3d/scripts/*.py

88 changes: 75 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,86 @@
# 3D U-Net Convolution Neural Network
## [Brain Tumor Segmentation (BraTS) Tutorial](examples/brats2020)
[![Tumor Segmentation Example](legacy/doc/tumor_segmentation_illusatration.gif)](examples/brats2020)
## [Automatic Cranial Implant Design (AutoImpant)](examples/autoimplant2020)
[![ Segmentation Example](doc/AutoImplant-Viz.png)](examples/autoimplant2020)
## [Anatomical Barriers to Cancer Spread (ABCS)](examples/abcs2020)
## Background

[[Update August 2023 - data loading is now 10x faster!](doc/Changes.md)]

* [Tutorials](#tutorials)
* [Introduction](#introduction)
* [Quick Start Guide](#quickstart)
* [Installation](#installation)
* [Configuration](#configuration)
* [Training](#training)
* [Inference](#inference)
* [Evaluation](#evaluation)
* [Documentation](#documentation)
* [Citation](#citation)


## Tutorials <a name="tutorials"></a>
### [Brain Tumor Segmentation (BraTS 2022)](examples/brats2020)
[![Tumor Segmentation Example](doc/viz/tumor_segmentation_illusatration.gif)](examples/brats2020)

## Introduction <a name="introduction"></a>
We designed 3DUnetCNN to make it easy to apply and control the training and application of various deep learning models to medical imaging data.
The links above give examples/tutorials for how to use this project with data from various MICCAI challenges.

## Getting started
Install [PyTorch](https://pytorch.org/get-started/locally/) and
[nilearn](https://nilearn.github.io/introduction.html#installing-nilearn).

## [Pretrained Models](https://zenodo.org/record/4289225)
## Quick Start Guide <a name="quickstart"></a>
How to train a UNet on your own data.

### Installation <a name="installation"></a>
1. Clone the repository:<br />
```git clone https://github.com/ellisdg/3DUnetCNN.git``` <br /><br />

2. Install the required dependencies<sup>*</sup>:<br />
```pip install -r 3DUnetCNN/requirements.txt```

<sup>*</sup>It is highly recommended that an Anaconda environment or a virtual environment is used to
manage dependcies and avoid conflicts with existing packages.

### Setup the configuration file <a name="configuration"></a>
1. Copy the default configuration file: <br />
```cp examples/default_config.json <your_config>.json```<br /><br />
2. Add the ```training_filenames``` and ```validation_filenames``` for your dataset to the configuration file.
<br /><br />
Example:<br />
```"training_filenames": [[["sub01/t1w.nii.gz", "sub01/t2w.nii.gz"], "sub01/labelmap.nii.gz"], ...]``` <br />
* ```["sub01/t1w.nii.gz", "sub01/t2w.nii.gz"]``` is the set of input filenames for a single subject.
* ```"sub01/labelmap.nii.gz"``` is the labelmap filename for that same subject.
* This should be repeated for all the subjects in the dataset.
(It is probably easiest to add these filenames using a Python script.)
3. (optional) Change model and training configuration settings as desired. (see [Configuration File Guide](doc/Configuration.md))

### Train the model <a name="training"></a>
Run the model training:<br />
```3DUnetCNN/scripts/train.py --configuration_filename <your_config>.json --model_filename <your_model>.pth --training_log_filename <your_training>.txt``` <br />
* Change ```<your_config>.json``` to the configuration file you created above.
* Change ```<your_model>.pth``` to the filename where you want the training to save the model file.
* Change ```<your_training>.txt``` to filename where you want the training to save the training and validation losses for each epoch.
* (optional) set ```--ngpus``` and ```--nthreads``` to the number of gpus and threads available. (default is ngpus=1, nthreads=8)

### Predict Validation Cases <a name="inference"></a>
Run model inference on the ```validation_filenames```:<br />
```3DUnetCNN/scripts/predict.py --configuration_filename <your_config>.json --model_filename <your_model>.pth --output_directory <your_predictions_folder>```
* Change ```<your_config>.json``` to the same configuration used in training.
* Change ```<your_model>.pth``` to the model filename specified during training.
* Change ```<your_prediction_folder>``` to the folder where you want the predictions to be saved.
* (optional) setting ```--group test``` will tell the script to look for ```test_filenames``` in the configuration file
instead of ```validation_filenames```.
This is helpful for predicting cases that are in a separate test set.

### Evaluate Results <a name="evaluation"></a>
```3DUnetCNN/scripts/evaluate.py --directory <your_prediction_folder> --config_filename <your_config>.json --output_filename <your_results>.csv```
* Change ```<your_prediction_folder>``` to the folder path from the prediction phase.
* Change ```<your_config>.json``` to your configuration filename.
* Change ```<your_results>.csv``` to the location where you want to save the evaluation scores as a CSV file.

## Got Questions?
See [FAQ](doc/FAQ.md), raise an issue on GitHub, or email me at [email protected].
## Documentation <a name="documentation"></a>
* [Configuration Guide](doc/Configuration.md)
* [Frequently Asked Questions](doc/FAQ.md)

### Still have questions? <a name="questions"></a>
Once you have reviewed the documentation, feel free to raise an issue on GitHub, or email me at [email protected].

## Citation
## Citation <a name="citation"></a>
Ellis D.G., Aizenberg M.R. (2021) Trialing U-Net Training Modifications for Segmenting Gliomas Using Open Source Deep Learning Framework. In: Crimi A., Bakas S. (eds) Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes 2020. Lecture Notes in Computer Science, vol 12659. Springer, Cham. https://doi.org/10.1007/978-3-030-72087-2_4

### Additional Citations
Expand Down
16 changes: 16 additions & 0 deletions doc/Changes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
## August 2023 (version 2.0.0)
* Utilizes MONAI instead of NiLearn as the image loading and processing core.
* This results in data loading and augmentation speeds seemingly 10-20x faster in my experiments (your results may vary).
* Simplifies configuration options.
* Removes support for generating filenames as this caused more headaches than it was worth.
* Due to the above changes, old configuration files are not likely to work anymore.
* Images are now read as MONAI MetaTensor objects.
* Preprocessing is now done almost entirely in Torch.
* Adds requirements.txt
* Adds Quick Start Guide
* Adds more FAQs
* Adds Normalization documentation
* Allows for using MONAI loss classes
* Removes old examples
* Using --fit_gpu_mem is no longer supported
* Removes old sequences/datasets (we will try adding some back in the future)
4 changes: 0 additions & 4 deletions doc/CommonErrors.md

This file was deleted.

124 changes: 119 additions & 5 deletions doc/Configuration.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,121 @@
#### Notes on configuration
The ```train.py``` script will automatically set the input image size and batch size based on the amount of GPU memory and number of GPUs.
If you do not want these settings automatically set, you can adjust them yourself by making changes to the config file instead of using the
```--fit_gpu_mem``` flag.
# Configuration File Guide

* [Introduction](#introduction)
* [Configuration Example](#example)
* [GPU Memory Constraints and Input Size](#gpu)
* [Using "--fit_gpu_mem"](#fitgpumem)
* [Normalization](#norm)
* [Machine Configuration File](#machine)

## Introduction <a name="introduction"></a>
The configuration file determines the model architecture and how it will be trained.
This is helpful for running multiple experiments as it provides documentation for
each configuration you have experimented with. A configuration file should produce
similar results each time it is used for training.


## Configuration Example <a name="example"></a>
Example python code to setup the configuration file for BraTS 2020 data.
```
config = dict()
model_config = dict()
model_config["name"] = "DynUNet" # network model name from MONAI
# set the network hyper-parameters
model_config["in_channels"] = 4 # 4 input images for the BraTS challenge
model_config["out_channels"] = 3 # whole tumor, tumor core, enhancing tumor
model_config["spatial_dims"] = 3 # 3D input images
model_config["deep_supervision"] = False # do not check outputs of lower layers
model_config["strides"] = [[1, 1, 1], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]][:-1] # number of downsampling convolutions
model_config["filters"] = [64, 96, 128, 192, 256, 384, 512, 768, 1024][:len(model_config["strides"])] # number of filters per layer
model_config["kernel_size"] = [[3, 3, 3]] * len(model_config["strides"]) # size of the convolution kernels per layer
model_config["upsample_kernel_size"] = model_config["strides"][1:] # should be the same as the strides
# put the model config in the main config
config["model"] = model_config
config["optimizer"] = {'name': 'Adam',
'lr': 0.001} # initial learning rate
# define the loss
config["loss"] = {'name': 'GeneralizedDiceLoss', # from Monai
'include_background': False, # we do not have a label for the background, so this should be false
'sigmoid': True} # transform the model logits to activations
# set the cross validation parameters
config["cross_validation"] = {'folds': 5, # number of cross validation folds
'seed': 25}, # seed to make the generation of cross validation folds consistent across different trials
# set the scheduler parameters
config["scheduler"] = {'name': 'ReduceLROnPlateau',
'patience': 10, # wait 10 epochs with no improvement before reducing the learning rate
'factor': 0.5, # multiply the learning rate by 0.5
'min_lr': 1e-08} # stop reducing the learning rate once it gets to 1e-8
# set the dataset parameters
config["dataset"] = {'name': 'SegmentationDatasetPersistent', # 'Persistent' means that it will save the preprocessed outputs generated during the first epoch
# However, using 'Persistent', does also increase the time of the first epoch compared to the other epochs, which should run faster
'desired_shape': [128, 128, 128], # resize the images to this shape, increase this to get higher resolution images (increases computation time and memory usage)
'labels': [1, 3, 2], # 1: edema; 2: enhancing tumor, 3: necrotic center
'setup_label_hierarchy': True, # changes the labels to whole tumor (1, 3, 2), tumor core (3, 2), and enhancing tumor (2) to be consistent with the challenge
'normalization': 'NormalizeIntensityD', # z score normalize the input images to zero mean unit standard deviation
'normalization_kwargs': {'channel_wise': True, "nonzero": False}, # perform the normalization channel wise and include the background
'resample': True, # resample the images when resizing them, otherwise the resize could crop out regions of interest
'crop_foreground': True, # crop the foreground of the images
}
config["training"] = {'batch_size': 1, # number of image/label pairs to read at a time during training
'validation_batch_size': 1, # number of image/label pairs to read at atime during validation
'amp': False, # don't set this to true unless the model you are using is setup to use automatic mixed precision (AMP)
'early_stopping_patience': None, # stop the model early if the validaiton loss stops improving
'n_epochs': 250, # number of training epochs, reduce this if you don't want training to run as long
'save_every_n_epochs': None, # save the model every n epochs (otherwise only the latest model will be saved)
'save_last_n_models': None, # save the last n models
'save_best': True} # save the model that has the best validation loss
# get the training filenames
config["training_filenames"] = list()
# if your BraTS data is stored somewhere else, change this code to fetch that data
for subject_folder in sorted(glob.glob("BraTS2020_TrainingData/MICCAI_BraTS2020_TrainingData/*")):
if not os.path.isdir(subject_folder):
continue
image_filenames = sorted(glob.glob(os.path.join(subject_folder, "*.nii")))
for i in range(len(image_filenames)):
if "seg" in image_filenames[i].lower():
label = image_filenames.pop(i)
break
assert len(image_filenames) == 4
config["training_filenames"].append({"image": image_filenames, "label": label})
config["bratsvalidation_filenames"] = list() # "validation_filenames" is reserved for the cross-validation, so we will call this bratsvalidation_filenames
for subject_folder in sorted(glob.glob("BraTS2020_ValidationData/MICCAI_BraTS2020_ValidationData/*")):
if not os.path.isdir(subject_folder):
continue
image_filenames = sorted(glob.glob(os.path.join(subject_folder, "*.nii")))
assert len(image_filenames) == 4
config["bratsvalidation_filenames"].append({"image": image_filenames})
```
## GPU Memory Constraints and Input Size <a name="gpu"></a>
I find that an input size of 176x224x144 works well for 32GB V100 GPUs.
If you are getting out of memory errors, try decreasing the input/window size in increments of 16
(i.e. the next increment down would be 160x208x128).
Note that each input dimension must always be divisible by the number of downsampling layers squared.
The example configuration shown above has 4 downsampling layers (5 encoding layers total) and therefore each
dimension must be divisible by 16.

## Normalization <a name="norm"></a>
Normalization can utilize any function in the [normalize.py](../unet3d/utils/normalize.py) file.
To use multiple normalization functions in order, you may specify a list of normalization functions.
You may also specify ```normalization_kwargs``` to further refine the normalization techniques.
If you provided a list of normalization techniques, then any ```normalization_kwargs``` must be
listed under the name of the respective normalization function.
See [Normalization documentation](Normalization.md) for more details.

## Machine Configuration File <a name="machine"></a>
Rather than specifying the number of GPUs and threads on the command line, you can also make a configuration file for the machine you are using
and pass this using the ```--machine_config_filename``` flag.
Click [here](../machine_configs/v100_2gpu_32gb_config.json) to see an example machine configuration JSON file.
Click [here](../machine_configs/v100_2gpu_32gb_config.json) to see an example machine configuration JSON file.



29 changes: 27 additions & 2 deletions doc/FAQ.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,27 @@
# How do I fix "ValueError: num_samples should be a positive integer value, but got num_samples=0"?
This error comes up when the script can't find the data to train on. It usually can be fixed by modifying the "generate_filenames_kwargs" part of the config file. Otherwise, it is possible that you haven't downloaded the data yet.



## How can I make gif visualizations like those shown in the README?
You can use the [make_gif.py](../unet3d/scripts/make_gif.py) to make your own gif visualizations.

## How can I speed up model training?
If you are only using one thread (i.e., ```--nthreads 1```), then training will
likely be very slow. Determine how many threads the machine you are using has and use as
many as possible.

## How do I fix "ValueError: num_samples should be a positive integer value, but got num_samples=0"?
This error comes up when the script can't find the data to train on.
Check that the ```training_filenames``` and ```validation_filenames``` in the configuration file are valid.

## How much GPU memory do I need?
It is recommended to have at least 11GB of GPU memory.
See GPU Memory and Input Size section of the [Configuration File Guide](./Configuration.md) for instructions on how to
adjust the input image size to use less memory.

## Do I need to use a GPU?
You can run model inference without a GPU, but model training will take far too long without a GPU.





8 changes: 5 additions & 3 deletions doc/Normalization.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
Normalization options:
* zero_mean_normalize_image_data
Currently any MONAI normalization transform is supported.

Old options that no longer work but may be implemented in the future:
* zero_mean
z score normalization where the mean is divided by the standard deviation.
* foreground_zero_mean_normalize_image_data
Same as zero_mean_normalize_image_data except the foreground is masked and normalized while the background remains
Same as zero_mean except the foreground is masked and normalized while the background remains
the same.
* zero_floor_normalize_image_data
* zero_one_window
Expand Down
Binary file added doc/viz/ATLAS.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes.
File renamed without changes
2 changes: 0 additions & 2 deletions examples/abcs2020/README.md

This file was deleted.

Loading

0 comments on commit 198da24

Please sign in to comment.