Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 2.3.0 #344

Merged
merged 110 commits into from
Dec 21, 2023
Merged

Version 2.3.0 #344

merged 110 commits into from
Dec 21, 2023

Conversation

jamesdolezal
Copy link
Collaborator

Highlights

The highlight of Slideflow 2.3 is the introduction of whole-slide tissue segmentation. Both binary and multiclass tissue segmentation models can be trained from labeled ROIs and deployed for slide QC or used to generate ROIs. This release also adds CycleGAN-based stain normalization, as well as several smaller features and optimizations.

Table of Contents

  1. Tissue Segmentation
    a. Training segmentation models
    b. Using models for QC
    c. Generating ROIs
    d. Deploying in Studio
  2. CycleGAN Stain Normalization
  3. Other New Features
  4. Dependencies
  5. Known Issues

Tissue segmentation

tissue_seg.mp4

Slideflow now supports training and deploying tissue segmentation models, both via the programmatic interface as well as in Slideflow Studio. Tissue segmentation models can be trained in binary, multiclass, or multilabel mode using labeled ROIs. Tissue segmentation is performed at the whole-slide level, trained on randomly cropped sections of the slide thumbnail at a specified resolution.

Training segmentation models

Segmentation models are configured using SegmentConfig, which determines the segmentation architecture (U-Net, FPN, DeepLabV3, etc), image resolution for segmentation in microns-per-pixel (MPP), and other training parameters.

from slideflow import segment

# Create a config object
config = segment.SegmentConfig(mpp=20, mode='binary', arch='Unet')

Models can be trained with slideflow.segment.train(). Models will be saved in the given destination directory as model.pth, with an auto-generated segment_config.json file describing the architecture and parameters.

...

# Load a dataset
project = sf.Project(...)
dataset = project.dataset(...)

# Train the model
segment.train(config, dataset, dest='path/to/output')

Once trained, tissue segmentation models can either be used for slide-level QC or to generate ROIs.

Using models for QC

The new slideflow.slide.qc.Segment class provides an easy interface for generating QC masks from a segmentation model (e.g., for a model trained to identify tumor regions, pen marks, etc). This class takes a path to a trained segmentation model as an argument, and otherwise can be used for QC as outlined in the documentation.

import slideflow as sf
from slideflow.slide import qc

# Load the slide
wsi = sf.WSI('/path/to/slide', ...)

# Create the QC algorithm
segmenter = qc.Segment('/path/to/model.pth')

# Apply QC
applied_mask = wsi.qc(segmenter)

For multiclass segmentation models, qc.Segment provides additional arguments to customize how the model should be used for QC.

Generating ROIs

The same qc.Segment class can also be used to generate regions of interest (ROIs). Use Segment.generate_rois() to generate and apply ROIs to a single slide:

...

# Create a QC mask
segmenter = qc.Segment('/path/to/model.pth')

# Generate and apply ROIs to a slide
roi_outlines = segmenter.generate_rois(wsi)

Or use Dataset.generate_rois() to create ROIs for an entire dataset:

import slideflow as sf

# Load a project and dataset.
project = sf.load_project('path/to/project')
dataset = project.dataset()

# Generate ROIs for all slides in the dataset.
dataset.generate_rois('path/to/model.pth')

Deploying in Studio

The slide widget in Studio now has a "Segment" section. A trained segmentation model can be loaded and used for either QC or to generate ROIs. Further details regarding use are available in the documentation.

CycleGAN Stain Normalization

Slideflow now includes a CycleGAN-based stain normalizer, 'cyclegan'. Our implementation is based off of the work by Zingman et al. The stain normalization algorithm is a two-step process utilizing two separate GANs. The H&E image to be transformed is first converted via GAN-1 into Masson's Trichrome (MT), and then converted back to H&E via GAN-2. By default, pretrained weights provided by Zingman will be used, although custom weights can also be provided.

At present, CycleGAN stain normalization requires PyTorch. If you would like us to port GAN normalizers to the Tensorflow backend, please head to our ongoing Discussion and let us know!

This method can be used like any other stain normalizer:

# Configure training parameters
# to use CycleGAN stain normalization
params = sf.ModelParams(..., normalizer='cyclegan')

Other New Features

  • Stain normalizers can now augment an image without also normalizing, using the new .augment() method.

    import slideflow as sf
    
    # Get a Macenko normalizer
    macenko = sf.norm.autoselect('macenko')
    
    # Perform stain augmentation
    img = macenko.augment(img)
  • Expanded support for more tile aggregation methods, for reducing tile-level predictions to slide- or patient-level predictions. The reduce_method argument to Project.train() and .evaluate() now supports 'median', 'sum', 'min', and 'max' (in addition to the previously supported 'average' and 'proportion'), as well as arbitrary callable functions. For example, to define slide-level predictions as the 75th percentile of tile-level predictions:

    Project.train(
        ...
        reduce_method=lambda x: np.percentile(x, 75)
    )
  • New utility function Dataset.get_unique_roi_labels() for getting a list of all unique ROI labels in a dataset.

  • Improve inference speed of PyTorch feature extractors when called on uint8 images.

  • Much faster generation of tile-level predictions for MIL models

  • Add function sf.mil.get_mil_tile_predictions(), which functions the same as sf.mil.save_mil_tile_predictions() but returns a pandas dataframe

  • Add ability to calculate tile-level uncertainty for MIL models trained with UQ, by passing uq=True to sf.mil.get_mil_tile_predictions()

Dependencies

Dependencies are largely unchanged. Updates include:

  • Tissue segmentation requires the segmentation-models-pytorch package.

Known Issues

  • Tissue segmentation is performed at the whole-slide level (based on cropped thumbnails), and performs best at lower magnifications (microns-per-pixel of 10 or greater). Attempting to train or deploy a tissue segmentation model at higher magnification may significantly increase memory requirements. Optimization work is ongoing to reduce memory requirements when training and deploying tissue segmentation models that operate at higher magnification.

jamesdolezal and others added 30 commits November 1, 2023 07:23
- Save a slide alignment with `WSI.alignment.save(path)`
- Re-apply a saved alignment with
`WSI.load_alignment(path)` or `WSI.apply_alignment(Alignment.load(path)`
- If a PyTorch, GPU-enabled normalizer has a device set, use this device when calculating DatasetFeatures
- Move normalizers to a device when setting the `.device` attribute
- Change the preferred device for Reinhard PyTorch normalizer from 'gpu' to 'cuda'
- Not dropping this batch can lead to shape errors, issues with batch normalization, and other problems.
- Disable "use slide bounding boxes" option in Studio with cucim backend
- Alignment is now performed with reference to base slide dimensions (untransformed, without bounds or flipping/rotating). This is groundwork to allow alignments to be portable across tile sizes and slides (but potentially consistent between slide scanners)
- Reduce memory usage when generating images with CuCIM by using float32 instead of float64 during image conversion
- This initial support needs minor refactoring for efficiency and to ensure broader compatibility
- Add multiclass support to segmentation models
- Various bug fixes with segmentation training
- Multilabel segmentation support for qc.Segment
- Error is raised when a dataset has old/outdated index files that need regenerated. This fix circumvents the problem by regenerating index files before calculating/exporting features.
- Add segmentation documentation
- Switch from `loss_mode` to `mode`; add `lr` parameter
- Auto-detect `out_classes` from segmentation labels
- Fix bug with GPU stain augmentation in PyTorch (ValueError: Stain augmentation (n) requires a stain normalizer, which was not provided)
- Fix "AssertionError: Input tensor must be float" for some PyTorch models deployed in Studio
- Fix edge case where there is 1 tile in a slide
- Fix bug in Studio in instances where there are no tiles in a slide (e.g. a JPEG image smaller than the tile size)
- New refresh button for loading in user-trained cellpose models
- Fix error raised with whole-slide cell segmentation in Studio
- Fix inconsistent transparency issues with cell mask viewing in Studio
Small fix: flip mask in outsu if `roi_method == outside`
- Reducing tile-level predictions into slide- and patient-level predictions can now be done using arbitrary callable functions, by passing a callable function (e.g. lambda) to the argument `reduce_method`. Additional supported functions now also include 'median', 'sum', 'min', and 'max'.
@jamesdolezal jamesdolezal merged commit b117406 into master Dec 21, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants