Skip to content

Commit

Permalink
Merge branch 'main' into 209-add-multirun-metric-report-recorder
Browse files Browse the repository at this point in the history
  • Loading branch information
ioangatop authored Mar 11, 2024
2 parents 35f53dc + 3d9a36c commit 9558f68
Show file tree
Hide file tree
Showing 4 changed files with 151 additions and 11 deletions.
23 changes: 13 additions & 10 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,22 +60,25 @@ If you have your own labelled dataset, all that is needed is to implement a data

We evaluated the following seven FMs on eva on the 4 supported WSI-patch-level image classification tasks:

| FM-backbone | PCam - val* | PCam - test* | BACH - val** | CRC - val** | MHIST - val* |
|----------------------------------------------------------------------------|------------------|-----------------|-----------------|------------------|--------------|
| DINO ViT-S16 random weights | 0.765 (±0.0036) | 0.726 (±0.0024) | 0.416 (±0.014) | 0.643 (±0.0046) | TBD |
| DINO ViT-S16 imagenet | 0.871 (±0.0039) | 0.856 (±0.0044) | 0.673 (±0.0041) | 0.936 (±0.0009) | TBD |
| DINO ViT-B8 imagenet | 0.872 (±0.0013) | 0.854 (±0.0015) | 0.704 (±0.008) | 0.942 (±0.0005) | TBD |
| Kaiko DINO ViT-S16 | 0.911 (±0.0017) | 0.899 (±0.002) | 0.773 (±0.0069) | 0.954 (±0.0012) | TBD |
| Kaiko DINO ViT-B8 | 0.902 (±0.0013) | 0.887 (±0.0031) | 0.798 (±0.0063) | 0.949 (±0.0001) | TBD |
| Lunit - ViT-S16 | 0.89 (±0.0009) | 0.897 (±0.0029) | 0.765 (±0.0108) | TBD | TBD |
| Owkin - ViT base (from [HuggingFace](https://huggingface.co/owkin/phikon)) | 0.914 (±0.0012) | 0.919 (±0.0082) | 0.717 (±0.0031) | TBD | TBD |
| FM-backbone | pretraining | PCam - val* | PCam - test* | BACH - val** | CRC - val** | MHIST - val* |
|-----------------------------|-------------|------------------|-----------------|-----------------|-----------------|--------------|
| DINO ViT-S16 random weights | N/A | 0.765 (±0.0036) | 0.726 (±0.0024) | 0.416 (±0.014) | 0.643 (±0.0046) | 0.551 (±0.017)|
| DINO ViT-S16 imagenet | ImageNet | 0.871 (±0.0039) | 0.856 (±0.0044) | 0.673 (±0.0041) | 0.936 (±0.0009) | 0.823 (±0.0051)|
| DINO ViT-B8 imagenet | ImageNet | 0.872 (±0.0013) | 0.854 (±0.0015) | 0.704 (±0.008) | 0.942 (±0.0005) | 0.813 (±0.0026)|
| Lunit - ViT-S16 | TCGA | 0.89 (±0.0009) | 0.897 (±0.0029) | 0.765 (±0.0108) | 0.936 (±0.001)| 0.762 (±0.0032)|
| Owkin - iBOT ViT-B16 | TCGA | 0.914 (±0.0012) | 0.919 (±0.0082) | 0.717 (±0.0031) | 0.938 (±0.0005)| 0.799 (±0.0021)|
| kaiko.ai - DINO ViT-S16 | TCGA | 0.911 (±0.0017) | 0.899 (±0.002) | 0.773 (±0.0069) | 0.954 (±0.0012) | 0.829 (±0.0035)|
| kaiko.ai - DINO ViT-B8 | TCGA | 0.902 (±0.0013) | 0.887 (±0.0031) | 0.798 (±0.0063) | 0.949 (±0.0001) | 0.803 (±0.0038)|

The reported performance metrics are *balanced binary accuracy* * and *balanced multiclass accuracy* **

The runs used the deafult setup described in the section below. The table shows the average performance & standard deviation over 5 runs. To replicate those results yourself, refer to the [Tutorials](user-guide/tutorials.md).
The runs used the default setup described in the section below. The table shows the average performance & standard deviation over 5 runs.

***eva*** trains the decoder on the "train" split and uses the "validation" split for monitoring, early stopping and checkpoint selection. Evaluation results are reported on the "validation" split and, if available, on the "test" split.

For more details on the FM-backbones and instructions to replicate those results with ***eva***, refer to the [Replicate results section](user-guide/replicate_evaluations.md)
in the [User Guide](user-guide/index.md).

## Evaluation setup

For WSI-patch-level/microscopy image classification tasks, FMs that produce image embeddings are evaluated with a single linear layer MLP with embeddings as inputs and label-predictions as output.
Expand Down
3 changes: 2 additions & 1 deletion docs/user-guide/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,5 @@

- [Getting started](getting_started.md)
- [How to use eva](how_to_use.md)
- [Tutorials](tutorials.md)
- [Tutorials](tutorials.md)
- [Replicate evaluations](replicate_evaluations.md)
135 changes: 135 additions & 0 deletions docs/user-guide/replicate_evaluations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# Replicate evaluations

To produce the evaluation results presented [here](../index.md), you can run ***eva*** with the settings below.

Make sure to replace `<task>` in the commands below with `bach`, `crc`, `mhist` or `patch_camelyon`.

## DINO ViT-S16 (random weights)

Evaluating the backbone with randomly initialized weights serves as a baseline to compare the pretrained FMs
to an FM that produces embeddings without any prior learning on image tasks. To evaluate, run:

```
# set environment variables:
export PRETRAINED=false
export EMBEDDINGS_DIR="./data/embeddings/dino_vits16_random/<task>"
export DINO_BACKBONE=dino_vits16
export CHECKPOINT_PATH=null
export NORMALIZE_MEAN=[0.485,0.456,0.406]
export NORMALIZE_STD=[0.229,0.224,0.225]
# run eva:
python -m eva predict_fit --config configs/vision/dino_vit/offline/<task>.yaml
```

## DINO ViT-S16 (ImageNet)

The next baseline model, uses a pretrained ViT-S16 backbone with ImageNet weights. To evaluate, run:

```
# set environment variables:
export PRETRAINED=true
export EMBEDDINGS_DIR="./data/embeddings/dino_vits16_imagenet/<task>"
export DINO_BACKBONE=dino_vits16
export CHECKPOINT_PATH=null
export NORMALIZE_MEAN=[0.485,0.456,0.406]
export NORMALIZE_STD=[0.229,0.224,0.225]
# run eva:
python -m eva predict_fit --config configs/vision/dino_vit/offline/<task>.yaml
```

## DINO ViT-B8 (ImageNet)

To evaluate performance on the larger ViT-B8 backbone pretrained on ImageNet, run:
```
# set environment variables:
export PRETRAINED=true
export EMBEDDINGS_DIR="./data/embeddings/dino_vitb8_imagenet/<task>"
export DINO_BACKBONE=dino_vitb8
export CHECKPOINT_PATH=null
export NORMALIZE_MEAN=[0.485,0.456,0.406]
export NORMALIZE_STD=[0.229,0.224,0.225]
# run eva:
python -m eva predict_fit --config configs/vision/dino_vit/offline/<task>.yaml
```

## Lunit - DINO ViT-S16 (TCGA)

[Lunit](https://www.lunit.io/en), released the weights for a DINO ViT-S16 backbone, pretrained on TCGA data
on [GitHub](https://github.com/lunit-io/benchmark-ssl-pathology/releases/). To evaluate, run:

```
# set environment variables:
export PRETRAINED=false
export EMBEDDINGS_DIR="./data/embeddings/dino_vits16_lunit/<task>"
export DINO_BACKBONE=dino_vits16
export CHECKPOINT_PATH="https://github.com/lunit-io/benchmark-ssl-pathology/releases/download/pretrained-weights/dino_vit_small_patch16_ep200.torch"
export NORMALIZE_MEAN=[0.70322989,0.53606487,0.66096631]
export NORMALIZE_STD=[0.21716536,0.26081574,0.20723464]
# run eva:
python -m eva predict_fit --config configs/vision/dino_vit/offline/<task>.yaml
```

## Owkin - iBOT ViT-B16 (TCGA)

[Owkin](https://www.owkin.com/) released the weights for "Phikon", an FM trained with iBOT on TCGA data, via
[HuggingFace](https://huggingface.co/owkin/phikon). To evaluate, run:

```
# set environment variables:
export EMBEDDINGS_DIR="./data/embeddings/dino_vitb16_owkin/<task>
# run eva:
python -m eva predict_fit --config configs/vision/owkin/phikon/offline/<task>.yaml
```

Note: since ***eva*** provides the config files to evaluate tasks with the Phikon FM in
"configs/vision/owkin/phikon/offline", it is not necessary to set the environment variables needed for
the runs above.

## kaiko.ai - DINO ViT-S16 (TCGA)

To evaluate [kaiko.ai's](https://www.kaiko.ai/) FM with DINO ViT-S16 backbone, pretrained on TCGA data
on [GitHub](https://github.com/lunit-io/benchmark-ssl-pathology/releases/), run:

```
# set environment variables:
export PRETRAINED=false
export EMBEDDINGS_DIR="./data/embeddings/dino_vits16_kaiko/<task>"
export DINO_BACKBONE=dino_vits16
export CHECKPOINT_PATH=[TBD*]
export NORMALIZE_MEAN=[0.5,0.5,0.5]
export NORMALIZE_STD=[0.5,0.5,0.5]
# run eva:
python -m eva predict_fit --config configs/vision/dino_vit/offline/<task>.yaml
```

\* path to public checkpoint will be added when available, currently the checkpoint is stored on Azure blob storage:
"kaiko/ml-outputs/experiments/pathology_fm/tcga/20240209/dino_vitb16/version_0/checkpoints/teacher.backbone/last.pth"



## kaiko.ai - DINO ViT-B8 (TCGA)

To evaluate [kaiko.ai's](https://www.kaiko.ai/) FM with the larger DINO ViT-B8 backbone, pretrained on TCGA data
on [GitHub](https://github.com/lunit-io/benchmark-ssl-pathology/releases/), run:

```
# set environment variables:
export PRETRAINED=false
export EMBEDDINGS_DIR="./data/embeddings/dino_vitb8_kaiko/<task>"
export DINO_BACKBONE=dino_vitb8
export CHECKPOINT_PATH=[TBD*]
export NORMALIZE_MEAN=[0.5,0.5,0.5]
export NORMALIZE_STD=[0.5,0.5,0.5]
# run eva:
python -m eva predict_fit --config configs/vision/dino_vit/offline/<task>.yaml
```

\* path to public checkpoint will be added when available, currently the checkpoint is stored on Azure blob storage:
"kaiko/ml-outputs/experiments/pathology_fm/tcga/20240209/dino_vitb8/version_1/checkpoints/teacher.backbone/last.pth"
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ nav:
- Getting started: user-guide/getting_started.md
- How to use eva: user-guide/how_to_use.md
- Tutorials: user-guide/tutorials.md
- Replicate evaluations: user-guide/replicate_evaluations.md
- Reference API:
- reference/index.md
- Interface: reference/interface.md
Expand Down

0 comments on commit 9558f68

Please sign in to comment.