Merge branch 'main' into 209-add-multirun-metric-report-recorder

kaiko-ai · Mar 11, 2024 · 9558f68 · 9558f68
2 parents 35f53dc + 3d9a36c
commit 9558f68
Show file tree

Hide file tree

Showing 4 changed files with 151 additions and 11 deletions.
diff --git a/docs/index.md b/docs/index.md
@@ -60,22 +60,25 @@ If you have your own labelled dataset, all that is needed is to implement a data
 
 We evaluated the following seven FMs on eva on the 4 supported WSI-patch-level image classification tasks:
 
-| FM-backbone                                                                | PCam - val*      | PCam - test*    | BACH - val**    | CRC - val**      | MHIST - val* |
-|----------------------------------------------------------------------------|------------------|-----------------|-----------------|------------------|--------------|
-| DINO ViT-S16 random weights                                                | 0.765 (±0.0036)  | 0.726 (±0.0024) | 0.416 (±0.014)  | 0.643 (±0.0046)	 | TBD          |
-| DINO ViT-S16 imagenet                                                      | 0.871 (±0.0039)  | 0.856 (±0.0044) | 0.673 (±0.0041) | 0.936 (±0.0009)  | TBD          |
-| DINO ViT-B8 imagenet	                                                      | 0.872 (±0.0013)  | 0.854 (±0.0015) | 0.704 (±0.008)  | 0.942 (±0.0005)  | TBD          |
-| Kaiko DINO ViT-S16	                                                        | 0.911 (±0.0017)  | 0.899 (±0.002)  | 0.773 (±0.0069) | 0.954 (±0.0012)  | TBD          |
-| Kaiko DINO ViT-B8                                                          | 0.902 (±0.0013)  | 0.887 (±0.0031) | 0.798 (±0.0063) | 0.949 (±0.0001)  | TBD          | 
-| Lunit - ViT-S16                                                            | 0.89 (±0.0009)   | 0.897 (±0.0029) | 0.765 (±0.0108) | TBD              | TBD          | 
-| Owkin - ViT base (from [HuggingFace](https://huggingface.co/owkin/phikon)) | 	0.914 (±0.0012) | 0.919 (±0.0082) | 0.717 (±0.0031) | TBD              | TBD          | 
+| FM-backbone                 | pretraining | PCam - val*      | PCam - test*    | BACH - val**    | CRC - val**     | MHIST - val* |
+|-----------------------------|-------------|------------------|-----------------|-----------------|-----------------|--------------|
+| DINO ViT-S16 random weights | N/A         | 0.765 (±0.0036) | 0.726 (±0.0024) | 0.416 (±0.014)  | 0.643 (±0.0046)	| 0.551 (±0.017)|
+| DINO ViT-S16 imagenet       | ImageNet    | 0.871 (±0.0039) | 0.856 (±0.0044) | 0.673 (±0.0041) | 0.936 (±0.0009) | 0.823 (±0.0051)|
+| DINO ViT-B8 imagenet	       | ImageNet    | 0.872 (±0.0013) | 0.854 (±0.0015) | 0.704 (±0.008)  | 0.942 (±0.0005) | 0.813 (±0.0026)|
+| Lunit - ViT-S16             | TCGA        | 0.89 (±0.0009) | 0.897 (±0.0029) | 0.765 (±0.0108) | 0.936 (±0.001)| 0.762 (±0.0032)| 
+| Owkin - iBOT ViT-B16        | TCGA        | 	0.914 (±0.0012) | 0.919 (±0.0082) | 0.717 (±0.0031) | 0.938 (±0.0005)| 0.799 (±0.0021)| 
+| kaiko.ai - DINO ViT-S16	    | TCGA        | 0.911 (±0.0017) | 0.899 (±0.002)  | 0.773 (±0.0069) | 0.954 (±0.0012) | 0.829 (±0.0035)|
+| kaiko.ai - DINO ViT-B8      | TCGA        | 0.902 (±0.0013) | 0.887 (±0.0031) | 0.798 (±0.0063) | 0.949 (±0.0001) | 0.803 (±0.0038)| 
 
 The reported performance metrics are *balanced binary accuracy* * and *balanced multiclass accuracy* **
 
-The runs used the deafult setup described in the section below. The table shows the average performance & standard deviation over 5 runs. To replicate those results yourself, refer to the [Tutorials](user-guide/tutorials.md).
+The runs used the default setup described in the section below. The table shows the average performance & standard deviation over 5 runs.
 
 ***eva*** trains the decoder on the "train" split and uses the "validation" split for monitoring, early stopping and checkpoint selection. Evaluation results are reported on the "validation" split and, if available, on the "test" split.
 
+For more details on the FM-backbones and instructions to replicate those results with ***eva***, refer to the [Replicate results section](user-guide/replicate_evaluations.md) 
+in the [User Guide](user-guide/index.md).
+
 ## Evaluation setup
 
 For WSI-patch-level/microscopy image classification tasks, FMs that produce image embeddings are evaluated with a single linear layer MLP with embeddings as inputs and label-predictions as output.

diff --git a/docs/user-guide/index.md b/docs/user-guide/index.md
@@ -2,4 +2,5 @@
 
 - [Getting started](getting_started.md)
 - [How to use eva](how_to_use.md)
-- [Tutorials](tutorials.md)
+- [Tutorials](tutorials.md)
+- [Replicate evaluations](replicate_evaluations.md)
diff --git a/docs/user-guide/replicate_evaluations.md b/docs/user-guide/replicate_evaluations.md
@@ -0,0 +1,135 @@
+# Replicate evaluations
+
+To produce the evaluation results presented [here](../index.md), you can run ***eva*** with the settings below.
+
+Make sure to replace `<task>` in the commands below with `bach`, `crc`, `mhist` or `patch_camelyon`.
+
+## DINO ViT-S16 (random weights)
+
+Evaluating the backbone with randomly initialized weights serves as a baseline to compare the pretrained FMs 
+to an FM that produces embeddings without any prior learning on image tasks. To evaluate, run:
+
+```
+# set environment variables:
+export PRETRAINED=false
+export EMBEDDINGS_DIR="./data/embeddings/dino_vits16_random/<task>"
+export DINO_BACKBONE=dino_vits16
+export CHECKPOINT_PATH=null
+export NORMALIZE_MEAN=[0.485,0.456,0.406]
+export NORMALIZE_STD=[0.229,0.224,0.225]
+
+# run eva:
+python -m eva predict_fit --config configs/vision/dino_vit/offline/<task>.yaml
+```
+
+## DINO ViT-S16 (ImageNet)
+
+The next baseline model, uses a pretrained ViT-S16 backbone with ImageNet weights. To evaluate, run:
+
+```
+# set environment variables:
+export PRETRAINED=true
+export EMBEDDINGS_DIR="./data/embeddings/dino_vits16_imagenet/<task>"
+export DINO_BACKBONE=dino_vits16
+export CHECKPOINT_PATH=null
+export NORMALIZE_MEAN=[0.485,0.456,0.406]
+export NORMALIZE_STD=[0.229,0.224,0.225]
+
+# run eva:
+python -m eva predict_fit --config configs/vision/dino_vit/offline/<task>.yaml
+```
+
+## DINO ViT-B8 (ImageNet)
+
+To evaluate performance on the larger ViT-B8 backbone pretrained on ImageNet, run:
+```
+# set environment variables:
+export PRETRAINED=true
+export EMBEDDINGS_DIR="./data/embeddings/dino_vitb8_imagenet/<task>"
+export DINO_BACKBONE=dino_vitb8
+export CHECKPOINT_PATH=null
+export NORMALIZE_MEAN=[0.485,0.456,0.406]
+export NORMALIZE_STD=[0.229,0.224,0.225]
+
+# run eva:
+python -m eva predict_fit --config configs/vision/dino_vit/offline/<task>.yaml
+```
+
+## Lunit - DINO ViT-S16 (TCGA)
+
+[Lunit](https://www.lunit.io/en), released the weights for a DINO ViT-S16 backbone, pretrained on TCGA data
+on [GitHub](https://github.com/lunit-io/benchmark-ssl-pathology/releases/). To evaluate, run:
+
+```
+# set environment variables:
+export PRETRAINED=false
+export EMBEDDINGS_DIR="./data/embeddings/dino_vits16_lunit/<task>"
+export DINO_BACKBONE=dino_vits16
+export CHECKPOINT_PATH="https://github.com/lunit-io/benchmark-ssl-pathology/releases/download/pretrained-weights/dino_vit_small_patch16_ep200.torch"
+export NORMALIZE_MEAN=[0.70322989,0.53606487,0.66096631]
+export NORMALIZE_STD=[0.21716536,0.26081574,0.20723464]
+
+# run eva:
+python -m eva predict_fit --config configs/vision/dino_vit/offline/<task>.yaml
+```
+
+## Owkin - iBOT ViT-B16 (TCGA)
+
+[Owkin](https://www.owkin.com/) released the weights for "Phikon", an FM trained with iBOT on TCGA data, via
+[HuggingFace](https://huggingface.co/owkin/phikon). To evaluate, run:
+
+```
+# set environment variables:
+export EMBEDDINGS_DIR="./data/embeddings/dino_vitb16_owkin/<task>
+
+# run eva:
+python -m eva predict_fit --config configs/vision/owkin/phikon/offline/<task>.yaml
+```
+
+Note: since ***eva*** provides the config files to evaluate tasks with the Phikon FM in 
+"configs/vision/owkin/phikon/offline", it is not necessary to set the environment variables needed for
+the runs above.
+
+## kaiko.ai - DINO ViT-S16 (TCGA)
+
+To evaluate [kaiko.ai's](https://www.kaiko.ai/) FM with DINO ViT-S16 backbone, pretrained on TCGA data 
+on [GitHub](https://github.com/lunit-io/benchmark-ssl-pathology/releases/), run:
+
+```
+# set environment variables:
+export PRETRAINED=false
+export EMBEDDINGS_DIR="./data/embeddings/dino_vits16_kaiko/<task>"
+export DINO_BACKBONE=dino_vits16
+export CHECKPOINT_PATH=[TBD*]
+export NORMALIZE_MEAN=[0.5,0.5,0.5]
+export NORMALIZE_STD=[0.5,0.5,0.5]
+
+# run eva:
+python -m eva predict_fit --config configs/vision/dino_vit/offline/<task>.yaml
+```
+
+\* path to public checkpoint will be added when available, currently the checkpoint is stored on Azure blob storage:
+"kaiko/ml-outputs/experiments/pathology_fm/tcga/20240209/dino_vitb16/version_0/checkpoints/teacher.backbone/last.pth"
+
+
+
+## kaiko.ai - DINO ViT-B8 (TCGA)
+
+To evaluate [kaiko.ai's](https://www.kaiko.ai/) FM with the larger DINO ViT-B8 backbone, pretrained on TCGA data 
+on [GitHub](https://github.com/lunit-io/benchmark-ssl-pathology/releases/), run:
+
+```
+# set environment variables:
+export PRETRAINED=false
+export EMBEDDINGS_DIR="./data/embeddings/dino_vitb8_kaiko/<task>"
+export DINO_BACKBONE=dino_vitb8
+export CHECKPOINT_PATH=[TBD*]
+export NORMALIZE_MEAN=[0.5,0.5,0.5]
+export NORMALIZE_STD=[0.5,0.5,0.5]
+
+# run eva:
+python -m eva predict_fit --config configs/vision/dino_vit/offline/<task>.yaml
+```
+
+\* path to public checkpoint will be added when available, currently the checkpoint is stored on Azure blob storage:
+"kaiko/ml-outputs/experiments/pathology_fm/tcga/20240209/dino_vitb8/version_1/checkpoints/teacher.backbone/last.pth"
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -33,6 +33,7 @@ nav:
       - Getting started: user-guide/getting_started.md
       - How to use eva: user-guide/how_to_use.md
       - Tutorials: user-guide/tutorials.md
+      - Replicate evaluations: user-guide/replicate_evaluations.md
   - Reference API:
     - reference/index.md
     - Interface: reference/interface.md