Dissecting Self-Supervised Learning Methods for Surgical Computer Vision

Sanat Ramesh, Vinkle Srivastav, Deepak Alapatt, Tong Yu, Aditya Murali, Luca Sestini, Chinedu Innocent Nwoye, Idris Hamoud, Saurav Sharma, Antoine Fleurentin, Georgios Exarchakis, Alexandros Karargyris, Nicolas Padoy, 2022

News

[ 05/06/2023 ]: Added training and evaluation scripts for surgical triplet recognition. Follow readme_triplet

Introduction

The field of surgical computer vision has undergone considerable breakthroughs in recent years with the rising popularity of deep neural network-based methods. However, standard fully-supervised approaches for training such models require vast amounts of annotated data, imposing a prohibitively high cost; especially in the clinical domain. Self-Supervised Learning (SSL) methods, which have begun to gain traction in the general computer vision community, represent a potential solution to these annotation costs, allowing to learn useful representations from only unlabeled data. Still, the effectiveness of SSL methods in more complex and impactful domains, such as medicine and surgery, remains limited and unexplored. In this work, we address this critical need by investigating four state-of-the-art SSL methods (MoCo v2, SimCLR, DINO, SwAV) in the context of surgical computer vision. We present an extensive analysis of the performance of these methods on the Cholec80 dataset for two fundamental and popular tasks in surgical context understanding, phase recognition and tool presence detection. We examine their parameterization, then their behavior with respect to training data quantities in semi-supervised settings. Correct transfer of these methods to surgery, as described and conducted in this work, leads to substantial performance gains over generic uses of SSL - up to 7.4% on phase recognition and 20% on tool presence detection - as well as state-of-the-art semi-supervised phase recognition approaches by up to 14%. Further results obtained on a highly diverse selection of surgical datasets exhibit strong generalization properties.

Main takeaways from the paper

[1] Benchmarking of four state-of-the-art SSL methods ( MoCo v2, SimCLR, SwAV, and DINO) in the surgical domain.

[2] Thorough experimentation (∼200 experiments, 7000 GPU hours) and analysis of different design settings - data augmentations, batch size, training duration, frame rate, and initialization - highlighting a need for and intuitions towards designing principled approaches for domain transfer of SSL methods.

[3] In-depth analysis on the adaptation of these methods, originally developed using other datasets and tasks, to the surgical domain with a comprehensive set of evaluation protocols, spanning 10 surgical vision tasks in total performed on 6 datasets: Cholec80, CholecT50, HeiChole, Endoscapes, CATARACTS, and CaDIS.

[4] Extensive evaluation (∼280 experiments, 2000 GPU hours) of the scalability of these methods to various amounts of labeled and unlabeled data through an exploration of both fully and semi-supervised settings.

In this repo we provide:

Self-supervised weights trained on cholec80 dataset using four state-of-the-art SSL methods (MOCO V2, SimCLR, SwAV, and DINO).
Self-supervised pre-training scripts.
Downstream fine-tuning scripts for surgical phase recognition (linear fine-tuning and TCN fine-tuning).
Downstream fine-tuning scripts for surgical tool recognition (linear fine-tuning).
Downstream fine-tuning scripts for surgical triplet recognition (linear fine-tuning).

Get Started

Datasets and imagenet checkpoints

Follow the steps for cholec80 dataset preparation and setting up imagenet checkpoints:

# 1. Cholec80 phase and tool labels for different splits
> git clone https://github.com/CAMMA-public/SelfSupSurg
> SelfSupSurg=$(pwd)/SelfSupSurg
> cd $SelfSupSurg/datasets/cholec80
> wget https://s3.unistra.fr/camma_public/github/selfsupsurg/ch80_labels.zip
> unzip -q ch80_labels.zip && rm ch80_labels.zip
# 2. Cholec80 frames:  
# a) Download cholec80 dataset: 
#      - Fill this google form: https://docs.google.com/forms/d/1GwZFM3-GhEduBs1d5QzbfFksKmS1OqXZAz8keYi-wKI  
#       (the link is also available on the CAMMA website: http://camma.u-strasbg.fr/datasets)
# b) Copy the videos in datasets/cholec80/videos 
# Extract frames using the following script (you need OpenCV and numpy)
> cd $SelfSupSurg
> python utils/extract_frames_ch80.py
# 3. Download Imagenet fully supervised and self-supervised weights
> cd $SelfSupSurg/checkpoints/defaults/resnet_50
> wget https://s3.unistra.fr/camma_public/github/selfsupsurg/imagenet_ckpts.zip
> unzip -q imagenet_ckpts.zip && rm imagenet_ckpts.zip

Directory structure should look as follows.

$SelSupSurg/
└── datasets/cholec80/
    ├── frames/
        ├── train/
            └── video01/
            └── video02/
            ...
        ├── val/
            └── video41/
            └── video42/
            ...
        ├── test/
            └── video49/
            └── video50/
            ...
    ├── labels/
        ├── train/
            └── 1fps_12p5_0.pickle
            └── 1fps_12p5_1.pickle
            ...
        ├── val/
            └── 1fps.pickle
            └── 3fps.pickle
            ...
        ├── test/
            └── 1fps.pickle
            └── 3fps.pickle
            ...        
    └── classweights/
        ├── train/
            └── 1fps_12p5_0.pickle
            └── 1fps_12p5_1.pickle
                ...
    ...
    └── checkpoints/defaults/resnet_50/
        └── resnet50-19c8e357.pth
        └── moco_v2_800ep_pretrain.pth.tar
        └── simclr_rn50_800ep_simclr_8node_resnet_16_07_20.7e8feed1.torch
        └── swav_in1k_rn50_800ep_swav_8node_resnet_27_07_20.a0a6b676.torch
        └── dino_resnet50_pretrain.pth

Installation

You need to have a Anaconda3 installed for the setup. We developed the code on the Ubuntu 20.04, Python 3.8, PyTorch 1.7.1, and CUDA 10.2 using V100 GPU.

> cd $SelfSupSurg
> conda create -n selfsupsurg python=3.8 && conda activate selfsupsurg
# install dependencies 
(selfsupsurg)>conda install -y pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=10.2 -c pytorch 
(selfsupsurg)>pip install opencv-python
(selfsupsurg)>pip install openpyxl==3.0.7
(selfsupsurg)>pip install pandas==1.3.2
(selfsupsurg)>pip install scikit-learn
(selfsupsurg)>pip install easydict
(selfsupsurg)>pip install apex -f https://dl.fbaipublicfiles.com/vissl/packaging/apexwheels/py38_cu102_pyt171/download.html
(selfsupsurg)>cd $SelfSupSurg/ext_libs
(selfsupsurg)>git clone https://github.com/facebookresearch/ClassyVision.git && cd ClassyVision
(selfsupsurg)>git checkout 659d7f788c941a8c0d08dd74e198b66bd8afa7f5 && pip install -e .
(selfsupsurg)>cd ../ && git clone --recursive https://github.com/facebookresearch/vissl.git && cd ./vissl/
(selfsupsurg)>git checkout 65f2c8d0efdd675c68a0dfb110aef87b7bb27a2b
(selfsupsurg)>pip install --progress-bar off -r requirements.txt
(selfsupsurg)>pip install -e .[dev] && cd $SelfSupSurg
(selfsupsurg)>cp -r ./vissl/vissl/* $SelfSupSurg/ext_libs/vissl/vissl/

Modify `$SelfSupSurg/ext_libs/vissl/configs/config/dataset_catalog.json` by appending the following key/value pair to the end of the dictionary

"surgery_datasets": {
    "train": ["<img_path>", "<lbl_path>"],
    "val": ["<img_path>", "<lbl_path>"],
    "test": ["<img_path>", "<lbl_path>"]
}

Pre-training

Run the folllowing code for the pre-training of MoCo v2, SimCLR, SwAV, and DINO methods on the Cholec80 dataset with 4 GPUS.

# MoCo v2
(selfsupsurg)>cfg=hparams/cholec80/pre_training/cholec_to_cholec/series_01/h001.yaml
(selfsupsurg)>python main.py -hp $cfg -m self_supervised
# SimCLR
(selfsupsurg)>cfg=hparams/cholec80/pre_training/cholec_to_cholec/series_01/h002.yaml
(selfsupsurg)>python main.py -hp $cfg -m self_supervised
# SwAV
(selfsupsurg)>cfg=hparams/cholec80/pre_training/cholec_to_cholec/series_01/h003.yaml
(selfsupsurg)>python main.py -hp $cfg -m self_supervised
# DINO 
(selfsupsurg)>cfg=hparams/cholec80/pre_training/cholec_to_cholec/series_01/h004.yaml
(selfsupsurg)>python main.py -hp $cfg -m self_supervised

Model Weights for the pre-training experiments

Model	Model Weights
MoCo V2	download
SimCLR	download
SwAV	download
DINO	download

Downstream finetuning

First perform pre-training using the above scripts or download the pre-trained weights and copy them into the appropriate directories, shown below

# MoCo v2
(selfsupsurg)>mkdir -p runs/cholec80/pre_training/cholec_to_cholec/series_01/run_001/ \
               && cp model_final_checkpoint_moco_v2_surg.torch runs/cholec80/pre_training/cholec_to_cholec/series_01/run_001/
# SimCLR
(selfsupsurg)>mkdir -p runs/cholec80/pre_training/cholec_to_cholec/series_01/run_002/ \
               && cp model_final_checkpoint_simclr_surg.torch runs/cholec80/pre_training/cholec_to_cholec/series_01/run_002/
# SwAV
(selfsupsurg)>mkdir -p runs/cholec80/pre_training/cholec_to_cholec/series_01/run_003/ \
               && cp model_final_checkpoint_swav_surg.torch runs/cholec80/pre_training/cholec_to_cholec/series_01/run_003/
# DINO 
(selfsupsurg)>mkdir -p runs/cholec80/pre_training/cholec_to_cholec/series_01/run_004/ \
               && cp model_final_checkpoint_dino_surg.torch runs/cholec80/pre_training/cholec_to_cholec/series_01/run_004/

1. Surgical phase recognition (Linear Finetuning)

The config files for the surgical phase recognition linear finetuning experiments are in cholec80 pre-training init and imagenet init. The config files are organized as follows:

config_files

# config files for the proposed pre-training init from cholec80 are oraganized as follows:
├── cholec_to_cholec/series_01/test/phase
    ├── 100 #(100 % of cholec 80)
    │   └── 0 #(split 0)
    │       ├── h001.yaml # MoCo V2 Surg
    │       ├── h002.yaml # SimCLR Surg
    │       ├── h003.yaml # SwAV Surg
    │       └── h004.yaml # DINO Surg
    ├── 12.5 #(12.5 % of cholec 80 dataset)
    │   ├── 0 #(split 0)
    │   │   ├── h001.yaml # MoCo V2 Surg
    │   │   ├── h002.yaml # SimCLR Surg
    │   │   ├── h003.yaml # SwAV Surg
    │   │   └── h004.yaml # DINO Surg
    │   ├── 1 #(split 1)
    │   │   ├── h001.yaml # MoCo V2 Surg
    │   │   ├── h002.yaml # SimCLR Surg
    │   │   ├── h003.yaml # SwAV Surg
    │   │   └── h004.yaml # DINO Surg
    │   ├── 2 #(split 2)
    │   │   ├── h001.yaml # MoCo V2 Surg
    │   │   ├── h002.yaml # SimCLR Surg
    │   │   ├── h003.yaml # SwAV Surg
    │   │   └── h004.yaml # DINO Surg
    └── 25 #(25 % of cholec 80 dataset)
        ├── 0 #(split 0)
        │   ├── h001.yaml # MoCo V2 Surg
        │   ├── h002.yaml # SimCLR Surg
        │   ├── h003.yaml # SwAV Surg
        │   └── h004.yaml # DINO Surg
        ├── 1 #(split 1)
        │   ├── h001.yaml # MoCo V2 Surg
        │   ├── h002.yaml # SimCLR Surg
        │   ├── h003.yaml # SwAV Surg
        │   └── h004.yaml # DINO Surg
        ├── 2 #(split 2)
        │   ├── h001.yaml # MoCo V2 Surg
        │   ├── h002.yaml # SimCLR Surg
        │   ├── h003.yaml # SwAV Surg
        │   └── h004.yaml # DINO Surg
# config files for the baselines imagenet to cholec80 are oraganized as follows:
├── imagenet_to_cholec/series_01/test/phase
    ├── 100 #(100 % of cholec 80)
    │   └── 0 #(split 0)
    │       ├── h001.yaml # Fully-supervised imagenet
    │       ├── h002.yaml # MoCo V2 imagenet
    │       ├── h003.yaml # SimCLR imagenet
    │       ├── h004.yaml # SwAV imagenet
    │       └── h005.yaml # DINO imagenet
    ├── 12.5 #(12.5 % of cholec 80 dataset)
    │   ├── 0 #(split 0)
    │   │   ├── h001.yaml # Fully-supervised imagenet
    │   │   ├── h002.yaml # MoCo V2 imagenet
    │   │   ├── h003.yaml # SimCLR  imagenet
    │   │   ├── h004.yaml # SwAV  imagenet
    │   │   └── h005.yaml # DINO imagenet
    │   ├── 1 #(split 1)
    │   │   ├── h001.yaml # Fully-supervised imagenet 
    │   │   ├── h002.yaml # MoCo V2 imagenet
    │   │   ├── h003.yaml # SimCLR imagenet
    │   │   ├── h004.yaml # SwAV imagenet
    │   │   └── h005.yaml # DINO imagenet
    │   ├── 2 #(split 2)
    │   │   ├── h001.yaml # Fully-supervised imagenet 
    │   │   ├── h002.yaml # MoCo V2 imagenet
    │   │   ├── h003.yaml # SimCLR imagenet
    │   │   ├── h004.yaml # SwAV imagenet
    │   │   └── h005.yaml # DINO imagenet
    └── 25 #(25 % of cholec 80 dataset)
        ├── 0 #(split 0)
        │   ├── h001.yaml # Fully-supervised imagenet
        │   ├── h002.yaml # MoCo V2 imagenet
        │   ├── h003.yaml # SimCLR imagenet
        │   ├── h004.yaml # SwAV imagenet
        │   └── h005.yaml # DINO imagenet
        ├── 1 #(split 1)
        │   ├── h001.yaml # Fully-supervised imagenet
        │   ├── h002.yaml # MoCo V2 imagenet
        │   ├── h003.yaml # SimCLR imagenet
        │   ├── h004.yaml # SwAV imagenet
        │   ├── h005.yaml # DINO imagenet
        ├── 2 #(split 2)
        │   ├── h001.yaml # Fully-supervised imagenet
        │   ├── h002.yaml # MoCo V2 imagenet
        │   ├── h003.yaml # SimCLR imagenet
        │   ├── h004.yaml # SwAV imagenet
        │   └── h005.yaml # DINO imagenet

Examples commands for surgical phase linear fine-tuning. It uses 4 GPUS for the training

# Example 1, run the following command for linear fine-tuning, initialized with MoCO V2 weights 
# on 25% of cholec80 data (split 0).
(selfsupsurg)>cfg=hparams/cholec80/finetuning/cholec_to_cholec/series_01/test/phase/25/0/h001.yaml
(selfsupsurg)>python main.py -hp $cfg -m supervised

# Example 2, run the following command for linear fine-tuning, initialized with SimCLR weights 
# on 12.5% of cholec80 data (split 1).
(selfsupsurg)>cfg=hparams/cholec80/finetuning/cholec_to_cholec/series_01/test/phase/12.5/1/h002.yaml
(selfsupsurg)>python main.py -hp $cfg -m supervised

# Example 3, run the following command for linear fine-tuning, initialized with 
# imagenet MoCO v2 weights on 12.5% of cholec80 data (split 2).
(selfsupsurg)>cfg=hparams/cholec80/finetuning/imagenet_to_cholec/series_01/test/phase/12.5/2/h002.yaml
(selfsupsurg)>python main.py -hp $cfg -m supervised

2. Surgical phase recognition (TCN Finetuning)

The config files for the surgical phase recognition TCN finetuning experiments are in cholec80 pre-training init and imagenet init. The config files are organized as follows:

config_files

# config files for the proposed pre-training init from cholec80 are oraganized as follows:
├── cholec_to_cholec/series_01/test/phase_tcn
    ├── 100 #(100 % of cholec 80)
    │   └── 0 #(split 0)
    │       ├── h001.yaml # MoCo V2 Surg
    │       ├── h002.yaml # SimCLR Surg
    │       ├── h003.yaml # SwAV Surg
    │       └── h004.yaml # DINO Surg
    ├── 12.5 #(12.5 % of cholec 80 dataset)
    │   ├── 0 #(split 0)
    │   │   ├── h001.yaml # MoCo V2 Surg
    │   │   ├── h002.yaml # SimCLR Surg
    │   │   ├── h003.yaml # SwAV Surg
    │   │   └── h004.yaml # DINO Surg
    │   ├── 1 #(split 1)
    │   │   ├── h001.yaml # MoCo V2 Surg
    │   │   ├── h002.yaml # SimCLR Surg
    │   │   ├── h003.yaml # SwAV Surg
    │   │   └── h004.yaml # DINO Surg
    │   ├── 2 #(split 2)
    │   │   ├── h001.yaml # MoCo V2 Surg
    │   │   ├── h002.yaml # SimCLR Surg
    │   │   ├── h003.yaml # SwAV Surg
    │   │   └── h004.yaml # DINO Surg
    └── 25 #(25 % of cholec 80 dataset)
        ├── 0 #(split 0)
        │   ├── h001.yaml # MoCo V2 Surg
        │   ├── h002.yaml # SimCLR Surg
        │   ├── h003.yaml # SwAV Surg
        │   └── h004.yaml # DINO Surg
        ├── 1 #(split 1)
        │   ├── h001.yaml # MoCo V2 Surg
        │   ├── h002.yaml # SimCLR Surg
        │   ├── h003.yaml # SwAV Surg
        │   └── h004.yaml # DINO Surg
        ├── 2 #(split 2)
        │   ├── h001.yaml # MoCo V2 Surg
        │   ├── h002.yaml # SimCLR Surg
        │   ├── h003.yaml # SwAV Surg
        │   └── h004.yaml # DINO Surg
# config files for the baselines imagenet to cholec80 are oraganized as follows:
├── imagenet_to_cholec/series_01/test/phase_tcn
    ├── 100 #(100 % of cholec 80)
    │   └── 0 #(split 0)
    │       ├── h001.yaml # Fully-supervised imagenet
    │       ├── h002.yaml # MoCo V2 imagenet
    │       ├── h003.yaml # SimCLR imagenet
    │       ├── h004.yaml # SwAV imagenet
    │       └── h005.yaml # DINO imagenet
    ├── 12.5 #(12.5 % of cholec 80 dataset)
    │   ├── 0 #(split 0)
    │   │   ├── h001.yaml # Fully-supervised imagenet
    │   │   ├── h002.yaml # MoCo V2 imagenet
    │   │   ├── h003.yaml # SimCLR  imagenet
    │   │   ├── h004.yaml # SwAV  imagenet
    │   │   └── h005.yaml # DINO imagenet
    │   ├── 1 #(split 1)
    │   │   ├── h001.yaml # Fully-supervised imagenet 
    │   │   ├── h002.yaml # MoCo V2 imagenet
    │   │   ├── h003.yaml # SimCLR imagenet
    │   │   ├── h004.yaml # SwAV imagenet
    │   │   └── h005.yaml # DINO imagenet
    │   ├── 2 #(split 2)
    │   │   ├── h001.yaml # Fully-supervised imagenet 
    │   │   ├── h002.yaml # MoCo V2 imagenet
    │   │   ├── h003.yaml # SimCLR imagenet
    │   │   ├── h004.yaml # SwAV imagenet
    │   │   └── h005.yaml # DINO imagenet
    └── 25 #(25 % of cholec 80 dataset)
        ├── 0 #(split 0)
        │   ├── h001.yaml # Fully-supervised imagenet
        │   ├── h002.yaml # MoCo V2 imagenet
        │   ├── h003.yaml # SimCLR imagenet
        │   ├── h004.yaml # SwAV imagenet
        │   └── h005.yaml # DINO imagenet
        ├── 1 #(split 1)
        │   ├── h001.yaml # Fully-supervised imagenet
        │   ├── h002.yaml # MoCo V2 imagenet
        │   ├── h003.yaml # SimCLR imagenet
        │   ├── h004.yaml # SwAV imagenet
        │   ├── h005.yaml # DINO imagenet
        ├── 2 #(split 2)
        │   ├── h001.yaml # Fully-supervised imagenet
        │   ├── h002.yaml # MoCo V2 imagenet
        │   ├── h003.yaml # SimCLR imagenet
        │   ├── h004.yaml # SwAV imagenet
        │   └── h005.yaml # DINO imagenet

Examples commands for TCN fine-tuning. We first extract the features for the train, val and test set and then perform the TCN fine-tuning

# Example 1, run the following command for TCN fine-tuning, initialized with MoCO V2 weights 
# on 25% of cholec80 data (split 0).
# 1) feature extraction for the train, val and test set
(selfsupsurg)>cfg=hparams/cholec80/finetuning/cholec_to_cholec/series_01/test/phase/25/0/h001.yaml
(selfsupsurg)>python main.py -hp $cfg -m  feature_extraction -s train -f Trunk
(selfsupsurg)>python main.py -hp $cfg -m  feature_extraction -s val -f Trunk
(selfsupsurg)>python main.py -hp $cfg -m  feature_extraction -s test -f Trunk                            
# 2) TCN fine-tuning        
(selfsupsurg)>cfg=hparams/cholec80/finetuning/cholec_to_cholec/series_01/test/phase_tcn/25/0/h001.yaml
(selfsupsurg)>python main_ft_phase_tcn.py -hp $cfg -t test

# Example 2, run the following command for TCN fine-tuning, initialized with SimCLR weights 
# on 12.5% of cholec80 data (split 1).
# 1) feature extraction for the train, val and test set
(selfsupsurg)>cfg=hparams/cholec80/finetuning/cholec_to_cholec/series_01/test/phase/12.5/1/h002.yaml
(selfsupsurg)>python main.py -hp $cfg -m  feature_extraction -s train -f Trunk
(selfsupsurg)>python main.py -hp $cfg -m  feature_extraction -s val -f Trunk
(selfsupsurg)>python main.py -hp $cfg -m  feature_extraction -s test -f Trunk                            
# 2) TCN fine-tuning        
(selfsupsurg)>cfg=hparams/cholec80/finetuning/cholec_to_cholec/series_01/test/phase_tcn/12.5/1/h002.yaml
(selfsupsurg)>python main_ft_phase_tcn.py -hp $cfg -t test

# Example 3, run the following command for TCN fine-tuning, initialized with imagenet MoCO v2 weights 
# on 12.5% of cholec80 data (split 2).
# 1) feature extraction for the train, val and test set
(selfsupsurg)>cfg=hparams/cholec80/finetuning/imagenet_to_cholec/series_01/test/phase/12.5/2/h002.yaml
(selfsupsurg)>python main.py -hp $cfg -m  feature_extraction -s train -f Trunk
(selfsupsurg)>python main.py -hp $cfg -m  feature_extraction -s val -f Trunk
(selfsupsurg)>python main.py -hp $cfg -m  feature_extraction -s test -f Trunk                            
# 2) TCN fine-tuning        
(selfsupsurg)>cfg=hparams/cholec80/finetuning/imagenet_to_cholec/series_01/test/phase_tcn/12.5/2/h002.yaml
(selfsupsurg)>python main_ft_phase_tcn.py -hp $cfg -t test

3. Surgical tool recognition

The config files for the surgical tool recognition experiments are in cholec80 pre-training init and imagenet init. The config files are organized as follows:

config_files

# config files for the proposed pre-training init from cholec80 are oraganized as follows:
├── cholec_to_cholec/series_01/test/tools
    ├── 100 #(100 % of cholec 80)
    │   └── 0 #(split 0)
    │       ├── h001.yaml # MoCo V2 Surg
    │       ├── h002.yaml # SimCLR Surg
    │       ├── h003.yaml # SwAV Surg
    │       └── h004.yaml # DINO Surg
    ├── 12.5 #(12.5 % of cholec 80 dataset)
    │   ├── 0 #(split 0)
    │   │   ├── h001.yaml # MoCo V2 Surg
    │   │   ├── h002.yaml # SimCLR Surg
    │   │   ├── h003.yaml # SwAV Surg
    │   │   └── h004.yaml # DINO Surg
    │   ├── 1 #(split 1)
    │   │   ├── h001.yaml # MoCo V2 Surg
    │   │   ├── h002.yaml # SimCLR Surg
    │   │   ├── h003.yaml # SwAV Surg
    │   │   └── h004.yaml # DINO Surg
    │   ├── 2 #(split 2)
    │   │   ├── h001.yaml # MoCo V2 Surg
    │   │   ├── h002.yaml # SimCLR Surg
    │   │   ├── h003.yaml # SwAV Surg
    │   │   └── h004.yaml # DINO Surg
    └── 25 #(25 % of cholec 80 dataset)
        ├── 0 #(split 0)
        │   ├── h001.yaml # MoCo V2 Surg
        │   ├── h002.yaml # SimCLR Surg
        │   ├── h003.yaml # SwAV Surg
        │   └── h004.yaml # DINO Surg
        ├── 1 #(split 1)
        │   ├── h001.yaml # MoCo V2 Surg
        │   ├── h002.yaml # SimCLR Surg
        │   ├── h003.yaml # SwAV Surg
        │   └── h004.yaml # DINO Surg
        ├── 2 #(split 2)
        │   ├── h001.yaml # MoCo V2 Surg
        │   ├── h002.yaml # SimCLR Surg
        │   ├── h003.yaml # SwAV Surg
        │   └── h004.yaml # DINO Surg
# config files for the baselines imagenet to cholec80 are oraganized as follows:
├── imagenet_to_cholec/series_01/test/tools
    ├── 100 #(100 % of cholec 80)
    │   └── 0 #(split 0)
    │       ├── h001.yaml # Fully-supervised imagenet
    │       ├── h002.yaml # MoCo V2 imagenet
    │       ├── h003.yaml # SimCLR imagenet
    │       ├── h004.yaml # SwAV imagenet
    │       └── h005.yaml # DINO imagenet
    ├── 12.5 #(12.5 % of cholec 80 dataset)
    │   ├── 0 #(split 0)
    │   │   ├── h001.yaml # Fully-supervised imagenet
    │   │   ├── h002.yaml # MoCo V2 imagenet
    │   │   ├── h003.yaml # SimCLR  imagenet
    │   │   ├── h004.yaml # SwAV  imagenet
    │   │   └── h005.yaml # DINO imagenet
    │   ├── 1 #(split 1)
    │   │   ├── h001.yaml # Fully-supervised imagenet 
    │   │   ├── h002.yaml # MoCo V2 imagenet
    │   │   ├── h003.yaml # SimCLR imagenet
    │   │   ├── h004.yaml # SwAV imagenet
    │   │   └── h005.yaml # DINO imagenet
    │   ├── 2 #(split 2)
    │   │   ├── h001.yaml # Fully-supervised imagenet 
    │   │   ├── h002.yaml # MoCo V2 imagenet
    │   │   ├── h003.yaml # SimCLR imagenet
    │   │   ├── h004.yaml # SwAV imagenet
    │   │   └── h005.yaml # DINO imagenet
    └── 25 #(25 % of cholec 80 dataset)
        ├── 0 #(split 0)
        │   ├── h001.yaml # Fully-supervised imagenet
        │   ├── h002.yaml # MoCo V2 imagenet
        │   ├── h003.yaml # SimCLR imagenet
        │   ├── h004.yaml # SwAV imagenet
        │   └── h005.yaml # DINO imagenet
        ├── 1 #(split 1)
        │   ├── h001.yaml # Fully-supervised imagenet
        │   ├── h002.yaml # MoCo V2 imagenet
        │   ├── h003.yaml # SimCLR imagenet
        │   ├── h004.yaml # SwAV imagenet
        │   ├── h005.yaml # DINO imagenet
        ├── 2 #(split 2)
        │   ├── h001.yaml # Fully-supervised imagenet
        │   ├── h002.yaml # MoCo V2 imagenet
        │   ├── h003.yaml # SimCLR imagenet
        │   ├── h004.yaml # SwAV imagenet
        │   └── h005.yaml # DINO imagenet

Examples commands for surgical tool recognition linear fine-tuning. It uses 4 GPUS for the training

# Example 1, run the following command for linear fine-tuning, initialized with MoCO V2 weights 
# on 25% of cholec80 data (split 0).
(selfsupsurg)>cfg=hparams/cholec80/finetuning/cholec_to_cholec/series_01/test/tools/25/0/h001.yaml
(selfsupsurg)>python main.py -hp $cfg -m supervised

# Example 2, run the following command for linear fine-tuning, initialized with SimCLR weights 
# on 12.5% of cholec80 data (split 1).
(selfsupsurg)>cfg=hparams/cholec80/finetuning/cholec_to_cholec/series_01/test/tools/12.5/1/h002.yaml
(selfsupsurg)>python main.py -hp $cfg -m supervised

# Example 3, run the following command for linear fine-tuning, initialized with 
# imagenet MoCO v2 weights on 12.5% of cholec80 data (split 2).
(selfsupsurg)>cfg=hparams/cholec80/finetuning/imagenet_to_cholec/series_01/test/tools/12.5/2/h002.yaml
(selfsupsurg)>python main.py -hp $cfg -m supervised

4. Evaluation

Example command to evaluate all the experiments and collect the results

# computes evaluation metrics for all the experiments and saves results in the runs/metrics_<phase/tool>.csv
(selfsupsurg)>python utils/generate_test_results.py

Citation

@article{ramesh2023dissecting,
  title={Dissecting self-supervised learning methods for surgical computer vision},
  author={Ramesh, Sanat and Srivastav, Vinkle and Alapatt, Deepak and Yu, Tong and Murali, Aditya and Sestini, Luca and Nwoye, Chinedu Innocent and Hamoud, Idris and Sharma, Saurav and Fleurentin, Antoine and others},
  journal={Medical Image Analysis},
  pages={102844},
  year={2023},
  publisher={Elsevier}
}

References

The project uses VISSL. We thank the authors of VISSL for releasing the library. If you use VISSL, consider citing it using the following BibTeX entry.

@misc{goyal2021vissl,
  author =       {Priya Goyal and Quentin Duval and Jeremy Reizenstein and Matthew Leavitt and Min Xu and
                  Benjamin Lefaudeux and Mannat Singh and Vinicius Reis and Mathilde Caron and Piotr Bojanowski and
                  Armand Joulin and Ishan Misra},
  title =        {VISSL},
  howpublished = {\url{https://github.com/facebookresearch/vissl}},
  year =         {2021}
}

The project also leverages following research works. We thank the authors for releasing their codes.

TeCNO

License

This code, models, and datasets are available for non-commercial scientific research purposes as defined in the CC BY-NC-SA 4.0. By downloading and using this code you agree to the terms in the LICENSE. Third-party codes are subject to their respective licenses.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Dissecting Self-Supervised Learning Methods for Surgical Computer Vision

News

Introduction

Main takeaways from the paper

In this repo we provide:

Get Started

Datasets and imagenet checkpoints

Installation

Modify `$SelfSupSurg/ext_libs/vissl/configs/config/dataset_catalog.json` by appending the following key/value pair to the end of the dictionary

Pre-training

Model Weights for the pre-training experiments

Downstream finetuning

1. Surgical phase recognition (Linear Finetuning)

2. Surgical phase recognition (TCN Finetuning)

3. Surgical tool recognition

4. Evaluation

Citation

References

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Dissecting Self-Supervised Learning Methods for Surgical Computer Vision

News

Introduction

Main takeaways from the paper

In this repo we provide:

Get Started

Datasets and imagenet checkpoints

Installation

Modify $SelfSupSurg/ext_libs/vissl/configs/config/dataset_catalog.json by appending the following key/value pair to the end of the dictionary

Pre-training

Model Weights for the pre-training experiments

Downstream finetuning

1. Surgical phase recognition (Linear Finetuning)

2. Surgical phase recognition (TCN Finetuning)

3. Surgical tool recognition

4. Evaluation

Citation

References

License

Modify `$SelfSupSurg/ext_libs/vissl/configs/config/dataset_catalog.json` by appending the following key/value pair to the end of the dictionary