Skip to content

Latest commit

 

History

History
274 lines (199 loc) · 30.8 KB

MODEL_ZOO.md

File metadata and controls

274 lines (199 loc) · 30.8 KB

VISSL Model Zoo and Benchmarks

VISSL provides reference implementation of a large number of self-supervision approaches and also a suite of benchmark tasks to quickly evaluate the representation quality of models trained with these self-supervised tasks using standard evaluation setup. In this document, we list the collection of self-supervised models and benchmark of these models on a standard task of evaluating a linear classifier on ImageNet-1K. All the models can be downloaded from the provided links.

Table of Contents

Torchvision and VISSL

VISSL is 100% compatible with TorchVision ResNet models. It's easy to use torchvision models in VISSL and to use VISSL models in torchvision.

Converting VISSL to Torchvision

All the ResNe(X)t models in VISSL can be converted to Torchvision weights. This involves simply removing the _features_blocks. prefix from all the weights. VISSL provides a convenience script for this:

python extra_scripts/convert_vissl_to_torchvision.py \
    --model_url_or_file <input_model>.pth  \
    --output_dir /path/to/output/dir/ \
    --output_name <my_converted_model>.torch

Converting Torchvision to VISSL

All the ResNe(X)t models in Torchvision can be directly loaded in VISSL. This involves simply setting the REMOVE_PREFIX, APPEND_PREFIX options in the config file following the instructions here. Also, see the example below for how torchvision models are loaded.

Models

VISSL is 100% compatible with TorchVision ResNet models. You can benchmark these models using VISSL's benchmark suite. See the docs for how to run various benchmarks.

Supervised

To reproduce the numbers below, the experiment configuration is provided in json format for each model here.

Method Model PreTrain dataset ImageNet top-1 acc. URL
Supervised RN50 - Torchvision ImageNet 76.1 model
Supervised RN101 - Torchvision ImageNet 77.21 model
Supervised RN50 - Caffe2 ImageNet 75.88 model
Supervised RN50 - Caffe2 Places205 58.49 model
Supervised Alexnet BVLC - Caffe2 ImageNet 49.54 model
Supervised RN50 - VISSL - 105 epochs ImageNet 75.45 model
Supervised ViT/B16 - 90 epochs (*) ImageNet-22K 83.38 model
Supervised RegNetY-64Gf - BGR input ImageNet 80.55 model
Supervised RegNetY-128Gf - BGR input ImageNet 80.57 model

(*) This specific checkpoint for ViT/B16 requires the following options to be added in command line to be loaded by VISSL: config.MODEL.WEIGHTS_INIT.APPEND_PREFIX=trunk.base_model. config.MODEL.WEIGHTS_INIT.STATE_DICT_KEY_NAME=classy_state_dict

Semi-weakly and Semi-supervised

To reproduce the numbers below, the experiment configuration is provided in json format for each model here.

Method Model PreTrain dataset ImageNet top-1 acc. URL
Semi-supervised RN50 YFCC100M - ImageNet 79.2 model
Semi-weakly supervised RN50 Public Instagram Images - ImageNet 81.06 model

Jigsaw

To reproduce the numbers below, the experiment configuration is provided in json format for each model here.

Method Model PreTrain dataset ImageNet top-1 acc. URL
Jigsaw RN50 - 100 permutations ImageNet-1K 48.57 model
Jigsaw RN50 - 2K permutations ImageNet-1K 46.73 model
Jigsaw RN50 - 10K permutations ImageNet-1K 48.11 model
Jigsaw RN50 - 2K permutations ImageNet-22K 44.84 model
Jigsaw RN50 - Goyal'19 ImageNet-1K 46.58 model
Jigsaw RN50 - Goyal'19 ImageNet-22K 53.09 model
Jigsaw RN50 - Goyal'19 YFCC100M 51.37 model
Jigsaw AlexNet - Goyal'19 ImageNet-1K 34.82 model
Jigsaw AlexNet - Goyal'19 ImageNet-22K 37.5 model
Jigsaw AlexNet - Goyal'19 YFCC100M 37.01 model

Colorization

To reproduce the numbers below, the experiment configuration is provided in json format for each model here.

Method Model PreTrain dataset ImageNet top-1 acc. URL
Colorization RN50 - Goyal'19 ImageNet-1K 40.11 model
Colorization RN50 - Goyal'19 ImageNet-22K 49.24 model
Colorization RN50 - Goyal'19 YFCC100M 47.46 model
Colorization AlexNet - Goyal'19 ImageNet-1K 30.39 model
Colorization AlexNet - Goyal'19 ImageNet-22K 36.83 model
Colorization AlexNet - Goyal'19 YFCC100M 34.19 model

RotNet

To reproduce the numbers below, the experiment configuration is provided in json format for each model here.

Method Model PreTrain dataset ImageNet top-1 acc. URL
RotNet AlexNet official ImageNet-1K 39.51 model
RotNet RN50 - 105 epochs ImageNet-1K 48.2 model
RotNet RN50 - 105 epochs ImageNet-22K 54.89 model

DeepCluster

To reproduce the numbers below, the experiment configuration is provided in json format for each model here.

Method Model PreTrain dataset ImageNet top-1 acc. URL
DeepCluster AlexNet official ImageNet-1K 37.88 model

ClusterFit

To reproduce the numbers below, the experiment configuration is provided in json format for each model here.

Method Model PreTrain dataset ImageNet top-1 acc. URL
ClusterFit RN50 - 105 epochs - 16K clusters from RotNet ImageNet-1K 53.63 model

NPID

To reproduce the numbers below, the experiment configuration is provided in json format for each model here.

Method Model PreTrain dataset ImageNet top-1 acc. URL
NPID RN50 official oldies ImageNet-1K 54.99 model
NPID RN50 - 4k negatives - 200 epochs - VISSL ImageNet-1K 52.73 model

NPID++

To reproduce the numbers below, the experiment configuration is provided in json format for each model here.

Method Model PreTrain dataset ImageNet top-1 acc. URL
NPID++ RN50 - 32k negatives - 800 epochs - cosine LR ImageNet-1K 56.68 model
NPID++ RN50-w2 - 32k negatives - 800 epochs - cosine LR ImageNet-1K 62.73 model

PIRL

To reproduce the numbers below, the experiment configuration is provided in json format for each model here.

Method Model PreTrain dataset ImageNet top-1 acc. URL
PIRL RN50 - 200 epochs ImageNet-1K 62.55 model
PIRL RN50 - 800 epochs ImageNet-1K 64.29 model

NOTE: Please see projects/PIRL/README.md for more PIRL models provided by authors.

SimCLR

To reproduce the numbers below, the experiment configuration is provided in json format for each model here.

Method Model PreTrain dataset ImageNet top-1 acc. URL
SimCLR RN50 - 100 epochs ImageNet-1K 64.4 model
SimCLR RN50 - 200 epochs ImageNet-1K 66.61 model
SimCLR RN50 - 400 epochs ImageNet-1K 67.71 model
SimCLR RN50 - 800 epochs ImageNet-1K 69.68 model
SimCLR RN50 - 1000 epochs ImageNet-1K 68.8 model
SimCLR RN50-w2 - 100 epochs ImageNet-1K 69.82 model
SimCLR RN50-w2 - 1000 epochs ImageNet-1K 73.84 model
SimCLR RN50-w4 - 1000 epochs ImageNet-1K 71.61 model
SimCLR RN101 - 100 epochs ImageNet-1K 62.76 model
SimCLR RN101 - 1000 epochs ImageNet-1K 71.56 model

SimCLRv2

The following models are converted from the TensorFlow format of the official repository to VISSL compatible format.

Method Model PreTrain dataset ImageNet top-1 acc. URL
SimCLRv2 RN152-w3-sk SimCLRv2 repository ImageNet-1K 80.0 model

BYOL

The following models are converted from the TensorFlow format of the official repository to VISSL compatible format.

Method Model PreTrain dataset ImageNet top-1 acc. URL
BYOL RN200-w2 BYOL repository (*) ImageNet-1K 78.34 model

(*) This specific checkpoint requires the following command line options to be provided to VISSL to be correctly loaded by VISSL: config.MODEL.WEIGHTS_INIT.APPEND_PREFIX=trunk.base_model._feature_blocks. config.MODEL.WEIGHTS_INIT.STATE_DICT_KEY_NAME=''

DeepClusterV2

To reproduce the numbers below, the experiment configuration is provided in json format for each model here.

Method Model PreTrain dataset ImageNet top-1 acc. URL
DeepClusterV2 RN50 - 400 epochs - 2x224 ImageNet-1K 70.01 model
DeepClusterV2 RN50 - 400 epochs - 2x160+4x96 ImageNet-1K 74.32 model
DeepClusterV2 RN50 - 800 epochs - 2x224+6x96 ImageNet-1K 75.18 model

SwAV

To reproduce the numbers below, the experiment configuration is provided in json format for each model here.

There is some standard deviation in linear results if we run the same eval several times and pre-train a SwAV model several times. The evals reported below are for 1 run.

Method Model PreTrain dataset ImageNet top-1 linear acc. URL
SwAV RN50 - 100 epochs - 2x224+6x96 - 4096 batch-size ImageNet-1K 71.99 model
SwAV RN50 - 200 epochs - 2x224+6x96 - 4096 batch-size ImageNet-1K 73.85 model
SwAV RN50 - 400 epochs - 2x224+6x96 - 4096 batch-size ImageNet-1K 74.81 model
SwAV RN50 - 800 epochs - 2x224+6x96 - 4096 batch-size ImageNet-1K 74.92 model
SwAV RN50 - 200 epochs - 2x224+6x96 - 256 batch-size ImageNet-1K 73.07 model
SwAV RN50 - 400 epochs - 2x224+6x96 - 256 batch-size ImageNet-1K 74.3 model
SwAV RN50 - 400 epochs - 2x224 - 4096 batch-size ImageNet-1K 69.53 model
SwAV RN50-w2 - 400 epochs - 2x224+6x96 - 4096 batch-size ImageNet-1K 77.01 model
SwAV RN50-w4 - 400 epochs - 2x224+6x96 - 2560 batch-size ImageNet-1K 77.03 model
SwAV RN50-w5 - 300 epochs - 2x224+6x96 - 2560 batch-size (*) ImageNet-1K 78.5 model
SwAV RegNetY-16Gf - 800 epochs - 2x224+6x96 - 4096 batch-size ImageNet-1K 76.15 model
SwAV RegNetY-128Gf - 400 epochs - 2x224+6x96 - 4096 batch-size ImageNet-1K 78.36 model

NOTE: Please see projects/SwAV/README.md for more SwAV models provided by authors.

(*) This specific RN50-w5 checkpoint requires the following options to be added to be loaded by VISSL: config.MODEL.WEIGHTS_INIT.APPEND_PREFIX=trunk.base_model._feature_blocks. config.MODEL.WEIGHTS_INIT.STATE_DICT_KEY_NAME='' config.MODEL.WEIGHTS_INIT.REMOVE_PREFIX=module.

SEER

Method Model PreTrain dataset ImageNet top-1 linear acc. ImageNet top-1 fine-tuned acc. URL
SEER RegNetY-32Gf IG-1B public images, non EU 74.03 (res5) 83.4 model
SEER RegNetY-64Gf IG-1B public images, non EU 75.25 (res5avg) 84.0 model
SEER RegNetY-128Gf IG-1B public images, non EU 75.96 (res5avg) 84.5 model
SEER RegNetY-256Gf IG-1B public images, non EU 77.51 (res5avg) 85.2 model
SEER RegNet10B IG-1B public images, non EU 79.8 (res4) 85.8 model

NOTE: Please see projects/SEER/README.md for more SwAV models provided by authors.

MoCoV2

Method Model PreTrain dataset ImageNet top-1 acc. URL
MoCo-v2 RN50 - 200 epochs - 256 batch-size ImageNet-1K 66.4 model

MoCoV3

Method Model PreTrain dataset ImageNet top-1 acc. URL
MoCo-v3 ViT-B/16 - 300 epochs ImageNet-1K 75.79 model

BarlowTwins

Method Model PreTrain dataset ImageNet top-1 acc. URL
Barlow Twins RN50 - 300 epochs - 2048 batch-size ImageNet-1K 70.75 model
Barlow Twins RN50 - 1000 epochs - 2048 batch-size ImageNet-1K 71.80 model

DINO

The ViT-small model is obtained with this config.

Method Model PreTrain dataset ImageNet k-NN acc. URL
DINO ViT-S/16 - 300 epochs - 1024 batch-size ImageNet-1K 73.4 model
DINO XCiT-S/16 - 300 epochs - 1024 batch-size ImageNet-1K 74.8 model