Improving DINOv2’s spatial representations in 19 GPU hours with Patch Neighbor Consistency

University of Amsterdam

Valentinos Pariza*, Mohammadreza Salehi*, Gertjan Burghouts, Francesco Locatello, Yuki M. Asano

[Paper]

Introduction

NeCo introduces a new self-supervised learning technique for enhancing spatial representations in vision transformers. By leveraging Patch Neighbor Consistency, NeCo captures fine-grained details and structural information that are crucial for various downstream tasks, such as semantic segmentation.

Key features of NeCo include:

Patch-based neighborhood consistency
Improved dense prediction capabilities
Efficient training requiring only 19 GPU hours
Compatibility with existing vision transformer backbone

Below is a table with some of our results on Pascal VOC 2012 based on DINOv2 backbone.

backbone	arch	params	Overclustering k=500	Dense NN Retrieval	linear	download
DINOv2	ViT-S/14	21M	72.6	81.3	78.9	student	teacher
DINOv2	ViT-B/14	85M	71.8	83.3	81.4	student	teacher
DINO	ViT-S/16	22M	47.9	61.3	65.8	student	teacher
TimeT	ViT-S/16	22M	53.1	66.5	68.5	student	teacher
Leopart	ViT-S/16	22M	55.3	66.2	68.3	student	teacher

In the following sections, we will delve into the training process, evaluation metrics, and provide instructions for using NeCo in your own projects.

GPU Requirements

Optimizing with our model, NeCo, does not necessitate a significant GPU budget. Our training process is conducted on a single NVIDIA A100 GPU.

Environment Setup

We use conda for dependency management. Please use environment.yml to install the environment necessary to run everything from our work. You can install it by running the following command:

conda env create -f environment.yaml

Pythonpath

Export the module to PYTHONPATH within the repository's parent directory. export PYTHONPATH="${PYTHONPATH}:PATH_TO_REPO"

Neptune

We use neptune for logging experiments. Get you API token for neptune and insert it in the corresponding run-files. Also make sure to adapt the project name when setting up the logger.

Loading pretrained models

To use NeCo embeddings on downstream dense prediction tasks, you just need to install timm and torch and run:

import torch
path_to_checkpoint = "<your path to downloaded ckpt>"
model =  torch.hub.load('facebookresearch/dinov2', 'dinov2_vits14')
state_dict = torch.load(path_to_checkpoint)
model.load_state_dict(state_dict, strict=False)

Training Setup

Repository Structure

src/: Model, method, and transform definitions
experiments/: Scripts for setting up and running experiments
data/: Data modules for ImageNet, COCO, Pascal VOC, and ADE20k

Training with NeCo

Use configs in experiments/configs/ to reproduce our experiments
Modify paths in config files to match your dataset and checkpoint directories
For new datasets:
1. Change the data path in the config
2. Add a new data module
3. Initialize the new data module in experiments/train_with_neco.py

For instance, to start a training on COCO:

python experiments/train_with_neco.py --config_path experiments/configs/neco_224x224.yml

Evaluation

We provide several evaluation scripts for different tasks. For detailed instructions and examples, please refer to the Evaluation README. Here's a summary of the evaluation methods:

Linear Segmentation:
- Use linear_finetune.py for fine-tuning.
- Use eval_linear.py for evaluating on the validation dataset.
Overclustering:
- Use eval_overcluster.py to evaluate overclustering performance.
Cluster Based Foreground Extraction + Community Detection (CBFE+CD):
- Requires downloading noisy attention train and val masks.
- Provides examples for both ViT-Small and ViT-Base models.

Each evaluation method has specific configuration files and command-line arguments. The Evaluation README provides detailed examples and instructions for running these evaluations on different datasets and model architectures.

Datasets

We use PyTorch Lightning data modules for our datasets. Supported datasets include ImageNet100k, COCO, Pascal VOC, and ADE20k. Each dataset requires a specific folder structure for proper functioning.

Data modules are located in the data/ directory and handle loading, preprocessing, and augmentation. When using these datasets, ensure you update the paths in your configuration files to match your local setup.

For detailed information on dataset preparation, download instructions, and specific folder structures, please refer to the Dataset README.

Visualizations

We provide visualizations to help understand the performance of our method. Below is an example of Cluster-Based Foreground Extraction (CBFE) results on the Pascal VOC dataset:

This visualization shows the ability of NeCo without relying on any supervision. Different objects are represented by distinct colors, and the method captures tight and precise object boundaries.

Citations

If you find this repository useful, please consider giving a star ⭐ and citation 📣:


@article{pariza2024neco,
  title={NeCo: Improving DINOv2's spatial representations in 19 GPU hours with Patch Neighbor Consistency},
  author={Pariza, Valentinos and Salehi, Mohammadreza and Burghouts, Gertjan and Locatello, Francesco and Asano, Yuki M},
  journal={arXiv preprint arXiv:2408.11054},
  year={2024}
}

Note: Our repository is developed by adopting and adapting multiple parts of the Leopart model, as well as parts from other works like DINOv2, DINO, R-CNN, ...

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
Images		Images
data		data
embeddings		embeddings
embeddings_base		embeddings_base
experiments		experiments
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dataset_README.md		dataset_README.md
environment.yml		environment.yml
evaluation_README.md		evaluation_README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Improving DINOv2’s spatial representations in 19 GPU hours with Patch Neighbor Consistency

Table of Contents

Introduction

GPU Requirements

Environment Setup

Pythonpath

Neptune

Loading pretrained models

Training Setup

Repository Structure

Training with NeCo

Evaluation

Datasets

Visualizations

Citations

Note: Our repository is developed by adopting and adapting multiple parts of the Leopart model, as well as parts from other works like DINOv2, DINO, R-CNN, ...

About

Releases

Packages

Contributors 2

Languages

License

vpariza/NeCo

Folders and files

Latest commit

History

Repository files navigation

Improving DINOv2’s spatial representations in 19 GPU hours with Patch Neighbor Consistency

Table of Contents

Introduction

GPU Requirements

Environment Setup

Pythonpath

Neptune

Loading pretrained models

Training Setup

Repository Structure

Training with NeCo

Evaluation

Datasets

Visualizations

Citations

Note: Our repository is developed by adopting and adapting multiple parts of the Leopart model, as well as parts from other works like DINOv2, DINO, R-CNN, ...

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages