Paper | Openreview | Blog | Project Page
This repository contains the code for the paper Asynchronous Perception Machine For Efficient Test Time Training by Rajat Modi And Yogesh Singh Rawat
Our proposed Asynchronous Perception Machine represents a new way to do machine perception: i.e. asynchronous perception. This involves processing patches of an image one at a time in any order, and still encode semantic awareness in the network. This helps us in moving towards architectures which consume less flops and occupy less on-device-memory, and predict almost same features that a transformer predicts. This also allows us to achieve strong performance on test-time-training benchmarks.
This is the public official release of our model and coco checkpoints, and we urge people across the world to try some more of GLOM's ideas. We will add more code here as we make progress.
-
Install conda
-
Install Pytorch We used version 1.13.0, an A6000 gpu on Ubuntu 22.04 However, our codebase is pretty simple, and should remain robust to library changes in future, since it contains minimal dependencies.
-
Run the download script to download the checkpoints and the coco dataset (validation set):
bash download.sh
- Visualize semantic clusterings on the coco val set. Note that the model was trained on coco train set
python visualize_coco.py
- Visualize islands of agreement on any image in the wild.
python predict_test_image.py
- Interpolate between any two images in the wild. A similar result was shown in the GAN paper, and diffusion too. We can do such interpolation in the MLP now.
python interpolate.py
- One sample learning, which is used in test-time-training. This illustrates the ability of the APM to learn on a single CLS token distilled from a teacher, for eg, CLIP.
In practice, we observed that a higher-parameterized teacher leads to higher-performance.
cd single_token_segmentation
python train_tta.py
Please follow the installation setting of the original clip repo to run this particular part of the code. You can find those installation instruction here.
- Computational Analysis
cd flop_analysis
python count_flops.py
python count_memory.py
python count_parameters.py
This should yield the same numbers as the computational analysis table i.e. Table 4 in the APM paper.
- Scaling Up experiments on COCO dataset
cd misc_scripts
python resize_coco_images.py
python 1_extract_coco_features.py
python train.py
Here we share the training code on the COCO dataset. Basically, we first dump features from Dinov2 backbone on coco-train set. You may need to download coco-train set and save it in the data/ directory for training. Otherwise, you can finetune on the checkpoints we have shared.
We illustrate that the idea of islands of agreement in the GLOM paper actually works. The below video has been shared with permission from Geoffrey Hinton.
To plot similar islands for any image in the wild, please follow the proper steps here
When using this code, please cite our paper:
@article{modi2024apm,
title={Asynchronous Perception Machine For Efficient Test-Time-Training},
author={Modi, Rajat and Rawat, Yogesh},
journal={Advances in Neural Information Processing Systems (NeurIPS)},
year={2024}
}
For questions and suggestions, feel free to open an issue on GitHub or send an email to [email protected]. I will get onto it as soon as possible.
This achievement reflects the collective effort of many brilliant minds, and we are deeply grateful for their contributions.