Skip to content

Latest commit

 

History

History
101 lines (77 loc) · 4.16 KB

README.md

File metadata and controls

101 lines (77 loc) · 4.16 KB

Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer

Our classification code is developed on top of PVT.

For details please see Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer.

If you use this code for a paper please cite:

@inproceedings{zeng2022not,
  title={Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer},
  author={Zeng, Wang and Jin, Sheng and Liu, Wentao and Qian, Chen and Luo, Ping and Ouyang, Wanli and Wang, Xiaogang},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={11101--11111},
  year={2022}
}

Usage

First, clone the repository locally:

git clone https://github.com/zengwang430521/TCFormer.git

Then, install PyTorch 1.6.0+ and torchvision 0.7.0+ and pytorch-image-models 0.3.2:

conda install -c pytorch pytorch torchvision
pip install timm==0.3.2

Then install mmcv:

pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu101/torch1.6.0/index.html

Data preparation

Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is the standard layout for the torchvision datasets.ImageFolder, and the training and validation data is expected to be in the train/ folder and val folder respectively:

/path/to/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class/2
      img4.jpeg

Model Zoo

  • TCFormer on ImageNet-1K
Method Size Acc@1 #Params (M) Config Checkpoint log
TCFormer-light 224 79.4 14.2M config 57M [Google] [Google]
TCFormer 224 82.3 25.6M config 103M [Google] [Google]
TCFormer-large 224 83.6 62.8M config 103M [Google] [Google]

Evaluation

To evaluate a pre-trained PVT-Small on ImageNet val with a single GPU run:

sh dist_train.sh configs/tcformer/tcformer.py 1 --data-path /path/to/imagenet --resume /path/to/checkpoint_file --eval

This should give

* Acc@1 82.346 Acc@5 95.982 loss 0.798
Accuracy of the network on the 50000 test images: 82.3%

Training

To train TCFormer on ImageNet on a single node with 8 gpus for 300 epochs run:

sh dist_train.sh configs/tcformer/tcformer.py 8 --data-path /path/to/imagenet

If you can train on a cluster managed with slurm, you can use the script slurm_train.sh.

./slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR}

Here is an example of using 16 GPUs to train TCFormer on the dev partition in a slurm cluster. (Use GPUS_PER_NODE=8 to specify a single slurm cluster node with 8 GPUs, CPUS_PER_TASK=2 to use 2 cpus per task. Assume that Test is a valid ${PARTITION} name.)

GPUS=16 GPUS_PER_NODE=8 CPUS_PER_TASK=2 ./slurm_train.sh Test tcformer configs/tcformer/tcformer.py work_dirs/tcformer 

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.