Official PyTorch implementation of ConvNeXt, from the following paper:
A ConvNet for the 2020s. CVPR 2022.
Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell and Saining Xie
Facebook AI Research, UC Berkeley
[arXiv
][video
]
We propose ConvNeXt, a pure ConvNet model constructed entirely from standard ConvNet modules. ConvNeXt is accurate, efficient, scalable and very simple in design.
- ImageNet-1K Training Code
- ImageNet-22K Pre-training Code
- ImageNet-1K Fine-tuning Code
- Downstream Transfer (Detection, Segmentation) Code
- Image Classification [Colab] and Web Demo
- Fine-tune on CIFAR with Weights & Biases logging [Colab]
name | resolution | acc@1 | #params | FLOPs | model |
---|---|---|---|---|---|
ConvNeXt-T | 224x224 | 82.1 | 28M | 4.5G | model |
ConvNeXt-S | 224x224 | 83.1 | 50M | 8.7G | model |
ConvNeXt-B | 224x224 | 83.8 | 89M | 15.4G | model |
ConvNeXt-B | 384x384 | 85.1 | 89M | 45.0G | model |
ConvNeXt-L | 224x224 | 84.3 | 198M | 34.4G | model |
ConvNeXt-L | 384x384 | 85.5 | 198M | 101.0G | model |
name | resolution | acc@1 | #params | FLOPs | 22k model | 1k model |
---|---|---|---|---|---|---|
ConvNeXt-T | 224x224 | 82.9 | 29M | 4.5G | model | model |
ConvNeXt-T | 384x384 | 84.1 | 29M | 13.1G | - | model |
ConvNeXt-S | 224x224 | 84.6 | 50M | 8.7G | model | model |
ConvNeXt-S | 384x384 | 85.8 | 50M | 25.5G | - | model |
ConvNeXt-B | 224x224 | 85.8 | 89M | 15.4G | model | model |
ConvNeXt-B | 384x384 | 86.8 | 89M | 47.0G | - | model |
ConvNeXt-L | 224x224 | 86.6 | 198M | 34.4G | model | model |
ConvNeXt-L | 384x384 | 87.5 | 198M | 101.0G | - | model |
ConvNeXt-XL | 224x224 | 87.0 | 350M | 60.9G | model | model |
ConvNeXt-XL | 384x384 | 87.8 | 350M | 179.0G | - | model |
name | resolution | acc@1 | #params | FLOPs | model |
---|---|---|---|---|---|
ConvNeXt-S | 224x224 | 78.7 | 22M | 4.3G | model |
ConvNeXt-B | 224x224 | 82.0 | 87M | 16.9G | model |
ConvNeXt-L | 224x224 | 82.6 | 306M | 59.7G | model |
Please check INSTALL.md for installation instructions.
We give an example evaluation command for a ImageNet-22K pre-trained, then ImageNet-1K fine-tuned ConvNeXt-B:
Single-GPU
python main.py --model convnext_base --eval true \
--resume https://dl.fbaipublicfiles.com/convnext/convnext_base_22k_1k_224.pth \
--input_size 224 --drop_path 0.2 \
--data_path /path/to/imagenet-1k
Multi-GPU
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model convnext_base --eval true \
--resume https://dl.fbaipublicfiles.com/convnext/convnext_base_22k_1k_224.pth \
--input_size 224 --drop_path 0.2 \
--data_path /path/to/imagenet-1k
This should give
* Acc@1 85.820 Acc@5 97.868 loss 0.563
- For evaluating other model variants, change
--model
,--resume
,--input_size
accordingly. You can get the url to pre-trained models from the tables above. - Setting model-specific
--drop_path
is not strictly required in evaluation, as theDropPath
module in timm behaves the same during evaluation; but it is required in training. See TRAINING.md or our paper for the values used for different models.
See TRAINING.md for training and fine-tuning instructions.
This repository is built using the timm library, DeiT and BEiT repositories.
This project is released under the MIT license. Please see the LICENSE file for more information.
If you find this repository helpful, please consider citing:
@Article{liu2022convnet,
author = {Zhuang Liu and Hanzi Mao and Chao-Yuan Wu and Christoph Feichtenhofer and Trevor Darrell and Saining Xie},
title = {A ConvNet for the 2020s},
journal = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2022},
}
-
- 전체 code 실행
- get_args_parser( )의 parameter 수정
--data_path : Train data 위치 --> list 형태로 입력 --eval_data_path : Validation data 위치 --> string 형태로 입력 --nb_classes : class 수 지정 --> amb class를 포함하는 경우 4로 지정 --use_softlabel : soft label을 사용할 경우 True 설정 --soft_label_ratio : soft label의 target ratio(amb_pos : amb_neg) --label_ratio : pos,neg target ratio # Image Crop & Padding --> data set image load와 관련된 항목
-
- train, validation, prediction(check result)과 관련된 코드
- prediction(evaluation) 모듈(새로 추가)
- prediction 모듈은 train 결과를 불러와 evaluation
- args.pred가 True로 설정되면 prediction진입
- 결과 image구분과 graph 결과 생성
-
preprocess_data.py & datasets.py
- dataset 생성과 image crop 및 전처리 관련 코드
- 두 파일내에 있는 function 및 class를 활용
- split_data
- 데이터 셋을 train, val, test으로 구분하는 모듈
- 기존에 split파일이 있으면 새로 목록을 구성하지 않음
- tset, val 비율에 따라 파일명 생성
- make_list
- dataloader를 통해서 불러올 raw데이터 list생성
-
- 다양한 loss를 적용하기 위한 code
- args.np_class 가 2 혹은 4일 경우 target을 수정
- args.use_sortlabel 이 True일 경우 4class -> 2class로 변경
-
- log에 저장되어 있는 tensorboard 저장파일 파싱
- csv로 결과 저장
- path에 log파일이 저장되어 있는 폴더 이름 수정 후 실행
- 추가로 필요한 부분이 있는 경우 col, val 부분을 수정
- csv 폴더에 csv파일 생성
-
- 루틴한 실험을 진행하기 위한 코드
- base : 변동이 없는 parameter 설정
- 변동이 있는 parameter는 list형태로 지정하고 for문으로 실행