Table of Contents
- EVA-02: MIM Pre-training and Image Classification
model name | #params | MIM pt dataset | MIM pt epochs | log | weight |
---|---|---|---|---|---|
eva02_Ti_pt_in21k_p14 |
6M | IN-21K | 240 | link | 🤗 HF link |
eva02_S_pt_in21k_p14 |
22M | IN-21K | 240 | link | 🤗 HF link |
eva02_B_pt_in21k_p14 |
86M | IN-21K | 150 | link | 🤗 HF link |
eva02_B_pt_in21k_p14to16 |
86M | IN-21K | 150 | - | 🤗 HF link |
eva02_L_pt_in21k_p14 |
304M | IN-21K | 150 | link | 🤗 HF link |
eva02_L_pt_in21k_p14to16 |
304M | IN-21K | 150 | - | 🤗 HF link |
eva02_L_pt_m38m_p14 |
304M | Merged-38M | 56 | link | 🤗 HF link |
eva02_L_pt_m38m_p14to16 |
304M | Merged-38M | 56 | - | 🤗 HF link |
- The input size / patch size of MIM pre-trained EVA-02 is
224x224
/14x14
. eva02_psz14to16
models interpolate the kernel size ofpatch_embed
from14x14
to16x16
, and interpolate thepos_embed
from16x16
to14x14
. This is useful for object detection, instance segmentation & semantic segmentation tasks.
model name | init. ckpt | IN-21K ft epochs | log | weight |
---|---|---|---|---|
eva02_B_pt_in21k_medft_in21k_p14 |
eva02_B_pt_in21k_p14 |
40 | link | 🤗 HF link |
eva02_L_pt_in21k_medft_in21k_p14 |
eva02_L_pt_in21k_p14 |
20 | link | 🤗 HF link |
eva02_L_pt_m38m_medft_in21k_p14 |
eva02_L_pt_m38m_p14 |
30 | link | 🤗 HF link |
- The input size / patch size of IN-21K intermediate fine-tuned EVA-02 is
448x448
/14x14
.
model name | init. ckpt | IN-1K ft epochs | ft image size | ema? | top-1 | log | weight |
---|---|---|---|---|---|---|---|
eva02_Ti_pt_in21k_ft_in1k_p14 |
eva02_Ti_pt_in21k_p14 |
100 | 336x336 |
x |
80.7 | link | 🤗 HF link |
eva02_S_pt_in21k_ft_in1k_p14 |
eva02_S_pt_in21k_p14 |
100 | 336x336 |
x |
85.8 | link | 🤗 HF link |
eva02_B_pt_in21k_ft_in1k_p14 |
eva02_B_pt_in21k_p14 |
30 | 448x448 |
x |
88.3 | link | 🤗 HF link |
eva02_L_pt_in21k_ft_in1k_p14 |
eva02_L_pt_in21k_p14 |
30 | 448x448 |
o |
89.6 | link | 🤗 HF link |
eva02_L_pt_m38m_ft_in1k_p14 |
eva02_L_pt_m38m_p14 |
30 | 448x448 |
o |
89.6 | link | 🤗 HF link |
o
: using ema model weight update achieves similar or slightly improved performance.
model name | init. ckpt | IN-1K ft epochs | ft image size | ema? | top-1 | log | weight |
---|---|---|---|---|---|---|---|
eva02_B_pt_in21k_medft_in21k_ft_in1k_p14 |
eva02_B_pt_in21k_medft_in21k_p14 |
15 | 448x448 |
o |
88.6 | link | 🤗 HF link |
eva02_L_pt_in21k_medft_in21k_ft_in1k_p14 |
eva02_L_pt_in21k_medft_in21k_p14 |
20 | 448x448 |
o |
89.9 | link | 🤗 HF link |
eva02_L_pt_m38m_medft_in21k_ft_in1k_p14 |
eva02_L_pt_m38m_medft_in21k_p14 |
20 | 448x448 |
o |
90.0 | link | 🤗 HF link |
o
: using ema model weight update achieves similar or slightly improved performance.
model name | IN-1K | IN-V2 | IN-ReaL | IN-Adv. | IN-Ren. | IN-Ske. | ObjectNet |
---|---|---|---|---|---|---|---|
eva02_B_pt_in21k_medft_in21k_ft_in1k_p14 |
88.6 | 79.8 | 90.8 | 78.1 | 76.8 | 57.7 | 55.3 |
eva02_L_pt_m38m_medft_in21k_ft_in1k_p14 |
90.0 | 82.4 | 91.1 | 87.7 | 89.9 | 70.1 | 62.8 |
For reference, timm
collects some open-sourced state-of-the-art models' image classification results at here (IN-1K, IN-V2, IN-ReaL, IN-Adv., IN-Ren., IN-Ske.).
First, clone the repo and install required packages:
conda create --name asuka python=3.8 -y
conda activate asuka
git clone [email protected]:baaivision/EVA.git
cd EVA-02/asuka
pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
pip install -r requirements.txt
Then, install Apex and xFormer following the official instruction.
Core packages:
- Pytorch version 1.12.1
- torchvision version 0.13.0
- timm version 0.5.4
- DeepSpeed version 0.6.5 (
fp16
training and ZeRO optimizer), fine-tuning withbfloat16
requires version 0.8.1 - Apex (fused layer norm)
- xFormer (fast and memory efficient MHSA)
We use the standard IN-1K dataset (1.2M images). Download it from http://image-net.org. Then, move and extract the training and validation images to labeled subfolders, using the shell script.
Evaluate the fine-tuned eva02_Ti_pt_in21k_ft_in1k_p14
on IN-1K val using a single node with 4 gpus (click to expand).
MODEL_NAME=eva02_tiny_patch14_xattn_fusedLN_SwiGLU_preln_RoPE
sz=336
batch_size=64
crop_pct=1.0
EVAL_CKPT=/path/to/eva02_Ti_pt_in21k_ft_in1k_p14.pt
DATA_PATH=/path/to/IN-1K/
# using model w/o ema for evaluation (w/o --use_ema_ckpt_eval)
python -m torch.distributed.launch --nproc_per_node=4 --nnodes=${WORLD_SIZE} --node_rank=${RANK} \
--master_addr=${MASTER_ADDR} --master_port=12345 --use_env run_class_finetuning.py \
--data_path ${DATA_PATH}/train \
--eval_data_path ${DATA_PATH}/val \
--nb_classes 1000 \
--data_set image_folder \
--model ${MODEL_NAME} \
--finetune ${EVAL_CKPT} \
--input_size ${sz} \
--batch_size ${batch_size} \
--crop_pct ${crop_pct} \
--no_auto_resume \
--dist_eval \
--eval \
--enable_deepspeed
Expected results:
* Acc@1 80.714 Acc@5 95.536 loss 0.807
Evaluate the fine-tuned eva02_S_pt_in21k_ft_in1k_p14
on IN-1K val using a single node with 4 gpus (click to expand).
MODEL_NAME=eva02_small_patch14_xattn_fusedLN_SwiGLU_preln_RoPE
sz=336
batch_size=64
crop_pct=1.0
EVAL_CKPT=/path/to/eva02_S_pt_in21k_ft_in1k_p14.pt
DATA_PATH=/path/to/IN-1K/
# using model w/o ema for evaluation (w/o --use_ema_ckpt_eval)
python -m torch.distributed.launch --nproc_per_node=4 --nnodes=${WORLD_SIZE} --node_rank=${RANK} \
--master_addr=${MASTER_ADDR} --master_port=12345 --use_env run_class_finetuning.py \
--data_path ${DATA_PATH}/train \
--eval_data_path ${DATA_PATH}/val \
--nb_classes 1000 \
--data_set image_folder \
--model ${MODEL_NAME} \
--finetune ${EVAL_CKPT} \
--input_size ${sz} \
--batch_size ${batch_size} \
--crop_pct ${crop_pct} \
--no_auto_resume \
--dist_eval \
--eval \
--enable_deepspeed
Expected results:
* Acc@1 85.780 Acc@5 97.598 loss 0.612
Evaluate the fine-tuned eva02_B_pt_in21k_ft_in1k_p14
on IN-1K val using a single node with 4 gpus (click to expand).
MODEL_NAME=eva02_base_patch14_xattn_fusedLN_NaiveSwiGLU_subln_RoPE
sz=448
batch_size=64
crop_pct=1.0
EVAL_CKPT=/path/to/eva02_B_pt_in21k_ft_in1k_p14.pt
DATA_PATH=/path/to/IN-1K/
# using model w/o ema for evaluation (w/o --use_ema_ckpt_eval)
python -m torch.distributed.launch --nproc_per_node=4 --nnodes=${WORLD_SIZE} --node_rank=${RANK} \
--master_addr=${MASTER_ADDR} --master_port=12345 --use_env run_class_finetuning.py \
--data_path ${DATA_PATH}/train \
--eval_data_path ${DATA_PATH}/val \
--nb_classes 1000 \
--data_set image_folder \
--model ${MODEL_NAME} \
--finetune ${EVAL_CKPT} \
--input_size ${sz} \
--batch_size ${batch_size} \
--crop_pct ${crop_pct} \
--no_auto_resume \
--dist_eval \
--eval \
--enable_deepspeed
Expected results:
* Acc@1 88.282 Acc@5 98.528 loss 0.507
Evaluate the fine-tuned eva02_L_pt_in21k_ft_in1k_p14
on IN-1K val using a single node with 4 gpus (click to expand).
MODEL_NAME=eva02_large_patch14_xattn_fusedLN_NaiveSwiGLU_subln_RoPE
sz=448
batch_size=64
crop_pct=1.0
EVAL_CKPT=/path/to/eva02_L_pt_in21k_ft_in1k_p14.pt
DATA_PATH=/path/to/IN-1K/
# using model w/ ema for evaluation (w/ --use_ema_ckpt_eval)
python -m torch.distributed.launch --nproc_per_node=4 --nnodes=${WORLD_SIZE} --node_rank=${RANK} \
--master_addr=${MASTER_ADDR} --master_port=12345 --use_env run_class_finetuning.py \
--data_path ${DATA_PATH}/train \
--eval_data_path ${DATA_PATH}/val \
--nb_classes 1000 \
--data_set image_folder \
--model ${MODEL_NAME} \
--finetune ${EVAL_CKPT} \
--input_size ${sz} \
--batch_size ${batch_size} \
--crop_pct ${crop_pct} \
--no_auto_resume \
--dist_eval \
--eval \
--use_ema_ckpt_eval \
--enable_deepspeed
Expected results:
* Acc@1 89.626 Acc@5 98.954 loss 0.599
Evaluate the fine-tuned eva02_L_pt_m38m_ft_in1k_p14
on IN-1K val with using a single node with 4 gpus (click to expand).
MODEL_NAME=eva02_large_patch14_xattn_fusedLN_NaiveSwiGLU_subln_RoPE
sz=448
batch_size=64
crop_pct=1.0
EVAL_CKPT=/path/to/eva02_L_pt_m38m_ft_in1k_p14.pt
DATA_PATH=/path/to/IN-1K/
# using model w/ ema for evaluation (w/ --use_ema_ckpt_eval)
python -m torch.distributed.launch --nproc_per_node=4 --nnodes=${WORLD_SIZE} --node_rank=${RANK} \
--master_addr=${MASTER_ADDR} --master_port=12345 --use_env run_class_finetuning.py \
--data_path ${DATA_PATH}/train \
--eval_data_path ${DATA_PATH}/val \
--nb_classes 1000 \
--data_set image_folder \
--model ${MODEL_NAME} \
--finetune ${EVAL_CKPT} \
--input_size ${sz} \
--batch_size ${batch_size} \
--crop_pct ${crop_pct} \
--no_auto_resume \
--dist_eval \
--eval \
--use_ema_ckpt_eval \
--enable_deepspeed
Expected results:
* Acc@1 89.570 Acc@5 98.924 loss 0.612
Evaluate the fine-tuned eva02_B_pt_in21k_medft_in21k_ft_in1k_p14
on IN-1K val using a single node with 4 gpus (click to expand).
MODEL_NAME=eva02_base_patch14_xattn_fusedLN_NaiveSwiGLU_subln_RoPE
sz=448
batch_size=64
crop_pct=1.0
EVAL_CKPT=/path/to/eva02_B_pt_in21k_medft_in21k_ft_in1k_p14.pt
DATA_PATH=/path/to/IN-1K/
# using model w/ ema for evaluation (w/ --use_ema_ckpt_eval)
python -m torch.distributed.launch --nproc_per_node=4 --nnodes=${WORLD_SIZE} --node_rank=${RANK} \
--master_addr=${MASTER_ADDR} --master_port=12345 --use_env run_class_finetuning.py \
--data_path ${DATA_PATH}/train \
--eval_data_path ${DATA_PATH}/val \
--nb_classes 1000 \
--data_set image_folder \
--model ${MODEL_NAME} \
--finetune ${EVAL_CKPT} \
--input_size ${sz} \
--batch_size ${batch_size} \
--crop_pct ${crop_pct} \
--no_auto_resume \
--dist_eval \
--eval \
--use_ema_ckpt_eval \
--enable_deepspeed
Expected results:
* Acc@1 88.570 Acc@5 98.650 loss 0.686
Evaluate the fine-tuned eva02_L_pt_in21k_medft_in21k_ft_in1k_p14
on IN-1K val using a single node with 4 gpus (click to expand).
MODEL_NAME=eva02_large_patch14_xattn_fusedLN_NaiveSwiGLU_subln_RoPE
sz=448
batch_size=64
crop_pct=1.0
EVAL_CKPT=/path/to/eva02_L_pt_in21k_medft_in21k_ft_in1k_p14.pt
DATA_PATH=/path/to/IN-1K/
# using model w/ ema for evaluation (w/ --use_ema_ckpt_eval)
python -m torch.distributed.launch --nproc_per_node=4 --nnodes=${WORLD_SIZE} --node_rank=${RANK} \
--master_addr=${MASTER_ADDR} --master_port=12345 --use_env run_class_finetuning.py \
--data_path ${DATA_PATH}/train \
--eval_data_path ${DATA_PATH}/val \
--nb_classes 1000 \
--data_set image_folder \
--model ${MODEL_NAME} \
--finetune ${EVAL_CKPT} \
--input_size ${sz} \
--batch_size ${batch_size} \
--crop_pct ${crop_pct} \
--no_auto_resume \
--dist_eval \
--eval \
--use_ema_ckpt_eval \
--enable_deepspeed
Expected results:
* Acc@1 89.904 Acc@5 98.974 loss 0.647
Evaluate the fine-tuned eva02_L_pt_m38m_medft_in21k_ft_in1k_p14
on IN-1K val using a single node with 4 gpus (click to expand).
MODEL_NAME=eva02_large_patch14_xattn_fusedLN_NaiveSwiGLU_subln_RoPE
sz=448
batch_size=64
crop_pct=1.0
EVAL_CKPT=/path/to/eva02_L_pt_m38m_medft_in21k_ft_in1k_p14.pt
DATA_PATH=/path/to/IN-1K/
# using model w/ ema for evaluation (w/ --use_ema_ckpt_eval)
python -m torch.distributed.launch --nproc_per_node=4 --nnodes=${WORLD_SIZE} --node_rank=${RANK} \
--master_addr=${MASTER_ADDR} --master_port=12345 --use_env run_class_finetuning.py \
--data_path ${DATA_PATH}/train \
--eval_data_path ${DATA_PATH}/val \
--nb_classes 1000 \
--data_set image_folder \
--model ${MODEL_NAME} \
--finetune ${EVAL_CKPT} \
--input_size ${sz} \
--batch_size ${batch_size} \
--crop_pct ${crop_pct} \
--no_auto_resume \
--dist_eval \
--eval \
--use_ema_ckpt_eval \
--enable_deepspeed
Expected results:
* Acc@1 89.974 Acc@5 99.022 loss 0.700
We provide the evaluation instructions of eva02_L_pt_m38m_medft_in21k_ft_in1k_p14
. Evaluation of other EVA-02 models are similar.
Please download / prepare the IN-1K variants data from the official release first.
Evaluate the fine-tuned eva02_L_pt_m38m_medft_in21k_ft_in1k_p14
on ImageNet-V2 (IN-V2) using a single node with 4 gpus (click to expand).
MODEL_NAME=eva02_large_patch14_xattn_fusedLN_NaiveSwiGLU_subln_RoPE
sz=448
batch_size=64
crop_pct=1.0
EVAL_CKPT=/path/to/eva02_L_pt_m38m_medft_in21k_ft_in1k_p14.pt
DATA_PATH=/path/to/IN-V2/ImageNetV2-matched-frequency
python -m torch.distributed.launch --nproc_per_node=4 --nnodes=${WORLD_SIZE} --node_rank=${RANK} \
--master_addr=${MASTER_ADDR} --master_port=12345 --use_env run_class_finetuning.py \
--robust_test 'imagenet_v2' \
--data_path ${DATA_PATH} \
--eval_data_path ${DATA_PATH} \
--nb_classes 1000 \
--data_set image_folder \
--model ${MODEL_NAME} \
--finetune ${EVAL_CKPT} \
--input_size ${sz} \
--batch_size ${batch_size} \
--crop_pct ${crop_pct} \
--no_auto_resume \
--dist_eval \
--eval \
--use_ema_ckpt_eval \
--enable_deepspeed
Expected results:
* Acc@1 82.430 Acc@5 96.360 loss 1.027
Evaluate the fine-tuned eva02_L_pt_m38m_medft_in21k_ft_in1k_p14
on ImageNet-ReaL (IN-ReaL) using a single node with 1 gpu (click to expand).
MODEL_NAME=eva02_large_patch14_xattn_fusedLN_NaiveSwiGLU_subln_RoPE
sz=448
batch_size=64
crop_pct=1.0
EVAL_CKPT=/path/to/eva02_L_pt_m38m_medft_in21k_ft_in1k_p14.pt
DATA_PATH=/path/to/IN-1K
python -m torch.distributed.launch --nproc_per_node=1 --nnodes=${WORLD_SIZE} --node_rank=${RANK} \
--master_addr=${MASTER_ADDR} --master_port=12345 --use_env run_class_finetuning.py \
--real_labels real.json \
--data_path ${DATA_PATH}/train \
--eval_data_path ${DATA_PATH}/val \
--nb_classes 1000 \
--data_set image_folder \
--model ${MODEL_NAME} \
--finetune ${EVAL_CKPT} \
--input_size ${sz} \
--batch_size ${batch_size} \
--crop_pct ${crop_pct} \
--no_auto_resume \
--dist_eval \
--eval \
--use_ema_ckpt_eval \
--enable_deepspeed
Expected results:
* ReaL Acc@1 91.075 Acc@5 98.689 loss 0.699
Evaluate the fine-tuned eva02_L_pt_m38m_medft_in21k_ft_in1k_p14
on ImageNet-Adversarial (IN-Adv.) using a single node with 4 gpus (click to expand).
MODEL_NAME=eva02_large_patch14_xattn_fusedLN_NaiveSwiGLU_subln_RoPE
sz=448
batch_size=64
crop_pct=1.0
EVAL_CKPT=/path/to/eva02_L_pt_m38m_medft_in21k_ft_in1k_p14.pt
DATA_PATH=/path/to/IN-Adv
python -m torch.distributed.launch --nproc_per_node=4 --nnodes=${WORLD_SIZE} --node_rank=${RANK} \
--master_addr=${MASTER_ADDR} --master_port=12345 --use_env run_class_finetuning.py \
--robust_test 'imagenet_a' \
--data_path ${DATA_PATH} \
--eval_data_path ${DATA_PATH} \
--nb_classes 200 \
--data_set image_folder \
--model ${MODEL_NAME} \
--finetune ${EVAL_CKPT} \
--input_size ${sz} \
--batch_size ${batch_size} \
--crop_pct ${crop_pct} \
--no_auto_resume \
--dist_eval \
--eval \
--use_ema_ckpt_eval \
--enable_deepspeed
Expected results:
* Acc@1 87.720 Acc@5 96.893 loss 0.829
Evaluate the fine-tuned eva02_L_pt_m38m_medft_in21k_ft_in1k_p14
on ImageNet-Rendition (IN-Ren.) using a single node with 4 gpus (click to expand).
MODEL_NAME=eva02_large_patch14_xattn_fusedLN_NaiveSwiGLU_subln_RoPE
sz=448
batch_size=64
crop_pct=1.0
EVAL_CKPT=/path/to/eva02_L_pt_m38m_medft_in21k_ft_in1k_p14.pt
DATA_PATH=/path/to/IN-Ren
python -m torch.distributed.launch --nproc_per_node=4 --nnodes=${WORLD_SIZE} --node_rank=${RANK} \
--master_addr=${MASTER_ADDR} --master_port=12345 --use_env run_class_finetuning.py \
--robust_test 'imagenet_r' \
--data_path ${DATA_PATH} \
--eval_data_path ${DATA_PATH} \
--nb_classes 200 \
--data_set image_folder \
--model ${MODEL_NAME} \
--finetune ${EVAL_CKPT} \
--input_size ${sz} \
--batch_size ${batch_size} \
--crop_pct ${crop_pct} \
--no_auto_resume \
--dist_eval \
--eval \
--use_ema_ckpt_eval \
--enable_deepspeed
Expected results:
* Acc@1 89.907 Acc@5 96.957 loss 0.802
Evaluate the fine-tuned eva02_L_pt_m38m_medft_in21k_ft_in1k_p14
on ImageNet-Sketch (IN-Ske.) using a single node with 4 gpus (click to expand).
MODEL_NAME=eva02_large_patch14_xattn_fusedLN_NaiveSwiGLU_subln_RoPE
sz=448
batch_size=64
crop_pct=1.0
EVAL_CKPT=/path/to/eva02_L_pt_m38m_medft_in21k_ft_in1k_p14.pt
DATA_PATH=/path/to/IN-Ske
python -m torch.distributed.launch --nproc_per_node=4 --nnodes=${WORLD_SIZE} --node_rank=${RANK} \
--master_addr=${MASTER_ADDR} --master_port=12345 --use_env run_class_finetuning.py \
--data_path ${DATA_PATH} \
--eval_data_path ${DATA_PATH} \
--nb_classes 1000 \
--data_set image_folder \
--model ${MODEL_NAME} \
--finetune ${EVAL_CKPT} \
--input_size ${sz} \
--batch_size ${batch_size} \
--crop_pct ${crop_pct} \
--no_auto_resume \
--dist_eval \
--eval \
--use_ema_ckpt_eval \
--enable_deepspeed
Expected results:
* Acc@1 70.131 Acc@5 89.617 loss 1.647
Evaluate the fine-tuned eva02_L_pt_m38m_medft_in21k_ft_in1k_p14
on ObjectNet (ObjNet) using a single node with 4 gpus (click to expand).
MODEL_NAME=eva02_large_patch14_xattn_fusedLN_NaiveSwiGLU_subln_RoPE
sz=448
batch_size=64
crop_pct=1.0
EVAL_CKPT=/path/to/eva02_L_pt_m38m_medft_in21k_ft_in1k_p14.pt
DUMMY_DATA_PATH=/path/to/IN-1K
DATA_PATH=/path/to/ObjNet/objectnet-1.0/images
python -m torch.distributed.launch --nproc_per_node=4 --nnodes=${WORLD_SIZE} --node_rank=${RANK} \
--master_addr=${MASTER_ADDR} --master_port=12345 --use_env run_class_finetuning.py \
--robust_test 'objectnet' \
--data_path ${DUMMY_DATA_PATH}/train \
--eval_data_path ${DATA_PATH} \
--nb_classes 1000 \
--data_set image_folder \
--model ${MODEL_NAME} \
--finetune ${EVAL_CKPT} \
--input_size ${sz} \
--batch_size ${batch_size} \
--crop_pct ${crop_pct} \
--no_auto_resume \
--dist_eval \
--eval \
--use_ema_ckpt_eval \
--enable_deepspeed
Expected results:
* Acc@1 62.801 Acc@5 84.636 loss 2.002
We provide instruction of pre-training EVA-02 on IN-21K dataset (14.2M images) and Merged-38M dataset.
Please prepare IN-21K dataset, Merged-38M dataset and EVA-CLIP (eva_clip_psz14.pt
, download link) first.
Pre-train eva02_Ti_pt_in21k_p14
on IN-21K using 5 nodes x 8 gpus per node (click to expand).
MODEL=eva02_tiny_patch14_xattn_fusedLN_SwiGLU_preln_RoPE_xavier_normal_init
DATA_PATH=/path/to/IN-21K
VAL_DATA_PATH=/path/to/IN-1K # monitoring val loss
input_size=224
num_mask_patches=105 ### 224*224/14/14 * 0.4
batch_size=100 # 100(bsz_per_gpu)*8(#gpus_per_node)*5(#nodes)*1(update_freq)=4000(total_bsz)
update_freq=1
lr=3e-3
b2=0.98
eps=1e-6
dpr=0.0
ls=0.0
epochs=240
wmep=1
save_ckpt_freq=10
mixup=0.0
cj=0.0
zero_stage=0
teacher_type=evaclip
clip_model=EVA_CLIP_g_14_X
cache_dir=/path/to/eva_clip_psz14.pt
OUTPUT_DIR=/path/to/output/${MODEL}
python -m torch.distributed.launch --nproc_per_node=8 --nnodes=${WORLD_SIZE} --node_rank=${RANK} \
--master_addr=${MASTER_ADDR} --master_port=12345 --use_env run_eva02_pretraining.py \
--data_path ${DATA_PATH} \
--val_data_path ${VAL_DATA_PATH} \
--output_dir ${OUTPUT_DIR} \
--log_dir ${OUTPUT_DIR}/tb_log \
--model ${MODEL} \
--teacher_type ${teacher_type} \
--clip_model ${clip_model} \
--cache_dir ${cache_dir} \
--input_size ${input_size} --second_input_size ${input_size} \
--num_mask_patches ${num_mask_patches} \
--layer_scale_init_value ${ls} \
--batch_size ${batch_size} \
--lr ${lr} \
--opt_betas 0.9 ${b2} \
--opt_eps ${eps} \
--drop_path ${dpr} \
--epochs ${epochs} \
--mixup ${mixup} \
--color_jitter ${cj} \
--warmup_epochs ${wmep} \
--update_freq ${update_freq} \
--weight_decay 0.05 \
--zero_stage ${zero_stage} \
--save_ckpt_freq ${save_ckpt_freq} \
--stop_grad_conv1 \
--enable_deepspeed
Pre-train eva02_S_pt_in21k_p14
on IN-21K using 5 nodes x 8 gpus per node (click to expand).
MODEL=eva02_small_patch14_xattn_fusedLN_SwiGLU_preln_RoPE_xavier_normal_init
DATA_PATH=/path/to/IN-21K
VAL_DATA_PATH=/path/to/IN-1K # monitoring val loss
input_size=224
num_mask_patches=105 ### 224*224/14/14 * 0.4
batch_size=100 # 100(bsz_per_gpu)*8(#gpus_per_node)*5(#nodes)*1(update_freq)=4000(total_bsz)
update_freq=1
lr=3e-3
b2=0.98
eps=1e-6
dpr=0.0
ls=0.0
epochs=240
wmep=1
save_ckpt_freq=10
mixup=0.0
cj=0.0
zero_stage=0
teacher_type=evaclip
clip_model=EVA_CLIP_g_14_X
cache_dir=/path/to/eva_clip_psz14.pt
OUTPUT_DIR=/path/to/output/${MODEL}
python -m torch.distributed.launch --nproc_per_node=8 --nnodes=${WORLD_SIZE} --node_rank=${RANK} \
--master_addr=${MASTER_ADDR} --master_port=12345 --use_env run_eva02_pretraining.py \
--data_path ${DATA_PATH} \
--val_data_path ${VAL_DATA_PATH} \
--output_dir ${OUTPUT_DIR} \
--log_dir ${OUTPUT_DIR}/tb_log \
--model ${MODEL} \
--teacher_type ${teacher_type} \
--clip_model ${clip_model} \
--cache_dir ${cache_dir} \
--input_size ${input_size} --second_input_size ${input_size} \
--num_mask_patches ${num_mask_patches} \
--layer_scale_init_value ${ls} \
--batch_size ${batch_size} \
--lr ${lr} \
--opt_betas 0.9 ${b2} \
--opt_eps ${eps} \
--drop_path ${dpr} \
--epochs ${epochs} \
--mixup ${mixup} \
--color_jitter ${cj} \
--warmup_epochs ${wmep} \
--update_freq ${update_freq} \
--weight_decay 0.05 \
--zero_stage ${zero_stage} \
--save_ckpt_freq ${save_ckpt_freq} \
--stop_grad_conv1 \
--enable_deepspeed
Pre-train eva02_B_pt_in21k_p14
on IN-21K using 4 nodes x 8 gpus per node (click to expand).
MODEL=eva02_base_patch14_xattn_fusedLN_NaiveSwiGLU_subln_RoPE_xavier_normal_init
DATA_PATH=/path/to/IN-21K
VAL_DATA_PATH=/path/to/IN-1K # monitoring val loss
input_size=224
num_mask_patches=105 ### 224*224/14/14 * 0.4
batch_size=64 # 64(bsz_per_gpu)*8(#gpus_per_node)*4(#nodes)*1(update_freq)=2048(total_bsz)
update_freq=1
lr=1.5e-3
b2=0.98
eps=1e-6
dpr=0.0
ls=0.0
epochs=150
wmep=1
save_ckpt_freq=10
mixup=0.0
cj=0.0
zero_stage=0
teacher_type=evaclip
clip_model=EVA_CLIP_g_14_X
cache_dir=/path/to/eva_clip_psz14.pt
OUTPUT_DIR=/path/to/output/${MODEL}
python -m torch.distributed.launch --nproc_per_node=8 --nnodes=${WORLD_SIZE} --node_rank=${RANK} \
--master_addr=${MASTER_ADDR} --master_port=12345 --use_env run_eva02_pretraining.py \
--data_path ${DATA_PATH} \
--val_data_path ${VAL_DATA_PATH} \
--output_dir ${OUTPUT_DIR} \
--log_dir ${OUTPUT_DIR}/tb_log \
--model ${MODEL} \
--teacher_type ${teacher_type} \
--clip_model ${clip_model} \
--cache_dir ${cache_dir} \
--input_size ${input_size} --second_input_size ${input_size} \
--num_mask_patches ${num_mask_patches} \
--layer_scale_init_value ${ls} \
--batch_size ${batch_size} \
--lr ${lr} \
--opt_betas 0.9 ${b2} \
--opt_eps ${eps} \
--drop_path ${dpr} \
--epochs ${epochs} \
--mixup ${mixup} \
--color_jitter ${cj} \
--warmup_epochs ${wmep} \
--update_freq ${update_freq} \
--weight_decay 0.05 \
--zero_stage ${zero_stage} \
--save_ckpt_freq ${save_ckpt_freq} \
--stop_grad_conv1 \
--enable_deepspeed
Pre-train eva02_L_pt_in21k_p14
on IN-21K using 8 nodes x 8 gpus per node (click to expand).
MODEL=eva02_large_patch14_xattn_fusedLN_NaiveSwiGLU_subln_RoPE_xavier_normal_init
DATA_PATH=/path/to/IN-21K
VAL_DATA_PATH=/path/to/IN-1K # monitoring val loss
input_size=224
num_mask_patches=105 ### 224*224/14/14 * 0.4
batch_size=32 # 32(bsz_per_gpu)*8(#gpus_per_node)*8(#nodes)*1(update_freq)=2048(total_bsz)
update_freq=1
lr=1.5e-3
b2=0.98
eps=1e-6
dpr=0.1
ls=0.0
epochs=150
wmep=1
save_ckpt_freq=10
mixup=0.0
cj=0.0
zero_stage=1
teacher_type=evaclip
clip_model=EVA_CLIP_g_14_X
cache_dir=/path/to/eva_clip_psz14.pt
OUTPUT_DIR=/path/to/output/${MODEL}
python -m torch.distributed.launch --nproc_per_node=8 --nnodes=${WORLD_SIZE} --node_rank=${RANK} \
--master_addr=${MASTER_ADDR} --master_port=12345 --use_env run_eva02_pretraining.py \
--data_path ${DATA_PATH} \
--val_data_path ${VAL_DATA_PATH} \
--output_dir ${OUTPUT_DIR} \
--log_dir ${OUTPUT_DIR}/tb_log \
--model ${MODEL} \
--teacher_type ${teacher_type} \
--clip_model ${clip_model} \
--cache_dir ${cache_dir} \
--input_size ${input_size} --second_input_size ${input_size} \
--num_mask_patches ${num_mask_patches} \
--layer_scale_init_value ${ls} \
--batch_size ${batch_size} \
--lr ${lr} \
--opt_betas 0.9 ${b2} \
--opt_eps ${eps} \
--drop_path ${dpr} \
--epochs ${epochs} \
--mixup ${mixup} \
--color_jitter ${cj} \
--warmup_epochs ${wmep} \
--update_freq ${update_freq} \
--weight_decay 0.05 \
--zero_stage ${zero_stage} \
--save_ckpt_freq ${save_ckpt_freq} \
--stop_grad_conv1 \
--enable_deepspeed
Pre-train eva02_L_pt_m38m_p14
on Merged-38M using 8 nodes x 8 gpus per node (click to expand).
Prepare Merged-38M unlabeled image dataset:
Merged-38M
├── IN-21K
│ └── IN-21K -> /path/to/IN-21K
├── ADE20K
│ └── training -> /path/to/ADEChallengeData2016/images/training
├── CC12M
│ └── train_image -> /path/to/CC12M/train_image
├── CC3M
│ └── train_image -> /path/to/CC3M/train_image
├── COCO
│ └── train2017 -> /path/to/COCO/train2017
├── Object365
│ └── images -> /path/to/Objects365/images
└── OpenImages
└── OpenImages_v6 -> /path/to/openimages_v6
Pre-training on Merged-38M unlabeled image dataset:
MODEL=eva02_large_patch14_xattn_fusedLN_NaiveSwiGLU_subln_RoPE_xavier_normal_init
DATA_PATH=/path/to/Merged-38M
VAL_DATA_PATH=/path/to/IN-1K # monitoring val loss
input_size=224
num_mask_patches=105 ### 224*224/14/14 * 0.4
batch_size=32 # 32(bsz_per_gpu)*8(#gpus_per_node)*8(#nodes)*1(update_freq)=2048(total_bsz)
update_freq=1
lr=1.5e-3
b2=0.98
eps=1e-6
dpr=0.1
ls=0.0
epochs=56
wmep=1
save_ckpt_freq=10
mixup=0.0
cj=0.0
zero_stage=1
teacher_type=evaclip
clip_model=EVA_CLIP_g_14_X
cache_dir=/path/to/eva_clip_psz14.pt
OUTPUT_DIR=/path/to/output/${MODEL}
python -m torch.distributed.launch --nproc_per_node=8 --nnodes=${WORLD_SIZE} --node_rank=${RANK} \
--master_addr=${MASTER_ADDR} --master_port=12345 --use_env run_eva02_pretraining.py \
--data_path ${DATA_PATH} \
--val_data_path ${VAL_DATA_PATH} \
--output_dir ${OUTPUT_DIR} \
--log_dir ${OUTPUT_DIR}/tb_log \
--model ${MODEL} \
--teacher_type ${teacher_type} \
--clip_model ${clip_model} \
--cache_dir ${cache_dir} \
--input_size ${input_size} --second_input_size ${input_size} \
--num_mask_patches ${num_mask_patches} \
--layer_scale_init_value ${ls} \
--batch_size ${batch_size} \
--lr ${lr} \
--opt_betas 0.9 ${b2} \
--opt_eps ${eps} \
--drop_path ${dpr} \
--epochs ${epochs} \
--mixup ${mixup} \
--color_jitter ${cj} \
--warmup_epochs ${wmep} \
--update_freq ${update_freq} \
--weight_decay 0.05 \
--zero_stage ${zero_stage} \
--save_ckpt_freq ${save_ckpt_freq} \
--stop_grad_conv1 \
--enable_deepspeed
- By default, we fine-tune EVA-02 with
deepspeed==0.6.5
&fp16
. Fine-tuning withbfloat16
requiresdeepspeed==0.8.1
. - If you receive complaints on
size mismatch of RoPE
when loading some pre-trained EVA-02 checkpoints, just ignore them. This is because previously we used a naive implementationVisionRotaryEmbedding
for pre-training, and later we changed to a slightly faster & neater oneVisionRotaryEmbeddingFast
. The only difference is they come with different RoPE shapes. Functionally they are the same. Also see baaivision#56 if you have trouble loading EVA-02 MIM pre-trained weights.
Fine-tune MIM pre-trained eva02_Ti_pt_in21k_p14
on IN-1K using 1 nodes x 8 gpus per node (click to expand).
MODEL_NAME=eva02_tiny_patch14_xattn_fusedLN_SwiGLU_preln_RoPE
PRETRAIN_CKPT=/path/to/eva02_Ti_pt_in21k_p14.pt
OUTPUT_DIR=/path/to/output/{MODEL_NAME}
DATA_PATH=/path/to/IN-1K
sz=336
batch_size=128 # 128(bsz_per_gpu)*8(#gpus_per_node)*1(#nodes)*1(update_freq)=1024(total_bsz)
update_freq=1
lr=2e-4
lrd=0.9
warmup_lr=0.0
min_lr=0.0
weight_decay=0.05
partial_freeze=0
ep=100
wmep=5
dpr=0.1
reprob=0.0
mixup=0.0
cutmix=0.0
smoothing=0.1
zero_stage=0
scale_low=0.08
crop_pct=1.0
aa=rand-m9-mstd0.5-inc1
python -m torch.distributed.launch --nproc_per_node=8 --nnodes=${WORLD_SIZE} --node_rank=${RANK} \
--master_addr=${MASTER_ADDR} --master_port=12345 --use_env run_class_finetuning.py \
--data_path ${DATA_PATH}/train \
--eval_data_path ${DATA_PATH}/val \
--nb_classes 1000 \
--data_set image_folder \
--output_dir ${OUTPUT_DIR} \
--log_dir ${OUTPUT_DIR}/tb_log \
--model ${MODEL_NAME} \
--finetune ${PRETRAIN_CKPT} \
--input_size ${sz} \
--scale ${scale_low} 1.0 \
--lr ${lr} \
--warmup_lr ${warmup_lr} \
--min_lr ${min_lr} \
--layer_decay ${lrd} \
--epochs ${ep} \
--warmup_epochs ${wmep} \
--drop_path ${dpr} \
--reprob ${reprob} \
--mixup ${mixup} \
--cutmix ${cutmix} \
--batch_size ${batch_size} \
--update_freq ${update_freq} \
--crop_pct ${crop_pct} \
--zero_stage ${zero_stage} \
--partial_freeze ${partial_freeze} \
--smoothing ${smoothing} \
--weight_decay ${weight_decay} \
--aa ${aa} \
--dist_eval \
--model_ema \
--model_ema_eval \
--enable_deepspeed
Fine-tune MIM pre-trained eva02_S_pt_in21k_p14
on IN-1K using 1 nodes x 8 gpus per node (click to expand).
MODEL_NAME=eva02_small_patch14_xattn_fusedLN_SwiGLU_preln_RoPE
PRETRAIN_CKPT=/path/to/eva02_S_pt_in21k_p14.pt
OUTPUT_DIR=/path/to/output/{MODEL_NAME}
DATA_PATH=/path/to/IN-1K
sz=336
batch_size=128 # 128(bsz_per_gpu)*8(#gpus_per_node)*1(#nodes)*1(update_freq)=1024(total_bsz)
update_freq=1
lr=1e-4
lrd=0.8
warmup_lr=0.0
min_lr=0.0
weight_decay=0.05
partial_freeze=0
ep=100
wmep=5
dpr=0.1
reprob=0.0
mixup=0.0
cutmix=0.0
smoothing=0.1
zero_stage=0
scale_low=0.08
crop_pct=1.0
aa=rand-m9-mstd0.5-inc1
python -m torch.distributed.launch --nproc_per_node=8 --nnodes=${WORLD_SIZE} --node_rank=${RANK} \
--master_addr=${MASTER_ADDR} --master_port=12345 --use_env run_class_finetuning.py \
--data_path ${DATA_PATH}/train \
--eval_data_path ${DATA_PATH}/val \
--nb_classes 1000 \
--data_set image_folder \
--output_dir ${OUTPUT_DIR} \
--log_dir ${OUTPUT_DIR}/tb_log \
--model ${MODEL_NAME} \
--finetune ${PRETRAIN_CKPT} \
--input_size ${sz} \
--scale ${scale_low} 1.0 \
--lr ${lr} \
--warmup_lr ${warmup_lr} \
--min_lr ${min_lr} \
--layer_decay ${lrd} \
--epochs ${ep} \
--warmup_epochs ${wmep} \
--drop_path ${dpr} \
--reprob ${reprob} \
--mixup ${mixup} \
--cutmix ${cutmix} \
--batch_size ${batch_size} \
--update_freq ${update_freq} \
--crop_pct ${crop_pct} \
--zero_stage ${zero_stage} \
--partial_freeze ${partial_freeze} \
--smoothing ${smoothing} \
--weight_decay ${weight_decay} \
--aa ${aa} \
--dist_eval \
--model_ema \
--model_ema_eval \
--enable_deepspeed
Fine-tune MIM pre-trained eva02_B_pt_in21k_p14
on IN-1K using 4 nodes x 8 gpus per node (click to expand).
MODEL_NAME=eva02_base_patch14_xattn_fusedLN_NaiveSwiGLU_subln_RoPE
PRETRAIN_CKPT=/path/to/eva02_B_pt_in21k_p14.pt
OUTPUT_DIR=/path/to/output/{MODEL_NAME}
DATA_PATH=/path/to/IN-1K
sz=448
batch_size=32 # 32(bsz_per_gpu)*8(#gpus_per_node)*4(#nodes)*1(update_freq)=1024(total_bsz)
update_freq=1
lr=1e-4
lrd=0.7
warmup_lr=0.0
min_lr=0.0
weight_decay=0.05
partial_freeze=0
ep=30
wmep=3
dpr=0.1
reprob=0.0
mixup=0.0
cutmix=0.0
smoothing=0.1
zero_stage=0
scale_low=0.08
crop_pct=1.0
aa=rand-m9-mstd0.5-inc1
python -m torch.distributed.launch --nproc_per_node=8 --nnodes=${WORLD_SIZE} --node_rank=${RANK} \
--master_addr=${MASTER_ADDR} --master_port=12345 --use_env run_class_finetuning.py \
--data_path ${DATA_PATH}/train \
--eval_data_path ${DATA_PATH}/val \
--nb_classes 1000 \
--data_set image_folder \
--output_dir ${OUTPUT_DIR} \
--log_dir ${OUTPUT_DIR}/tb_log \
--model ${MODEL_NAME} \
--finetune ${PRETRAIN_CKPT} \
--input_size ${sz} \
--scale ${scale_low} 1.0 \
--lr ${lr} \
--warmup_lr ${warmup_lr} \
--min_lr ${min_lr} \
--layer_decay ${lrd} \
--epochs ${ep} \
--warmup_epochs ${wmep} \
--drop_path ${dpr} \
--reprob ${reprob} \
--mixup ${mixup} \
--cutmix ${cutmix} \
--batch_size ${batch_size} \
--update_freq ${update_freq} \
--crop_pct ${crop_pct} \
--zero_stage ${zero_stage} \
--partial_freeze ${partial_freeze} \
--smoothing ${smoothing} \
--weight_decay ${weight_decay} \
--aa ${aa} \
--dist_eval \
--model_ema \
--model_ema_eval \
--enable_deepspeed
Fine-tune MIM pre-trained eva02_L_pt_in21k_p14
on IN-1K using 4 nodes x 8 gpus per node (click to expand).
MODEL_NAME=eva02_large_patch14_xattn_fusedLN_NaiveSwiGLU_subln_RoPE
PRETRAIN_CKPT=/path/to/eva02_L_pt_in21k_p14.pt
OUTPUT_DIR=/path/to/output/{MODEL_NAME}
DATA_PATH=/path/to/IN-1K
sz=448
batch_size=16 # 16(bsz_per_gpu)*8(#gpus_per_node)*4(#nodes)*2(update_freq)=1024(total_bsz)
update_freq=2
lr=5e-5
lrd=0.8
warmup_lr=0.0
min_lr=0.0
weight_decay=0.05
partial_freeze=0
ep=30
wmep=3
dpr=0.15
reprob=0.0
mixup=0.0
cutmix=0.0
smoothing=0.2
zero_stage=1
scale_low=0.08
crop_pct=1.0
aa=rand-m9-mstd0.5-inc1
python -m torch.distributed.launch --nproc_per_node=8 --nnodes=${WORLD_SIZE} --node_rank=${RANK} \
--master_addr=${MASTER_ADDR} --master_port=12345 --use_env run_class_finetuning.py \
--data_path ${DATA_PATH}/train \
--eval_data_path ${DATA_PATH}/val \
--nb_classes 1000 \
--data_set image_folder \
--output_dir ${OUTPUT_DIR} \
--log_dir ${OUTPUT_DIR}/tb_log \
--model ${MODEL_NAME} \
--finetune ${PRETRAIN_CKPT} \
--input_size ${sz} \
--scale ${scale_low} 1.0 \
--lr ${lr} \
--warmup_lr ${warmup_lr} \
--min_lr ${min_lr} \
--layer_decay ${lrd} \
--epochs ${ep} \
--warmup_epochs ${wmep} \
--drop_path ${dpr} \
--reprob ${reprob} \
--mixup ${mixup} \
--cutmix ${cutmix} \
--batch_size ${batch_size} \
--update_freq ${update_freq} \
--crop_pct ${crop_pct} \
--zero_stage ${zero_stage} \
--partial_freeze ${partial_freeze} \
--smoothing ${smoothing} \
--weight_decay ${weight_decay} \
--aa ${aa} \
--dist_eval \
--model_ema \
--model_ema_eval \
--enable_deepspeed
Fine-tune MIM pre-trained eva02_L_pt_m38m_p14
on IN-1K using 4 nodes x 8 gpus per node (click to expand).
MODEL_NAME=eva02_large_patch14_xattn_fusedLN_NaiveSwiGLU_subln_RoPE
PRETRAIN_CKPT=/path/to/eva02_L_pt_m38m_p14.pt
OUTPUT_DIR=/path/to/output/{MODEL_NAME}
DATA_PATH=/path/to/IN-1K
sz=448
batch_size=16 # 16(bsz_per_gpu)*8(#gpus_per_node)*4(#nodes)*2(update_freq)=1024(total_bsz)
update_freq=2
lr=7e-5
lrd=0.8
warmup_lr=0.0
min_lr=0.0
weight_decay=0.05
partial_freeze=0
ep=30
wmep=3
dpr=0.15
reprob=0.0
mixup=0.0
cutmix=0.0
smoothing=0.2
zero_stage=1
scale_low=0.08
crop_pct=1.0
aa=rand-m9-mstd0.5-inc1
python -m torch.distributed.launch --nproc_per_node=8 --nnodes=${WORLD_SIZE} --node_rank=${RANK} \
--master_addr=${MASTER_ADDR} --master_port=12345 --use_env run_class_finetuning.py \
--data_path ${DATA_PATH}/train \
--eval_data_path ${DATA_PATH}/val \
--nb_classes 1000 \
--data_set image_folder \
--output_dir ${OUTPUT_DIR} \
--log_dir ${OUTPUT_DIR}/tb_log \
--model ${MODEL_NAME} \
--finetune ${PRETRAIN_CKPT} \
--input_size ${sz} \
--scale ${scale_low} 1.0 \
--lr ${lr} \
--warmup_lr ${warmup_lr} \
--min_lr ${min_lr} \
--layer_decay ${lrd} \
--epochs ${ep} \
--warmup_epochs ${wmep} \
--drop_path ${dpr} \
--reprob ${reprob} \
--mixup ${mixup} \
--cutmix ${cutmix} \
--batch_size ${batch_size} \
--update_freq ${update_freq} \
--crop_pct ${crop_pct} \
--zero_stage ${zero_stage} \
--partial_freeze ${partial_freeze} \
--smoothing ${smoothing} \
--weight_decay ${weight_decay} \
--aa ${aa} \
--dist_eval \
--model_ema \
--model_ema_eval \
--enable_deepspeed
Fine-tune MIM pre-trained eva02_B_pt_in21k_p14
on IN-21K using 4 nodes x 8 gpus per node (click to expand).
MODEL_NAME=eva02_base_patch14_xattn_fusedLN_NaiveSwiGLU_subln_RoPE
PRETRAIN_CKPT=/path/to/eva02_B_pt_in21k_p14.pt
OUTPUT_DIR=/path/to/output/{MODEL_NAME}
DATA_PATH=/path/to/IN-21K
sz=448
batch_size=64 # 64(bsz_per_gpu)*8(#gpus_per_node)*4(#nodes)*1(update_freq)=2048(total_bsz)
update_freq=1
lr=3e-4
lrd=0.7
warmup_lr=0.0
min_lr=0.0
weight_decay=0.05
partial_freeze=0
ep=40
wmep=1
dpr=0.1
reprob=0.0
mixup=0.0
cutmix=0.0
smoothing=0.1
zero_stage=1
scale_low=0.2
crop_pct=1.0
aa=rand-m9-mstd0.5-inc1
python -m torch.distributed.launch --nproc_per_node=8 --nnodes=${WORLD_SIZE} --node_rank=${RANK} \
--master_addr=${MASTER_ADDR} --master_port=23333 --use_env run_class_finetuning.py \
--data_path ${DATA_PATH} \
--disable_eval_during_finetuning \
--nb_classes 21841 \
--data_set image_folder \
--output_dir ${OUTPUT_DIR} \
--log_dir ${OUTPUT_DIR}/tb_log \
--model ${MODEL_NAME} \
--finetune ${PRETRAIN_CKPT} \
--input_size ${sz} \
--lr ${lr} \
--warmup_lr ${warmup_lr} \
--min_lr ${min_lr} \
--layer_decay ${lrd} \
--epochs ${ep} \
--warmup_epochs ${wmep} \
--drop_path ${dpr} \
--reprob ${reprob} \
--mixup ${mixup} \
--cutmix ${cutmix} \
--batch_size ${batch_size} \
--update_freq ${update_freq} \
--crop_pct ${crop_pct} \
--zero_stage ${zero_stage} \
--partial_freeze ${partial_freeze} \
--smoothing ${smoothing} \
--weight_decay ${weight_decay} \
--scale ${scale_low} 1.0 \
--aa ${aa} \
--enable_deepspeed
Fine-tune MIM pre-trained eva02_L_pt_in21k_p14
on IN-21K using 8 nodes x 8 gpus per node (click to expand).
MODEL_NAME=eva02_large_patch14_xattn_fusedLN_NaiveSwiGLU_subln_RoPE
PRETRAIN_CKPT=/path/to/eva02_L_pt_in21k_p14.pt
OUTPUT_DIR=/path/to/output/{MODEL_NAME}
DATA_PATH=/path/to/IN-21K
sz=448
batch_size=16 # 16(bsz_per_gpu)*8(#gpus_per_node)*8(#nodes)*1(update_freq)=1024(total_bsz)
update_freq=1
lr=2e-4
lrd=0.75
warmup_lr=0.0
min_lr=0.0
weight_decay=0.05
partial_freeze=0
ep=20
wmep=1
dpr=0.15
reprob=0.0
mixup=0.0
cutmix=0.0
smoothing=0.1
zero_stage=1
scale_low=0.2
crop_pct=1.0
aa=rand-m9-mstd0.5-inc1
python -m torch.distributed.launch --nproc_per_node=8 --nnodes=${WORLD_SIZE} --node_rank=${RANK} \
--master_addr=${MASTER_ADDR} --master_port=23333 --use_env run_class_finetuning.py \
--data_path ${DATA_PATH} \
--disable_eval_during_finetuning \
--nb_classes 21841 \
--data_set image_folder \
--output_dir ${OUTPUT_DIR} \
--log_dir ${OUTPUT_DIR}/tb_log \
--model ${MODEL_NAME} \
--finetune ${PRETRAIN_CKPT} \
--input_size ${sz} \
--lr ${lr} \
--warmup_lr ${warmup_lr} \
--min_lr ${min_lr} \
--layer_decay ${lrd} \
--epochs ${ep} \
--warmup_epochs ${wmep} \
--drop_path ${dpr} \
--reprob ${reprob} \
--mixup ${mixup} \
--cutmix ${cutmix} \
--batch_size ${batch_size} \
--update_freq ${update_freq} \
--crop_pct ${crop_pct} \
--zero_stage ${zero_stage} \
--partial_freeze ${partial_freeze} \
--smoothing ${smoothing} \
--weight_decay ${weight_decay} \
--scale ${scale_low} 1.0 \
--aa ${aa} \
--enable_deepspeed
Fine-tune MIM pre-trained eva02_L_pt_m38m_p14
on IN-21K using 8 nodes x 8 gpus per node (click to expand).
MODEL_NAME=eva02_large_patch14_xattn_fusedLN_NaiveSwiGLU_subln_RoPE
PRETRAIN_CKPT=/path/to/eva02_L_pt_m38m_p14.pt
OUTPUT_DIR=/path/to/output/{MODEL_NAME}
DATA_PATH=/path/to/IN-21K
sz=448
batch_size=16 # 16(bsz_per_gpu)*8(#gpus_per_node)*8(#nodes)*2(update_freq)=2048(total_bsz)
update_freq=2
lr=3e-4
lrd=0.75
warmup_lr=0.0
min_lr=0.0
weight_decay=0.05
partial_freeze=0
ep=30
wmep=1
dpr=0.15
reprob=0.0
mixup=0.0
cutmix=0.0
smoothing=0.1
zero_stage=1
scale_low=0.2
crop_pct=1.0
aa=rand-m9-mstd0.5-inc1
python -m torch.distributed.launch --nproc_per_node=8 --nnodes=${WORLD_SIZE} --node_rank=${RANK} \
--master_addr=${MASTER_ADDR} --master_port=23333 --use_env run_class_finetuning.py \
--data_path ${DATA_PATH} \
--disable_eval_during_finetuning \
--nb_classes 21841 \
--data_set image_folder \
--output_dir ${OUTPUT_DIR} \
--log_dir ${OUTPUT_DIR}/tb_log \
--model ${MODEL_NAME} \
--finetune ${PRETRAIN_CKPT} \
--input_size ${sz} \
--lr ${lr} \
--warmup_lr ${warmup_lr} \
--min_lr ${min_lr} \
--layer_decay ${lrd} \
--epochs ${ep} \
--warmup_epochs ${wmep} \
--drop_path ${dpr} \
--reprob ${reprob} \
--mixup ${mixup} \
--cutmix ${cutmix} \
--batch_size ${batch_size} \
--update_freq ${update_freq} \
--crop_pct ${crop_pct} \
--zero_stage ${zero_stage} \
--partial_freeze ${partial_freeze} \
--smoothing ${smoothing} \
--weight_decay ${weight_decay} \
--scale ${scale_low} 1.0 \
--aa ${aa} \
--enable_deepspeed
Fine-tune IN-21K-tuned eva02_B_pt_in21k_medft_in21k_p14
on IN-1K using 1 nodes x 8 gpus per node (click to expand).
MODEL_NAME=eva02_base_patch14_xattn_fusedLN_NaiveSwiGLU_subln_RoPE
PRETRAIN_CKPT=/path/to/eva02_B_pt_in21k_medft_in21k_p14.pt
OUTPUT_DIR=/path/to/output/{MODEL_NAME}
DATA_PATH=/path/to/IN-1K
sz=448
batch_size=64 # 64(bsz_per_gpu)*8(#gpus_per_node)*1(#nodes)*1(update_freq)=512(total_bsz)
update_freq=1
lr=5e-5
lrd=0.8
warmup_lr=0.0
min_lr=0.0
weight_decay=0.05
partial_freeze=0
ep=15
wmep=2
dpr=0.15
reprob=0.0
mixup=0.0
cutmix=0.0
smoothing=0.2
zero_stage=1
scale_low=0.08
crop_pct=1.0
aa=rand-m9-mstd0.5-inc1
python -m torch.distributed.launch --nproc_per_node=8 --nnodes=${WORLD_SIZE} --node_rank=${RANK} \
--master_addr=${MASTER_ADDR} --master_port=12345 --use_env run_class_finetuning.py \
--data_path ${DATA_PATH}/train \
--eval_data_path ${DATA_PATH}/val \
--nb_classes 1000 \
--data_set image_folder \
--output_dir ${OUTPUT_DIR} \
--log_dir ${OUTPUT_DIR}/tb_log \
--model ${MODEL_NAME} \
--finetune ${PRETRAIN_CKPT} \
--input_size ${sz} \
--scale ${scale_low} 1.0 \
--lr ${lr} \
--warmup_lr ${warmup_lr} \
--min_lr ${min_lr} \
--layer_decay ${lrd} \
--epochs ${ep} \
--warmup_epochs ${wmep} \
--drop_path ${dpr} \
--reprob ${reprob} \
--mixup ${mixup} \
--cutmix ${cutmix} \
--batch_size ${batch_size} \
--update_freq ${update_freq} \
--crop_pct ${crop_pct} \
--zero_stage ${zero_stage} \
--partial_freeze ${partial_freeze} \
--smoothing ${smoothing} \
--weight_decay ${weight_decay} \
--aa ${aa} \
--dist_eval \
--model_ema \
--model_ema_eval \
--enable_deepspeed
Fine-tune IN-21K-tuned eva02_L_pt_in21k_medft_in21k_p14
on IN-1K using 4 nodes x 8 gpus per node (click to expand).
MODEL_NAME=eva02_large_patch14_xattn_fusedLN_NaiveSwiGLU_subln_RoPE
PRETRAIN_CKPT=/path/to/eva02_L_pt_in21k_medft_in21k_p14.pt
OUTPUT_DIR=/path/to/output/{MODEL_NAME}
DATA_PATH=/path/to/IN-1K
sz=448
batch_size=16 # 16(bsz_per_gpu)*8(#gpus_per_node)*4(#nodes)*1(update_freq)=512(total_bsz)
update_freq=1
lr=2e-5
lrd=0.85
warmup_lr=0.0
min_lr=0.0
weight_decay=0.05
partial_freeze=0
ep=20
wmep=2
dpr=0.15
reprob=0.0
mixup=0.0
cutmix=0.0
smoothing=0.2
zero_stage=1
scale_low=0.08
crop_pct=1.0
aa=rand-m9-mstd0.5-inc1
python -m torch.distributed.launch --nproc_per_node=8 --nnodes=${WORLD_SIZE} --node_rank=${RANK} \
--master_addr=${MASTER_ADDR} --master_port=12345 --use_env run_class_finetuning.py \
--data_path ${DATA_PATH}/train \
--eval_data_path ${DATA_PATH}/val \
--nb_classes 1000 \
--data_set image_folder \
--output_dir ${OUTPUT_DIR} \
--log_dir ${OUTPUT_DIR}/tb_log \
--model ${MODEL_NAME} \
--finetune ${PRETRAIN_CKPT} \
--input_size ${sz} \
--scale ${scale_low} 1.0 \
--lr ${lr} \
--warmup_lr ${warmup_lr} \
--min_lr ${min_lr} \
--layer_decay ${lrd} \
--epochs ${ep} \
--warmup_epochs ${wmep} \
--drop_path ${dpr} \
--reprob ${reprob} \
--mixup ${mixup} \
--cutmix ${cutmix} \
--batch_size ${batch_size} \
--update_freq ${update_freq} \
--crop_pct ${crop_pct} \
--zero_stage ${zero_stage} \
--partial_freeze ${partial_freeze} \
--smoothing ${smoothing} \
--weight_decay ${weight_decay} \
--aa ${aa} \
--dist_eval \
--model_ema \
--model_ema_eval \
--enable_deepspeed
Fine-tune IN-21K-tuned eva02_L_pt_m38m_medft_in21k_p14
on IN-1K using 4 nodes x 8 gpus per node (click to expand).
MODEL_NAME=eva02_large_patch14_xattn_fusedLN_NaiveSwiGLU_subln_RoPE
PRETRAIN_CKPT=/path/to/eva02_L_pt_m38m_medft_in21k_p14.pt
OUTPUT_DIR=/path/to/output/{MODEL_NAME}
DATA_PATH=/path/to/IN-1K
sz=448
batch_size=16 # 16(bsz_per_gpu)*8(#gpus_per_node)*4(#nodes)*1(update_freq)=512(total_bsz)
update_freq=1
lr=2e-5
lrd=0.85
warmup_lr=0.0
min_lr=0.0
weight_decay=0.05
partial_freeze=0
ep=20
wmep=2
dpr=0.15
reprob=0.0
mixup=0.0
cutmix=0.0
smoothing=0.2
zero_stage=1
scale_low=0.08
crop_pct=1.0
aa=rand-m9-mstd0.5-inc1
python -m torch.distributed.launch --nproc_per_node=8 --nnodes=${WORLD_SIZE} --node_rank=${RANK} \
--master_addr=${MASTER_ADDR} --master_port=12345 --use_env run_class_finetuning.py \
--data_path ${DATA_PATH}/train \
--eval_data_path ${DATA_PATH}/val \
--nb_classes 1000 \
--data_set image_folder \
--output_dir ${OUTPUT_DIR} \
--log_dir ${OUTPUT_DIR}/tb_log \
--model ${MODEL_NAME} \
--finetune ${PRETRAIN_CKPT} \
--input_size ${sz} \
--scale ${scale_low} 1.0 \
--lr ${lr} \
--warmup_lr ${warmup_lr} \
--min_lr ${min_lr} \
--layer_decay ${lrd} \
--epochs ${ep} \
--warmup_epochs ${wmep} \
--drop_path ${dpr} \
--reprob ${reprob} \
--mixup ${mixup} \
--cutmix ${cutmix} \
--batch_size ${batch_size} \
--update_freq ${update_freq} \
--crop_pct ${crop_pct} \
--zero_stage ${zero_stage} \
--partial_freeze ${partial_freeze} \
--smoothing ${smoothing} \
--weight_decay ${weight_decay} \
--aa ${aa} \
--dist_eval \
--model_ema \
--model_ema_eval \
--enable_deepspeed
EVA-02 is built using the awesome EVA-01, BEiT, BEiTv2, CLIP, MAE, timm, DeepSpeed, Apex, xFormer, and rotary-embedding-torch.