forked from open-mmlab/mmpretrain
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Feature]: Add MFF (open-mmlab#1725)
* [Feature]: Add MFF * [Feature]: Add mff linear prob * [Feature]: Add ft * [Fix]: Update docstring * [Feature]: Update out_indices * [Feature]: Add prefix to ft * [Feature]: Add README * [Feature]: Update readme * [Feature]: Update README * [Feature]: Add metafile * [Feature]: Update README * [Fix]: Fix lint * [Feature]: Add UT * [Feature]: Update paper link
- Loading branch information
1 parent
2fb52ee
commit fa53174
Showing
16 changed files
with
721 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
# MFF | ||
|
||
> [Improving Pixel-based MIM by Reducing Wasted Modeling Capability](https://arxiv.org/abs/2308.00261) | ||
<!-- [ALGORITHM] --> | ||
|
||
## Abstract | ||
|
||
There has been significant progress in Masked Image Modeling (MIM). Existing MIM methods can be broadly categorized into two groups based on the reconstruction target: pixel-based and tokenizer-based approaches. The former offers a simpler pipeline and lower computational cost, but it is known to be biased toward high-frequency details. In this paper, we provide a set of empirical studies to confirm this limitation of pixel-based MIM and propose a new method that explicitly utilizes low-level features from shallow layers to aid pixel reconstruction. By incorporating this design into our base method, MAE, we reduce the wasted modeling capability of pixel-based MIM, improving its convergence and achieving non-trivial improvements across various downstream tasks. To the best of our knowledge, we are the first to systematically investigate multi-level feature fusion for isotropic architectures like the standard Vision Transformer (ViT). Notably, when applied to a smaller model (e.g., ViT-S), our method yields significant performance gains, such as 1.2% on fine-tuning, 2.8% on linear probing, and 2.6% on semantic segmentation. | ||
|
||
<div align=center> | ||
<img src="https://user-images.githubusercontent.com/30762564/257412932-5f36b11b-ee64-4ce7-b7d1-a31000302bd8.png" width="80%"/> | ||
</div> | ||
|
||
**Train/Test Command** | ||
|
||
Prepare your dataset according to the [docs](https://mmpretrain.readthedocs.io/en/latest/user_guides/dataset_prepare.html#prepare-dataset). | ||
|
||
Train: | ||
|
||
```shell | ||
python tools/train.py configs/mff/mff_vit-base-p16_8xb512-amp-coslr-300e_in1k.py | ||
``` | ||
|
||
Test: | ||
|
||
```shell | ||
python tools/test.py configs/mff/benchmarks/vit-base-p16_8xb128-coslr-100e_in1k.py None | ||
``` | ||
|
||
<!-- [TABS-END] --> | ||
|
||
## Models and results | ||
|
||
### Pretrained models | ||
|
||
| Model | Params (M) | Flops (G) | Config | Download | | ||
| :-------------------------------------------- | :--------: | :-------: | :------------------------------------------------------: | :------------------------------------------------------------------------------: | | ||
| `mff_vit-base-p16_8xb512-amp-coslr-300e_in1k` | - | - | [config](mff_vit-base-p16_8xb512-amp-coslr-300e_in1k.py) | [model](https://download.openmmlab.com/mmpretrain/v1.0/mff/mff_vit-base-p16_8xb512-amp-coslr-300e_in1k/mff_vit-base-p16_8xb512-amp-coslr-300e_in1k_20230801-3c1bcce4.pth) \| [log](https://download.openmmlab.com/mmpretrain/v1.0/mff/mff_vit-base-p16_8xb512-amp-coslr-300e_in1k/mff_vit-base-p16_8xb512-amp-coslr-300e_in1k_20230801-3c1bcce4.json) | | ||
| `mff_vit-base-p16_8xb512-amp-coslr-800e_in1k` | - | - | [config](mff_vit-base-p16_8xb512-amp-coslr-300e_in1k.py) | [model](https://download.openmmlab.com/mmpretrain/v1.0/mff/mff_vit-base-p16_8xb512-amp-coslr-800e_in1k/mff_vit-base-p16_8xb512-amp-coslr-800e_in1k_20230801-3af7cd9d.pth) \| [log](https://download.openmmlab.com/mmpretrain/v1.0/mff/mff_vit-base-p16_8xb512-amp-coslr-800e_in1k/mff_vit-base-p16_8xb512-amp-coslr-800e_in1k_20230801-3af7cd9d.json) | | ||
|
||
### Image Classification on ImageNet-1k | ||
|
||
| Model | Pretrain | Params (M) | Flops (G) | Top-1 (%) | Config | Download | | ||
| :---------------------------------------- | :------------------------------------------: | :--------: | :-------: | :-------: | :----------------------------------------: | :-------------------------------------------: | | ||
| `vit-base-p16_mff-300e-pre_8xb128-coslr-100e_in1k` | [MFF 300-Epochs](https://download.openmmlab.com/mmpretrain/v1.0/mff/mff_vit-base-p16_8xb512-amp-coslr-300e_in1k/mff_vit-base-p16_8xb512-amp-coslr-300e_in1k_20230801-3c1bcce4.pth) | 86.57 | 17.58 | 83.00 | [config](benchmarks/vit-base-p16_8xb128-coslr-100e_in1k.py) | [model](https://download.openmmlab.com/mmpretrain/v1.0/mff/mff_vit-base-p16_8xb512-amp-coslr-300e_in1k/vit-base-p16_8xb128-coslr-100e_in1k/vit-base-p16_8xb128-coslr-100e_in1k_20230802-d746fdb7.pth) / [log](https://download.openmmlab.com/mmpretrain/v1.0/mff/mff_vit-base-p16_8xb512-amp-coslr-300e_in1k/vit-base-p16_8xb128-coslr-100e_in1k/vit-base-p16_8xb128-coslr-100e_in1k_20230802-d746fdb7.json) | | ||
| `vit-base-p16_mff-800e-pre_8xb128-coslr-100e_in1k` | [MFF 800-Epochs](https://download.openmmlab.com/mmpretrain/v1.0/mff/mff_vit-base-p16_8xb512-amp-coslr-800e_in1k/mff_vit-base-p16_8xb512-amp-coslr-800e_in1k_20230801-3af7cd9d.pth) | 86.57 | 17.58 | 83.70 | [config](benchmarks/vit-base-p16_8xb128-coslr-100e_in1k.py) | [model](https://download.openmmlab.com/mmpretrain/v1.0/mff/mff_vit-base-p16_8xb512-amp-coslr-800e_in1k/vit-base-p16_8xb128-coslr-100e/vit-base-p16_8xb128-coslr-100e_20230802-6780e47d.pth) / [log](https://download.openmmlab.com/mmpretrain/v1.0/mff/mff_vit-base-p16_8xb512-amp-coslr-800e_in1k/vit-base-p16_8xb128-coslr-100e/vit-base-p16_8xb128-coslr-100e_20230802-6780e47d.json) | | ||
| `vit-base-p16_mff-300e-pre_8xb2048-linear-coslr-90e_in1k` | [MFF 300-Epochs](https://download.openmmlab.com/mmpretrain/v1.0/mff/mff_vit-base-p16_8xb512-amp-coslr-300e_in1k/mff_vit-base-p16_8xb512-amp-coslr-300e_in1k_20230801-3c1bcce4.pth) | 304.33 | 61.60 | 64.20 | [config](benchmarks/vit-base-p16_8xb128-coslr-100e_in1k.py) | [log](https://download.openmmlab.com/mmpretrain/v1.0/mff/mff_vit-base-p16_8xb512-amp-coslr-300e_in1k/vit-base-p16_8xb2048-linear-coslr-90e_in1k/vit-base-p16_8xb2048-linear-coslr-90e_in1k.json) | | ||
| `vit-base-p16_mff-800e-pre_8xb2048-linear-coslr-90e_in1k` | [MFF 800-Epochs](https://download.openmmlab.com/mmselfsup/1.x/mae/mae_vit-base-p16_8xb512-fp16-coslr-1600e_in1k/mae_vit-base-p16_8xb512-fp16-coslr-1600e_in1k_20220825-f7569ca2.pth) | 304.33 | 61.60 | 68.30 | [config](benchmarks/vit-base-p16_8xb128-coslr-100e_in1k.py) | [model](https://download.openmmlab.com/mmpretrain/v1.0/mff/mff_vit-base-p16_8xb512-amp-coslr-800e_in1k/vit-base-p16_8xb2048-linear-coslr-90e/vit-base-p16_8xb2048-linear-coslr-90e_20230802-6b1f7bc8.pth) / [log](https://download.openmmlab.com/mmpretrain/v1.0/mff/mff_vit-base-p16_8xb512-amp-coslr-800e_in1k/vit-base-p16_8xb2048-linear-coslr-90e/vit-base-p16_8xb2048-linear-coslr-90e_20230802-6b1f7bc8.json) | | ||
|
||
## Citation | ||
|
||
```bibtex | ||
@article{MFF, | ||
title={Improving Pixel-based MIM by Reducing Wasted Modeling Capability}, | ||
author={Yuan Liu, Songyang Zhang, Jiacheng Chen, Zhaohui Yu, Kai Chen, Dahua Lin}, | ||
journal={arXiv}, | ||
year={2023} | ||
} | ||
``` |
114 changes: 114 additions & 0 deletions
114
configs/mff/benchmarks/vit-base-p16_8xb128-coslr-100e_in1k.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,114 @@ | ||
_base_ = [ | ||
'../../_base_/datasets/imagenet_bs64_swin_224.py', | ||
'../../_base_/schedules/imagenet_bs1024_adamw_swin.py', | ||
'../../_base_/default_runtime.py' | ||
] | ||
|
||
train_pipeline = [ | ||
dict(type='LoadImageFromFile'), | ||
dict( | ||
type='RandomResizedCrop', | ||
scale=224, | ||
backend='pillow', | ||
interpolation='bicubic'), | ||
dict(type='RandomFlip', prob=0.5, direction='horizontal'), | ||
dict( | ||
type='RandAugment', | ||
policies='timm_increasing', | ||
num_policies=2, | ||
total_level=10, | ||
magnitude_level=9, | ||
magnitude_std=0.5, | ||
hparams=dict(pad_val=[104, 116, 124], interpolation='bicubic')), | ||
dict( | ||
type='RandomErasing', | ||
erase_prob=0.25, | ||
mode='rand', | ||
min_area_ratio=0.02, | ||
max_area_ratio=0.3333333333333333, | ||
fill_color=[103.53, 116.28, 123.675], | ||
fill_std=[57.375, 57.12, 58.395]), | ||
dict(type='PackInputs') | ||
] | ||
test_pipeline = [ | ||
dict(type='LoadImageFromFile'), | ||
dict( | ||
type='ResizeEdge', | ||
scale=256, | ||
edge='short', | ||
backend='pillow', | ||
interpolation='bicubic'), | ||
dict(type='CenterCrop', crop_size=224), | ||
dict(type='PackInputs') | ||
] | ||
|
||
train_dataloader = dict(batch_size=128, dataset=dict(pipeline=train_pipeline)) | ||
val_dataloader = dict(batch_size=128, dataset=dict(pipeline=test_pipeline)) | ||
test_dataloader = val_dataloader | ||
|
||
# model settings | ||
model = dict( | ||
type='ImageClassifier', | ||
backbone=dict( | ||
type='VisionTransformer', | ||
arch='base', | ||
img_size=224, | ||
patch_size=16, | ||
drop_path_rate=0.1, | ||
out_type='avg_featmap', | ||
final_norm=False, | ||
init_cfg=dict(type='Pretrained', checkpoint='', prefix='backbone.')), | ||
neck=None, | ||
head=dict( | ||
type='LinearClsHead', | ||
num_classes=1000, | ||
in_channels=768, | ||
loss=dict( | ||
type='LabelSmoothLoss', label_smooth_val=0.1, mode='original'), | ||
init_cfg=[dict(type='TruncNormal', layer='Linear', std=2e-5)]), | ||
train_cfg=dict(augments=[ | ||
dict(type='Mixup', alpha=0.8), | ||
dict(type='CutMix', alpha=1.0) | ||
])) | ||
|
||
# optimizer wrapper | ||
optim_wrapper = dict( | ||
optimizer=dict( | ||
type='AdamW', lr=2e-3, weight_decay=0.05, betas=(0.9, 0.999)), | ||
constructor='LearningRateDecayOptimWrapperConstructor', | ||
paramwise_cfg=dict( | ||
layer_decay_rate=0.65, | ||
custom_keys={ | ||
'.ln': dict(decay_mult=0.0), | ||
'.bias': dict(decay_mult=0.0), | ||
'.cls_token': dict(decay_mult=0.0), | ||
'.pos_embed': dict(decay_mult=0.0) | ||
})) | ||
|
||
# learning rate scheduler | ||
param_scheduler = [ | ||
dict( | ||
type='LinearLR', | ||
start_factor=1e-4, | ||
by_epoch=True, | ||
begin=0, | ||
end=5, | ||
convert_to_iter_based=True), | ||
dict( | ||
type='CosineAnnealingLR', | ||
T_max=95, | ||
by_epoch=True, | ||
begin=5, | ||
end=100, | ||
eta_min=1e-6, | ||
convert_to_iter_based=True) | ||
] | ||
|
||
# runtime settings | ||
default_hooks = dict( | ||
# save checkpoint per epoch. | ||
checkpoint=dict(type='CheckpointHook', interval=1, max_keep_ckpts=3)) | ||
|
||
train_cfg = dict(by_epoch=True, max_epochs=100) | ||
|
||
randomness = dict(seed=0, diff_rank_seed=True) |
74 changes: 74 additions & 0 deletions
74
configs/mff/benchmarks/vit-base-p16_8xb2048-linear-coslr-90e_in1k.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
_base_ = [ | ||
'../../_base_/datasets/imagenet_bs32_pil_resize.py', | ||
'../../_base_/schedules/imagenet_bs1024_adamw_swin.py', | ||
'../../_base_/default_runtime.py' | ||
] | ||
|
||
train_pipeline = [ | ||
dict(type='LoadImageFromFile'), | ||
dict(type='ToPIL', to_rgb=True), | ||
dict(type='MAERandomResizedCrop', size=224, interpolation=3), | ||
dict(type='torchvision/RandomHorizontalFlip', p=0.5), | ||
dict(type='ToNumpy', to_bgr=True), | ||
dict(type='PackInputs'), | ||
] | ||
|
||
# dataset settings | ||
train_dataloader = dict( | ||
batch_size=2048, drop_last=True, dataset=dict(pipeline=train_pipeline)) | ||
val_dataloader = dict(drop_last=False) | ||
test_dataloader = dict(drop_last=False) | ||
|
||
# model settings | ||
model = dict( | ||
type='ImageClassifier', | ||
backbone=dict( | ||
type='VisionTransformer', | ||
arch='base', | ||
img_size=224, | ||
patch_size=16, | ||
frozen_stages=12, | ||
out_type='cls_token', | ||
final_norm=True, | ||
init_cfg=dict(type='Pretrained', prefix='backbone.')), | ||
neck=dict(type='ClsBatchNormNeck', input_features=768), | ||
head=dict( | ||
type='VisionTransformerClsHead', | ||
num_classes=1000, | ||
in_channels=768, | ||
loss=dict(type='CrossEntropyLoss'), | ||
init_cfg=[dict(type='TruncNormal', layer='Linear', std=0.01)])) | ||
|
||
# optimizer | ||
optim_wrapper = dict( | ||
_delete_=True, | ||
type='AmpOptimWrapper', | ||
optimizer=dict(type='LARS', lr=6.4, weight_decay=0.0, momentum=0.9)) | ||
|
||
# learning rate scheduler | ||
param_scheduler = [ | ||
dict( | ||
type='LinearLR', | ||
start_factor=1e-4, | ||
by_epoch=True, | ||
begin=0, | ||
end=10, | ||
convert_to_iter_based=True), | ||
dict( | ||
type='CosineAnnealingLR', | ||
T_max=80, | ||
by_epoch=True, | ||
begin=10, | ||
end=90, | ||
eta_min=0.0, | ||
convert_to_iter_based=True) | ||
] | ||
|
||
# runtime settings | ||
train_cfg = dict(by_epoch=True, max_epochs=90) | ||
|
||
default_hooks = dict( | ||
checkpoint=dict(type='CheckpointHook', interval=1, max_keep_ckpts=1), | ||
logger=dict(type='LoggerHook', interval=10)) | ||
|
||
randomness = dict(seed=0, diff_rank_seed=True) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,103 @@ | ||
Collections: | ||
- Name: MFF | ||
Metadata: | ||
Training Data: ImageNet-1k | ||
Training Techniques: | ||
- AdamW | ||
Training Resources: 8x A100-80G GPUs | ||
Architecture: | ||
- ViT | ||
Paper: | ||
Title: Improving Pixel-based MIM by Reducing Wasted Modeling Capability | ||
URL: https://arxiv.org/pdf/2308.00261.pdf | ||
README: configs/mff/README.md | ||
|
||
Models: | ||
- Name: mff_vit-base-p16_8xb512-amp-coslr-300e_in1k | ||
Metadata: | ||
Epochs: 300 | ||
Batch Size: 2048 | ||
FLOPs: 17581972224 | ||
Parameters: 85882692 | ||
Training Data: ImageNet-1k | ||
In Collection: MaskFeat | ||
Results: null | ||
Weights: https://download.openmmlab.com/mmpretrain/v1.0/mff/mff_vit-base-p16_8xb512-amp-coslr-300e_in1k/mff_vit-base-p16_8xb512-amp-coslr-300e_in1k_20230801-3c1bcce4.pth | ||
Config: configs/mff/mff_vit-base-p16_8xb512-amp-coslr-300e_in1k.py | ||
Downstream: | ||
- vit-base-p16_mff-300e-pre_8xb128-coslr-100e_in1k | ||
- vit-base-p16_mff-300e-pre_8xb2048-linear-coslr-90e_in1k | ||
- Name: mff_vit-base-p16_8xb512-amp-coslr-800e_in1k | ||
Metadata: | ||
Epochs: 800 | ||
Batch Size: 2048 | ||
FLOPs: 17581972224 | ||
Parameters: 85882692 | ||
Training Data: ImageNet-1k | ||
In Collection: MaskFeat | ||
Results: null | ||
Weights: https://download.openmmlab.com/mmpretrain/v1.0/mff/mff_vit-base-p16_8xb512-amp-coslr-800e_in1k/mff_vit-base-p16_8xb512-amp-coslr-800e_in1k_20230801-3af7cd9d.pth | ||
Config: configs/mff/mff_vit-base-p16_8xb512-amp-coslr-800e_in1k.py | ||
Downstream: | ||
- vit-base-p16_mff-800e-pre_8xb128-coslr-100e_in1k | ||
- vit-base-p16_mff-800e-pre_8xb2048-linear-coslr-90e_in1k | ||
- Name: vit-base-p16_mff-300e-pre_8xb128-coslr-100e_in1k | ||
Metadata: | ||
Epochs: 100 | ||
Batch Size: 1024 | ||
FLOPs: 17581215744 | ||
Parameters: 86566120 | ||
Training Data: ImageNet-1k | ||
In Collection: MaskFeat | ||
Results: | ||
- Task: Image Classification | ||
Dataset: ImageNet-1k | ||
Metrics: | ||
Top 1 Accuracy: 83.0 | ||
Weights: https://download.openmmlab.com/mmpretrain/v1.0/mff/mff_vit-base-p16_8xb512-amp-coslr-300e_in1k/vit-base-p16_8xb128-coslr-100e_in1k/vit-base-p16_8xb128-coslr-100e_in1k_20230802-d746fdb7.pth | ||
Config: configs/mff/benchmarks/vit-base-p16_8xb128-coslr-100e_in1k.py | ||
- Name: vit-base-p16_mff-800e-pre_8xb128-coslr-100e_in1k | ||
Metadata: | ||
Epochs: 100 | ||
Batch Size: 1024 | ||
FLOPs: 17581215744 | ||
Parameters: 86566120 | ||
Training Data: ImageNet-1k | ||
In Collection: MFF | ||
Results: | ||
- Task: Image Classification | ||
Dataset: ImageNet-1k | ||
Metrics: | ||
Top 1 Accuracy: 83.7 | ||
Weights: https://download.openmmlab.com/mmpretrain/v1.0/mff/mff_vit-base-p16_8xb512-amp-coslr-800e_in1k/vit-base-p16_8xb128-coslr-100e/vit-base-p16_8xb128-coslr-100e_20230802-6780e47d.pth | ||
Config: configs/mff/benchmarks/vit-base-p16_8xb128-coslr-100e_in1k.py | ||
- Name: vit-base-p16_mff-300e-pre_8xb2048-linear-coslr-90e_in1k | ||
Metadata: | ||
Epochs: 90 | ||
Batch Size: 16384 | ||
FLOPs: 17581215744 | ||
Parameters: 86566120 | ||
Training Data: ImageNet-1k | ||
In Collection: MFF | ||
Results: | ||
- Task: Image Classification | ||
Dataset: ImageNet-1k | ||
Metrics: | ||
Top 1 Accuracy: 64.2 | ||
Weights: | ||
Config: configs/mff/benchmarks/vit-base-p16_8xb2048-linear-coslr-90e_in1k.py | ||
- Name: vit-base-p16_mff-800e-pre_8xb2048-linear-coslr-90e_in1k | ||
Metadata: | ||
Epochs: 90 | ||
Batch Size: 16384 | ||
FLOPs: 17581215744 | ||
Parameters: 86566120 | ||
Training Data: ImageNet-1k | ||
In Collection: MFF | ||
Results: | ||
- Task: Image Classification | ||
Dataset: ImageNet-1k | ||
Metrics: | ||
Top 1 Accuracy: 68.3 | ||
Weights: https://download.openmmlab.com/mmpretrain/v1.0/mff/mff_vit-base-p16_8xb512-amp-coslr-300e_in1k/vit-base-p16_8xb128-coslr-100e_in1k/vit-base-p16_8xb128-coslr-100e_in1k_20230802-d746fdb7.pth | ||
Config: configs/mff/benchmarks/vit-base-p16_8xb2048-linear-coslr-90e_in1k.py |
24 changes: 24 additions & 0 deletions
24
configs/mff/mff_vit-base-p16_8xb512-amp-coslr-300e_in1k.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
_base_ = '../mae/mae_vit-base-p16_8xb512-amp-coslr-300e_in1k.py' | ||
|
||
randomness = dict(seed=2, diff_rank_seed=True) | ||
|
||
# dataset config | ||
train_pipeline = [ | ||
dict(type='LoadImageFromFile'), | ||
dict(type='ToPIL', to_rgb=True), | ||
dict(type='torchvision/Resize', size=224), | ||
dict( | ||
type='torchvision/RandomCrop', | ||
size=224, | ||
padding=4, | ||
padding_mode='reflect'), | ||
dict(type='torchvision/RandomHorizontalFlip', p=0.5), | ||
dict(type='ToNumpy', to_bgr=True), | ||
dict(type='PackInputs') | ||
] | ||
|
||
train_dataloader = dict(dataset=dict(pipeline=train_pipeline)) | ||
|
||
# model config | ||
model = dict( | ||
type='MFF', backbone=dict(type='MFFViT', out_indices=[0, 2, 4, 6, 8, 11])) |
Oops, something went wrong.