Skip to content

Commit

Permalink
[Feature] Support multiple multi-modal algorithms and inferencers. (o…
Browse files Browse the repository at this point in the history
…pen-mmlab#1561)

* [Feat] Migrate blip caption to mmpretrain. (open-mmlab#50)

* Migrate blip caption to mmpretrain

* minor fix

* support train

* [Feature] Support OFA caption task. (open-mmlab#51)

* [Feature] Support OFA caption task.

* Remove duplicated files.

* [Feature] Support OFA vqa task. (open-mmlab#58)

* [Feature] Support OFA vqa task.

* Fix lint.

* [Feat] Add BLIP retrieval to mmpretrain. (open-mmlab#55)

* init

* minor fix for train

* fix according to comments

* refactor

* Update Blip retrieval. (open-mmlab#62)

* [Feature] Support OFA visual grounding task. (open-mmlab#59)

* [Feature] Support OFA visual grounding task.

* minor add TODO

---------

Co-authored-by: yingfhu <[email protected]>

* [Feat] Add flamingos coco caption and vqa. (open-mmlab#60)

* first init

* init flamingo coco

* add vqa

* minor fix

* remove unnecessary modules

* Update config

* Use `ApplyToList`.

---------

Co-authored-by: mzr1996 <[email protected]>

* [Feature]: BLIP2 coco retrieval  (open-mmlab#53)

* [Feature]: Add blip2 retriever

* [Feature]: Add blip2 all modules

* [Feature]: Refine model

* [Feature]: x1

* [Feature]: Runnable coco ret

* [Feature]: Runnable version

* [Feature]: Fix lint

* [Fix]: Fix lint

* [Feature]: Use 364 img size

* [Feature]: Refactor blip2

* [Fix]: Fix lint

* refactor files

* minor fix

* minor fix

---------

Co-authored-by: yingfhu <[email protected]>

* Remove

* fix blip caption inputs (open-mmlab#68)

* [Feat] Add BLIP NLVR support. (open-mmlab#67)

* first init

* init flamingo coco

* add vqa

* add nlvr

* refactor nlvr

* minor fix

* minor fix

* Update dataset

---------

Co-authored-by: mzr1996 <[email protected]>

* [Feature]: BLIP2 Caption (open-mmlab#70)

* [Feature]: Add language model

* [Feature]: blip2 caption forward

* [Feature]: Reproduce the results

* [Feature]: Refactor caption

* refine config

---------

Co-authored-by: yingfhu <[email protected]>

* [Feat] Migrate BLIP VQA to mmpretrain (open-mmlab#69)

* reformat

* change

* change

* change

* change

* change

* change

* change

* change

* change

* change

* change

* change

* change

* change

* change

* change

* change

* change

* change

* refactor code

---------

Co-authored-by: yingfhu <[email protected]>

* Update RefCOCO dataset

* [Fix] fix lint

* [Feature] Implement inference APIs for multi-modal tasks. (open-mmlab#65)

* [Feature] Implement inference APIs for multi-modal tasks.

* [Project] Add gradio demo.

* [Improve] Update requirements

* Update flamingo

* Update blip

* Add NLVR inferencer

* Update flamingo

* Update hugging face model register

* Update ofa vqa

* Update BLIP-vqa (open-mmlab#71)

* Update blip-vqa docstring (open-mmlab#72)

* Refine flamingo docstring (open-mmlab#73)

* [Feature]: BLIP2 VQA (open-mmlab#61)

* [Feature]: VQA forward

* [Feature]: Reproduce accuracy

* [Fix]: Fix lint

* [Fix]: Add blank line

* minor fix

---------

Co-authored-by: yingfhu <[email protected]>

* [Feature]: BLIP2 docstring (open-mmlab#74)

* [Feature]: Add caption docstring

* [Feature]: Add docstring to blip2 vqa

* [Feature]: Add docstring to retrieval

* Update BLIP-2 metafile and README (open-mmlab#75)

* [Feature]: Add readme and docstring

* Update blip2 results

---------

Co-authored-by: mzr1996 <[email protected]>

* [Feature] BLIP Visual Grounding on MMPretrain Branch (open-mmlab#66)

* blip grounding merge with mmpretrain

* remove commit

* blip grounding test and inference api

* refcoco dataset

* refcoco dataset refine config

* rebasing

* gitignore

* rebasing

* minor edit

* minor edit

* Update blip-vqa docstring (open-mmlab#72)

* rebasing

* Revert "minor edit"

This reverts commit 639cec757c215e654625ed0979319e60f0be9044.

* blip grounding final

* precommit

* refine config

* refine config

* Update blip visual grounding

---------

Co-authored-by: Yiqin Wang 王逸钦 <[email protected]>
Co-authored-by: mzr1996 <[email protected]>

* Update visual grounding metric

* Update OFA docstring, README and metafiles. (open-mmlab#76)

* [Docs] Update installation docs and gradio demo docs. (open-mmlab#77)

* Update OFA name

* Update Visual Grounding Visualizer

* Integrate accelerate support

* Fix imports.

* Fix timm backbone

* Update imports

* Update README

* Update circle ci

* Update flamingo config

* Add gradio demo README

* [Feature]: Add scienceqa (open-mmlab#1571)

* [Feature]: Add scienceqa

* [Feature]: Change param name

* Update docs

* Update video

---------

Co-authored-by: Hubert <[email protected]>
Co-authored-by: yingfhu <[email protected]>
Co-authored-by: Yuan Liu <[email protected]>
Co-authored-by: Yiqin Wang 王逸钦 <[email protected]>
Co-authored-by: Rongjie Li <[email protected]>
  • Loading branch information
6 people authored May 19, 2023
1 parent 770eb8e commit 6847d20
Show file tree
Hide file tree
Showing 142 changed files with 17,961 additions and 414 deletions.
2 changes: 2 additions & 0 deletions .circleci/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,8 @@ jobs:
machine:
image: ubuntu-2004-cuda-11.4:202110-01
resource_class: gpu.nvidia.small
environment:
MKL_SERVICE_FORCE_INTEL: 1
parameters:
torch:
type: string
Expand Down
125 changes: 68 additions & 57 deletions .dev_scripts/fill_metafile.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,10 @@

MMCLS_ROOT = Path(__file__).absolute().parents[1].resolve().absolute()
console = Console()
dataset_completer = FuzzyWordCompleter(
['ImageNet-1k', 'ImageNet-21k', 'CIFAR-10', 'CIFAR-100'])
dataset_completer = FuzzyWordCompleter([
'ImageNet-1k', 'ImageNet-21k', 'CIFAR-10', 'CIFAR-100', 'RefCOCO', 'VQAv2',
'COCO', 'OpenImages', 'Object365', 'CC3M', 'CC12M', 'YFCC100M', 'VG'
])


def prompt(message,
Expand Down Expand Up @@ -83,53 +85,57 @@ def parse_args():
return args


def get_flops(config_path):
def get_flops_params(config_path):
import numpy as np
import torch
from fvcore.nn import FlopCountAnalysis, parameter_count
from mmengine.config import Config
from mmengine.analysis import FlopAnalyzer, parameter_count
from mmengine.dataset import Compose
from mmengine.model.utils import revert_sync_batchnorm
from mmengine.registry import DefaultScope

import mmpretrain.datasets # noqa: F401
from mmpretrain.apis import init_model

cfg = Config.fromfile(config_path)

if 'test_dataloader' in cfg:
# build the data pipeline
test_dataset = cfg.test_dataloader.dataset
if test_dataset.pipeline[0]['type'] == 'LoadImageFromFile':
test_dataset.pipeline.pop(0)
if test_dataset.type in ['CIFAR10', 'CIFAR100']:
# The image shape of CIFAR is (32, 32, 3)
test_dataset.pipeline.insert(1, dict(type='Resize', scale=32))

with DefaultScope.overwrite_default_scope('mmpretrain'):
data = Compose(test_dataset.pipeline)({
'img':
np.random.randint(0, 256, (224, 224, 3), dtype=np.uint8)
})
resolution = tuple(data['inputs'].shape[-2:])
else:
# For configs only for get model.
resolution = (224, 224)
from mmpretrain.apis import get_model
from mmpretrain.models.utils import no_load_hf_pretrained_model

model = init_model(cfg, device='cpu')
with no_load_hf_pretrained_model():
model = get_model(config_path, device='cpu')
model = revert_sync_batchnorm(model)
model.eval()

with torch.no_grad():
model.forward = model.extract_feat
model.to('cpu')
inputs = (torch.randn((1, 3, *resolution)), )
analyzer = FlopCountAnalysis(model, inputs)
analyzer.unsupported_ops_warnings(False)
analyzer.uncalled_modules_warnings(False)
flops = analyzer.total()
params = parameter_count(model)['']
return int(flops), int(params)
params = int(parameter_count(model)[''])

# get flops
try:
if 'test_dataloader' in model._config:
# build the data pipeline
test_dataset = model._config.test_dataloader.dataset
if test_dataset.pipeline[0]['type'] == 'LoadImageFromFile':
test_dataset.pipeline.pop(0)
if test_dataset.type in ['CIFAR10', 'CIFAR100']:
# The image shape of CIFAR is (32, 32, 3)
test_dataset.pipeline.insert(1, dict(type='Resize', scale=32))

with DefaultScope.overwrite_default_scope('mmpretrain'):
data = Compose(test_dataset.pipeline)({
'img':
np.random.randint(0, 256, (224, 224, 3), dtype=np.uint8)
})
resolution = tuple(data['inputs'].shape[-2:])
else:
# For configs only for get model.
resolution = (224, 224)

with torch.no_grad():
# Skip flops if the model doesn't have `extract_feat` method.
model.forward = model.extract_feat
model.to('cpu')
inputs = (torch.randn((1, 3, *resolution)), )
analyzer = FlopAnalyzer(model, inputs)
analyzer.unsupported_ops_warnings(False)
analyzer.uncalled_modules_warnings(False)
flops = int(analyzer.total())
except Exception:
print('Unable to calculate flops.')
flops = None
return flops, params


def fill_collection(collection: dict):
Expand Down Expand Up @@ -202,12 +208,9 @@ def fill_model_by_prompt(model: dict, defaults: dict):
params = model.get('Metadata', {}).get('Parameters')
if model.get('Config') is not None and (
MMCLS_ROOT / model['Config']).exists() and (flops is None
or params is None):
try:
print('Automatically compute FLOPs and Parameters from config.')
flops, params = get_flops(str(MMCLS_ROOT / model['Config']))
except Exception:
print('Failed to compute FLOPs and Parameters.')
and params is None):
print('Automatically compute FLOPs and Parameters from config.')
flops, params = get_flops_params(str(MMCLS_ROOT / model['Config']))

if flops is None:
flops = prompt('Please specify the [red]FLOPs[/]: ')
Expand All @@ -222,7 +225,8 @@ def fill_model_by_prompt(model: dict, defaults: dict):
model['Metadata'].setdefault('FLOPs', flops)
model['Metadata'].setdefault('Parameters', params)

if model.get('Metadata', {}).get('Training Data') is None:
if 'Training Data' not in model.get('Metadata', {}) and \
'Training Data' not in defaults.get('Metadata', {}):
training_data = prompt(
'Please input all [red]training dataset[/], '
'include pre-training (input empty to finish): ',
Expand Down Expand Up @@ -259,12 +263,11 @@ def fill_model_by_prompt(model: dict, defaults: dict):
for metric in metrics_list:
k, v = metric.split('=')[:2]
metrics[k] = round(float(v), 2)
if len(metrics) > 0:
results = [{
'Dataset': test_dataset,
'Metrics': metrics,
'Task': task
}]
results = [{
'Task': task,
'Dataset': test_dataset,
'Metrics': metrics or None,
}]
model['Results'] = results

weights = model.get('Weights')
Expand All @@ -274,7 +277,7 @@ def fill_model_by_prompt(model: dict, defaults: dict):

if model.get('Converted From') is None and model.get(
'Weights') is not None:
if Confirm.ask(
if '3rdparty' in model['Name'] or Confirm.ask(
'Is the checkpoint is converted '
'from [red]other repository[/]?',
default=False):
Expand Down Expand Up @@ -317,9 +320,9 @@ def update_model_by_dict(model: dict, update_dict: dict, defaults: dict):
# Metadata.Flops, Metadata.Parameters
flops = model.get('Metadata', {}).get('FLOPs')
params = model.get('Metadata', {}).get('Parameters')
if config_updated or (flops is None or params is None):
if config_updated or (flops is None and params is None):
print(f'Automatically compute FLOPs and Parameters of {model["Name"]}')
flops, params = get_flops(str(MMCLS_ROOT / model['Config']))
flops, params = get_flops_params(str(MMCLS_ROOT / model['Config']))

model.setdefault('Metadata', {})
model['Metadata']['FLOPs'] = flops
Expand Down Expand Up @@ -409,10 +412,15 @@ def format_model(model: dict):

def order_models(model):
order = []
# Pre-trained model
order.append(int('Downstream' not in model))
# non-3rdparty model
order.append(int('3rdparty' in model['Name']))
# smaller model
order.append(model.get('Metadata', {}).get('Parameters', 0))
# faster model
order.append(model.get('Metadata', {}).get('FLOPs', 0))
# name order
order.append(len(model['Name']))

return tuple(order)
Expand Down Expand Up @@ -442,7 +450,10 @@ def main():
collection = fill_collection(collection)
if ori_collection != collection:
console.print(format_collection(collection))
model_defaults = {'In Collection': collection['Name']}
model_defaults = {
'In Collection': collection['Name'],
'Metadata': collection.get('Metadata', {}),
}

models = content.get('Models', [])
updated_models = []
Expand Down
6 changes: 6 additions & 0 deletions .dev_scripts/generate_readme.py
Original file line number Diff line number Diff line change
Expand Up @@ -331,6 +331,12 @@ def add_models(metafile):
'Image Classification',
'Image Retrieval',
'Multi-Label Classification',
'Image Caption',
'Visual Grounding',
'Visual Question Answering',
'Image-To-Text Retrieval',
'Text-To-Image Retrieval',
'NLVR',
]

for task in tasks:
Expand Down
16 changes: 15 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,11 +70,19 @@ The `main` branch works with **PyTorch 1.8+**.
### Major features

- Various backbones and pretrained models
- Rich training strategies(supervised learning, self-supervised learning, etc.)
- Rich training strategies (supervised learning, self-supervised learning, multi-modality learning etc.)
- Bag of training tricks
- Large-scale training configs
- High efficiency and extensibility
- Powerful toolkits for model analysis and experiments
- Various out-of-box inference tasks.
- Image Classification
- Image Caption
- Visual Question Answering
- Visual Grounding
- Retrieval (Image-To-Image, Text-To-Image, Image-To-Text)

https://github.com/open-mmlab/mmpretrain/assets/26739999/e4dcd3a2-f895-4d1b-a351-fbc74a04e904

## What's new

Expand Down Expand Up @@ -117,6 +125,12 @@ mim install -e .

Please refer to [installation documentation](https://mmpretrain.readthedocs.io/en/latest/get_started.html) for more detailed installation and dataset preparation.

For multi-modality models support, please install the extra dependencies by:

```shell
mim install -e ".[multimodal]"
```

## User Guides

We provided a series of tutorials about the basic usage of MMPreTrain for new users:
Expand Down
16 changes: 15 additions & 1 deletion README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,11 +68,19 @@ MMPreTrain 是一款基于 PyTorch 的开源深度学习预训练工具箱,是
### 主要特性

- 支持多样的主干网络与预训练模型
- 支持多种训练策略(有监督学习,无监督学习等
- 支持多种训练策略(有监督学习,无监督学习,多模态学习等
- 提供多种训练技巧
- 大量的训练配置文件
- 高效率和高可扩展性
- 功能强大的工具箱,有助于模型分析和实验
- 支持多种开箱即用的推理任务
- 图像分类
- 图像描述(Image Caption)
- 视觉问答(Visual Question Answering)
- 视觉定位(Visual Grounding)
- 检索(图搜图,图搜文,文搜图)

https://github.com/open-mmlab/mmpretrain/assets/26739999/e4dcd3a2-f895-4d1b-a351-fbc74a04e904

## 更新日志

Expand Down Expand Up @@ -114,6 +122,12 @@ mim install -e .

更详细的步骤请参考 [安装指南](https://mmpretrain.readthedocs.io/zh_CN/latest/get_started.html) 进行安装。

如果需要多模态模型,请使用如下方式安装额外的依赖:

```shell
mim install -e ".[multimodal]"
```

## 基础教程

我们为新用户提供了一系列基础教程:
Expand Down
69 changes: 69 additions & 0 deletions configs/_base_/datasets/coco_caption.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# data settings

data_preprocessor = dict(
type='MultiModalDataPreprocessor',
mean=[122.770938, 116.7460125, 104.09373615],
std=[68.5005327, 66.6321579, 70.32316305],
to_rgb=True,
)

train_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='RandomResizedCrop',
scale=384,
interpolation='bicubic',
backend='pillow'),
dict(type='RandomFlip', prob=0.5, direction='horizontal'),
dict(type='CleanCaption', keys='gt_caption'),
dict(
type='PackInputs',
algorithm_keys=['gt_caption'],
meta_keys=['image_id'],
),
]

test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='Resize',
scale=(384, 384),
interpolation='bicubic',
backend='pillow'),
dict(type='PackInputs', meta_keys=['image_id']),
]

train_dataloader = dict(
batch_size=32,
num_workers=5,
dataset=dict(
type='COCOCaption',
data_root='data/coco',
ann_file='annotations/coco_karpathy_train.json',
pipeline=train_pipeline),
sampler=dict(type='DefaultSampler', shuffle=True),
persistent_workers=True,
drop_last=True,
)

val_dataloader = dict(
batch_size=16,
num_workers=5,
dataset=dict(
type='COCOCaption',
data_root='data/coco',
ann_file='annotations/coco_karpathy_val.json',
pipeline=test_pipeline,
),
sampler=dict(type='DefaultSampler', shuffle=False),
persistent_workers=True,
)

val_evaluator = dict(
type='COCOCaption',
ann_file='data/coco/annotations/coco_karpathy_val_gt.json',
)

# # If you want standard test, please manually configure the test dataset
test_dataloader = val_dataloader
test_evaluator = val_evaluator
Loading

0 comments on commit 6847d20

Please sign in to comment.