[Feature] Support multiple multi-modal algorithms and inferencers. (o…

…pen-mmlab#1561) * [Feat] Migrate blip caption to mmpretrain. (open-mmlab#50) * Migrate blip caption to mmpretrain * minor fix * support train * [Feature] Support OFA caption task. (open-mmlab#51) * [Feature] Support OFA caption task. * Remove duplicated files. * [Feature] Support OFA vqa task. (open-mmlab#58) * [Feature] Support OFA vqa task. * Fix lint. * [Feat] Add BLIP retrieval to mmpretrain. (open-mmlab#55) * init * minor fix for train * fix according to comments * refactor * Update Blip retrieval. (open-mmlab#62) * [Feature] Support OFA visual grounding task. (open-mmlab#59) * [Feature] Support OFA visual grounding task. * minor add TODO --------- Co-authored-by: yingfhu <[email protected]> * [Feat] Add flamingos coco caption and vqa. (open-mmlab#60) * first init * init flamingo coco * add vqa * minor fix * remove unnecessary modules * Update config * Use `ApplyToList`. --------- Co-authored-by: mzr1996 <[email protected]> * [Feature]: BLIP2 coco retrieval (open-mmlab#53) * [Feature]: Add blip2 retriever * [Feature]: Add blip2 all modules * [Feature]: Refine model * [Feature]: x1 * [Feature]: Runnable coco ret * [Feature]: Runnable version * [Feature]: Fix lint * [Fix]: Fix lint * [Feature]: Use 364 img size * [Feature]: Refactor blip2 * [Fix]: Fix lint * refactor files * minor fix * minor fix --------- Co-authored-by: yingfhu <[email protected]> * Remove * fix blip caption inputs (open-mmlab#68) * [Feat] Add BLIP NLVR support. (open-mmlab#67) * first init * init flamingo coco * add vqa * add nlvr * refactor nlvr * minor fix * minor fix * Update dataset --------- Co-authored-by: mzr1996 <[email protected]> * [Feature]: BLIP2 Caption (open-mmlab#70) * [Feature]: Add language model * [Feature]: blip2 caption forward * [Feature]: Reproduce the results * [Feature]: Refactor caption * refine config --------- Co-authored-by: yingfhu <[email protected]> * [Feat] Migrate BLIP VQA to mmpretrain (open-mmlab#69) * reformat * change * change * change * change * change * change * change * change * change * change * change * change * change * change * change * change * change * change * change * refactor code --------- Co-authored-by: yingfhu <[email protected]> * Update RefCOCO dataset * [Fix] fix lint * [Feature] Implement inference APIs for multi-modal tasks. (open-mmlab#65) * [Feature] Implement inference APIs for multi-modal tasks. * [Project] Add gradio demo. * [Improve] Update requirements * Update flamingo * Update blip * Add NLVR inferencer * Update flamingo * Update hugging face model register * Update ofa vqa * Update BLIP-vqa (open-mmlab#71) * Update blip-vqa docstring (open-mmlab#72) * Refine flamingo docstring (open-mmlab#73) * [Feature]: BLIP2 VQA (open-mmlab#61) * [Feature]: VQA forward * [Feature]: Reproduce accuracy * [Fix]: Fix lint * [Fix]: Add blank line * minor fix --------- Co-authored-by: yingfhu <[email protected]> * [Feature]: BLIP2 docstring (open-mmlab#74) * [Feature]: Add caption docstring * [Feature]: Add docstring to blip2 vqa * [Feature]: Add docstring to retrieval * Update BLIP-2 metafile and README (open-mmlab#75) * [Feature]: Add readme and docstring * Update blip2 results --------- Co-authored-by: mzr1996 <[email protected]> * [Feature] BLIP Visual Grounding on MMPretrain Branch (open-mmlab#66) * blip grounding merge with mmpretrain * remove commit * blip grounding test and inference api * refcoco dataset * refcoco dataset refine config * rebasing * gitignore * rebasing * minor edit * minor edit * Update blip-vqa docstring (open-mmlab#72) * rebasing * Revert "minor edit" This reverts commit 639cec757c215e654625ed0979319e60f0be9044. * blip grounding final * precommit * refine config * refine config * Update blip visual grounding --------- Co-authored-by: Yiqin Wang 王逸钦 <[email protected]> Co-authored-by: mzr1996 <[email protected]> * Update visual grounding metric * Update OFA docstring, README and metafiles. (open-mmlab#76) * [Docs] Update installation docs and gradio demo docs. (open-mmlab#77) * Update OFA name * Update Visual Grounding Visualizer * Integrate accelerate support * Fix imports. * Fix timm backbone * Update imports * Update README * Update circle ci * Update flamingo config * Add gradio demo README * [Feature]: Add scienceqa (open-mmlab#1571) * [Feature]: Add scienceqa * [Feature]: Change param name * Update docs * Update video --------- Co-authored-by: Hubert <[email protected]> Co-authored-by: yingfhu <[email protected]> Co-authored-by: Yuan Liu <[email protected]> Co-authored-by: Yiqin Wang 王逸钦 <[email protected]> Co-authored-by: Rongjie Li <[email protected]>
DE009 · May 19, 2023 · 6847d20 · 6847d20
1 parent 770eb8e
commit 6847d20
Show file tree

Hide file tree

Showing 142 changed files with 17,961 additions and 414 deletions.
diff --git a/.circleci/test.yml b/.circleci/test.yml
@@ -136,6 +136,8 @@ jobs:
     machine:
       image: ubuntu-2004-cuda-11.4:202110-01
     resource_class: gpu.nvidia.small
+    environment:
+      MKL_SERVICE_FORCE_INTEL: 1
     parameters:
       torch:
         type: string

diff --git a/.dev_scripts/fill_metafile.py b/.dev_scripts/fill_metafile.py
@@ -20,8 +20,10 @@
 
 MMCLS_ROOT = Path(__file__).absolute().parents[1].resolve().absolute()
 console = Console()
-dataset_completer = FuzzyWordCompleter(
-    ['ImageNet-1k', 'ImageNet-21k', 'CIFAR-10', 'CIFAR-100'])
+dataset_completer = FuzzyWordCompleter([
+    'ImageNet-1k', 'ImageNet-21k', 'CIFAR-10', 'CIFAR-100', 'RefCOCO', 'VQAv2',
+    'COCO', 'OpenImages', 'Object365', 'CC3M', 'CC12M', 'YFCC100M', 'VG'
+])
 
 
 def prompt(message,
@@ -83,53 +85,57 @@ def parse_args():
     return args
 
 
-def get_flops(config_path):
+def get_flops_params(config_path):
     import numpy as np
     import torch
-    from fvcore.nn import FlopCountAnalysis, parameter_count
-    from mmengine.config import Config
+    from mmengine.analysis import FlopAnalyzer, parameter_count
     from mmengine.dataset import Compose
     from mmengine.model.utils import revert_sync_batchnorm
     from mmengine.registry import DefaultScope
 
-    import mmpretrain.datasets  # noqa: F401
-    from mmpretrain.apis import init_model
-
-    cfg = Config.fromfile(config_path)
-
-    if 'test_dataloader' in cfg:
-        # build the data pipeline
-        test_dataset = cfg.test_dataloader.dataset
-        if test_dataset.pipeline[0]['type'] == 'LoadImageFromFile':
-            test_dataset.pipeline.pop(0)
-        if test_dataset.type in ['CIFAR10', 'CIFAR100']:
-            # The image shape of CIFAR is (32, 32, 3)
-            test_dataset.pipeline.insert(1, dict(type='Resize', scale=32))
-
-        with DefaultScope.overwrite_default_scope('mmpretrain'):
-            data = Compose(test_dataset.pipeline)({
-                'img':
-                np.random.randint(0, 256, (224, 224, 3), dtype=np.uint8)
-            })
-        resolution = tuple(data['inputs'].shape[-2:])
-    else:
-        # For configs only for get model.
-        resolution = (224, 224)
+    from mmpretrain.apis import get_model
+    from mmpretrain.models.utils import no_load_hf_pretrained_model
 
-    model = init_model(cfg, device='cpu')
+    with no_load_hf_pretrained_model():
+        model = get_model(config_path, device='cpu')
     model = revert_sync_batchnorm(model)
     model.eval()
-
-    with torch.no_grad():
-        model.forward = model.extract_feat
-        model.to('cpu')
-        inputs = (torch.randn((1, 3, *resolution)), )
-        analyzer = FlopCountAnalysis(model, inputs)
-        analyzer.unsupported_ops_warnings(False)
-        analyzer.uncalled_modules_warnings(False)
-        flops = analyzer.total()
-        params = parameter_count(model)['']
-    return int(flops), int(params)
+    params = int(parameter_count(model)[''])
+
+    # get flops
+    try:
+        if 'test_dataloader' in model._config:
+            # build the data pipeline
+            test_dataset = model._config.test_dataloader.dataset
+            if test_dataset.pipeline[0]['type'] == 'LoadImageFromFile':
+                test_dataset.pipeline.pop(0)
+            if test_dataset.type in ['CIFAR10', 'CIFAR100']:
+                # The image shape of CIFAR is (32, 32, 3)
+                test_dataset.pipeline.insert(1, dict(type='Resize', scale=32))
+
+            with DefaultScope.overwrite_default_scope('mmpretrain'):
+                data = Compose(test_dataset.pipeline)({
+                    'img':
+                    np.random.randint(0, 256, (224, 224, 3), dtype=np.uint8)
+                })
+            resolution = tuple(data['inputs'].shape[-2:])
+        else:
+            # For configs only for get model.
+            resolution = (224, 224)
+
+        with torch.no_grad():
+            # Skip flops if the model doesn't have `extract_feat` method.
+            model.forward = model.extract_feat
+            model.to('cpu')
+            inputs = (torch.randn((1, 3, *resolution)), )
+            analyzer = FlopAnalyzer(model, inputs)
+            analyzer.unsupported_ops_warnings(False)
+            analyzer.uncalled_modules_warnings(False)
+            flops = int(analyzer.total())
+    except Exception:
+        print('Unable to calculate flops.')
+        flops = None
+    return flops, params
 
 
 def fill_collection(collection: dict):
@@ -202,12 +208,9 @@ def fill_model_by_prompt(model: dict, defaults: dict):
     params = model.get('Metadata', {}).get('Parameters')
     if model.get('Config') is not None and (
             MMCLS_ROOT / model['Config']).exists() and (flops is None
-                                                        or params is None):
-        try:
-            print('Automatically compute FLOPs and Parameters from config.')
-            flops, params = get_flops(str(MMCLS_ROOT / model['Config']))
-        except Exception:
-            print('Failed to compute FLOPs and Parameters.')
+                                                        and params is None):
+        print('Automatically compute FLOPs and Parameters from config.')
+        flops, params = get_flops_params(str(MMCLS_ROOT / model['Config']))
 
     if flops is None:
         flops = prompt('Please specify the [red]FLOPs[/]: ')
@@ -222,7 +225,8 @@ def fill_model_by_prompt(model: dict, defaults: dict):
     model['Metadata'].setdefault('FLOPs', flops)
     model['Metadata'].setdefault('Parameters', params)
 
-    if model.get('Metadata', {}).get('Training Data') is None:
+    if 'Training Data' not in model.get('Metadata', {}) and \
+            'Training Data' not in defaults.get('Metadata', {}):
         training_data = prompt(
             'Please input all [red]training dataset[/], '
             'include pre-training (input empty to finish): ',
@@ -259,12 +263,11 @@ def fill_model_by_prompt(model: dict, defaults: dict):
                 for metric in metrics_list:
                     k, v = metric.split('=')[:2]
                     metrics[k] = round(float(v), 2)
-            if len(metrics) > 0:
-                results = [{
-                    'Dataset': test_dataset,
-                    'Metrics': metrics,
-                    'Task': task
-                }]
+            results = [{
+                'Task': task,
+                'Dataset': test_dataset,
+                'Metrics': metrics or None,
+            }]
     model['Results'] = results
 
     weights = model.get('Weights')
@@ -274,7 +277,7 @@ def fill_model_by_prompt(model: dict, defaults: dict):
 
     if model.get('Converted From') is None and model.get(
             'Weights') is not None:
-        if Confirm.ask(
+        if '3rdparty' in model['Name'] or Confirm.ask(
                 'Is the checkpoint is converted '
                 'from [red]other repository[/]?',
                 default=False):
@@ -317,9 +320,9 @@ def update_model_by_dict(model: dict, update_dict: dict, defaults: dict):
     # Metadata.Flops, Metadata.Parameters
     flops = model.get('Metadata', {}).get('FLOPs')
     params = model.get('Metadata', {}).get('Parameters')
-    if config_updated or (flops is None or params is None):
+    if config_updated or (flops is None and params is None):
         print(f'Automatically compute FLOPs and Parameters of {model["Name"]}')
-        flops, params = get_flops(str(MMCLS_ROOT / model['Config']))
+        flops, params = get_flops_params(str(MMCLS_ROOT / model['Config']))
 
     model.setdefault('Metadata', {})
     model['Metadata']['FLOPs'] = flops
@@ -409,10 +412,15 @@ def format_model(model: dict):
 
 def order_models(model):
     order = []
+    # Pre-trained model
     order.append(int('Downstream' not in model))
+    # non-3rdparty model
     order.append(int('3rdparty' in model['Name']))
+    # smaller model
     order.append(model.get('Metadata', {}).get('Parameters', 0))
+    # faster model
     order.append(model.get('Metadata', {}).get('FLOPs', 0))
+    # name order
     order.append(len(model['Name']))
 
     return tuple(order)
@@ -442,7 +450,10 @@ def main():
     collection = fill_collection(collection)
     if ori_collection != collection:
         console.print(format_collection(collection))
-    model_defaults = {'In Collection': collection['Name']}
+    model_defaults = {
+        'In Collection': collection['Name'],
+        'Metadata': collection.get('Metadata', {}),
+    }
 
     models = content.get('Models', [])
     updated_models = []

diff --git a/.dev_scripts/generate_readme.py b/.dev_scripts/generate_readme.py
@@ -331,6 +331,12 @@ def add_models(metafile):
         'Image Classification',
         'Image Retrieval',
         'Multi-Label Classification',
+        'Image Caption',
+        'Visual Grounding',
+        'Visual Question Answering',
+        'Image-To-Text Retrieval',
+        'Text-To-Image Retrieval',
+        'NLVR',
     ]
 
     for task in tasks:

diff --git a/README.md b/README.md
@@ -70,11 +70,19 @@ The `main` branch works with **PyTorch 1.8+**.
 ### Major features
 
 - Various backbones and pretrained models
-- Rich training strategies(supervised learning, self-supervised learning, etc.)
+- Rich training strategies (supervised learning, self-supervised learning, multi-modality learning etc.)
 - Bag of training tricks
 - Large-scale training configs
 - High efficiency and extensibility
 - Powerful toolkits for model analysis and experiments
+- Various out-of-box inference tasks.
+  - Image Classification
+  - Image Caption
+  - Visual Question Answering
+  - Visual Grounding
+  - Retrieval (Image-To-Image, Text-To-Image, Image-To-Text)
+
+https://github.com/open-mmlab/mmpretrain/assets/26739999/e4dcd3a2-f895-4d1b-a351-fbc74a04e904
 
 ## What's new
 
@@ -117,6 +125,12 @@ mim install -e .
 
 Please refer to [installation documentation](https://mmpretrain.readthedocs.io/en/latest/get_started.html) for more detailed installation and dataset preparation.
 
+For multi-modality models support, please install the extra dependencies by:
+
+```shell
+mim install -e ".[multimodal]"
+```
+
 ## User Guides
 
 We provided a series of tutorials about the basic usage of MMPreTrain for new users:

diff --git a/README_zh-CN.md b/README_zh-CN.md
@@ -68,11 +68,19 @@ MMPreTrain 是一款基于 PyTorch 的开源深度学习预训练工具箱，是
 ### 主要特性
 
 - 支持多样的主干网络与预训练模型
-- 支持多种训练策略（有监督学习，无监督学习等）
+- 支持多种训练策略（有监督学习，无监督学习，多模态学习等）
 - 提供多种训练技巧
 - 大量的训练配置文件
 - 高效率和高可扩展性
 - 功能强大的工具箱，有助于模型分析和实验
+- 支持多种开箱即用的推理任务
+  - 图像分类
+  - 图像描述（Image Caption）
+  - 视觉问答（Visual Question Answering）
+  - 视觉定位（Visual Grounding）
+  - 检索（图搜图，图搜文，文搜图）
+
+https://github.com/open-mmlab/mmpretrain/assets/26739999/e4dcd3a2-f895-4d1b-a351-fbc74a04e904
 
 ## 更新日志
 
@@ -114,6 +122,12 @@ mim install -e .
 
 更详细的步骤请参考 [安装指南](https://mmpretrain.readthedocs.io/zh_CN/latest/get_started.html) 进行安装。
 
+如果需要多模态模型，请使用如下方式安装额外的依赖：
+
+```shell
+mim install -e ".[multimodal]"
+```
+
 ## 基础教程
 
 我们为新用户提供了一系列基础教程：

diff --git a/configs/_base_/datasets/coco_caption.py b/configs/_base_/datasets/coco_caption.py
@@ -0,0 +1,69 @@
+# data settings
+
+data_preprocessor = dict(
+    type='MultiModalDataPreprocessor',
+    mean=[122.770938, 116.7460125, 104.09373615],
+    std=[68.5005327, 66.6321579, 70.32316305],
+    to_rgb=True,
+)
+
+train_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(
+        type='RandomResizedCrop',
+        scale=384,
+        interpolation='bicubic',
+        backend='pillow'),
+    dict(type='RandomFlip', prob=0.5, direction='horizontal'),
+    dict(type='CleanCaption', keys='gt_caption'),
+    dict(
+        type='PackInputs',
+        algorithm_keys=['gt_caption'],
+        meta_keys=['image_id'],
+    ),
+]
+
+test_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(
+        type='Resize',
+        scale=(384, 384),
+        interpolation='bicubic',
+        backend='pillow'),
+    dict(type='PackInputs', meta_keys=['image_id']),
+]
+
+train_dataloader = dict(
+    batch_size=32,
+    num_workers=5,
+    dataset=dict(
+        type='COCOCaption',
+        data_root='data/coco',
+        ann_file='annotations/coco_karpathy_train.json',
+        pipeline=train_pipeline),
+    sampler=dict(type='DefaultSampler', shuffle=True),
+    persistent_workers=True,
+    drop_last=True,
+)
+
+val_dataloader = dict(
+    batch_size=16,
+    num_workers=5,
+    dataset=dict(
+        type='COCOCaption',
+        data_root='data/coco',
+        ann_file='annotations/coco_karpathy_val.json',
+        pipeline=test_pipeline,
+    ),
+    sampler=dict(type='DefaultSampler', shuffle=False),
+    persistent_workers=True,
+)
+
+val_evaluator = dict(
+    type='COCOCaption',
+    ann_file='data/coco/annotations/coco_karpathy_val_gt.json',
+)
+
+# # If you want standard test, please manually configure the test dataset
+test_dataloader = val_dataloader
+test_evaluator = val_evaluator