diff --git a/configs/llava/README.md b/configs/llava/README.md index 31275f7748c..7aaf57d7d13 100644 --- a/configs/llava/README.md +++ b/configs/llava/README.md @@ -21,7 +21,7 @@ Instruction tuning large language models (LLMs) using machine-generated instruct According to the license of LLaMA, we cannot provide the merged checkpoint directly. Please use the below script to download and get the merged the checkpoint. -```baseh +```shell python tools/model_converters/llava-delta2mmpre.py huggyllama/llama-7b liuhaotian/LLaVA-Lightning-7B-delta-v1-1 ./LLaVA-Lightning-7B-delta-v1-1.pth ``` diff --git a/docs/en/notes/changelog.md b/docs/en/notes/changelog.md index 219797b4f8f..f25923d4f62 100644 --- a/docs/en/notes/changelog.md +++ b/docs/en/notes/changelog.md @@ -7,7 +7,7 @@ - Support inference of more **multi-modal** algorithms, such as **LLaVA**, **MiniGPT-4**, **Otter**, etc. - Support around **10 multi-modal datasets**! - Add **iTPN**, **SparK** self-supervised learning algorithms. -- Provide examples of [New Config](./mmpretrain/configs/) and [DeepSpeed/FSDP](./configs/mae/benchmarks/). +- Provide examples of [New Config](https://github.com/open-mmlab/mmpretrain/tree/main/mmpretrain/configs/) and [DeepSpeed/FSDP](https://github.com/open-mmlab/mmpretrain/tree/main/configs/mae/benchmarks/). ### New Features diff --git a/docs/en/useful_tools/shape_bias.md b/docs/en/useful_tools/shape_bias.md index ea4f96c46be..907bde61ee7 100644 --- a/docs/en/useful_tools/shape_bias.md +++ b/docs/en/useful_tools/shape_bias.md @@ -1,10 +1,10 @@ -## Shape Bias Tool Usage +# Shape Bias Tool Usage Shape bias measures how a model relies the shapes, compared to texture, to sense the semantics in images. For more details, we recommend interested readers to this [paper](https://arxiv.org/abs/2106.07411). MMPretrain provide an off-the-shelf toolbox to obtain the shape bias of a classification model. You can following these steps below: -### Prepare the dataset +## Prepare the dataset First you should download the [cue-conflict](https://github.com/bethgelab/model-vs-human/releases/download/v0.1/cue-conflict.tar.gz) to `data` folder, and then unzip this dataset. After that, you `data` folder should have the following structure: @@ -18,7 +18,7 @@ data | |── truck ``` -### Modify the config for classification +## Modify the config for classification We run the shape-bias tool on a ViT-base model with masked autoencoder pretraining. Its config file is `configs/mae/benchmarks/vit-base-p16_8xb128-coslr-100e_in1k.py`, and its checkpoint is downloaded from [this link](https://download.openmmlab.com/mmselfsup/1.x/mae/mae_vit-base-p16_8xb512-fp16-coslr-1600e_in1k/vit-base-p16_ft-8xb128-coslr-100e_in1k/vit-base-p16_ft-8xb128-coslr-100e_in1k_20220825-cf70aa21.pth). Replace the original test_pipeline, test_dataloader and test_evaluation with the following configurations: @@ -55,7 +55,7 @@ test_evaluator = dict( Please note you should make custom modifications to the `csv_dir` and `model_name` above. I renamed my modified sample config file as `vit-base-p16_8xb128-coslr-100e_in1k_shape-bias.py` in the folder `configs/mae/benchmarks/`. -### Inference your model with above modified config file +## Inference your model with above modified config file Then you should inferece your model on the `cue-conflict` dataset with the your modified config file. @@ -77,7 +77,7 @@ bash tools/dist_test.sh configs/mae/benchmarks/vit-base-p16_8xb128-coslr-100e_in After that, you should obtain a csv file in `csv_dir` folder, named `cue-conflict_model-name_session-1.csv`. Besides this file, you should also download these [csv files](https://github.com/bethgelab/model-vs-human/tree/master/raw-data/cue-conflict) to the `csv_dir`. -### Plot shape bias +## Plot shape bias Then we can start to plot the shape bias: diff --git a/mmpretrain/models/multimodal/flamingo/flamingo.py b/mmpretrain/models/multimodal/flamingo/flamingo.py index abdd03328f4..1c19875b8b4 100644 --- a/mmpretrain/models/multimodal/flamingo/flamingo.py +++ b/mmpretrain/models/multimodal/flamingo/flamingo.py @@ -23,7 +23,7 @@ class Flamingo(BaseModel): zeroshot_prompt (str): Prompt used for zero-shot inference. Defaults to 'Output:'. shot_prompt_tmpl (str): Prompt used for few-shot inference. - Defaults to 'Output:{caption}<|endofchunk|>'. + Defaults to ``Output:{caption}<|endofchunk|>``. final_prompt_tmpl (str): Final part of prompt used for inference. Defaults to 'Output:'. generation_cfg (dict): The extra generation config, accept the keyword diff --git a/mmpretrain/models/multimodal/minigpt4/minigpt4.py b/mmpretrain/models/multimodal/minigpt4/minigpt4.py index d23203603ec..4616c2e2597 100644 --- a/mmpretrain/models/multimodal/minigpt4/minigpt4.py +++ b/mmpretrain/models/multimodal/minigpt4/minigpt4.py @@ -36,7 +36,7 @@ class MiniGPT4(BaseModel): raw_prompts (list): Prompts for training. Defaults to None. max_txt_len (int): Max token length while doing tokenization. Defaults to 32. - end_sym (str): Ended symbol of the sequence. Defaults to '\n'. + end_sym (str): Ended symbol of the sequence. Defaults to '\\n'. generation_cfg (dict): The config of text generation. Defaults to dict(). data_preprocessor (:obj:`BaseDataPreprocessor`): Used for diff --git a/mmpretrain/models/multimodal/otter/otter.py b/mmpretrain/models/multimodal/otter/otter.py index 2fed1a4d27c..f1eb61baead 100644 --- a/mmpretrain/models/multimodal/otter/otter.py +++ b/mmpretrain/models/multimodal/otter/otter.py @@ -20,8 +20,8 @@ class Otter(Flamingo): zeroshot_prompt (str): Prompt used for zero-shot inference. Defaults to an. shot_prompt_tmpl (str): Prompt used for few-shot inference. - Defaults to 'User:Please describe the image. - GPT:{caption}<|endofchunk|>'. + Defaults to ``User:Please describe the image. + GPT:{caption}<|endofchunk|>``. final_prompt_tmpl (str): Final part of prompt used for inference. Defaults to 'User:Please describe the image. GPT:'. generation_cfg (dict): The extra generation config, accept the keyword