diff --git a/.gitignore b/.gitignore index 8c2a1c3c..66b0628f 100644 --- a/.gitignore +++ b/.gitignore @@ -67,7 +67,7 @@ instance/ # Scrapy stuff: .scrapy - +.history # Sphinx documentation docs/_build/ diff --git a/README.md b/README.md index b6b0b1b0..d3721a12 100644 --- a/README.md +++ b/README.md @@ -70,11 +70,10 @@ - [x] [Qwen2.5-Coder-7B-Instruct Lora 微调 SwanLab 可视化记录版](./models/Qwen2.5-Coder/05-Qwen2.5-Coder-7B-Instruct%20Lora%20微调%20SwanLab%20可视化记录版.md) @杨卓 - [Qwen2-vl](https://github.com/QwenLM/Qwen2-VL) - - [x] [Qwen2-vl-2B FastApi 部署调用](./models/Qwen2-VL/01-Qwen2-VL-2B-Instruct%20FastApi%20部署调用.md) @姜舒凡 - - [x] [Qwen2-vl-2B WebDemo 部署](./models/Qwen2-VL/02-Qwen2-VL-2B-Instruct%20Web%20Demo部署.md) @赵伟 - - [ ] [Qwen2-vl-2B vLLM 部署]() @荞麦 - - [ ] [Qwen2-vl-2B Lora 微调]() @李柯辰 - - [x] [Qwen2-vl-2B Lora 微调 SwanLab 可视化记录版](./models/Qwen2-VL/05-Qwen2-VL-2B-Instruct%20Lora%20微调%20SwanLab%20可视化记录版.md) @林泽毅 + - [ ] [Qwen2-vl-2B FastApi 部署调用]() + - [ ] [Qwen2-vl-2B WebDemo 部署]() + - [ ] [Qwen2-vl-2B vLLM 部署]() + - [ ] [Qwen2-vl-2B Lora 微调]() - [Qwen2.5](https://github.com/QwenLM/Qwen2.5) - [x] [Qwen2.5-7B-Instruct FastApi 部署调用](./models/Qwen2.5/01-Qwen2.5-7B-Instruct%20FastApi%20部署调用.md) @娄天奥 diff --git "a/models/Qwen2-VL/04-Qwen2-VL-2B Lora \345\276\256\350\260\203.ipynb" "b/models/Qwen2-VL/04-Qwen2-VL-2B Lora \345\276\256\350\260\203.ipynb" new file mode 100644 index 00000000..06860d92 --- /dev/null +++ "b/models/Qwen2-VL/04-Qwen2-VL-2B Lora \345\276\256\350\260\203.ipynb" @@ -0,0 +1,492 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Qwen2-VL-2B-Instruct Lora 微调\n", + "\n", + "本节我们将简要介绍如何基于 `transformers` 和 `peft` 等框架,使用 Qwen2-VL-2B-Instruct 模型在 **COCO2014图像描述** 任务上进行 Lora 微调训练。Lora 是一种高效的微调方法,若需深入了解 Lora 的工作原理,可参考博客:[知乎|深入浅出 Lora](https://zhuanlan.zhihu.com/p/650197598)。\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 🌍 环境配置\n", + "\n", + "考虑到部分同学在配置环境时可能会遇到一些问题,我们在 AutoDL 平台上提供了预装了 Qwen2-VL 环境的镜像。点击下方链接并直接创建 Autodl 示例即可快速开始:[AutoDL-Qwen2-VL-self-llm](https://www.codewithgpu.com/i/datawhalechina/self-llm/Qwen2-VL-self-llm)。\n", + "\n", + "\n", + "## 📚 准备数据集\n", + "\n", + "本节使用的是 [COCO 2014 Caption](https://modelscope.cn/datasets/modelscope/coco_2014_caption/summary) 数据集,该数据集主要用于多模态(Image-to-Text)任务。\n", + "\n", + "> 数据集介绍:COCO 2014 Caption数据集是Microsoft Common Objects in Context (COCO)数据集的一部分,主要用于图像描述任务。该数据集包含了大约40万张图像,每张图像都有至少1个人工生成的英文描述语句。这些描述语句旨在帮助计算机理解图像内容,并为图像自动生成描述提供训练数据。\n", + "\n", + "![05-2](./images/05-2.jpg)\n", + "\n", + "在本节的任务中,我们主要使用其中的前500张图像,并对它们进行处理和格式调整,目标是组合成如下格式的JSON文件:\n", + "\n", + "**数据集下载与处理方式**\n", + "\n", + "1. **我们需要做四件事情:**\n", + " - 通过Modelscope下载COCO 2014 Caption数据集\n", + " - 加载数据集,将图像保存到本地\n", + " - 将图像路径和描述文本转换为一个CSV文件\n", + " - 将CSV文件转换为JSON文件\n", + "\n", + "2. **使用下面的代码完成从数据下载到生成CSV的过程:**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# 导入所需的库\n", + "from modelscope.msdatasets import MsDataset\n", + "import os\n", + "import pandas as pd\n", + "\n", + "MAX_DATA_NUMBER = 500\n", + "\n", + "# 检查目录是否已存在\n", + "if not os.path.exists('coco_2014_caption'):\n", + " # 从modelscope下载COCO 2014图像描述数据集\n", + " ds = MsDataset.load('modelscope/coco_2014_caption', subset_name='coco_2014_caption', split='train')\n", + " print(len(ds))\n", + " # 设置处理的图片数量上限\n", + " total = min(MAX_DATA_NUMBER, len(ds))\n", + "\n", + " # 创建保存图片的目录\n", + " os.makedirs('coco_2014_caption', exist_ok=True)\n", + "\n", + " # 初始化存储图片路径和描述的列表\n", + " image_paths = []\n", + " captions = []\n", + "\n", + " for i in range(total):\n", + " # 获取每个样本的信息\n", + " item = ds[i]\n", + " image_id = item['image_id']\n", + " caption = item['caption']\n", + " image = item['image']\n", + " \n", + " # 保存图片并记录路径\n", + " image_path = os.path.abspath(f'coco_2014_caption/{image_id}.jpg')\n", + " image.save(image_path)\n", + " \n", + " # 将路径和描述添加到列表中\n", + " image_paths.append(image_path)\n", + " captions.append(caption)\n", + " \n", + " # 每处理50张图片打印一次进度\n", + " if (i + 1) % 50 == 0:\n", + " print(f'Processing {i+1}/{total} images ({(i+1)/total*100:.1f}%)')\n", + "\n", + " # 将图片路径和描述保存为CSV文件\n", + " df = pd.DataFrame({\n", + " 'image_path': image_paths,\n", + " 'caption': captions\n", + " })\n", + " \n", + " # 将数据保存为CSV文件\n", + " df.to_csv('./coco-2024-dataset.csv', index=False)\n", + " \n", + " print(f'数据处理完成,共处理了{total}张图片')\n", + "\n", + "else:\n", + " print('coco_2014_caption目录已存在,跳过数据处理步骤')\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "注意:将代码中的`\"测试图像路径\"`替换为你自己希望测试的图像路径。" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "3. **在同一目录下,用以下代码,将csv文件转换为json文件:**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import json\n", + "\n", + "# 载入CSV文件\n", + "df = pd.read_csv('./coco-2024-dataset.csv')\n", + "conversations = []\n", + "\n", + "# 添加对话数据\n", + "for i in range(len(df)):\n", + " conversations.append({\n", + " \"id\": f\"identity_{i+1}\",\n", + " \"conversations\": [\n", + " {\n", + " \"from\": \"user\",\n", + " \"value\": f\"COCO Yes: <|vision_start|>{df.iloc[i]['image_path']}<|vision_end|>\"\n", + " },\n", + " {\n", + " \"from\": \"assistant\", \n", + " \"value\": df.iloc[i]['caption']\n", + " }\n", + " ]\n", + " })\n", + "\n", + "# 保存为json\n", + "with open('data_vl.json', 'w', encoding='utf-8') as f:\n", + " json.dump(conversations, f, ensure_ascii=False, indent=2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "此时目录下会多出两个文件:\n", + "- coco-2024-dataset.csv\n", + "- data_vl.json\n", + "\n", + "至此,我们完成了数据集的准备。" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 🤖 模型下载与加载\n", + "\n", + "\n", + "这里使用 `modelscope` 提供的 `snapshot_download` 函数进行下载,该方法对国内的用户十分友好。然后把它加载到Transformers中进行训练:\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from modelscope import snapshot_download, AutoTokenizer\n", + "from transformers import TrainingArguments, Trainer, DataCollatorForSeq2Seq, Qwen2VLForConditionalGeneration, AutoProcessor\n", + "import torch\n", + "\n", + "# 在modelscope上下载Qwen2-VL模型到本地目录下\n", + "model_dir = snapshot_download(\"Qwen/Qwen2-VL-2B-Instruct\", cache_dir=\"./\", revision=\"master\")\n", + "\n", + "# 使用Transformers加载模型权重\n", + "tokenizer = AutoTokenizer.from_pretrained(\"./Qwen/Qwen2-VL-2B-Instruct/\", use_fast=False, trust_remote_code=True)\n", + "# 特别的,Qwen2-VL-2B-Instruct模型需要使用Qwen2VLForConditionalGeneration来加载\n", + "model = Qwen2VLForConditionalGeneration.from_pretrained(\"./Qwen/Qwen2-VL-2B-Instruct/\", device_map=\"auto\", torch_dtype=torch.bfloat16, trust_remote_code=True,)\n", + "model.enable_input_require_grads() # 开启梯度检查点时,要执行该方法" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "模型大小约 4.5GB,下载模型大概需要 5 - 10 分钟。" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "vscode": { + "languageId": "bat" + } + }, + "source": [ + "## 🚀 开始微调\n", + "\n", + "**本节代码做了以下几件事:**\n", + "1. 下载并加载 `Qwen2-VL-2B-Instruct` 模型\n", + "2. 加载数据集,取前496条数据参与训练,4条数据进行主观评测\n", + "3. 配置Lora,参数为r=64, lora_alpha=16, lora_dropout=0.05\n", + "4. 训练2个epoch" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "import torch\n", + "from datasets import Dataset\n", + "from modelscope import snapshot_download, AutoTokenizer\n", + "from qwen_vl_utils import process_vision_info\n", + "from peft import LoraConfig, TaskType, get_peft_model, PeftModel\n", + "from transformers import (\n", + " TrainingArguments,\n", + " Trainer,\n", + " DataCollatorForSeq2Seq,\n", + " Qwen2VLForConditionalGeneration,\n", + " AutoProcessor,\n", + ")\n", + "import json\n", + "\n", + "def process_func(example):\n", + " \"\"\"\n", + " 将数据集进行预处理\n", + " \"\"\"\n", + " MAX_LENGTH = 8192\n", + " input_ids, attention_mask, labels = [], [], []\n", + " conversation = example[\"conversations\"]\n", + " input_content = conversation[0][\"value\"]\n", + " output_content = conversation[1][\"value\"]\n", + " \n", + " instruction = tokenizer(\n", + " f\"<|im_start|>system\\nYou are a helpful assistant.<|im_end|>\\n<|im_start|>user\\n{input_content}<|im_end|>\\n<|im_start|>assistant\\n\",\n", + " add_special_tokens=False,\n", + " )\n", + " response = tokenizer(f\"{output_content}\", add_special_tokens=False)\n", + " input_ids = (\n", + " instruction[\"input_ids\"] + response[\"input_ids\"] + [tokenizer.pad_token_id]\n", + " )\n", + " attention_mask = instruction[\"attention_mask\"] + response[\"attention_mask\"] + [1]\n", + " labels = (\n", + " [-100] * len(instruction[\"input_ids\"])\n", + " + response[\"input_ids\"]\n", + " + [tokenizer.pad_token_id]\n", + " )\n", + " \n", + " if len(input_ids) > MAX_LENGTH: # 做一个截断\n", + " input_ids = input_ids[:MAX_LENGTH]\n", + " attention_mask = attention_mask[:MAX_LENGTH]\n", + " labels = labels[:MAX_LENGTH]\n", + " \n", + " return {\"input_ids\": input_ids, \"attention_mask\": attention_mask, \"labels\": labels}\n", + "\n", + "def predict(messages, model):\n", + " # 准备推理\n", + " text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)\n", + " image_inputs, video_inputs = process_vision_info(messages)\n", + " inputs = processor(\n", + " text=[text],\n", + " images=image_inputs,\n", + " videos=video_inputs,\n", + " padding=True,\n", + " return_tensors=\"pt\",\n", + " )\n", + " inputs = inputs.to(\"cuda\")\n", + "\n", + " # 生成输出\n", + " generated_ids = model.generate(**inputs, max_new_tokens=128)\n", + " generated_ids_trimmed = [\n", + " out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)\n", + " ]\n", + " output_text = processor.batch_decode(\n", + " generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False\n", + " )\n", + " \n", + " return output_text[0]\n", + "\n", + "# 使用Transformers加载模型权重\n", + "tokenizer = AutoTokenizer.from_pretrained(\"./Qwen/Qwen2-VL-2B-Instruct/\", use_fast=False, trust_remote_code=True)\n", + "processor = AutoProcessor.from_pretrained(\"./Qwen/Qwen2-VL-2B-Instruct\")\n", + "\n", + "model = Qwen2VLForConditionalGeneration.from_pretrained(\"./Qwen/Qwen2-VL-2B-Instruct/\", device_map=\"auto\", torch_dtype=torch.bfloat16, trust_remote_code=True,)\n", + "model.enable_input_require_grads() # 开启梯度检查点时,要执行该方法\n", + "\n", + "# 处理数据集:读取json文件\n", + "# 拆分成训练集和测试集,保存为data_vl_train.json和data_vl_test.json\n", + "train_json_path = \"data_vl.json\"\n", + "with open(train_json_path, 'r') as f:\n", + " data = json.load(f)\n", + " train_data = data[:-4]\n", + " test_data = data[-4:]\n", + "\n", + "with open(\"data_vl_train.json\", \"w\") as f:\n", + " json.dump(train_data, f)\n", + "\n", + "with open(\"data_vl_test.json\", \"w\") as f:\n", + " json.dump(test_data, f)\n", + "\n", + "train_ds = Dataset.from_json(\"data_vl_train.json\")\n", + "train_dataset = train_ds.map(process_func)\n", + "\n", + "# 配置LoRA\n", + "config = LoraConfig(\n", + " task_type=TaskType.CAUSAL_LM,\n", + " target_modules=[\"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\", \"gate_proj\", \"up_proj\", \"down_proj\"],\n", + " inference_mode=False, # 训练模式\n", + " r=64, # Lora 秩\n", + " lora_alpha=16, # Lora alaph,具体作用参见 Lora 原理\n", + " lora_dropout=0.05, # Dropout 比例\n", + " bias=\"none\",\n", + ")\n", + "\n", + "# 获取LoRA模型\n", + "peft_model = get_peft_model(model, config)\n", + "\n", + "# 配置训练参数\n", + "args = TrainingArguments(\n", + " output_dir=\"./output/Qwen2-VL-2B\",\n", + " per_device_train_batch_size=2,\n", + " gradient_accumulation_steps=2,\n", + " logging_steps=10,\n", + " num_train_epochs=2,\n", + " save_steps=100,\n", + " learning_rate=1e-4,\n", + " save_on_each_node=True,\n", + " gradient_checkpointing=True,\n", + " report_to=\"none\",\n", + ")\n", + " \n", + "# 配置Trainer\n", + "trainer = Trainer(\n", + " model=peft_model,\n", + " args=args,\n", + " train_dataset=train_dataset,\n", + " data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True),\n", + ")\n", + "# 开启模型训练\n", + "trainer.train()\n", + "\n", + "# ===测试模式===\n", + "# 配置测试参数\n", + "val_config = LoraConfig(\n", + " task_type=TaskType.CAUSAL_LM,\n", + " target_modules=[\"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\", \"gate_proj\", \"up_proj\", \"down_proj\"],\n", + " inference_mode=True, # 训练模式\n", + " r=64, # Lora 秩\n", + " lora_alpha=16, # Lora alaph,具体作用参见 Lora 原理\n", + " lora_dropout=0.05, # Dropout 比例\n", + " bias=\"none\",\n", + ")\n", + "\n", + "# 获取测试模型\n", + "val_peft_model = PeftModel.from_pretrained(model, model_id=\"./output/Qwen2-VL-2B/checkpoint-100\", config=val_config)\n", + "\n", + "# 读取测试数据\n", + "with open(\"data_vl_test.json\", \"r\") as f:\n", + " test_dataset = json.load(f)\n", + "\n", + "test_image_list = []\n", + "for item in test_dataset:\n", + " input_image_prompt = item[\"conversations\"][0][\"value\"]\n", + " # 去掉前后的<|vision_start|>和<|vision_end|>\n", + " origin_image_path = input_image_prompt.split(\"<|vision_start|>\")[1].split(\"<|vision_end|>\")[0]\n", + " \n", + " messages = [{\n", + " \"role\": \"user\", \n", + " \"content\": [\n", + " {\n", + " \"type\": \"image\", \n", + " \"image\": origin_image_path\n", + " },\n", + " {\n", + " \"type\": \"text\",\n", + " \"text\": \"COCO Yes:\"\n", + " }\n", + " ]}]\n", + " \n", + " response = predict(messages, val_peft_model)\n", + " messages.append({\"role\": \"assistant\", \"content\": f\"{response}\"})\n", + " print(messages[-1])\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "看到下面的进度条即代表训练开始:\n", + "\n", + "![alt text](./images/04-1.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 🧐 推理LoRA微调后的模型\n", + "\n", + "加载LoRA微调后的模型,并进行推理。\n", + "\n", + "**完整代码如下:**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from transformers import Qwen2VLForConditionalGeneration, AutoProcessor\n", + "from qwen_vl_utils import process_vision_info\n", + "from peft import PeftModel, LoraConfig, TaskType\n", + "\n", + "config = LoraConfig(\n", + " task_type=TaskType.CAUSAL_LM,\n", + " target_modules=[\"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\", \"gate_proj\", \"up_proj\", \"down_proj\"],\n", + " inference_mode=True,\n", + " r=64, # Lora 秩\n", + " lora_alpha=16, # Lora alaph,具体作用参见 Lora 原理\n", + " lora_dropout=0.05, # Dropout 比例\n", + " bias=\"none\",\n", + ")\n", + "\n", + "# default: Load the model on the available device(s)\n", + "model = Qwen2VLForConditionalGeneration.from_pretrained(\n", + " \"./Qwen/Qwen2-VL-2B-Instruct\", torch_dtype=\"auto\", device_map=\"auto\"\n", + ")\n", + "model = PeftModel.from_pretrained(model, model_id=\"./output/Qwen2-VL-2B/checkpoint-100\", config=config)\n", + "processor = AutoProcessor.from_pretrained(\"./Qwen/Qwen2-VL-2B-Instruct\")\n", + "\n", + "messages = [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"image\",\n", + " \"image\": \"测试图像路径\",\n", + " },\n", + " {\"type\": \"text\", \"text\": \"COCO Yes:\"},\n", + " ],\n", + " }\n", + "]\n", + "\n", + "# Preparation for inference\n", + "text = processor.apply_chat_template(\n", + " messages, tokenize=False, add_generation_prompt=True\n", + ")\n", + "image_inputs, video_inputs = process_vision_info(messages)\n", + "inputs = processor(\n", + " text=[text],\n", + " images=image_inputs,\n", + " videos=video_inputs,\n", + " padding=True,\n", + " return_tensors=\"pt\",\n", + ")\n", + "inputs = inputs.to(\"cuda\")\n", + "\n", + "# Inference: Generation of the output\n", + "generated_ids = model.generate(**inputs, max_new_tokens=128)\n", + "generated_ids_trimmed = [\n", + " out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)\n", + "]\n", + "output_text = processor.batch_decode(\n", + " generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False\n", + ")\n", + "print(output_text)" + ] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git "a/models/Qwen2-VL/04-Qwen2-VL-2B Lora \345\276\256\350\260\203.md" "b/models/Qwen2-VL/04-Qwen2-VL-2B Lora \345\276\256\350\260\203.md" new file mode 100644 index 00000000..21e1af21 --- /dev/null +++ "b/models/Qwen2-VL/04-Qwen2-VL-2B Lora \345\276\256\350\260\203.md" @@ -0,0 +1,402 @@ +# Qwen2-VL-2B-Instruct Lora 微调 + +本节我们将简要介绍如何基于 `transformers` 和 `peft` 等框架,使用 Qwen2-VL-2B-Instruct 模型在 **COCO2014图像描述** 任务上进行 Lora 微调训练。Lora 是一种高效的微调方法,若需深入了解 Lora 的工作原理,可参考博客:[知乎|深入浅出 Lora](https://zhuanlan.zhihu.com/p/650197598)。 + +## 🌍 环境配置 + +考虑到部分同学在配置环境时可能会遇到一些问题,我们在 AutoDL 平台上提供了预装了 Qwen2-VL 环境的镜像。点击下方链接并直接创建 Autodl 示例即可快速开始:[AutoDL-Qwen2-VL-self-llm](https://www.codewithgpu.com/i/datawhalechina/self-llm/Qwen2-VL-self-llm)。 + + +## 📚 准备数据集 + +本节使用的是 [COCO 2014 Caption](https://modelscope.cn/datasets/modelscope/coco_2014_caption/summary) 数据集,该数据集主要用于多模态(Image-to-Text)任务。 + +> 数据集介绍:COCO 2014 Caption数据集是Microsoft Common Objects in Context (COCO)数据集的一部分,主要用于图像描述任务。该数据集包含了大约40万张图像,每张图像都有至少1个人工生成的英文描述语句。这些描述语句旨在帮助计算机理解图像内容,并为图像自动生成描述提供训练数据。 + +![05-2](./images/05-2.jpg) + +在本节的任务中,我们主要使用其中的前500张图像,并对它们进行处理和格式调整,目标是组合成如下格式的JSON文件: + +**数据集下载与处理方式** + +1. **我们需要做四件事情:** + - 通过Modelscope下载COCO 2014 Caption数据集 + - 加载数据集,将图像保存到本地 + - 将图像路径和描述文本转换为一个CSV文件 + - 将CSV文件转换为JSON文件 + +2. **使用下面的代码完成从数据下载到生成CSV的过程:** +```python +# 导入所需的库 +from modelscope.msdatasets import MsDataset +import os +import pandas as pd + +MAX_DATA_NUMBER = 500 + +# 检查目录是否已存在 +if not os.path.exists('coco_2014_caption'): + # 从modelscope下载COCO 2014图像描述数据集 + ds = MsDataset.load('modelscope/coco_2014_caption', subset_name='coco_2014_caption', split='train') + print(len(ds)) + # 设置处理的图片数量上限 + total = min(MAX_DATA_NUMBER, len(ds)) + + # 创建保存图片的目录 + os.makedirs('coco_2014_caption', exist_ok=True) + + # 初始化存储图片路径和描述的列表 + image_paths = [] + captions = [] + + for i in range(total): + # 获取每个样本的信息 + item = ds[i] + image_id = item['image_id'] + caption = item['caption'] + image = item['image'] + + # 保存图片并记录路径 + image_path = os.path.abspath(f'coco_2014_caption/{image_id}.jpg') + image.save(image_path) + + # 将路径和描述添加到列表中 + image_paths.append(image_path) + captions.append(caption) + + # 每处理50张图片打印一次进度 + if (i + 1) % 50 == 0: + print(f'Processing {i+1}/{total} images ({(i+1)/total*100:.1f}%)') + + # 将图片路径和描述保存为CSV文件 + df = pd.DataFrame({ + 'image_path': image_paths, + 'caption': captions + }) + + # 将数据保存为CSV文件 + df.to_csv('./coco-2024-dataset.csv', index=False) + + print(f'数据处理完成,共处理了{total}张图片') + +else: + print('coco_2014_caption目录已存在,跳过数据处理步骤') +``` + +3. **在同一目录下,用以下代码,将csv文件转换为json文件:** + +```python +import pandas as pd +import json + +# 载入CSV文件 +df = pd.read_csv('./coco-2024-dataset.csv') +conversations = [] + +# 添加对话数据 +for i in range(len(df)): + conversations.append({ + "id": f"identity_{i+1}", + "conversations": [ + { + "from": "user", + "value": f"COCO Yes: <|vision_start|>{df.iloc[i]['image_path']}<|vision_end|>" + }, + { + "from": "assistant", + "value": df.iloc[i]['caption'] + } + ] + }) + +# 保存为json +with open('data_vl.json', 'w', encoding='utf-8') as f: + json.dump(conversations, f, ensure_ascii=False, indent=2) +``` + +此时目录下会多出两个文件: +- coco-2024-dataset.csv +- data_vl.json + +至此,我们完成了数据集的准备。 + + +## 🤖 模型下载与加载 + + +这里使用 `modelscope` 提供的 `snapshot_download` 函数进行下载,该方法对国内的用户十分友好。然后把它加载到Transformers中进行训练: +```python +from modelscope import snapshot_download, AutoTokenizer +from transformers import TrainingArguments, Trainer, DataCollatorForSeq2Seq, Qwen2VLForConditionalGeneration, AutoProcessor +import torch + +# 在modelscope上下载Qwen2-VL模型到本地目录下 +model_dir = snapshot_download("Qwen/Qwen2-VL-2B-Instruct", cache_dir="./", revision="master") + +# 使用Transformers加载模型权重 +tokenizer = AutoTokenizer.from_pretrained("./Qwen/Qwen2-VL-2B-Instruct/", use_fast=False, trust_remote_code=True) +# 特别的,Qwen2-VL-2B-Instruct模型需要使用Qwen2VLForConditionalGeneration来加载 +model = Qwen2VLForConditionalGeneration.from_pretrained("./Qwen/Qwen2-VL-2B-Instruct/", device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True,) +model.enable_input_require_grads() # 开启梯度检查点时,要执行该方法 +``` + +模型大小约 4.5GB,下载模型大概需要 5 - 10 分钟。 + + + + +## 🚀 开始微调 + +**本节代码做了以下几件事:** +1. 下载并加载 `Qwen2-VL-2B-Instruct` 模型 +2. 加载数据集,取前496条数据参与训练,4条数据进行主观评测 +3. 配置Lora,参数为r=64, lora_alpha=16, lora_dropout=0.05 +4. 训练2个epoch + + +**完整代码如下:** +```python +import torch +from datasets import Dataset +from modelscope import snapshot_download, AutoTokenizer +from qwen_vl_utils import process_vision_info +from peft import LoraConfig, TaskType, get_peft_model, PeftModel +from transformers import ( + TrainingArguments, + Trainer, + DataCollatorForSeq2Seq, + Qwen2VLForConditionalGeneration, + AutoProcessor, +) +import json + +def process_func(example): + """ + 将数据集进行预处理 + """ + MAX_LENGTH = 8192 + input_ids, attention_mask, labels = [], [], [] + conversation = example["conversations"] + input_content = conversation[0]["value"] + output_content = conversation[1]["value"] + + instruction = tokenizer( + f"<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n{input_content}<|im_end|>\n<|im_start|>assistant\n", + add_special_tokens=False, + ) + response = tokenizer(f"{output_content}", add_special_tokens=False) + input_ids = ( + instruction["input_ids"] + response["input_ids"] + [tokenizer.pad_token_id] + ) + attention_mask = instruction["attention_mask"] + response["attention_mask"] + [1] + labels = ( + [-100] * len(instruction["input_ids"]) + + response["input_ids"] + + [tokenizer.pad_token_id] + ) + + if len(input_ids) > MAX_LENGTH: # 做一个截断 + input_ids = input_ids[:MAX_LENGTH] + attention_mask = attention_mask[:MAX_LENGTH] + labels = labels[:MAX_LENGTH] + + return {"input_ids": input_ids, "attention_mask": attention_mask, "labels": labels} + +def predict(messages, model): + # 准备推理 + text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) + image_inputs, video_inputs = process_vision_info(messages) + inputs = processor( + text=[text], + images=image_inputs, + videos=video_inputs, + padding=True, + return_tensors="pt", + ) + inputs = inputs.to("cuda") + + # 生成输出 + generated_ids = model.generate(**inputs, max_new_tokens=128) + generated_ids_trimmed = [ + out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids) + ] + output_text = processor.batch_decode( + generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False + ) + + return output_text[0] + +# 使用Transformers加载模型权重 +tokenizer = AutoTokenizer.from_pretrained("./Qwen/Qwen2-VL-2B-Instruct/", use_fast=False, trust_remote_code=True) +processor = AutoProcessor.from_pretrained("./Qwen/Qwen2-VL-2B-Instruct") + +model = Qwen2VLForConditionalGeneration.from_pretrained("./Qwen/Qwen2-VL-2B-Instruct/", device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True,) +model.enable_input_require_grads() # 开启梯度检查点时,要执行该方法 + +# 处理数据集:读取json文件 +# 拆分成训练集和测试集,保存为data_vl_train.json和data_vl_test.json +train_json_path = "data_vl.json" +with open(train_json_path, 'r') as f: + data = json.load(f) + train_data = data[:-4] + test_data = data[-4:] + +with open("data_vl_train.json", "w") as f: + json.dump(train_data, f) + +with open("data_vl_test.json", "w") as f: + json.dump(test_data, f) + +train_ds = Dataset.from_json("data_vl_train.json") +train_dataset = train_ds.map(process_func) + +# 配置LoRA +config = LoraConfig( + task_type=TaskType.CAUSAL_LM, + target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"], + inference_mode=False, # 训练模式 + r=64, # Lora 秩 + lora_alpha=16, # Lora alaph,具体作用参见 Lora 原理 + lora_dropout=0.05, # Dropout 比例 + bias="none", +) + +# 获取LoRA模型 +peft_model = get_peft_model(model, config) + +# 配置训练参数 +args = TrainingArguments( + output_dir="./output/Qwen2-VL-2B", + per_device_train_batch_size=2, + gradient_accumulation_steps=2, + logging_steps=10, + num_train_epochs=2, + save_steps=100, + learning_rate=1e-4, + save_on_each_node=True, + gradient_checkpointing=True, + report_to="none", +) + +# 配置Trainer +trainer = Trainer( + model=peft_model, + args=args, + train_dataset=train_dataset, + data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True), +) +# 开启模型训练 +trainer.train() + +# ===测试模式=== +# 配置测试参数 +val_config = LoraConfig( + task_type=TaskType.CAUSAL_LM, + target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"], + inference_mode=True, # 训练模式 + r=64, # Lora 秩 + lora_alpha=16, # Lora alaph,具体作用参见 Lora 原理 + lora_dropout=0.05, # Dropout 比例 + bias="none", +) + +# 获取测试模型 +val_peft_model = PeftModel.from_pretrained(model, model_id="./output/Qwen2-VL-2B/checkpoint-100", config=val_config) + +# 读取测试数据 +with open("data_vl_test.json", "r") as f: + test_dataset = json.load(f) + +test_image_list = [] +for item in test_dataset: + input_image_prompt = item["conversations"][0]["value"] + # 去掉前后的<|vision_start|>和<|vision_end|> + origin_image_path = input_image_prompt.split("<|vision_start|>")[1].split("<|vision_end|>")[0] + + messages = [{ + "role": "user", + "content": [ + { + "type": "image", + "image": origin_image_path + }, + { + "type": "text", + "text": "COCO Yes:" + } + ]}] + + response = predict(messages, val_peft_model) + messages.append({"role": "assistant", "content": f"{response}"}) + print(messages[-1]) +``` +看到下面的进度条即代表训练开始: + +![alt text](./images/04-1.png) + + +## 🧐 推理LoRA微调后的模型 + +加载LoRA微调后的模型,并进行推理。 + +**完整代码如下:** +```python +from transformers import Qwen2VLForConditionalGeneration, AutoProcessor +from qwen_vl_utils import process_vision_info +from peft import PeftModel, LoraConfig, TaskType + +config = LoraConfig( + task_type=TaskType.CAUSAL_LM, + target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"], + inference_mode=True, + r=64, # Lora 秩 + lora_alpha=16, # Lora alaph,具体作用参见 Lora 原理 + lora_dropout=0.05, # Dropout 比例 + bias="none", +) + +# default: Load the model on the available device(s) +model = Qwen2VLForConditionalGeneration.from_pretrained( + "./Qwen/Qwen2-VL-2B-Instruct", torch_dtype="auto", device_map="auto" +) +model = PeftModel.from_pretrained(model, model_id="./output/Qwen2-VL-2B/checkpoint-100", config=config) +processor = AutoProcessor.from_pretrained("./Qwen/Qwen2-VL-2B-Instruct") + +messages = [ + { + "role": "user", + "content": [ + { + "type": "image", + "image": "测试图像路径", + }, + {"type": "text", "text": "COCO Yes:"}, + ], + } +] + +# Preparation for inference +text = processor.apply_chat_template( + messages, tokenize=False, add_generation_prompt=True +) +image_inputs, video_inputs = process_vision_info(messages) +inputs = processor( + text=[text], + images=image_inputs, + videos=video_inputs, + padding=True, + return_tensors="pt", +) +inputs = inputs.to("cuda") + +# Inference: Generation of the output +generated_ids = model.generate(**inputs, max_new_tokens=128) +generated_ids_trimmed = [ + out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids) +] +output_text = processor.batch_decode( + generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False +) +print(output_text) +``` +注意:将代码中的`"测试图像路径"`替换为你自己希望测试的图像路径。 \ No newline at end of file diff --git a/models/Qwen2-VL/images/04-1.png b/models/Qwen2-VL/images/04-1.png new file mode 100644 index 00000000..28dac22a Binary files /dev/null and b/models/Qwen2-VL/images/04-1.png differ