Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/support qwenvl glm4-v phi3-v(conflict resolving) #4377

Closed
wants to merge 41 commits into from
Closed
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
fbf19f8
Basic support for webui.
marko1616 Jun 19, 2024
95b8a1d
Basic support for GLM4V
marko1616 Jun 19, 2024
61a0880
Merge branch 'hiyouga:main' into feature/Support-Qwenvl
marko1616 Jun 19, 2024
8044804
Pass ruff check.
marko1616 Jun 19, 2024
c58be83
Half of sft support and bug fix.
marko1616 Jun 20, 2024
4b01584
GLM4v lora sft support
marko1616 Jun 21, 2024
c233520
Little fix
marko1616 Jun 22, 2024
078c85d
Merge branch 'main' into feature/Support-Qwenvl
hiyouga Jun 24, 2024
67542a0
Fix requirements.txt
marko1616 Jun 25, 2024
e6aa967
fix conflict
BUAADreamer Jun 28, 2024
f698b43
QwenVL sft & webui train buxfix.
marko1616 Jun 29, 2024
3fa3a0b
phi3v infer support & rename.
marko1616 Jun 30, 2024
06823f4
Add rm,pt,ppo,kto,dpo support for glm4v(Not tested).
marko1616 Jun 30, 2024
40e817c
Merge branch 'hiyouga:main' into feature/Support-Qwenvl
marko1616 Jun 30, 2024
4e4f959
little fix
marko1616 Jun 30, 2024
4f564a1
Pass ruff
marko1616 Jun 30, 2024
5065e87
Merge branch 'main' into feature/Support-Qwenvl
marko1616 Jun 30, 2024
c37465e
Style check & fix requirements.txt
marko1616 Jul 1, 2024
9e7bb3f
Bugfix
marko1616 Jul 2, 2024
17e5d7d
Merge branch 'main' into feature/Support-Qwenvl
marko1616 Jul 2, 2024
5fe2862
Change implementation.
marko1616 Jul 2, 2024
e871b03
Merge remote
marko1616 Jul 2, 2024
b8cf95a
Update README, fix template constant, and add download source for phi3v.
Jul 2, 2024
c4ac67a
Merge pull request #1 from Radeon-grapchis/feature/Support-Qwenvl
marko1616 Jul 2, 2024
e6099f5
Name style fix.
marko1616 Jul 2, 2024
eb38fe2
modify glm_4v 9B desc
BUAADreamer Jul 3, 2024
51931b9
add torchvision to pass test
BUAADreamer Jul 3, 2024
a0ad0b5
modify dict in common
BUAADreamer Jul 3, 2024
3acefbc
Support latest glm4v.
marko1616 Jul 3, 2024
4146242
Phi3v lora sft fix.
marko1616 Jul 3, 2024
70ac8ea
fix get_template.
marko1616 Jul 3, 2024
ea60231
Update for unsupervised dataset.
marko1616 Jul 4, 2024
b932bc0
Phi3v dataset processor fix.
marko1616 Jul 6, 2024
36932dd
Merge branch 'main' into feature/Support-Qwenvl
marko1616 Jul 18, 2024
3c2ecba
Conflict fix
marko1616 Jul 18, 2024
3f9ccb3
RLHF support.
marko1616 Jul 19, 2024
9c6587e
glm4v pairwise dataset support
marko1616 Jul 19, 2024
cfe0652
Merge branch 'main' into feature/Support-Qwenvl
marko1616 Jul 31, 2024
19a4cf7
Merge branch 'main' into feature/Support-Qwenvl
marko1616 Aug 20, 2024
e9d902b
Name fix.
marko1616 Aug 22, 2024
65b64be
ruff pass.
marko1616 Aug 27, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@ Compared to ChatGLM's [P-Tuning](https://github.com/THUDM/ChatGLM2-6B/tree/main/
## Supported Models

| Model | Model size | Template |
| ------------------------------------------------------------ | -------------------------------- | --------- |
| ------------------------------------------------------------ |----------------------------------| --------- |
| [Baichuan 2](https://huggingface.co/baichuan-inc) | 7B/13B | baichuan2 |
| [BLOOM/BLOOMZ](https://huggingface.co/bigscience) | 560M/1.1B/1.7B/3B/7.1B/176B | - |
| [ChatGLM3](https://huggingface.co/THUDM) | 6B | chatglm3 |
Expand All @@ -161,6 +161,7 @@ Compared to ChatGLM's [P-Tuning](https://github.com/THUDM/ChatGLM2-6B/tree/main/
| [Falcon](https://huggingface.co/tiiuae) | 7B/11B/40B/180B | falcon |
| [Gemma/Gemma 2/CodeGemma](https://huggingface.co/google) | 2B/7B/9B/27B | gemma |
| [GLM-4](https://huggingface.co/THUDM) | 9B | glm4 |
| [GLM-4V](https://huggingface.co/THUDM) | 9B | glm4_v |
| [InternLM2](https://huggingface.co/internlm) | 7B/20B | intern2 |
| [Llama](https://github.com/facebookresearch/llama) | 7B/13B/33B/65B | - |
| [Llama 2](https://huggingface.co/meta-llama) | 7B/13B/70B | llama2 |
Expand All @@ -171,7 +172,9 @@ Compared to ChatGLM's [P-Tuning](https://github.com/THUDM/ChatGLM2-6B/tree/main/
| [PaliGemma](https://huggingface.co/google) | 3B | gemma |
| [Phi-1.5/Phi-2](https://huggingface.co/microsoft) | 1.3B/2.7B | - |
| [Phi-3](https://huggingface.co/microsoft) | 4B/7B/14B | phi |
| [Phi-3-vision](https://huggingface.co/microsoft) | 4B | phi_v |
| [Qwen/Qwen1.5/Qwen2 (Code/MoE)](https://huggingface.co/Qwen) | 0.5B/1.5B/4B/7B/14B/32B/72B/110B | qwen |
| [Qwen-VL](https://huggingface.co/Qwen) | 9B | qwen_vl |
| [StarCoder 2](https://huggingface.co/bigcode) | 3B/7B/15B | - |
| [XVERSE](https://huggingface.co/xverse) | 7B/13B/65B | xverse |
| [Yi/Yi-1.5](https://huggingface.co/01-ai) | 6B/9B/34B | yi |
Expand Down
3 changes: 3 additions & 0 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,7 @@ https://github.com/hiyouga/LLaMA-Factory/assets/16256802/ec36a9dd-37f4-4f72-81bd
| [Falcon](https://huggingface.co/tiiuae) | 7B/11B/40B/180B | falcon |
| [Gemma/Gemma 2/CodeGemma](https://huggingface.co/google) | 2B/7B/9B/27B | gemma |
| [GLM-4](https://huggingface.co/THUDM) | 9B | glm4 |
| [GLM-4V](https://huggingface.co/THUDM) | 9B | glm4_v |
| [InternLM2](https://huggingface.co/internlm) | 7B/20B | intern2 |
| [Llama](https://github.com/facebookresearch/llama) | 7B/13B/33B/65B | - |
| [Llama 2](https://huggingface.co/meta-llama) | 7B/13B/70B | llama2 |
Expand All @@ -171,7 +172,9 @@ https://github.com/hiyouga/LLaMA-Factory/assets/16256802/ec36a9dd-37f4-4f72-81bd
| [PaliGemma](https://huggingface.co/google) | 3B | gemma |
| [Phi-1.5/Phi-2](https://huggingface.co/microsoft) | 1.3B/2.7B | - |
| [Phi-3](https://huggingface.co/microsoft) | 4B/7B/14B | phi |
| [Phi-3-vision](https://huggingface.co/microsoft) | 4B | phi_v |
| [Qwen/Qwen1.5/Qwen2 (Code/MoE)](https://huggingface.co/Qwen) | 0.5B/1.5B/4B/7B/14B/32B/72B/110B | qwen |
| [Qwen-VL](https://huggingface.co/Qwen) | 9B | qwen_vl |
| [StarCoder 2](https://huggingface.co/bigcode) | 3B/7B/15B | - |
| [XVERSE](https://huggingface.co/xverse) | 7B/13B/65B | xverse |
| [Yi/Yi-1.5](https://huggingface.co/01-ai) | 6B/9B/34B | yi |
Expand Down
1 change: 1 addition & 0 deletions requirements.txt
marko1616 marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,4 @@ fire
packaging
pyyaml
numpy<2.0.0
torchvision
114 changes: 103 additions & 11 deletions src/llamafactory/chat/hf_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,16 +15,21 @@
import asyncio
import concurrent.futures
import os
import pathlib
from threading import Thread
from typing import TYPE_CHECKING, Any, AsyncGenerator, Callable, Dict, List, Optional, Sequence, Tuple, Union
from uuid import uuid4

import torch
import torchvision
from PIL import Image
from transformers import GenerationConfig, TextIteratorStreamer

from ..data import get_template_and_fix_tokenizer
from ..extras.logging import get_logger
from ..extras.misc import get_logits_processor
from ..model import load_model, load_tokenizer
from ..webui.common import DEFAULT_CACHE_DIR
from .base_engine import BaseEngine, Response


Expand Down Expand Up @@ -58,6 +63,7 @@ def __init__(
self.model = load_model(
self.tokenizer, model_args, finetuning_args, is_trainable=False, add_valuehead=(not self.can_generate)
) # must after fixing tokenizer to resize vocab
self.model_args = model_args
self.generating_args = generating_args.to_dict()
try:
asyncio.get_event_loop()
Expand All @@ -75,33 +81,60 @@ def _process_args(
processor: Optional["ProcessorMixin"],
template: "Template",
generating_args: Dict[str, Any],
model_args: "ModelArguments",
messages: Sequence[Dict[str, str]],
system: Optional[str] = None,
tools: Optional[str] = None,
image: Optional["NDArray"] = None,
input_kwargs: Optional[Dict[str, Any]] = {},
) -> Tuple[Dict[str, Any], int]:
) -> Tuple[Dict[str, Any], int, Optional[pathlib.Path]]:
image_path = None
if (
processor is not None
and image is not None
and not hasattr(processor, "image_seq_length")
and template.image_token not in messages[0]["content"]
): # llava-like models
and model_args.visual_inputs_type in ["vision_tower", "phi3v_like"]
):
# llava-like models
messages[0]["content"] = template.image_token + messages[0]["content"]
elif image is not None and model_args.visual_inputs_type == "qwen_vl_like":
# Add image pathlike token as vision input
image_path = pathlib.Path(DEFAULT_CACHE_DIR) / f"{str(uuid4())}.png"
Image.fromarray(image).convert("RGB").save(image_path)
messages[-1]["content"] = (
template.format_image.apply(content=os.fspath(image_path))[0] + messages[-1]["content"]
)
elif image is not None and model_args.visual_inputs_type == "glm4v_like":
messages[-1]["content"] = template.format_image.apply()[0] + messages[-1]["content"]
marko1616 marked this conversation as resolved.
Show resolved Hide resolved

paired_messages = messages + [{"role": "assistant", "content": ""}]
system = system or generating_args["default_system"]
pixel_values = None
image_sizes = None
prompt_ids, _ = template.encode_oneturn(
tokenizer=tokenizer, messages=paired_messages, system=system, tools=tools
)
if processor is not None and image is not None: # add image features
# add image features for vision tower
if processor is not None and image is not None and template.format_image is None:
image_processor: "BaseImageProcessor" = getattr(processor, "image_processor")
batch_feature = image_processor(image, return_tensors="pt")
batch_feature = (
image_processor(image, return_tensors="pt")
if model_args.visual_inputs_type == "vision_tower"
else image_processor(Image.fromarray(image), return_tensors="pt")
)
pixel_values = batch_feature.to(model.device)["pixel_values"] # shape (B, C, H, W)
if hasattr(processor, "image_seq_length"): # paligemma models
image_token_id = tokenizer.convert_tokens_to_ids(template.image_token)
prompt_ids = [image_token_id] * getattr(processor, "image_seq_length") + prompt_ids
if model_args.visual_inputs_type == "phi3v_like":
image_sizes = batch_feature["image_sizes"]
index_image = prompt_ids.index(tokenizer.vocab["<|image|>"])
prompt_ids = (
prompt_ids[:index_image]
+ [-1] * batch_feature["num_img_tokens"].item()
+ prompt_ids[index_image + 1 :]
)

prompt_length = len(prompt_ids)
inputs = torch.tensor([prompt_ids], device=model.device)
Expand Down Expand Up @@ -163,11 +196,42 @@ def _process_args(
generation_config=GenerationConfig(**generating_args),
logits_processor=get_logits_processor(),
)
if image is not None and model_args.visual_inputs_type == "glm4v_like":
transform = torchvision.transforms.Compose(
[
torchvision.transforms.Resize(
(1120, 1120), interpolation=torchvision.transforms.InterpolationMode.BICUBIC
),
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize(
(0.48145466, 0.4578275, 0.40821073), (0.26862954, 0.26130258, 0.27577711)
),
]
)
gen_kwargs["images"] = (
transform(Image.fromarray(image)).unsqueeze(0).to(model.device).to(model_args.compute_dtype)
)
elif model_args.visual_inputs_type == "glm4v_like":
gen_kwargs["images"] = None

if pixel_values is not None:
gen_kwargs["pixel_values"] = pixel_values

return gen_kwargs, prompt_length
if image_sizes is not None and model_args.visual_inputs_type == "phi3v_like":
gen_kwargs["image_sizes"] = image_sizes
marko1616 marked this conversation as resolved.
Show resolved Hide resolved

return gen_kwargs, prompt_length, image_path

@staticmethod
def image_clean_wrapper(func, temporary_image):
# clean up for qwen_vl.
def wrapped_function(**kwargs):
result = func(**kwargs)
if temporary_image:
os.remove(temporary_image)
return result

return wrapped_function

@staticmethod
@torch.inference_mode()
Expand All @@ -177,16 +241,27 @@ def _chat(
processor: Optional["ProcessorMixin"],
template: "Template",
generating_args: Dict[str, Any],
model_args: "ModelArguments",
messages: Sequence[Dict[str, str]],
system: Optional[str] = None,
tools: Optional[str] = None,
image: Optional["NDArray"] = None,
input_kwargs: Optional[Dict[str, Any]] = {},
) -> List["Response"]:
gen_kwargs, prompt_length = HuggingfaceEngine._process_args(
model, tokenizer, processor, template, generating_args, messages, system, tools, image, input_kwargs
gen_kwargs, prompt_length, temporary_image = HuggingfaceEngine._process_args(
model,
marko1616 marked this conversation as resolved.
Show resolved Hide resolved
tokenizer,
processor,
template,
generating_args,
model_args,
messages,
system,
tools,
image,
input_kwargs,
)
generate_output = model.generate(**gen_kwargs)
generate_output = HuggingfaceEngine.image_clean_wrapper(model.generate, temporary_image)(**gen_kwargs)
response_ids = generate_output[:, prompt_length:]
response = tokenizer.batch_decode(response_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True)
results = []
Expand All @@ -212,18 +287,33 @@ def _stream_chat(
processor: Optional["ProcessorMixin"],
template: "Template",
generating_args: Dict[str, Any],
model_args: "ModelArguments",
messages: Sequence[Dict[str, str]],
system: Optional[str] = None,
tools: Optional[str] = None,
image: Optional["NDArray"] = None,
input_kwargs: Optional[Dict[str, Any]] = {},
) -> Callable[[], str]:
gen_kwargs, _ = HuggingfaceEngine._process_args(
model, tokenizer, processor, template, generating_args, messages, system, tools, image, input_kwargs
gen_kwargs, _, temporary_image = HuggingfaceEngine._process_args(
model,
tokenizer,
processor,
template,
generating_args,
model_args,
messages,
system,
tools,
image,
input_kwargs,
)
streamer = TextIteratorStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
gen_kwargs["streamer"] = streamer
thread = Thread(target=model.generate, kwargs=gen_kwargs, daemon=True)
thread = Thread(
target=HuggingfaceEngine.image_clean_wrapper(model.generate, temporary_image),
kwargs=gen_kwargs,
daemon=True,
)
thread.start()

def stream():
Expand Down Expand Up @@ -285,6 +375,7 @@ async def chat(
self.processor,
self.template,
self.generating_args,
self.model_args,
messages,
system,
tools,
Expand Down Expand Up @@ -313,6 +404,7 @@ async def stream_chat(
self.processor,
self.template,
self.generating_args,
self.model_args,
messages,
system,
tools,
Expand Down
2 changes: 1 addition & 1 deletion src/llamafactory/chat/vllm_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ def __init__(
"max_lora_rank": model_args.vllm_max_lora_rank,
}

if model_args.visual_inputs:
if model_args.visual_inputs and model_args.visual_inputs_type == "vision_tower":
marko1616 marked this conversation as resolved.
Show resolved Hide resolved
image_size = config.vision_config.image_size
patch_size = config.vision_config.patch_size
self.image_feature_size = (image_size // patch_size) ** 2
Expand Down
32 changes: 25 additions & 7 deletions src/llamafactory/data/aligner.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
from datasets import Features

from ..extras.logging import get_logger
from ..hparams import ModelArguments
from .data_utils import Role


Expand Down Expand Up @@ -49,7 +50,10 @@ def _convert_images(images: List[Any], dataset_attr: "DatasetAttr", data_args: "


def convert_alpaca(
examples: Dict[str, List[Any]], dataset_attr: "DatasetAttr", data_args: "DataArguments"
examples: Dict[str, List[Any]],
dataset_attr: "DatasetAttr",
data_args: "DataArguments",
model_args: "ModelArguments",
) -> Dict[str, List[Any]]:
r"""
Converts alpaca format dataset to the standard format.
Expand Down Expand Up @@ -96,13 +100,19 @@ def convert_alpaca(
outputs["response"].append(response)
outputs["system"].append(examples[dataset_attr.system][i] if dataset_attr.system else "")
outputs["tools"].append(examples[dataset_attr.tools][i] if dataset_attr.tools else "")
outputs["images"].append(convert_images(examples[dataset_attr.images][i]) if dataset_attr.images else [])
if model_args.visual_inputs_type == "qwen_vl_like":
outputs["images"].append(examples[dataset_attr.images][i] if dataset_attr.images else [])
else:
outputs["images"].append(convert_images(examples[dataset_attr.images][i]) if dataset_attr.images else [])

return outputs


def convert_sharegpt(
examples: Dict[str, List[Any]], dataset_attr: "DatasetAttr", data_args: "DataArguments"
examples: Dict[str, List[Any]],
dataset_attr: "DatasetAttr",
data_args: "DataArguments",
model_args: "ModelArguments",
) -> Dict[str, List[Any]]:
r"""
Converts sharegpt format dataset to the standard format.
Expand Down Expand Up @@ -184,7 +194,10 @@ def convert_sharegpt(
outputs["response"].append(response)
outputs["system"].append(system)
outputs["tools"].append(examples[dataset_attr.tools][i] if dataset_attr.tools else "")
outputs["images"].append(convert_images(examples[dataset_attr.images][i]) if dataset_attr.images else [])
if model_args.visual_inputs_type == "qwen_vl_like":
outputs["images"].append(examples[dataset_attr.images][i] if dataset_attr.images else [])
else:
outputs["images"].append(convert_images(examples[dataset_attr.images][i]) if dataset_attr.images else [])

return outputs

Expand All @@ -194,6 +207,7 @@ def align_dataset(
dataset_attr: "DatasetAttr",
data_args: "DataArguments",
training_args: "Seq2SeqTrainingArguments",
model_args: "ModelArguments",
) -> Union["Dataset", "IterableDataset"]:
r"""
Aligned dataset:
Expand All @@ -204,9 +218,9 @@ def align_dataset(
images: [],
"""
if dataset_attr.formatting == "alpaca":
convert_func = partial(convert_alpaca, dataset_attr=dataset_attr, data_args=data_args)
convert_func = partial(convert_alpaca, dataset_attr=dataset_attr, data_args=data_args, model_args=model_args)
else:
convert_func = partial(convert_sharegpt, dataset_attr=dataset_attr, data_args=data_args)
convert_func = partial(convert_sharegpt, dataset_attr=dataset_attr, data_args=data_args, model_args=model_args)

column_names = list(next(iter(dataset)).keys())
features = Features.from_dict(
Expand All @@ -219,7 +233,11 @@ def align_dataset(
],
"system": {"dtype": "string", "_type": "Value"},
"tools": {"dtype": "string", "_type": "Value"},
"images": [{"_type": "Image"}],
"images": [
{"dtype": "string", "_type": "Value"}
if model_args.visual_inputs_type == "qwen_vl_like"
else {"_type": "Image"}
],
}
)
kwargs = {}
Expand Down
7 changes: 5 additions & 2 deletions src/llamafactory/data/loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@ def load_single_dataset(
max_samples = min(data_args.max_samples, len(dataset))
dataset = dataset.select(range(max_samples))

return align_dataset(dataset, dataset_attr, data_args, training_args)
return align_dataset(dataset, dataset_attr, data_args, training_args, model_args)


def get_dataset(
Expand Down Expand Up @@ -177,7 +177,7 @@ def get_dataset(

with training_args.main_process_first(desc="pre-process dataset"):
preprocess_func, print_function = get_preprocess_and_print_func(
data_args, training_args, stage, template, tokenizer, processor
data_args, training_args, model_args, stage, template, tokenizer, processor
)
column_names = list(next(iter(dataset)).keys())
kwargs = {}
Expand All @@ -190,6 +190,9 @@ def get_dataset(

dataset = dataset.map(preprocess_func, batched=True, remove_columns=column_names, **kwargs)

if model_args.visual_inputs_type == "glm4v_like":
dataset = dataset.rename_column("image_inputs", "images")

marko1616 marked this conversation as resolved.
Show resolved Hide resolved
if data_args.tokenized_path is not None:
if training_args.should_save:
dataset.save_to_disk(data_args.tokenized_path)
Expand Down
Loading