-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model] Adding support for MiniCPM-V #4087
Conversation
There's an incompatible pip's dependency error, the questions are listed as follows:
|
We can't do anything until timm has the same dependency as vllm. Or you can try to remove timm dependency. |
Sry, we were confused by this situation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay review. Thanks for your contribution! Looks good to me and left some minor comments. But there are so many custom stuff that are hard to review carefully. IMO, it's better that you can encapsulate this into your own package and import it into vllm for better maintenance.
@HwwwwwwwH Thanks for your excellent work, may I ask what is preventing the progress of this PR? |
Very sry for late!! We've been working on the new VLM MiniCPM-V-2.5 last few days. I've pushed the new commit according to the reviews. And I see some new features about VLM, is there any requirements for adapting these features? Really sry~ |
ping @ywang96 |
It should have been fixed last night. Please update to the latest main branch. |
@HwwwwwwwH I find Qwen2Model in init_llm, Are there any plans to release maybe minicpmv3-Qwen2 in the future? * v * |
Could you show the input prompt for each case? |
|
English prompt: question = "please describe the image in detail"
messages = [{
'role': 'user',
'content': f'(<image>./</image>)\n{question}'
}]
prompt = tokenizer.apply_chat_template(messages,
tokenize=False,
add_generation_prompt=True) Chinese prompt: question = "详细描述图片内容"
messages = [{
'role': 'user',
'content': f'(<image>./</image>)\n{question}'
}]
prompt = tokenizer.apply_chat_template(messages,
tokenize=False,
add_generation_prompt=True) |
thanks, add |
请问一下,我使用pip install vllm 安装的vllm版本是0.5.3.post1,为啥还是不能使用python -m vllm.entrypoints.openai.api_server \ --model /home/nlp/xc/NLP/LLM/openLLM/MiniCPM-Llama3-V-2_5 \,他提醒我说不支持这个模型 |
This model was added after the release of |
@DarkLight1337 I'm updating this PR description to link to this comment from you given how many times we had to answer the same question :P |
offline vllm推理 minicpmv2-6 会出现推理结果一直重复输出 某段文字。
辛苦帮忙看下,感谢~ @ywang96 |
Could you share a sample input/output with the repetitive generation? |
stop_tokens = ['<|im_end|>', '<|endoftext|>']
stop_token_ids = [tokenizer.convert_tokens_to_ids(i) for i in stop_tokens]
sampling_params = SamplingParams(
stop_token_ids=stop_token_ids,
) |
prompt in Chinese, which probably means producing some classic advertising copy |
By the way, how can i use minicpmv2-6's fewshot feature wtih VLLM structure. |
Here is minicpmv2-6 infer best practice with VLLM :
|
Thank u very much, I think problem is each version has different stoptoken-ids. These codes will work, I think. |
here is official fewshot feature usage with transformers:
|
在5.4.0版本的vllm中以openai api形式部署minicpm-v-2.6,遇到这个报错,请帮忙看下:
|
在0.5.4版本中推理minicpm-V,会出现out of memory的情况,采用OpenAI格式部署 |
我在一张A100-80G显卡上面做了测试,发现使用vllm加载时,内存会先到16GB(读取模型),读取完毕后的某一个瞬间,内存会达到29GB的峰值,然后又降低到了19GB。原因不明。 |
How to load the vision model in a separate gpu to avoid oom? |
@sfyumi I have a solution. In default vllm's max-num-seqs default to 256 and it's too large for the 3090, just lower the number to 32 for max-num-seqs and raise gpu-memory-utilization to 1. |
I think this could work msgs = [
{'role': 'user', 'content': "(<image>./</image>)" + question}, {'role': 'assistant', 'content': answer1},
{'role': 'user', 'content': "(<image>./</image>)" + question}, {'role': 'assistant', 'content': answer2},
{'role': 'user', 'content': "(<image>./</image>)" + question}
]
prompt = tokenizer.apply_chat_template(
msgs,
tokenize=False,
add_generation_prompt=True
)
inputs = {
"prompt": prompt,
"multi_modal_data": {
"image": [image1, image2, image_test]
},
} |
vLLM will send dummy data(with multiple dummy images) to the model. Since |
Signed-off-by: Alvant <[email protected]>
Adding support for MiniCPM-V-2, please review.
HuggingFace Page: https://huggingface.co/openbmb/MiniCPM-V-2
FIX #4943
FIX #5808
NOTE: This model was added after the release of 0.5.3.post1, so it'll only be included in the next release (e.g. 0.5.4). If you want to use it now, please install vLLM from source (i.e. main branch).