Skip to content

Commit

Permalink
0.15.0 +cogvlm2
Browse files Browse the repository at this point in the history
  • Loading branch information
matatonic committed May 20, 2024
1 parent 1b9109f commit 8a66a15
Show file tree
Hide file tree
Showing 8 changed files with 168 additions and 101 deletions.
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
FROM python:3.11-slim

RUN apt-get update && apt-get install -y git
RUN apt-get update && apt-get install -y git gcc
RUN pip install --no-cache-dir --upgrade pip

RUN mkdir -p /app
Expand Down
16 changes: 11 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ An OpenAI API compatible vision server, it functions like `gpt-4-vision-preview`
- - [X] [InternVL-Chat-V1-5](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5) (wont gpu split yet, 4bit not recommended)
- - [X] [InternVL-Chat-V1-5-Int8](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5-Int8) (wont gpu split yet)
- [X] [THUDM/CogVLM](https://github.com/THUDM/CogVLM)
- - [X] [cogvlm2-llama3-chat-19B](https://huggingface.co/THUDM/cogvlm2-llama3-chat-19B)
- - [X] [cogvlm2-llama3-chinese-chat-19B](https://huggingface.co/THUDM/cogvlm2-llama3-chinese-chat-19B)
- - [X] [cogvlm-chat-hf](https://huggingface.co/THUDM/cogvlm-chat-hf)
- - [X] [cogagent-chat-hf](https://huggingface.co/THUDM/cogagent-chat-hf)
- [X] [InternLM](https://huggingface.co/internlm/)
Expand All @@ -29,7 +31,7 @@ An OpenAI API compatible vision server, it functions like `gpt-4-vision-preview`
- - [X] [idefics2-8b-chatty-AWQ](https://huggingface.co/HuggingFaceM4/idefics2-8b-chatty-AWQ) (main docker only, wont gpu split)
- [X] [qihoo360](https://huggingface.co/qihoo360)
- - [X] [360VL-8B](https://huggingface.co/qihoo360/360VL-8B)
- - [X] [360VL-70B](https://huggingface.co/qihoo360/360VL-70B) (loading error, [see note](https://huggingface.co/qihoo360/360VL-70B/discussions/1), also too large for me to test)
- - [X] [360VL-70B](https://huggingface.co/qihoo360/360VL-70B) (untested)
- [X] [LlavaNext](https://huggingface.co/llava-hf) (main docker only)
- - [X] [llava-v1.6-34b-hf](https://huggingface.co/llava-hf/llava-v1.6-34b-hf) (main docker only)
- - [X] [llava-v1.6-vicuna-13b-hf](https://huggingface.co/llava-hf/llava-v1.6-vicuna-13b-hf) (main docker only)
Expand All @@ -39,9 +41,6 @@ An OpenAI API compatible vision server, it functions like `gpt-4-vision-preview`
- - [X] [llava-v1.5-vicuna-7b-hf](https://huggingface.co/llava-hf/llava-v1.5-vicuna-7b-hf)
- - [X] [llava-v1.5-vicuna-13b-hf](https://huggingface.co/llava-hf/llava-v1.5-vicuna-13b-hf)
- - [ ] [llava-v1.5-bakLlava-7b-hf](https://huggingface.co/llava-hf/llava-v1.5-bakLlava-7b-hf) (currently errors)
- [X] [01-ai/Yi-VL](https://huggingface.co/01-ai)
- - [ ] [Yi-VL-6B](https://huggingface.co/01-ai/Yi-VL-6B) (currently errors)
- - [ ] [Yi-VL-34B](https://huggingface.co/01-ai/Yi-VL-34B) (currently errors)
- [X] [qresearch](https://huggingface.co/qresearch/)
- - [X] [llama-3-vision-alpha-hf](https://huggingface.co/qresearch/llama-3-vision-alpha-hf) (main docker only, wont gpu split)
- [X] [BAAI](https://huggingface.co/BAAI/)
Expand Down Expand Up @@ -72,6 +71,9 @@ An OpenAI API compatible vision server, it functions like `gpt-4-vision-preview`
- - [X] [MGM-34B-HD](https://huggingface.co/YanweiLi/MGM-34B-HD) (alternate docker only)
- - [X] [MGM-8x7B-HD](https://huggingface.co/YanweiLi/MGM-8x7B-HD) (alternate docker only)
- [X] [qnguyen3/nanoLLaVA](https://huggingface.co/qnguyen3/nanoLLaVA) (main docker only, wont gpu split)
- [ ] [01-ai/Yi-VL](https://huggingface.co/01-ai)
- - [ ] [Yi-VL-6B](https://huggingface.co/01-ai/Yi-VL-6B) (currently errors)
- - [ ] [Yi-VL-34B](https://huggingface.co/01-ai/Yi-VL-34B) (currently errors)
- [ ] [Deepseek-VL-7b-chat](https://huggingface.co/deepseek-ai/deepseek-vl-7b-chat)
- [ ] [Deepseek-VL-1.3b-chat](https://huggingface.co/deepseek-ai/deepseek-vl-1.3b-chat)
- [ ] [NousResearch/Obsidian-3B-V0.5](https://huggingface.co/NousResearch/Obsidian-3B-V0.5)
Expand All @@ -81,6 +83,10 @@ See: [OpenVLM Leaderboard](https://huggingface.co/spaces/opencompass/open_vlm_le

## Recent updates

Version 0.15.0

- new model support: cogvlm2-llama3-chinese-chat-19B, cogvlm2-llama3-chat-19B

Version 0.14.1

- new model support: idefics2-8b-chatty, idefics2-8b-chatty-AWQ (it worked already, no code change)
Expand All @@ -89,7 +95,7 @@ Version 0.14.1
Version: 0.14.0

- docker-compose.yml: Assume the runtime supports the device (ie. nvidia)
- new model support: qihoo360/360VL-8B, qihoo360/360VL-70B (70B loading error, [see note](https://huggingface.co/qihoo360/360VL-70B/discussions/1), also too large for me to test)
- new model support: qihoo360/360VL-8B, qihoo360/360VL-70B (70B is untested, too large for me)
- new model support: BAAI/Emu2-Chat, Can be slow to load, may need --max-memory option control the loading on multiple gpus
- new model support: TIGER-Labs/Mantis: Mantis-8B-siglip-llama3, Mantis-8B-clip-llama3, Mantis-8B-Fuyu

Expand Down
48 changes: 48 additions & 0 deletions backend/cogvlm2.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
from transformers import AutoTokenizer, AutoModelForCausalLM

from vision_qna import *

# THUDM/cogvlm2-llama3-chat-19B
# THUDM/cogvlm2-llama3-chinese-chat-19B
import transformers
transformers.logging.set_verbosity_error()

class VisionQnA(VisionQnABase):
model_name: str = "cogvlm2"
format: str = 'llama3'

def __init__(self, model_id: str, device: str, device_map: str = 'auto', extra_params = {}, format = None):
super().__init__(model_id, device, device_map, extra_params, format)

self.tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=self.params.get('trust_remote_code', False))
self.model = AutoModelForCausalLM.from_pretrained(**self.params).eval()

print(f"Loaded on device: {self.model.device} with dtype: {self.model.dtype}")

async def chat_with_images(self, request: ImageChatRequest) -> str:

query, history, images, system_message = await prompt_history_images_system_from_messages(
request.messages, img_tok='', url_handler=url_to_image)

input_by_model = self.model.build_conversation_input_ids(self.tokenizer, query=query, history=history, images=images, template_version='chat')

inputs = {
'input_ids': input_by_model['input_ids'].unsqueeze(0).to(self.model.device),
'token_type_ids': input_by_model['token_type_ids'].unsqueeze(0).to(self.model.device),
'attention_mask': input_by_model['attention_mask'].unsqueeze(0).to(self.model.device),
'images': [[input_by_model['images'][0].to(self.model.device).to(self.model.dtype)]] if images else None,
}

default_params = {
'max_new_tokens': 2048,
'pad_token_id': 128002,
'top_p': None, # 0.9
'temperature': None, # 0.6
}

params = self.get_generation_params(request, default_params)

response = self.model.generate(**inputs, **params)
answer = self.tokenizer.decode(response[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True).strip()

return answer
2 changes: 2 additions & 0 deletions model_conf_tests.alt.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
["echo840/Monkey"],
["echo840/Monkey-Chat"],
["OpenGVLab/InternVL-Chat-V1-5", "--device-map", "cuda:0"],
["THUDM/cogvlm2-llama3-chat-19B"],
["THUDM/cogvlm2-llama3-chinese-chat-19B"],
["THUDM/cogvlm-chat-hf"],
["THUDM/cogagent-chat-hf"],
["Qwen/Qwen-VL-Chat"],
Expand Down
2 changes: 2 additions & 0 deletions model_conf_tests.json
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@
["qnguyen3/nanoLLaVA", "--use-flash-attn", "--device-map", "cuda:0"],
["echo840/Monkey"],
["echo840/Monkey-Chat"],
["THUDM/cogvlm2-llama3-chat-19B"],
["THUDM/cogvlm2-llama3-chinese-chat-19B"],
["THUDM/cogvlm-chat-hf"],
["THUDM/cogagent-chat-hf"],
["Qwen/Qwen-VL-Chat"],
Expand Down
Loading

0 comments on commit 8a66a15

Please sign in to comment.