Skip to content

Commit

Permalink
0.31.1 minicpm v2.6-int4 fix, container rename
Browse files Browse the repository at this point in the history
  • Loading branch information
matatonic committed Sep 14, 2024
1 parent 1239d89 commit e1576c6
Show file tree
Hide file tree
Showing 6 changed files with 15 additions and 12 deletions.
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ Can't decide which to use? See the [OpenVLM Leaderboard](https://huggingface.co/
- - [X] [Pixtral-12B](https://huggingface.co/mistralai/Pixtral-12B-2409)
- [X] [openbmb](https://huggingface.co/openbmb)
- - [X] [MiniCPM-V-2_6](https://huggingface.co/openbmb/MiniCPM-V-2_6) (video not supported yet)
- - [X] [MiniCPM-V-2_6-int4](https://huggingface.co/openbmb/MiniCPM-V-2_6-int4)
- - [X] [MiniCPM-Llama3-V-2_5](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5)
- - [X] [MiniCPM-V-2](https://huggingface.co/openbmb/MiniCPM-V-2) (alternate docker only)
- - [X] [MiniCPM-V aka. OmniLMM-3B](https://huggingface.co/openbmb/MiniCPM-V) (alternate docker only)
Expand Down Expand Up @@ -138,6 +139,10 @@ If you can't find your favorite model, you can [open a new issue](https://github

## Recent updates

Version 0.31.1

- Fix support for openbmb/MiniCPM-V-2_6-int4

Version 0.31.0

- new model support: Qwen/Qwen2-VL family of models (video untested, GPTQ not working yet, but AWQ and BF16 are fine)
Expand Down
6 changes: 4 additions & 2 deletions backend/minicpm-v-2_6.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
from decord import VideoReader, cpu

# openbmb/MiniCPM-V-2_6
# openbmb/MiniCPM-V-2_6-int4

MAX_NUM_FRAMES=64 # if cuda OOM set a smaller number

Expand Down Expand Up @@ -36,8 +37,9 @@ def __init__(self, model_id: str, device: str, device_map: str = 'auto', extra_p
self.model = AutoModel.from_pretrained(**self.params).eval()

# bitsandbytes already moves the model to the device, so we don't need to do it again.
if not (extra_params.get('load_in_4bit', False) or extra_params.get('load_in_8bit', False)):
self.model = self.model.to(dtype=self.params['torch_dtype'], device=self.device)
if '-int4' not in model_id:
if not (extra_params.get('load_in_4bit', False) or extra_params.get('load_in_8bit', False)):
self.model = self.model.to(dtype=self.params['torch_dtype'], device=self.device)

self.loaded_banner()

Expand Down
7 changes: 2 additions & 5 deletions docker-compose.alt.yml
Original file line number Diff line number Diff line change
@@ -1,17 +1,14 @@
services:
server:
openedai-vision-alt:
build:
args:
- VERSION=alt
dockerfile: Dockerfile
tty: true
container_name: openedai-vision-alt
image: ghcr.io/matatonic/openedai-vision-alt
env_file: vision-alt.env # your settings go here
volumes:
- ./hf_home:/app/hf_home # for Hugginface model cache
# be sure to review and run prepare_minigemini.sh before starting a mini-gemini model
- ./model_zoo:/app/model_zoo # for MiniGemini
- ./YanweiLi:/app/YanweiLi # for MiniGemini
- ./model_conf_tests.alt.json:/app/model_conf_tests.json
ports:
- 5006:5006
Expand Down
7 changes: 2 additions & 5 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,17 +1,14 @@
services:
server:
openedai-vision:
build:
args:
- VERSION=latest
dockerfile: Dockerfile
tty: true
container_name: openedai-vision
image: ghcr.io/matatonic/openedai-vision
env_file: vision.env # your settings go here
volumes:
- ./hf_home:/app/hf_home # for Hugginface model cache
# be sure to review and run prepare_minigemini.sh before starting a mini-gemini model
- ./model_zoo:/app/model_zoo # for MiniGemini
- ./YanweiLi:/app/YanweiLi # for MiniGemini
- ./model_conf_tests.json:/app/model_conf_tests.json
ports:
- 5006:5006
Expand Down
1 change: 1 addition & 0 deletions model_conf_tests.json
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,7 @@
["microsoft/Phi-3.5-vision-instruct", "-A", "flash_attention_2", "--load-in-4bit"],
["microsoft/Phi-3.5-vision-instruct", "-A", "flash_attention_2"],
["mistralai/Pixtral-12B-2409"],
["openbmb/MiniCPM-V-2_6-int4", "-A", "flash_attention_2", "--device-map", "cuda:0"],
["openbmb/MiniCPM-V-2_6", "-A", "flash_attention_2", "--device-map", "cuda:0", "--load-in-4bit"],
["openbmb/MiniCPM-V-2_6", "-A", "flash_attention_2", "--device-map", "cuda:0"],
["openbmb/MiniCPM-Llama3-V-2_5", "-A", "flash_attention_2", "--device-map", "cuda:0", "--load-in-4bit"],
Expand Down
1 change: 1 addition & 0 deletions vision.sample.env
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ HF_HUB_ENABLE_HF_TRANSFER=1
#CLI_COMMAND="python vision.py -m microsoft/Phi-3.5-vision-instruct -A flash_attention_2 --load-in-4bit" # test pass✅, time: 8.5s, mem: 4.4GB, 13/13 tests passed.
#CLI_COMMAND="python vision.py -m microsoft/Phi-3.5-vision-instruct -A flash_attention_2" # test pass✅, time: 7.5s, mem: 9.3GB, 13/13 tests passed.
#CLI_COMMAND="python vision.py -m mistralai/Pixtral-12B-2409" # test pass✅, time: 16.3s, mem: 35.6GB, 13/13 tests passed.
#CLI_COMMAND="python vision.py -m openbmb/MiniCPM-V-2_6-int4 -A flash_attention_2 --device-map cuda:0" # test pass✅, time: 15.5s, mem: 9.5GB, 13/13 tests passed.
#CLI_COMMAND="python vision.py -m openbmb/MiniCPM-V-2_6 -A flash_attention_2 --device-map cuda:0 --load-in-4bit" # test pass✅, time: 13.6s, mem: 9.4GB, 13/13 tests passed.
#CLI_COMMAND="python vision.py -m openbmb/MiniCPM-V-2_6 -A flash_attention_2 --device-map cuda:0" # test pass✅, time: 12.1s, mem: 18.8GB, 13/13 tests passed.
#CLI_COMMAND="python vision.py -m openbmb/MiniCPM-Llama3-V-2_5 -A flash_attention_2 --device-map cuda:0 --load-in-4bit" # test pass✅, time: 22.8s, mem: 9.0GB, 13/13 tests passed.
Expand Down

0 comments on commit e1576c6

Please sign in to comment.