added gemma2 9b and 27b vllm with streaming #318

dsingal0 · 2024-07-02T01:58:37Z

No description provided.

dsingal0 · 2024-07-02T19:47:02Z

vllm's next release will add support for gemma2 9/27B. Until then you'd have to build from source on top of a pytorch image which takes 30+ minutes to deploy. vllm-project/vllm#5806

…d local-gemma readme.

vshulman

Left a few small suggestions, but looking forward to having 27b, folks are looking forward to it.

vshulman · 2024-07-09T19:00:40Z

gemma2/gemma2-27b-it-vllm/model/model.py

+        logger.info(f"tensor parallelism: {model_metadata['tensor_parallel']}")
+        logger.info(f"max num seqs: {model_metadata['max_num_seqs']}")
+
+        self.model_args = AsyncEngineArgs(


potential improvement is to move everything to config, e.g. as in this example:
https://github.com/vshulman/truss-examples/tree/main/ultravox-vllm

we can merge without this change as other vLLM examples also support a partial list of arguments

Looking through it, it looks like that example uses the vllm openai server instead of explicitly instantiating the vllm AsyncLLMEngine for the model.

100% -- I just think the same kwargs pattern can apply here. the benefit I see is that going forward all it would take to pass a new argument into vLLM, either the standalone OpenAI server or the Python API above, is adding it to the config.yaml.

gemma2/gemma2-27b-it/model/model.py

gemma2/gemma2-27b-it/config.yaml

…figs

vshulman

Do you think this pattern is expected? It looks like the instruct response completes part of the request (! 😀), which makes me think there is potentially a templating issue?

gemma2/gemma2-27b-it-vllm/model/model.py

vshulman · 2024-07-11T15:22:20Z

gemma2/gemma2-27b-it-vllm/model/model.py

+        logger.info(f"tensor parallelism: {model_metadata['tensor_parallel']}")
+        logger.info(f"max num seqs: {model_metadata['max_num_seqs']}")
+
+        self.model_args = AsyncEngineArgs(


100% -- I just think the same kwargs pattern can apply here. the benefit I see is that going forward all it would take to pass a new argument into vLLM, either the standalone OpenAI server or the Python API above, is adding it to the config.yaml.

dsingal0 · 2024-07-11T18:30:03Z

Do you think this pattern is expected? It looks like the instruct response completes part of the request (! 😀), which makes me think there is potentially a templating issue?

I believe this is because the vllm implementation didn't use the chat template. I'm not familiar with that, but will give it a try.

… template. todo: confirm 27B working

…P > 1

dsingal0 added 2 commits July 1, 2024 18:56

added gemma2 9b and 27b with streaming using local-gemma

91add77

fixed formatting

cb23cd0

vshulman assigned vshulman, bolasim and htrivedi99 Jul 3, 2024

dsingal0 added 2 commits July 6, 2024 22:34

Merge branch 'basetenlabs:main' into gemma2

8afdb58

added vllm implementation. cleanup local-gemma implementation. update…

973d56c

…d local-gemma readme.

dsingal0 changed the title ~~added gemma2 9b and 27b with streaming using local-gemma~~ added gemma2 9b and 27b with streaming Jul 7, 2024

vshulman reviewed Jul 9, 2024

View reviewed changes

dsingal0 added 2 commits July 11, 2024 02:14

add streaming to local-gemma(to test), remove memory and cpu from con…

a0ce32b

…figs

tested and added support for "stream" : "False"

d0cb98a

vshulman reviewed Jul 11, 2024

View reviewed changes

dsingal0 changed the title ~~added gemma2 9b and 27b with streaming~~ [DRAFT]added gemma2 9b and 27b with streaming Jul 15, 2024

dsingal0 added 2 commits July 15, 2024 14:42

removed local-gemma implementations. added HF tokenizer for vllm chat…

770d923

… template. todo: confirm 27B working

bump vllm and flashinfer version. multiprocessing fix for CUDA with T…

edb87ea

…P > 1

vshulman approved these changes Jul 16, 2024

View reviewed changes

fix whitespace to pass pre-commit-check

ff077ca

dsingal0 changed the title ~~[DRAFT]added gemma2 9b and 27b with streaming~~ added gemma2 9b and 27b vllm with streaming Jul 16, 2024

dsingal0 added 2 commits July 15, 2024 21:58

another fix for pre-check-commits

1c4dfde

isort fixes

2462295

dsingal0 merged commit a95eeee into basetenlabs:main Jul 16, 2024
2 checks passed

dsingal0 deleted the gemma2 branch July 16, 2024 16:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added gemma2 9b and 27b vllm with streaming #318

added gemma2 9b and 27b vllm with streaming #318

dsingal0 commented Jul 2, 2024

dsingal0 commented Jul 2, 2024

vshulman left a comment

vshulman Jul 9, 2024

vshulman Jul 9, 2024

dsingal0 Jul 11, 2024

vshulman Jul 11, 2024

vshulman left a comment •

edited

Loading

vshulman Jul 11, 2024

dsingal0 commented Jul 11, 2024

added gemma2 9b and 27b vllm with streaming #318

added gemma2 9b and 27b vllm with streaming #318

Conversation

dsingal0 commented Jul 2, 2024

dsingal0 commented Jul 2, 2024

vshulman left a comment

Choose a reason for hiding this comment

vshulman Jul 9, 2024

Choose a reason for hiding this comment

vshulman Jul 9, 2024

Choose a reason for hiding this comment

dsingal0 Jul 11, 2024

Choose a reason for hiding this comment

vshulman Jul 11, 2024

Choose a reason for hiding this comment

vshulman left a comment • edited Loading

Choose a reason for hiding this comment

vshulman Jul 11, 2024

Choose a reason for hiding this comment

dsingal0 commented Jul 11, 2024

vshulman left a comment •

edited

Loading