Support custom conversation template in multi_model_worker #2434

hi-jin · 2023-09-17T04:55:15Z

Why are these changes needed?

The single model_worker already supports a custom conv_template as you can see in model_worker.py's create_model_worker method.
But multi_model_worker didn't support custom conv_templates.

Related issue number (if applicable)

Closes #2383

Checks

I've run format.sh to lint the changes in this PR.
I've included any doc changes needed.
I've made sure the relevant tests are passing (if applicable).

* Remove hardcode flash-attn disable setting (lm-sys#2342) * Document turning off proxy_buffering when api is streaming (lm-sys#2337) * Simplify huggingface api example (lm-sys#2355) * Update sponsor logos (lm-sys#2367) * if LOGDIR is empty, then don't try output log to local file (lm-sys#2357) Signed-off-by: Lei Wen <[email protected]> Co-authored-by: Lei Wen <[email protected]> * add best_of and use_beam_search for completions interface (lm-sys#2348) Signed-off-by: Lei Wen <[email protected]> Co-authored-by: Lei Wen <[email protected]> * Extract upvote/downvote from log files (lm-sys#2369) * Revert "add best_of and use_beam_search for completions interface" (lm-sys#2370) * Improve doc (lm-sys#2371) * add best_of and use_beam_search for completions interface (lm-sys#2372) Signed-off-by: Lei Wen <[email protected]> Co-authored-by: Lei Wen <[email protected]> * update monkey patch for llama2 (lm-sys#2379) * Make E5 adapter more restrict to reduce mismatch (lm-sys#2381) * Update UI and sponsers (lm-sys#2387) * Use fsdp api for save save (lm-sys#2390) * Release v0.2.27 * Spicyboros + airoboros 2.2 template update. (lm-sys#2392) Co-authored-by: Jon Durbin <[email protected]> * bugfix of openai_api_server for fastchat.serve.vllm_worker (lm-sys#2398) Co-authored-by: wuyongyu <[email protected]> * Revert "bugfix of openai_api_server for fastchat.serve.vllm_worker" (lm-sys#2400) * Revert "add best_of and use_beam_search for completions interface" (lm-sys#2401) * Release a v0.2.28 with bug fixes and more test cases * Fix model_worker error (lm-sys#2404) * Added google/flan models and fixed AutoModelForSeq2SeqLM when loading T5 compression model (lm-sys#2402) * Rename twitter to X (lm-sys#2406) * Update huggingface_api.py (lm-sys#2409) * Add support for baichuan2 models (lm-sys#2408) * Fixed character overlap issue when api streaming output (lm-sys#2431) * Support custom conversation template in multi_model_worker (lm-sys#2434) * Add Ascend NPU support (lm-sys#2422) * Add raw conversation template (lm-sys#2417) (lm-sys#2418) * Improve docs & UI (lm-sys#2436) * Fix Salesforce xgen inference (lm-sys#2350) * Add support for Phind-CodeLlama models (lm-sys#2415) (lm-sys#2416) Co-authored-by: Lianmin Zheng <[email protected]> * Add falcon 180B chat conversation template (lm-sys#2384) * Improve docs (lm-sys#2438) * add dtype and seed (lm-sys#2430) * Data cleaning scripts for dataset release (lm-sys#2440) * merge google/flan based adapters: T5Adapter, CodeT5pAdapter, FlanAdapter (lm-sys#2411) * Fix docs * Update UI (lm-sys#2446) * Add Optional SSL Support to controller.py (lm-sys#2448) * Format & Improve docs * Release v0.2.29 (lm-sys#2450) * Show terms of use as an JS alert (lm-sys#2461) * vllm worker awq quantization update (lm-sys#2463) Co-authored-by: 董晓龙 <[email protected]> * Fix falcon chat template (lm-sys#2464) --------- Signed-off-by: Lei Wen <[email protected]> Co-authored-by: Trangle <[email protected]> Co-authored-by: Nathan Stitt <[email protected]> Co-authored-by: Lianmin Zheng <[email protected]> Co-authored-by: leiwen83 <[email protected]> Co-authored-by: Lei Wen <[email protected]> Co-authored-by: Jon Durbin <[email protected]> Co-authored-by: Jon Durbin <[email protected]> Co-authored-by: Rayrtfr <[email protected]> Co-authored-by: wuyongyu <[email protected]> Co-authored-by: wangxiyuan <[email protected]> Co-authored-by: Jeff (Zhen) Wang <[email protected]> Co-authored-by: karshPrime <[email protected]> Co-authored-by: obitolyz <[email protected]> Co-authored-by: Shangwei Chen <[email protected]> Co-authored-by: HyungJin Ahn <[email protected]> Co-authored-by: zhangsibo1129 <[email protected]> Co-authored-by: Tobias Birchler <[email protected]> Co-authored-by: Jae-Won Chung <[email protected]> Co-authored-by: Mingdao Liu <[email protected]> Co-authored-by: Ying Sheng <[email protected]> Co-authored-by: Brandon Biggs <[email protected]> Co-authored-by: dongxiaolong <[email protected]> Co-authored-by: 董晓龙 <[email protected]>

* Remove hardcode flash-attn disable setting (lm-sys#2342) * Document turning off proxy_buffering when api is streaming (lm-sys#2337) * Simplify huggingface api example (lm-sys#2355) * Update sponsor logos (lm-sys#2367) * if LOGDIR is empty, then don't try output log to local file (lm-sys#2357) Signed-off-by: Lei Wen <[email protected]> Co-authored-by: Lei Wen <[email protected]> * add best_of and use_beam_search for completions interface (lm-sys#2348) Signed-off-by: Lei Wen <[email protected]> Co-authored-by: Lei Wen <[email protected]> * Extract upvote/downvote from log files (lm-sys#2369) * Revert "add best_of and use_beam_search for completions interface" (lm-sys#2370) * Improve doc (lm-sys#2371) * add best_of and use_beam_search for completions interface (lm-sys#2372) Signed-off-by: Lei Wen <[email protected]> Co-authored-by: Lei Wen <[email protected]> * update monkey patch for llama2 (lm-sys#2379) * Make E5 adapter more restrict to reduce mismatch (lm-sys#2381) * Update UI and sponsers (lm-sys#2387) * Use fsdp api for save save (lm-sys#2390) * Release v0.2.27 * Spicyboros + airoboros 2.2 template update. (lm-sys#2392) Co-authored-by: Jon Durbin <[email protected]> * bugfix of openai_api_server for fastchat.serve.vllm_worker (lm-sys#2398) Co-authored-by: wuyongyu <[email protected]> * Revert "bugfix of openai_api_server for fastchat.serve.vllm_worker" (lm-sys#2400) * Revert "add best_of and use_beam_search for completions interface" (lm-sys#2401) * Release a v0.2.28 with bug fixes and more test cases * Fix model_worker error (lm-sys#2404) * Added google/flan models and fixed AutoModelForSeq2SeqLM when loading T5 compression model (lm-sys#2402) * Rename twitter to X (lm-sys#2406) * Update huggingface_api.py (lm-sys#2409) * Add support for baichuan2 models (lm-sys#2408) * Fixed character overlap issue when api streaming output (lm-sys#2431) * Support custom conversation template in multi_model_worker (lm-sys#2434) * Add Ascend NPU support (lm-sys#2422) * Add raw conversation template (lm-sys#2417) (lm-sys#2418) * Improve docs & UI (lm-sys#2436) * Fix Salesforce xgen inference (lm-sys#2350) * Add support for Phind-CodeLlama models (lm-sys#2415) (lm-sys#2416) Co-authored-by: Lianmin Zheng <[email protected]> * Add falcon 180B chat conversation template (lm-sys#2384) * Improve docs (lm-sys#2438) * add dtype and seed (lm-sys#2430) * Data cleaning scripts for dataset release (lm-sys#2440) * merge google/flan based adapters: T5Adapter, CodeT5pAdapter, FlanAdapter (lm-sys#2411) * Fix docs * Update UI (lm-sys#2446) * Add Optional SSL Support to controller.py (lm-sys#2448) * Format & Improve docs * Release v0.2.29 (lm-sys#2450) * Show terms of use as an JS alert (lm-sys#2461) * vllm worker awq quantization update (lm-sys#2463) Co-authored-by: 董晓龙 <[email protected]> * Fix falcon chat template (lm-sys#2464) * Fix chunk handling when partial chunks are returned (lm-sys#2485) * Update openai_api_server.py to add an SSL option (lm-sys#2484) * Update vllm_worker.py (lm-sys#2482) * fix typo quantization (lm-sys#2469) * fix vllm quanziation args * Update README.md (lm-sys#2492) * Huggingface api worker (lm-sys#2456) * Update links to lmsys-chat-1m (lm-sys#2497) * Update train code to support the new tokenizer (lm-sys#2498) * Third Party UI Example (lm-sys#2499) * Add metharme (pygmalion) conversation template (lm-sys#2500) * Optimize for proper flash attn causal handling (lm-sys#2503) * Add Mistral AI instruction template (lm-sys#2483) * Update monitor & plots (lm-sys#2506) * Release v0.2.30 (lm-sys#2507) * Fix for single turn dataset (lm-sys#2509) * replace os.getenv with os.path.expanduser because the first one doesn… (lm-sys#2515) Co-authored-by: khalil <[email protected]> * Fix arena (lm-sys#2522) * Update Dockerfile (lm-sys#2524) * add Llama2ChangAdapter (lm-sys#2510) * Add ExllamaV2 Inference Framework Support. (lm-sys#2455) * Improve docs (lm-sys#2534) * Fix warnings for new gradio versions (lm-sys#2538) * revert the gradio change; now works for 3.40 * Improve chat templates (lm-sys#2539) * Add Zephyr 7B Alpha (lm-sys#2535) * Improve Support for Mistral-Instruct (lm-sys#2547) * correct max_tokens by context_length instead of raise exception (lm-sys#2544) * Revert "Improve Support for Mistral-Instruct" (lm-sys#2552) * Fix Mistral template (lm-sys#2529) * Add additional Informations from the vllm worker (lm-sys#2550) * Make FastChat work with LMSYS-Chat-1M Code (lm-sys#2551) * Create `tags` attribute to fix `MarkupError` in rich CLI (lm-sys#2553) * move BaseModelWorker outside serve.model_worker to make it independent (lm-sys#2531) * Misc style and bug fixes (lm-sys#2559) * Fix README.md (lm-sys#2561) * release v0.2.31 (lm-sys#2563) * resolves lm-sys#2542 modify dockerfile to upgrade cuda to 12.2.0 and pydantic 1.10.13 (lm-sys#2565) * Add airoboros_v3 chat template (llama-2 format) (lm-sys#2564) * Add Xwin-LM V0.1, V0.2 support (lm-sys#2566) * Fixed model_worker generate_gate may blocked main thread (lm-sys#2540) (lm-sys#2562) * feat: add claude-v2 (lm-sys#2571) * Update vigogne template (lm-sys#2580) * Fix issue lm-sys#2568: --device mps led to TypeError: forward() got an unexpected keyword argument 'padding_mask'. (lm-sys#2579) * Add Mistral-7B-OpenOrca conversation_temmplate (lm-sys#2585) * docs: bit misspell comments model adapter default template name conversation (lm-sys#2594) * Update Mistral template (lm-sys#2581) * Fix <s> in mistral template * Update README.md (vicuna-v1.3 -> vicuna-1.5) (lm-sys#2592) * Update README.md to highlight chatbot arena (lm-sys#2596) * Add Lemur model (lm-sys#2584) Co-authored-by: Roberto Ugolotti <[email protected]> * add trust_remote_code=True in BaseModelAdapter (lm-sys#2583) * Openai interface add use beam search and best of 2 (lm-sys#2442) Signed-off-by: Lei Wen <[email protected]> Co-authored-by: Lei Wen <[email protected]> * Update qwen and add pygmalion (lm-sys#2607) * feat: Support model AquilaChat2 (lm-sys#2616) * Added settings vllm (lm-sys#2599) Co-authored-by: bodza <[email protected]> Co-authored-by: bodza <[email protected]> * [Logprobs] Support logprobs=1 (lm-sys#2612) * release v0.2.32 * fix: Fix for OpenOrcaAdapter to return correct conversation template (lm-sys#2613) * Make fastchat.serve.model_worker to take debug argument (lm-sys#2628) Co-authored-by: hi-jin <[email protected]> * openchat 3.5 model support (lm-sys#2638) * xFastTransformer framework support (lm-sys#2615) * feat: support custom models vllm serving (lm-sys#2635) * kill only fastchat process (lm-sys#2641) * Update server_arch.png * Use conv.update_last_message api in mt-bench answer generation (lm-sys#2647) * Improve Azure OpenAI interface (lm-sys#2651) * Add required_temp support in jsonl format to support flexible temperature setting for gen_api_answer (lm-sys#2653) * Pin openai version < 1 (lm-sys#2658) * Remove exclude_unset parameter (lm-sys#2654) * Revert "Remove exclude_unset parameter" (lm-sys#2666) * added support for CodeGeex(2) (lm-sys#2645) * add chatglm3 conv template support in conversation.py (lm-sys#2622) * UI and model change (lm-sys#2672) Co-authored-by: Lianmin Zheng <[email protected]> * train_flant5: fix typo (lm-sys#2673) * Fix gpt template (lm-sys#2674) * Update README.md (lm-sys#2679) * feat: support template's stop_str as list (lm-sys#2678) * Update exllama_v2.md (lm-sys#2680) * save model under deepspeed (lm-sys#2689) * Adding SSL support for model workers and huggingface worker (lm-sys#2687) * Check the max_new_tokens <= 0 in openai api server (lm-sys#2688) * Add Microsoft/Orca-2-7b and update model support docs (lm-sys#2714) * fix tokenizer of chatglm2 (lm-sys#2711) * Template for using Deepseek code models (lm-sys#2705) * add support for Chinese-LLaMA-Alpaca (lm-sys#2700) * Make --load-8bit flag work with weights in safetensors format (lm-sys#2698) * Format code and minor bug fix (lm-sys#2716) * Bump version to v0.2.33 (lm-sys#2717) * fix tokenizer.pad_token attribute error (lm-sys#2710) * support stable-vicuna model (lm-sys#2696) * Exllama cache 8bit (lm-sys#2719) * Add Yi support (lm-sys#2723) * Add Hermes 2.5 [fixed] (lm-sys#2725) * Fix Hermes2Adapter (lm-sys#2727) * Fix YiAdapter (lm-sys#2730) * add trust_remote_code argument (lm-sys#2715) * Add revision arg to MT Bench answer generation (lm-sys#2728) * Fix MPS backend 'index out of range' error (lm-sys#2737) * add starling support (lm-sys#2738) --------- Signed-off-by: Lei Wen <[email protected]> Co-authored-by: Trangle <[email protected]> Co-authored-by: Nathan Stitt <[email protected]> Co-authored-by: Lianmin Zheng <[email protected]> Co-authored-by: leiwen83 <[email protected]> Co-authored-by: Lei Wen <[email protected]> Co-authored-by: Jon Durbin <[email protected]> Co-authored-by: Jon Durbin <[email protected]> Co-authored-by: Rayrtfr <[email protected]> Co-authored-by: wuyongyu <[email protected]> Co-authored-by: wangxiyuan <[email protected]> Co-authored-by: Jeff (Zhen) Wang <[email protected]> Co-authored-by: karshPrime <[email protected]> Co-authored-by: obitolyz <[email protected]> Co-authored-by: Shangwei Chen <[email protected]> Co-authored-by: HyungJin Ahn <[email protected]> Co-authored-by: zhangsibo1129 <[email protected]> Co-authored-by: Tobias Birchler <[email protected]> Co-authored-by: Jae-Won Chung <[email protected]> Co-authored-by: Mingdao Liu <[email protected]> Co-authored-by: Ying Sheng <[email protected]> Co-authored-by: Brandon Biggs <[email protected]> Co-authored-by: dongxiaolong <[email protected]> Co-authored-by: 董晓龙 <[email protected]> Co-authored-by: Siddartha Naidu <[email protected]> Co-authored-by: shuishu <[email protected]> Co-authored-by: Andrew Aikawa <[email protected]> Co-authored-by: Liangsheng Yin <[email protected]> Co-authored-by: enochlev <[email protected]> Co-authored-by: AlpinDale <[email protected]> Co-authored-by: Lé <[email protected]> Co-authored-by: Toshiki Kataoka <[email protected]> Co-authored-by: khalil <[email protected]> Co-authored-by: khalil <[email protected]> Co-authored-by: dubaoquan404 <[email protected]> Co-authored-by: Chang W. Lee <[email protected]> Co-authored-by: theScotchGame <[email protected]> Co-authored-by: lewtun <[email protected]> Co-authored-by: Stephen Horvath <[email protected]> Co-authored-by: liunux4odoo <[email protected]> Co-authored-by: Norman Mu <[email protected]> Co-authored-by: Sebastian Bodza <[email protected]> Co-authored-by: Tianle (Tim) Li <[email protected]> Co-authored-by: Wei-Lin Chiang <[email protected]> Co-authored-by: Alex <[email protected]> Co-authored-by: Jingcheng Hu <[email protected]> Co-authored-by: lvxuan <[email protected]> Co-authored-by: cOng <[email protected]> Co-authored-by: bofeng huang <[email protected]> Co-authored-by: Phil-U-U <[email protected]> Co-authored-by: Wayne Spangenberg <[email protected]> Co-authored-by: Guspan Tanadi <[email protected]> Co-authored-by: Rohan Gupta <[email protected]> Co-authored-by: ugolotti <[email protected]> Co-authored-by: Roberto Ugolotti <[email protected]> Co-authored-by: edisonwd <[email protected]> Co-authored-by: FangYin Cheng <[email protected]> Co-authored-by: bodza <[email protected]> Co-authored-by: bodza <[email protected]> Co-authored-by: Cody Yu <[email protected]> Co-authored-by: Srinath Janakiraman <[email protected]> Co-authored-by: Jaeheon Jeong <[email protected]> Co-authored-by: One <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: David <[email protected]> Co-authored-by: Witold Wasiczko <[email protected]> Co-authored-by: Peter Willemsen <[email protected]> Co-authored-by: ZeyuTeng96 <[email protected]> Co-authored-by: Forceless <[email protected]> Co-authored-by: Jeff <[email protected]> Co-authored-by: MrZhengXin <[email protected]> Co-authored-by: Long Nguyen <[email protected]> Co-authored-by: Elsa Granger <[email protected]> Co-authored-by: Christopher Chou <[email protected]> Co-authored-by: wangshuai09 <[email protected]> Co-authored-by: amaleshvemula <[email protected]> Co-authored-by: Zollty Tsou <[email protected]> Co-authored-by: xuguodong1999 <[email protected]> Co-authored-by: Michael J Kaye <[email protected]> Co-authored-by: 152334H <[email protected]> Co-authored-by: Jingsong-Yan <[email protected]> Co-authored-by: Siyuan (Ryans) Zhuang <[email protected]>

* Remove hardcode flash-attn disable setting (lm-sys#2342) * Document turning off proxy_buffering when api is streaming (lm-sys#2337) * Simplify huggingface api example (lm-sys#2355) * Update sponsor logos (lm-sys#2367) * if LOGDIR is empty, then don't try output log to local file (lm-sys#2357) Signed-off-by: Lei Wen <[email protected]> Co-authored-by: Lei Wen <[email protected]> * add best_of and use_beam_search for completions interface (lm-sys#2348) Signed-off-by: Lei Wen <[email protected]> Co-authored-by: Lei Wen <[email protected]> * Extract upvote/downvote from log files (lm-sys#2369) * Revert "add best_of and use_beam_search for completions interface" (lm-sys#2370) * Improve doc (lm-sys#2371) * add best_of and use_beam_search for completions interface (lm-sys#2372) Signed-off-by: Lei Wen <[email protected]> Co-authored-by: Lei Wen <[email protected]> * update monkey patch for llama2 (lm-sys#2379) * Make E5 adapter more restrict to reduce mismatch (lm-sys#2381) * Update UI and sponsers (lm-sys#2387) * Use fsdp api for save save (lm-sys#2390) * Release v0.2.27 * Spicyboros + airoboros 2.2 template update. (lm-sys#2392) Co-authored-by: Jon Durbin <[email protected]> * bugfix of openai_api_server for fastchat.serve.vllm_worker (lm-sys#2398) Co-authored-by: wuyongyu <[email protected]> * Revert "bugfix of openai_api_server for fastchat.serve.vllm_worker" (lm-sys#2400) * Revert "add best_of and use_beam_search for completions interface" (lm-sys#2401) * Release a v0.2.28 with bug fixes and more test cases * Fix model_worker error (lm-sys#2404) * Added google/flan models and fixed AutoModelForSeq2SeqLM when loading T5 compression model (lm-sys#2402) * Rename twitter to X (lm-sys#2406) * Update huggingface_api.py (lm-sys#2409) * Add support for baichuan2 models (lm-sys#2408) * Fixed character overlap issue when api streaming output (lm-sys#2431) * Support custom conversation template in multi_model_worker (lm-sys#2434) * Add Ascend NPU support (lm-sys#2422) * Add raw conversation template (lm-sys#2417) (lm-sys#2418) * Improve docs & UI (lm-sys#2436) * Fix Salesforce xgen inference (lm-sys#2350) * Add support for Phind-CodeLlama models (lm-sys#2415) (lm-sys#2416) Co-authored-by: Lianmin Zheng <[email protected]> * Add falcon 180B chat conversation template (lm-sys#2384) * Improve docs (lm-sys#2438) * add dtype and seed (lm-sys#2430) * Data cleaning scripts for dataset release (lm-sys#2440) * merge google/flan based adapters: T5Adapter, CodeT5pAdapter, FlanAdapter (lm-sys#2411) * Fix docs * Update UI (lm-sys#2446) * Add Optional SSL Support to controller.py (lm-sys#2448) * Format & Improve docs * Release v0.2.29 (lm-sys#2450) * Show terms of use as an JS alert (lm-sys#2461) * vllm worker awq quantization update (lm-sys#2463) Co-authored-by: 董晓龙 <[email protected]> * Fix falcon chat template (lm-sys#2464) * Fix chunk handling when partial chunks are returned (lm-sys#2485) * Update openai_api_server.py to add an SSL option (lm-sys#2484) * Update vllm_worker.py (lm-sys#2482) * fix typo quantization (lm-sys#2469) * fix vllm quanziation args * Update README.md (lm-sys#2492) * Huggingface api worker (lm-sys#2456) * Update links to lmsys-chat-1m (lm-sys#2497) * Update train code to support the new tokenizer (lm-sys#2498) * Third Party UI Example (lm-sys#2499) * Add metharme (pygmalion) conversation template (lm-sys#2500) * Optimize for proper flash attn causal handling (lm-sys#2503) * Add Mistral AI instruction template (lm-sys#2483) * Update monitor & plots (lm-sys#2506) * Release v0.2.30 (lm-sys#2507) * Fix for single turn dataset (lm-sys#2509) * replace os.getenv with os.path.expanduser because the first one doesn… (lm-sys#2515) Co-authored-by: khalil <[email protected]> * Fix arena (lm-sys#2522) * Update Dockerfile (lm-sys#2524) * add Llama2ChangAdapter (lm-sys#2510) * Add ExllamaV2 Inference Framework Support. (lm-sys#2455) * Improve docs (lm-sys#2534) * Fix warnings for new gradio versions (lm-sys#2538) * revert the gradio change; now works for 3.40 * Improve chat templates (lm-sys#2539) * Add Zephyr 7B Alpha (lm-sys#2535) * Improve Support for Mistral-Instruct (lm-sys#2547) * correct max_tokens by context_length instead of raise exception (lm-sys#2544) * Revert "Improve Support for Mistral-Instruct" (lm-sys#2552) * Fix Mistral template (lm-sys#2529) * Add additional Informations from the vllm worker (lm-sys#2550) * Make FastChat work with LMSYS-Chat-1M Code (lm-sys#2551) * Create `tags` attribute to fix `MarkupError` in rich CLI (lm-sys#2553) * move BaseModelWorker outside serve.model_worker to make it independent (lm-sys#2531) * Misc style and bug fixes (lm-sys#2559) * Fix README.md (lm-sys#2561) * release v0.2.31 (lm-sys#2563) * resolves lm-sys#2542 modify dockerfile to upgrade cuda to 12.2.0 and pydantic 1.10.13 (lm-sys#2565) * Add airoboros_v3 chat template (llama-2 format) (lm-sys#2564) * Add Xwin-LM V0.1, V0.2 support (lm-sys#2566) * Fixed model_worker generate_gate may blocked main thread (lm-sys#2540) (lm-sys#2562) * feat: add claude-v2 (lm-sys#2571) * Update vigogne template (lm-sys#2580) * Fix issue lm-sys#2568: --device mps led to TypeError: forward() got an unexpected keyword argument 'padding_mask'. (lm-sys#2579) * Add Mistral-7B-OpenOrca conversation_temmplate (lm-sys#2585) * docs: bit misspell comments model adapter default template name conversation (lm-sys#2594) * Update Mistral template (lm-sys#2581) * Fix <s> in mistral template * Update README.md (vicuna-v1.3 -> vicuna-1.5) (lm-sys#2592) * Update README.md to highlight chatbot arena (lm-sys#2596) * Add Lemur model (lm-sys#2584) Co-authored-by: Roberto Ugolotti <[email protected]> * add trust_remote_code=True in BaseModelAdapter (lm-sys#2583) * Openai interface add use beam search and best of 2 (lm-sys#2442) Signed-off-by: Lei Wen <[email protected]> Co-authored-by: Lei Wen <[email protected]> * Update qwen and add pygmalion (lm-sys#2607) * feat: Support model AquilaChat2 (lm-sys#2616) * Added settings vllm (lm-sys#2599) Co-authored-by: bodza <[email protected]> Co-authored-by: bodza <[email protected]> * [Logprobs] Support logprobs=1 (lm-sys#2612) * release v0.2.32 * fix: Fix for OpenOrcaAdapter to return correct conversation template (lm-sys#2613) * Make fastchat.serve.model_worker to take debug argument (lm-sys#2628) Co-authored-by: hi-jin <[email protected]> * openchat 3.5 model support (lm-sys#2638) * xFastTransformer framework support (lm-sys#2615) * feat: support custom models vllm serving (lm-sys#2635) * kill only fastchat process (lm-sys#2641) * Update server_arch.png * Use conv.update_last_message api in mt-bench answer generation (lm-sys#2647) * Improve Azure OpenAI interface (lm-sys#2651) * Add required_temp support in jsonl format to support flexible temperature setting for gen_api_answer (lm-sys#2653) * Pin openai version < 1 (lm-sys#2658) * Remove exclude_unset parameter (lm-sys#2654) * Revert "Remove exclude_unset parameter" (lm-sys#2666) * added support for CodeGeex(2) (lm-sys#2645) * add chatglm3 conv template support in conversation.py (lm-sys#2622) * UI and model change (lm-sys#2672) Co-authored-by: Lianmin Zheng <[email protected]> * train_flant5: fix typo (lm-sys#2673) * Fix gpt template (lm-sys#2674) * Update README.md (lm-sys#2679) * feat: support template's stop_str as list (lm-sys#2678) * Update exllama_v2.md (lm-sys#2680) * save model under deepspeed (lm-sys#2689) * Adding SSL support for model workers and huggingface worker (lm-sys#2687) * Check the max_new_tokens <= 0 in openai api server (lm-sys#2688) * Add Microsoft/Orca-2-7b and update model support docs (lm-sys#2714) * fix tokenizer of chatglm2 (lm-sys#2711) * Template for using Deepseek code models (lm-sys#2705) * add support for Chinese-LLaMA-Alpaca (lm-sys#2700) * Make --load-8bit flag work with weights in safetensors format (lm-sys#2698) * Format code and minor bug fix (lm-sys#2716) * Bump version to v0.2.33 (lm-sys#2717) * fix tokenizer.pad_token attribute error (lm-sys#2710) * support stable-vicuna model (lm-sys#2696) * Exllama cache 8bit (lm-sys#2719) * Add Yi support (lm-sys#2723) * Add Hermes 2.5 [fixed] (lm-sys#2725) * Fix Hermes2Adapter (lm-sys#2727) * Fix YiAdapter (lm-sys#2730) * add trust_remote_code argument (lm-sys#2715) * Add revision arg to MT Bench answer generation (lm-sys#2728) * Fix MPS backend 'index out of range' error (lm-sys#2737) * add starling support (lm-sys#2738) * Add deepseek chat (lm-sys#2760) * a convenient script for spinning up the API with Model Workers (lm-sys#2790) * Prevent returning partial stop string in vllm worker (lm-sys#2780) * Update UI and new models (lm-sys#2762) * Support MetaMath (lm-sys#2748) * Use common logging code in the OpenAI API server (lm-sys#2758) Co-authored-by: Warren Francis <[email protected]> * Show how to turn on experiment tracking for fine-tuning (lm-sys#2742) Co-authored-by: Morgan McGuire <[email protected]> * Support xDAN-L1-Chat Model (lm-sys#2732) * Format code * Update the version to 0.2.34 (lm-sys#2793) * add dolphin (lm-sys#2794) * Fix tiny typo (lm-sys#2805) * Add instructions for evaluating on MT bench using vLLM (lm-sys#2770) * Update README.md * Add SOLAR-10.7b Instruct Model (lm-sys#2826) * Update README.md (lm-sys#2852) * fix: 'compeletion' typo (lm-sys#2847) * Add Tunnelmole as an open source alternative to ngrok and include usage instructions (lm-sys#2846) * update readme * update mt-bench readme * Add support for CatPPT (lm-sys#2840) * Add functionality to ping AI2 InferD endpoints for tulu 2 (lm-sys#2832) Co-authored-by: Sam Skjonsberg <[email protected]> * add download models from www.modelscope.cn (lm-sys#2830) Co-authored-by: mulin.lyh <[email protected]> * Fix conv_template of chinese alpaca 2 (lm-sys#2812) * add bagel model adapter (lm-sys#2814) * add root_path argument to gradio web server. (lm-sys#2807) Co-authored-by: bertls <[email protected]> * Import `accelerate` locally to avoid it as a strong dependency (lm-sys#2820) * Replace dict merge with unpacking for compatibility of 3.8 in vLLM worker (lm-sys#2824) Signed-off-by: rudeigerc <[email protected]> * Format code (lm-sys#2854) * Openai API migrate (lm-sys#2765) * fix openai api server docs * Add a16z as a sponser * Add new models (Perplexity, gemini) & Separate GPT versions (lm-sys#2856) Co-authored-by: Wei-Lin Chiang <[email protected]> * Clean error messages (lm-sys#2857) * Update docs (lm-sys#2858) * Modify doc description (lm-sys#2859) * Fix the problem of not using the decoding method corresponding to the base model in peft mode (lm-sys#2865) * update a new sota model on MT-Bench which touch an 8.8 scores. (lm-sys#2864) * NPU needs to be initialized when starting a new process (lm-sys#2843) * Fix the problem with "vllm + chatglm3" (lm-sys#2845) (lm-sys#2876) Co-authored-by: 姚峰 <[email protected]> * Update token spacing for mistral conversation.py (lm-sys#2872) * check if hm in models before deleting to avoid errors (lm-sys#2870) Co-authored-by: Your Name <[email protected]> * Add TinyLlama (lm-sys#2889) * Fix bug that model doesn't automatically switch peft adapter (lm-sys#2884) * Update web server commands (lm-sys#2869) * fix the tokenize process and prompt template of chatglm3 (lm-sys#2883) Co-authored-by: 章焕锭 <[email protected]> * Add `Notus` support (lm-sys#2813) Co-authored-by: alvarobartt <[email protected]> * feat: support anthropic api with api_dict (lm-sys#2879) * Update model_adapter.py (lm-sys#2895) * leaderboard code update (lm-sys#2867) * fix: change order of SEQUENCE_LENGTH_KEYS (lm-sys#2925) * fix baichuan:apply_prompt_template call args error (lm-sys#2921) Co-authored-by: Zheng Hao <[email protected]> * Fix a typo in openai_api_server.py (lm-sys#2905) * feat: use variables OPENAI_MODEL_LIST (lm-sys#2907) * Add TenyxChat-7B-v1 model (lm-sys#2901) Co-authored-by: sarath@L3 <[omitted]> * add support for iei yuan2.0 (https://huggingface.co/IEITYuan) (lm-sys#2919) * nous-hermes-2-mixtral-dpo (lm-sys#2922) * Bump the version to 0.2.35 (lm-sys#2927) * fix specify local path issue use model from www.modelscope.cn (lm-sys#2934) Co-authored-by: mulin.lyh <[email protected]> * support openai embedding for topic clustering (lm-sys#2729) * Remove duplicate API endpoint (lm-sys#2949) * Update Hermes Mixtral (lm-sys#2938) * Enablement of REST API Usage within Google Colab Free Tier (lm-sys#2940) * Create a new worker implementation for Apple MLX (lm-sys#2937) * feat: support Model Yuan2.0, a new generation Fundamental Large Language Model developed by IEIT System (lm-sys#2936) * Fix the pooling method of BGE embedding model (lm-sys#2926) * format code * SGLang Worker (lm-sys#2928) * Fix sglang worker (lm-sys#2953) * Update mlx_worker to be async (lm-sys#2958) * Integrate LightLLM into serve worker (lm-sys#2888) * Copy button (lm-sys#2963) * feat: train with template (lm-sys#2951) * fix content maybe a str (lm-sys#2968) * Adding download folder information in README (lm-sys#2972) * use cl100k_base as the default tiktoken encoding (lm-sys#2974) Signed-off-by: bjwswang <[email protected]> * Update README.md (lm-sys#2975) * Fix tokenizer for vllm worker (lm-sys#2984) * update yuan2.0 generation (lm-sys#2989) * fix: tokenization mismatch when training with different templates (lm-sys#2996) * fix: inconsistent tokenization by llama tokenizer (lm-sys#3006) * Fix type hint for play_a_match_single (lm-sys#3008) * code update (lm-sys#2997) * Update model_support.md (lm-sys#3016) * Update lightllm_integration.md (lm-sys#3014) * Upgrade gradio to 4.17 (lm-sys#3027) * Update MLX integration to use new generate_step function signature (lm-sys#3021) * Update readme (lm-sys#3028) * Update gradio version in `pyproject.toml` and fix a bug (lm-sys#3029) * Update gradio demo and API model providers (lm-sys#3030) * Gradio Web Server for Multimodal Models (lm-sys#2960) Co-authored-by: Lianmin Zheng <[email protected]> * Migrate the gradio server to openai v1 (lm-sys#3032) * Update version to 0.2.36 (lm-sys#3033) Co-authored-by: Wei-Lin Chiang <[email protected]> * Add llava 34b template (lm-sys#3034) * Update model support (lm-sys#3040) * Add psutil to pyproject.toml dependencies (lm-sys#3039) * Fix SGLang worker (lm-sys#3045) * Random VQA Sample button for VLM direct chat (lm-sys#3041) * Update arena.md to fix link (lm-sys#3051) * multi inference --------- Signed-off-by: Lei Wen <[email protected]> Signed-off-by: rudeigerc <[email protected]> Signed-off-by: bjwswang <[email protected]> Co-authored-by: Trangle <[email protected]> Co-authored-by: Nathan Stitt <[email protected]> Co-authored-by: Lianmin Zheng <[email protected]> Co-authored-by: leiwen83 <[email protected]> Co-authored-by: Lei Wen <[email protected]> Co-authored-by: Jon Durbin <[email protected]> Co-authored-by: Jon Durbin <[email protected]> Co-authored-by: Rayrtfr <[email protected]> Co-authored-by: wuyongyu <[email protected]> Co-authored-by: wangxiyuan <[email protected]> Co-authored-by: Jeff (Zhen) Wang <[email protected]> Co-authored-by: karshPrime <[email protected]> Co-authored-by: obitolyz <[email protected]> Co-authored-by: Shangwei Chen <[email protected]> Co-authored-by: HyungJin Ahn <[email protected]> Co-authored-by: zhangsibo1129 <[email protected]> Co-authored-by: Tobias Birchler <[email protected]> Co-authored-by: Jae-Won Chung <[email protected]> Co-authored-by: Mingdao Liu <[email protected]> Co-authored-by: Ying Sheng <[email protected]> Co-authored-by: Brandon Biggs <[email protected]> Co-authored-by: dongxiaolong <[email protected]> Co-authored-by: 董晓龙 <[email protected]> Co-authored-by: Siddartha Naidu <[email protected]> Co-authored-by: shuishu <[email protected]> Co-authored-by: Andrew Aikawa <[email protected]> Co-authored-by: Liangsheng Yin <[email protected]> Co-authored-by: enochlev <[email protected]> Co-authored-by: AlpinDale <[email protected]> Co-authored-by: Lé <[email protected]> Co-authored-by: Toshiki Kataoka <[email protected]> Co-authored-by: khalil <[email protected]> Co-authored-by: khalil <[email protected]> Co-authored-by: dubaoquan404 <[email protected]> Co-authored-by: Chang W. Lee <[email protected]> Co-authored-by: theScotchGame <[email protected]> Co-authored-by: lewtun <[email protected]> Co-authored-by: Stephen Horvath <[email protected]> Co-authored-by: liunux4odoo <[email protected]> Co-authored-by: Norman Mu <[email protected]> Co-authored-by: Sebastian Bodza <[email protected]> Co-authored-by: Tianle (Tim) Li <[email protected]> Co-authored-by: Wei-Lin Chiang <[email protected]> Co-authored-by: Alex <[email protected]> Co-authored-by: Jingcheng Hu <[email protected]> Co-authored-by: lvxuan <[email protected]> Co-authored-by: cOng <[email protected]> Co-authored-by: bofeng huang <[email protected]> Co-authored-by: Phil-U-U <[email protected]> Co-authored-by: Wayne Spangenberg <[email protected]> Co-authored-by: Guspan Tanadi <[email protected]> Co-authored-by: Rohan Gupta <[email protected]> Co-authored-by: ugolotti <[email protected]> Co-authored-by: Roberto Ugolotti <[email protected]> Co-authored-by: edisonwd <[email protected]> Co-authored-by: FangYin Cheng <[email protected]> Co-authored-by: bodza <[email protected]> Co-authored-by: bodza <[email protected]> Co-authored-by: Cody Yu <[email protected]> Co-authored-by: Srinath Janakiraman <[email protected]> Co-authored-by: Jaeheon Jeong <[email protected]> Co-authored-by: One <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: David <[email protected]> Co-authored-by: Witold Wasiczko <[email protected]> Co-authored-by: Peter Willemsen <[email protected]> Co-authored-by: ZeyuTeng96 <[email protected]> Co-authored-by: Forceless <[email protected]> Co-authored-by: Jeff <[email protected]> Co-authored-by: MrZhengXin <[email protected]> Co-authored-by: Long Nguyen <[email protected]> Co-authored-by: Elsa Granger <[email protected]> Co-authored-by: Christopher Chou <[email protected]> Co-authored-by: wangshuai09 <[email protected]> Co-authored-by: amaleshvemula <[email protected]> Co-authored-by: Zollty Tsou <[email protected]> Co-authored-by: xuguodong1999 <[email protected]> Co-authored-by: Michael J Kaye <[email protected]> Co-authored-by: 152334H <[email protected]> Co-authored-by: Jingsong-Yan <[email protected]> Co-authored-by: Siyuan (Ryans) Zhuang <[email protected]> Co-authored-by: Chris Kerwell Gresla <[email protected]> Co-authored-by: pandada8 <[email protected]> Co-authored-by: Isaac Ong <[email protected]> Co-authored-by: Warren Francis <[email protected]> Co-authored-by: Warren Francis <[email protected]> Co-authored-by: Morgan McGuire <[email protected]> Co-authored-by: Morgan McGuire <[email protected]> Co-authored-by: xDAN-AI <[email protected]> Co-authored-by: Ikko Eltociear Ashimine <[email protected]> Co-authored-by: Robbie <[email protected]> Co-authored-by: Rishiraj Acharya <[email protected]> Co-authored-by: Nathan Lambert <[email protected]> Co-authored-by: Sam Skjonsberg <[email protected]> Co-authored-by: liuyhwangyh <[email protected]> Co-authored-by: mulin.lyh <[email protected]> Co-authored-by: stephanbertl <[email protected]> Co-authored-by: bertls <[email protected]> Co-authored-by: Chirag Jain <[email protected]> Co-authored-by: Yuchen Cheng <[email protected]> Co-authored-by: Shuo Yang <[email protected]> Co-authored-by: Wei-Lin Chiang <[email protected]> Co-authored-by: JQ <[email protected]> Co-authored-by: yaofeng <[email protected]> Co-authored-by: 姚峰 <[email protected]> Co-authored-by: Michael <[email protected]> Co-authored-by: Josh NE <[email protected]> Co-authored-by: Your Name <[email protected]> Co-authored-by: WHDY <[email protected]> Co-authored-by: 章焕锭 <[email protected]> Co-authored-by: Gabriel Martín Blázquez <[email protected]> Co-authored-by: alvarobartt <[email protected]> Co-authored-by: Zheng Hao <[email protected]> Co-authored-by: Ren Xuancheng <[email protected]> Co-authored-by: Sarath Shekkizhar <[email protected]> Co-authored-by: wangpengfei1013 <[email protected]> Co-authored-by: Alexandre Strube <[email protected]> Co-authored-by: Teknium <[email protected]> Co-authored-by: Cristian Gutiérrez <[email protected]> Co-authored-by: ali asaria <[email protected]> Co-authored-by: wulixuan <[email protected]> Co-authored-by: staoxiao <[email protected]> Co-authored-by: Zaida Zhou <[email protected]> Co-authored-by: dheeraj-326 <[email protected]> Co-authored-by: bjwswang <[email protected]> Co-authored-by: Zhanghao Wu <[email protected]> Co-authored-by: Ted Li <[email protected]> Co-authored-by: Shukant Pal <[email protected]> Co-authored-by: Lisa Dunlap <[email protected]> Co-authored-by: Logan Kilpatrick <[email protected]>

Support custom conversation template in multi_model_worker

ea571dd

hi-jin changed the title ~~Support custom conversation template in multi_model_worker (#2383)~~ Support custom conversation template in multi_model_worker Sep 17, 2023

merrymercy merged commit c7e3e67 into lm-sys:main Sep 18, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support custom conversation template in multi_model_worker #2434

Support custom conversation template in multi_model_worker #2434

hi-jin commented Sep 17, 2023 •

edited

Loading

Support custom conversation template in multi_model_worker #2434

Support custom conversation template in multi_model_worker #2434

Conversation

hi-jin commented Sep 17, 2023 • edited Loading

Why are these changes needed?

Related issue number (if applicable)

Checks

hi-jin commented Sep 17, 2023 •

edited

Loading