v0.12.2
What's new in 0.12.2 (2024-06-21)
These are the changes in inference v0.12.2.
New features
- FEAT: Add Tools Support for Qwen Series MOE Models by @zhanghx0905 in #1642
- FEAT: [UI]Modify the deletion function of a custom model. by @yiboyasss in #1656
- FEAT: [UI]Custom model presents JSON data and modifies it. by @yiboyasss in #1670
- FEAT: Add Rerank model token input/output usage by @wxiwnd in #1657
Enhancements
- ENH: Continuous batching supports all the models with
transformers
backend by @ChengjieLi28 in #1659
Bug fixes
- BUG: show error when user launch quantized model without device supported by @Minamiyama in #1645
- BUG: Fix default rerank type by @codingl2k1 in #1649
- BUG: chat_completion not response while error appears more than 100 by @liuzhenghua in #1663
Tests
- TST: Fix CI due to
tenacity
by @ChengjieLi28 in #1660
Others
- CHORE: [pre-commit] Add exclude thirdparty rules by @frostyplanet in #1678
Full Changelog: v0.12.1...v0.12.2