Releases: turboderp-org/exllamav2
Releases · turboderp-org/exllamav2
0.1.7
- Support Gemma2
- Support InternLM2
- Various bugfixes and optimizations
Full Changelog: v0.1.6...v0.1.7
0.1.6
- Fix dynamic generator fallback mode (was broken for prompts longer than max_input_len)
- Fix inference on ROCm wave64 devices
- Made model conversion script part of
exllamav2
package - CPU optimizations
Full Changelog: v0.1.5...v0.1.6
0.1.5
- Added Q6 and Q8 cache modes
- Defragment cache in dynamic generator
- Use SDPA with Torch 2.3.0+
- Updated wheels to Torch 2.3.1
- Added Python 3.12 wheels, plus Python 3.9 for ROCm
Full Changelog: v0.1.4...v0.1.5
0.1.4
- Option to keep calibration states in VRAM while measuring
- Fix for Q4 cache for odd key/value sizes (MiniCPM specifically)
- Alternative
fasttensors
option on Windows to solve system memory issues - Prefix filter with multiple prefixes
Full Changelog: v0.1.3...v0.1.4
0.1.3
- Fixes CFG
Full Changelog: v0.1.2...v0.1.3
0.1.2
- Support MiniCPM architecture
- Optimized prompt processing for page generator with Q4 cache
- New HumanEval and MMLU tests using dynamic generator
- Some bugfixes and small QoL improvements
Full Changelog: v0.1.1...v0.1.2
0.1.1
- Fix performance of Q4 cache in dynamic generator
- Add paged attn support for FP16 models
- Add xformers support
Full Changelog: v0.1.0...v0.1.1
0.1.0
- Paged attention support (requries flash-attn>=2.5.7)
- New generator with dynamic batching support (requires paged attn)
- Examples updated for dynamic generator
- Faster draft model SD
- Various optimizations, bugfixes and QoL improvements
Full Changelog: v0.0.21...v0.1.0
0.0.21
- Support for Granite architecture
- Support for GPT2 architecture
- Support for banned strings in streaming generator
- A bit more work on multimodal support (still unfinished)
- Few bugfixes and stuff
- Windows wheels for PyTorch 2.2.0 are included below to work around an apparent (likely temporary) issue in PyTorch. See #434 and pytorch/pytorch#125109
Full Changelog: v0.0.20...v0.0.21
0.0.20
- Adds Phi3 support
- Wheels compiled for PyTorch 2.3.0
- ROCm 6.0 wheels
Full Changelog: v0.0.19...v0.0.20