Skip to content

Releases: turboderp-org/exllamav2

0.1.7

11 Jul 13:20
Compare
Choose a tag to compare
  • Support Gemma2
  • Support InternLM2
  • Various bugfixes and optimizations

Full Changelog: v0.1.6...v0.1.7

0.1.6

24 Jun 00:36
Compare
Choose a tag to compare
  • Fix dynamic generator fallback mode (was broken for prompts longer than max_input_len)
  • Fix inference on ROCm wave64 devices
  • Made model conversion script part of exllamav2 package
  • CPU optimizations

Full Changelog: v0.1.5...v0.1.6

0.1.5

09 Jun 00:19
Compare
Choose a tag to compare
  • Added Q6 and Q8 cache modes
  • Defragment cache in dynamic generator
  • Use SDPA with Torch 2.3.0+
  • Updated wheels to Torch 2.3.1
  • Added Python 3.12 wheels, plus Python 3.9 for ROCm

Full Changelog: v0.1.4...v0.1.5

0.1.4

03 Jun 23:34
Compare
Choose a tag to compare
  • Option to keep calibration states in VRAM while measuring
  • Fix for Q4 cache for odd key/value sizes (MiniCPM specifically)
  • Alternative fasttensors option on Windows to solve system memory issues
  • Prefix filter with multiple prefixes

Full Changelog: v0.1.3...v0.1.4

0.1.3

01 Jun 19:32
Compare
Choose a tag to compare
  • Fixes CFG

Full Changelog: v0.1.2...v0.1.3

0.1.2

01 Jun 17:58
Compare
Choose a tag to compare
  • Support MiniCPM architecture
  • Optimized prompt processing for page generator with Q4 cache
  • New HumanEval and MMLU tests using dynamic generator
  • Some bugfixes and small QoL improvements

Full Changelog: v0.1.1...v0.1.2

0.1.1

27 May 16:53
Compare
Choose a tag to compare
  • Fix performance of Q4 cache in dynamic generator
  • Add paged attn support for FP16 models
  • Add xformers support

Full Changelog: v0.1.0...v0.1.1

0.1.0

25 May 20:56
Compare
Choose a tag to compare
  • Paged attention support (requries flash-attn>=2.5.7)
  • New generator with dynamic batching support (requires paged attn)
  • Examples updated for dynamic generator
  • Faster draft model SD
  • Various optimizations, bugfixes and QoL improvements

Full Changelog: v0.0.21...v0.1.0

0.0.21

11 May 13:31
Compare
Choose a tag to compare
  • Support for Granite architecture
  • Support for GPT2 architecture
  • Support for banned strings in streaming generator
  • A bit more work on multimodal support (still unfinished)
  • Few bugfixes and stuff
  • Windows wheels for PyTorch 2.2.0 are included below to work around an apparent (likely temporary) issue in PyTorch. See #434 and pytorch/pytorch#125109

Full Changelog: v0.0.20...v0.0.21

0.0.20

27 Apr 00:56
Compare
Choose a tag to compare
  • Adds Phi3 support
  • Wheels compiled for PyTorch 2.3.0
  • ROCm 6.0 wheels

Full Changelog: v0.0.19...v0.0.20