Skip to content

Releases: turboderp/exllamav2

0.2.6

07 Dec 14:58
Compare
Choose a tag to compare
  • Some small fixes, most notably for Qwen2-VL inference on Windows

Full Changelog: v0.2.5...v0.2.6

0.2.5

01 Dec 13:32
Compare
Choose a tag to compare
  • Initial support for Qwen2-VL (images for now, no video)
  • Some bugfixes

Full Changelog: v0.2.4...v0.2.5

0.2.4

12 Nov 03:21
Compare
Choose a tag to compare
  • Support Pixtral
  • Refactoring for more multimodal support
  • Faster filter evaluation
  • Various optimizations and bugfixes
  • Various quality of life improvements

Full Changelog: v0.2.3...v0.2.4

0.2.3

29 Sep 11:04
Compare
Choose a tag to compare
  • No longer use safetensors for loading weights (fix virtual memory issues on Windows especially)
  • Disable fasttensors option (now redundant)
  • Prioritize HF Tokenizers model when both HF and SPM models available
  • Add XTC sampler
  • Add YaRN support
  • Various fixes and QoL improvements

Full Changelog: v0.2.2...v0.2.3

0.2.2

14 Sep 19:20
Compare
Choose a tag to compare
  • small fixes related to LMFE
  • allow SDPA during normal inference with custom bias

Full Changelog: v0.2.1...v0.2.2

0.2.1

08 Sep 17:26
Compare
Choose a tag to compare
  • TP: fallback SDPA mode when flash-attn is unavailable
  • Faster filter/grammar path
  • Add DRY
  • Fix issues since 0.1.9 (streams/graphs) when loading certain models via Tabby
  • Banish Râul

Full Changelog: v0.2.0...v0.2.1

0.2.0

28 Aug 21:00
Compare
Choose a tag to compare

Small release to fix various issues in 0.1.9

Full Changelog: v0.1.9...v0.2.0

0.1.9

22 Aug 11:54
Compare
Choose a tag to compare
  • Add experimental tensor-parallel mode. Currently supports Llama(1+2+3), Qwen2 and Mistral models
  • CUDA Graphs to reduce overhead and CPU bottlenecking
  • Various other optimizations
  • Some bugfixes

Full Changelog: v0.1.8...v0.1.9

0.1.8

24 Jul 06:36
Compare
Choose a tag to compare
  • Support Llama 3.1 (correct RoPE scaling etc.)
  • Support IndexTeam architecture
  • Some bugfixes and QoL improvements

Full Changelog: v0.1.7...v0.1.8

0.1.7

11 Jul 13:20
Compare
Choose a tag to compare
  • Support Gemma2
  • Support InternLM2
  • Various bugfixes and optimizations

Full Changelog: v0.1.6...v0.1.7