Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

turboderp-org / exllamav2 Public

Notifications You must be signed in to change notification settings
Fork 290
Star 3.8k

Code
Issues 84
Pull requests 14
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Releases: turboderp-org/exllamav2

Releases Tags

Releases · turboderp-org/exllamav2

0.1.7

11 Jul 13:20

github-actions

v0.1.7

ca9aecf

Compare

Choose a tag to compare

View all tags

0.1.7

Support Gemma2
Support InternLM2
Various bugfixes and optimizations

Full Changelog: v0.1.6...v0.1.7

Assets 47

dancemanUK, pabl-o-ce, jepjoo, firengate, GralchemOz, dillonroach, and flflow reacted with thumbs up emoji

firengate and flflow reacted with laugh emoji

firengate, anxiangyipiao, and flflow reacted with hooray emoji

pabl-o-ce, flflow, beep39, gittb, firengate, dillonroach, and Djahal reacted with heart emoji

firengate, dillonroach, and flflow reacted with rocket emoji

All reactions

👍 7 reactions
😄 2 reactions
🎉 3 reactions
❤️ 7 reactions
🚀 3 reactions

11 people reacted

0.1.6

24 Jun 00:36

github-actions

v0.1.6

6a8172c

Compare

Choose a tag to compare

View all tags

0.1.6

Fix dynamic generator fallback mode (was broken for prompts longer than max_input_len)
Fix inference on ROCm wave64 devices
Made model conversion script part of exllamav2 package
CPU optimizations

Full Changelog: v0.1.5...v0.1.6

Assets 46

Thireus, firengate, drxmy, and flflow reacted with thumbs up emoji

firengate and flflow reacted with laugh emoji

firengate, RichardFevrier, and flflow reacted with hooray emoji

firengate, RichardFevrier, and flflow reacted with heart emoji

firengate and flflow reacted with rocket emoji

All reactions

👍 4 reactions
😄 2 reactions
🎉 3 reactions
❤️ 3 reactions
🚀 2 reactions

5 people reacted

0.1.5

09 Jun 00:19

github-actions

v0.1.5

3a3e69f

Compare

Choose a tag to compare

View all tags

0.1.5

Added Q6 and Q8 cache modes
Defragment cache in dynamic generator
Use SDPA with Torch 2.3.0+
Updated wheels to Torch 2.3.1
Added Python 3.12 wheels, plus Python 3.9 for ROCm

Full Changelog: v0.1.4...v0.1.5

Assets 46

firengate, remichu-ai, and flflow reacted with thumbs up emoji

firengate and flflow reacted with laugh emoji

firengate, RichardFevrier, AgeOfAlgorithms, akaszynski, and flflow reacted with hooray emoji

firengate, epicfilemcnulty, and flflow reacted with heart emoji

firengate, ramzeez88, iamwavecut, and flflow reacted with rocket emoji

All reactions

👍 3 reactions
😄 2 reactions
🎉 5 reactions
❤️ 3 reactions
🚀 4 reactions

9 people reacted

0.1.4

03 Jun 23:34

github-actions

v0.1.4

ee0e84b

Compare

Choose a tag to compare

View all tags

0.1.4

Option to keep calibration states in VRAM while measuring
Fix for Q4 cache for odd key/value sizes (MiniCPM specifically)
Alternative fasttensors option on Windows to solve system memory issues
Prefix filter with multiple prefixes

Full Changelog: v0.1.3...v0.1.4

Assets 48

firengate, Nottlespike, and mpomplun-bb reacted with thumbs up emoji

firengate reacted with laugh emoji

bartowski1182, ipechman, and firengate reacted with hooray emoji

firengate reacted with heart emoji

firengate reacted with rocket emoji

All reactions

👍 3 reactions
😄 1 reaction
🎉 3 reactions
❤️ 1 reaction
🚀 1 reaction

5 people reacted

0.1.3

01 Jun 19:32

github-actions

v0.1.3

08bfd2b

Compare

Choose a tag to compare

View all tags

0.1.3

Fixes CFG

Full Changelog: v0.1.2...v0.1.3

Assets 39

All reactions

0.1.2

01 Jun 17:58

github-actions

v0.1.2

18a2580

Compare

Choose a tag to compare

View all tags

0.1.2

Support MiniCPM architecture
Optimized prompt processing for page generator with Q4 cache
New HumanEval and MMLU tests using dynamic generator
Some bugfixes and small QoL improvements

Full Changelog: v0.1.1...v0.1.2

Assets 39

firengate reacted with thumbs up emoji

firengate reacted with laugh emoji

firengate reacted with hooray emoji

firengate reacted with heart emoji

firengate reacted with rocket emoji

All reactions

👍 1 reaction
😄 1 reaction
🎉 1 reaction
❤️ 1 reaction
🚀 1 reaction

1 person reacted

0.1.1

27 May 16:53

github-actions

v0.1.1

8a57be1

Compare

Choose a tag to compare

View all tags

0.1.1

Fix performance of Q4 cache in dynamic generator
Add paged attn support for FP16 models
Add xformers support

Full Changelog: v0.1.0...v0.1.1

Assets 39

firengate reacted with thumbs up emoji

firengate reacted with laugh emoji

firengate reacted with hooray emoji

ashleykleynhans, LlamaEnjoyer, drxmy, and firengate reacted with heart emoji

firengate reacted with rocket emoji

All reactions

👍 1 reaction
😄 1 reaction
🎉 1 reaction
❤️ 4 reactions
🚀 1 reaction

4 people reacted

0.1.0

25 May 20:56

github-actions

v0.1.0

e6f230b

Compare

Choose a tag to compare

View all tags

0.1.0

Paged attention support (requries flash-attn>=2.5.7)
New generator with dynamic batching support (requires paged attn)
Examples updated for dynamic generator
Faster draft model SD
Various optimizations, bugfixes and QoL improvements

Full Changelog: v0.0.21...v0.1.0

Assets 39

firengate and jepjoo reacted with thumbs up emoji

firengate reacted with laugh emoji

firengate and bartowski1182 reacted with hooray emoji

firengate reacted with heart emoji

firengate, bartowski1182, and darrenangle reacted with rocket emoji

All reactions

👍 2 reactions
😄 1 reaction
🎉 2 reactions
❤️ 1 reaction
🚀 3 reactions

4 people reacted

0.0.21

11 May 13:31

github-actions

v0.0.21

a349847

Compare

Choose a tag to compare

View all tags

0.0.21

Support for Granite architecture
Support for GPT2 architecture
Support for banned strings in streaming generator
A bit more work on multimodal support (still unfinished)
Few bugfixes and stuff
Windows wheels for PyTorch 2.2.0 are included below to work around an apparent (likely temporary) issue in PyTorch. See #434 and pytorch/pytorch#125109

Full Changelog: v0.0.20...v0.0.21

Assets 39

remichu-ai, drxmy, firengate, and Lyrcaxis reacted with thumbs up emoji

firengate reacted with laugh emoji

firengate reacted with hooray emoji

pabl-o-ce, firengate, and flflow reacted with heart emoji

firengate reacted with rocket emoji

All reactions

👍 4 reactions
😄 1 reaction
🎉 1 reaction
❤️ 3 reactions
🚀 1 reaction

6 people reacted

0.0.20

27 Apr 00:56

github-actions

v0.0.20

68f1eba

Compare

Choose a tag to compare

View all tags

0.0.20

Adds Phi3 support
Wheels compiled for PyTorch 2.3.0
ROCm 6.0 wheels

Full Changelog: v0.0.19...v0.0.20

Assets 32

drxmy, LeiWang1999, venetanji, firengate, and Mar2ck reacted with thumbs up emoji

firengate reacted with laugh emoji

firengate reacted with hooray emoji

firengate and Mar2ck reacted with heart emoji

firengate reacted with rocket emoji

All reactions

👍 5 reactions
😄 1 reaction
🎉 1 reaction
❤️ 2 reactions
🚀 1 reaction

5 people reacted

Previous 1 2 3 4 Next

Previous Next

Footer

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.