Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

turboderp-org / exllamav2 Public

Notifications You must be signed in to change notification settings
Fork 290
Star 3.8k

Code
Issues 85
Pull requests 14
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Releases: turboderp-org/exllamav2

Releases · turboderp-org/exllamav2

0.0.19

19 Apr 06:44

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

0.0.19

More accurate Q4 cache using groupwise rotations
Better prompt ingestion speed when using flash-attn
Minor fixes related to issues quantizing Llama 3
New, more robust optimizer
Fix bug on long-sequence inference for GPTQ models

Full Changelog: v0.0.18...v0.0.19

Assets 32

Loading

firengate, Mar2ck, acidbubbles, cmhamiche, xhinker, and alok-abhishek reacted with thumbs up emoji

firengate reacted with laugh emoji

firengate and akaszynski reacted with hooray emoji

firengate, Mar2ck, and acidbubbles reacted with heart emoji

firengate reacted with rocket emoji

All reactions

👍 6 reactions
😄 1 reaction
🎉 2 reactions
❤️ 3 reactions
🚀 1 reaction

7 people reacted

0.0.18

07 Apr 18:41

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

0.0.18

Support for Command-R-plus
Fix for pre-AVX2 CPUs
VRAM optimizations for quantization
Very preliminary multimodal support
Various other small fixes and optimizations

Full Changelog: v0.0.17...v0.0.18

Assets 31

Loading

firengate, LeoYelton, and Maykeye reacted with thumbs up emoji

firengate reacted with laugh emoji

firengate reacted with hooray emoji

firengate and marcasmed reacted with heart emoji

FlareP1, bartowski1182, drxmy, and firengate reacted with rocket emoji

All reactions

👍 3 reactions
😄 1 reaction
🎉 1 reaction
❤️ 2 reactions
🚀 4 reactions

7 people reacted

0.0.17

31 Mar 03:19

Compare

Choose a tag to compare

Loading

0.0.17

Mostly just minor fixes and support for DBRX models.

Full Changelog: v0.0.16...v0.0.17

Assets 31

Loading

firengate, JoeySalmons, drxmy, and linkage001 reacted with thumbs up emoji

firengate reacted with laugh emoji

firengate reacted with hooray emoji

firengate reacted with heart emoji

firengate and Josephrp reacted with rocket emoji

All reactions

👍 4 reactions
😄 1 reaction
🎉 1 reaction
❤️ 1 reaction
🚀 2 reactions

5 people reacted

0.0.16

20 Mar 07:23

Compare

Choose a tag to compare

Loading

0.0.16

Adds support for Cohere models
N-gram decoding
A few bugfixes
Lots of optimizations

Full Changelog: v0.0.15...v0.0.16

Assets 31

Loading

firengate reacted with thumbs up emoji

firengate reacted with laugh emoji

jepjoo, BetaDoggo, and firengate reacted with hooray emoji

firengate reacted with heart emoji

TheZennou and firengate reacted with rocket emoji

All reactions

👍 1 reaction
😄 1 reaction
🎉 3 reactions
❤️ 1 reaction
🚀 2 reactions

4 people reacted

0.0.15

07 Mar 02:26

Compare

Choose a tag to compare

Loading

0.0.15

Adds Q4 cache mode
Support for StarCoder2
Minor optimizations and a couple of bugfixes

Full Changelog: v0.0.14...v0.0.15

Assets 31

Loading

firengate and Maykeye reacted with thumbs up emoji

firengate reacted with laugh emoji

jepjoo, firengate, and Mar2ck reacted with hooray emoji

firengate, ivsanro1, and Mar2ck reacted with heart emoji

firengate and ivsanro1 reacted with rocket emoji

All reactions

👍 2 reactions
😄 1 reaction
🎉 3 reactions
❤️ 3 reactions
🚀 2 reactions

5 people reacted

0.0.14

24 Feb 05:54

Compare

Choose a tag to compare

Loading

0.0.14

Adds support for Qwen1.5 and Gemma architectures.

Various fixes and optimizations.

Full Changelog since 0.0.13: v0.0.13...v0.0.14

Assets 31

Loading

alicat22, bartowski1182, biship, firengate, and akaszynski reacted with thumbs up emoji

firengate reacted with laugh emoji

frammiie and firengate reacted with hooray emoji

firengate reacted with heart emoji

firengate reacted with rocket emoji

All reactions

👍 5 reactions
😄 1 reaction
🎉 2 reactions
❤️ 1 reaction
🚀 1 reaction

6 people reacted

0.0.13.post2

15 Feb 00:28

turboderp

Compare

Choose a tag to compare

Loading

0.0.13.post2

Full Changelog: 0.0.13.post1...0.0.13.post2

Assets 32

Loading

firengate reacted with thumbs up emoji

firengate reacted with laugh emoji

firengate reacted with hooray emoji

firengate reacted with heart emoji

firengate reacted with rocket emoji

All reactions

👍 1 reaction
😄 1 reaction
🎉 1 reaction
❤️ 1 reaction
🚀 1 reaction

1 person reacted

0.0.13.post1

04 Feb 23:11

turboderp

Compare

Choose a tag to compare

Loading

0.0.13.post1

Fixes inference on models with vocab sizes that are not multiples of 32

Assets 32

Loading

pabl-o-ce, drxmy, firengate, xpgx1, and ivanbaldo reacted with hooray emoji

All reactions

🎉 5 reactions

5 people reacted

0.0.13

02 Feb 18:17

Compare

Choose a tag to compare

Loading

0.0.13

This release is mostly to update the prebuilt wheels to Torch 2.2, since it won't load extensions built for earlier versions.

Adds dynamic temperature and quadratic sampling. Fixes performance degradation on some GPUs after batch optimizations and various other little things.

Assets 31

Loading

firengate and Maykeye reacted with thumbs up emoji

firengate reacted with laugh emoji

firengate reacted with hooray emoji

firengate and ivanbaldo reacted with heart emoji

firengate, Qubitium, ivanbaldo, and akaszynski reacted with rocket emoji

All reactions

👍 2 reactions
😄 1 reaction
🎉 1 reaction
❤️ 2 reactions
🚀 4 reactions

5 people reacted

0.0.12

22 Jan 20:04

Compare

Choose a tag to compare

Loading

0.0.12

Lots of fixes and tweaks. Main feature updates:

Model support:

Basic LoRA support for MoE models
Support for Orion models (also groundwork for other layernorm models)
Support for loading/converting from Axolotl checkpoints

Generation/sampling:

Fused kernels enabled for num_experts = 4
Option to return probs from streaming generator
Add top-A sampling
Add freq/pres penalties
CFG support in streaming generator
Disable flash-attn for non-causal attention (fixes left-padding until FA2 implements custom bias)

Testing/evaluation:

HumanEval test
Script to compare two models layer by layer (e.g. quantized vs. original model)
"Standard" ppl test that attempts to mimic text-generation-webui

Conversion:

VRAM optimizations
Optimized quantization kernels

IO:

Cache safetensors context managers for faster loading
Optional direct IO loader (for very fast arrays)

Assets 31

Loading

firengate, attashe, and xhinker reacted with thumbs up emoji

firengate reacted with laugh emoji

xXWarMachineRoXx, firengate, Maykeye, and AmineDjeghri reacted with hooray emoji

pabl-o-ce, drxmy, frankxyy, firengate, and alicat22 reacted with heart emoji

firengate reacted with rocket emoji

All reactions

👍 3 reactions
😄 1 reaction
🎉 4 reactions
❤️ 5 reactions
🚀 1 reaction

10 people reacted

Previous 1 2 3 4 Next

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.