Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

turboderp / exllamav2 Public

Notifications You must be signed in to change notification settings
Fork 289
Star 3.8k

Code
Issues 84
Pull requests 15
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Releases: turboderp/exllamav2

Releases · turboderp/exllamav2

0.2.6

07 Dec 14:58

Compare

Choose a tag to compare

Loading

0.2.6 Latest

Latest

Some small fixes, most notably for Qwen2-VL inference on Windows

Full Changelog: v0.2.5...v0.2.6

Assets 88

exllamav2-0.2.6+cu117.torch2.0.1-cp310-cp310-linux_x86_64.whl

98.3 MB 2024-12-07T15:19:49Z
exllamav2-0.2.6+cu117.torch2.0.1-cp310-cp310-win_amd64.whl

98.3 MB 2024-12-07T16:03:06Z
exllamav2-0.2.6+cu117.torch2.0.1-cp311-cp311-linux_x86_64.whl

98.3 MB 2024-12-07T15:19:04Z
exllamav2-0.2.6+cu117.torch2.0.1-cp311-cp311-win_amd64.whl

98.3 MB 2024-12-07T16:28:42Z
exllamav2-0.2.6+cu117.torch2.0.1-cp38-cp38-linux_x86_64.whl

98.3 MB 2024-12-07T15:54:21Z
exllamav2-0.2.6+cu117.torch2.0.1-cp38-cp38-win_amd64.whl

98.3 MB 2024-12-07T17:12:36Z
exllamav2-0.2.6+cu117.torch2.0.1-cp39-cp39-linux_x86_64.whl

98.3 MB 2024-12-07T16:03:31Z
exllamav2-0.2.6+cu117.torch2.0.1-cp39-cp39-win_amd64.whl

98.3 MB 2024-12-07T17:29:04Z
exllamav2-0.2.6+cu118.torch2.2.0-cp310-cp310-win_amd64.whl

129 MB 2024-12-07T16:25:51Z
exllamav2-0.2.6+cu118.torch2.2.0-cp311-cp311-win_amd64.whl

129 MB 2024-12-07T16:01:42Z
Source code (zip)

2024-12-07T14:56:16Z
Source code (tar.gz)

2024-12-07T14:56:16Z

flflow, firengate, and pabl-o-ce reacted with thumbs up emoji

flflow and firengate reacted with laugh emoji

flflow, firengate, and pabl-o-ce reacted with hooray emoji

flflow, firengate, and pabl-o-ce reacted with heart emoji

flflow, firengate, and pabl-o-ce reacted with rocket emoji

flflow and firengate reacted with eyes emoji

All reactions

👍 3 reactions
😄 2 reactions
🎉 3 reactions
❤️ 3 reactions
🚀 3 reactions
👀 2 reactions

3 people reacted

0.2.5

01 Dec 13:32

Compare

Choose a tag to compare

Loading

0.2.5

Initial support for Qwen2-VL (images for now, no video)
Some bugfixes

Full Changelog: v0.2.4...v0.2.5

Assets 88

Loading

flflow, firengate, and gabinguo reacted with thumbs up emoji

flflow, firengate, and gabinguo reacted with laugh emoji

flflow, firengate, and gabinguo reacted with hooray emoji

flflow, firengate, gabinguo, and Thireus reacted with heart emoji

flflow, firengate, and gabinguo reacted with rocket emoji

All reactions

👍 3 reactions
😄 3 reactions
🎉 3 reactions
❤️ 4 reactions
🚀 3 reactions

4 people reacted

0.2.4

12 Nov 03:21

Compare

Choose a tag to compare

Loading

0.2.4

Support Pixtral
Refactoring for more multimodal support
Faster filter evaluation
Various optimizations and bugfixes
Various quality of life improvements

Full Changelog: v0.2.3...v0.2.4

Assets 88

Loading

firengate, ThomasBaruzier, JoeySalmons, hacksmith-CA, flflow, and Ednaordinary reacted with thumbs up emoji

firengate and flflow reacted with laugh emoji

Icemaster-Eric, rwwrwr, firengate, ThomasBaruzier, JoeySalmons, flflow, and Ednaordinary reacted with hooray emoji

firengate, LemgonUltimate, WouterGlorieux, flflow, and Ednaordinary reacted with heart emoji

firengate and flflow reacted with rocket emoji

xonfour and flflow reacted with eyes emoji

All reactions

👍 6 reactions
😄 2 reactions
🎉 7 reactions
❤️ 5 reactions
🚀 2 reactions
👀 2 reactions

11 people reacted

0.2.3

29 Sep 11:04

Compare

Choose a tag to compare

Loading

0.2.3

No longer use safetensors for loading weights (fix virtual memory issues on Windows especially)
Disable fasttensors option (now redundant)
Prioritize HF Tokenizers model when both HF and SPM models available
Add XTC sampler
Add YaRN support
Various fixes and QoL improvements

Full Changelog: v0.2.2...v0.2.3

Assets 70

Loading

firengate, Thireus, and flflow reacted with thumbs up emoji

firengate, Originalimoc, and flflow reacted with laugh emoji

firengate, mamei16, and flflow reacted with hooray emoji

flflow, firengate, MikeLP, LemgonUltimate, Julianlaue, and matthu017 reacted with heart emoji

firengate and flflow reacted with rocket emoji

All reactions

👍 3 reactions
😄 3 reactions
🎉 3 reactions
❤️ 6 reactions
🚀 2 reactions

9 people reacted

0.2.2

14 Sep 19:20

Compare

Choose a tag to compare

Loading

0.2.2

small fixes related to LMFE
allow SDPA during normal inference with custom bias

Full Changelog: v0.2.1...v0.2.2

Assets 69

Loading

firengate, MikeLP, and flflow reacted with thumbs up emoji

firengate and flflow reacted with laugh emoji

firengate, gittb, and flflow reacted with hooray emoji

firengate and flflow reacted with heart emoji

firengate and flflow reacted with rocket emoji

All reactions

👍 3 reactions
😄 2 reactions
🎉 3 reactions
❤️ 2 reactions
🚀 2 reactions

4 people reacted

0.2.1

08 Sep 17:26

Compare

Choose a tag to compare

Loading

0.2.1

TP: fallback SDPA mode when flash-attn is unavailable
Faster filter/grammar path
Add DRY
Fix issues since 0.1.9 (streams/graphs) when loading certain models via Tabby
Banish Râul

Full Changelog: v0.2.0...v0.2.1

Assets 68

Loading

firengate and flflow reacted with thumbs up emoji

firengate and flflow reacted with laugh emoji

firengate and flflow reacted with hooray emoji

ThomasBaruzier, nktice, AgeOfAlgorithms, Icemaster-Eric, firengate, and flflow reacted with heart emoji

firengate and flflow reacted with rocket emoji

All reactions

👍 2 reactions
😄 2 reactions
🎉 2 reactions
❤️ 6 reactions
🚀 2 reactions

6 people reacted

0.2.0

28 Aug 21:00

Compare

Choose a tag to compare

Loading

0.2.0

Small release to fix various issues in 0.1.9

Full Changelog: v0.1.9...v0.2.0

Assets 68

Loading

ColumbusAI, AlanDoesCS, ovowei, mamei16, RichardFevrier, and flflow reacted with heart emoji

All reactions

❤️ 6 reactions

6 people reacted

0.1.9

22 Aug 11:54

Compare

Choose a tag to compare

Loading

0.1.9

Add experimental tensor-parallel mode. Currently supports Llama(1+2+3), Qwen2 and Mistral models
CUDA Graphs to reduce overhead and CPU bottlenecking
Various other optimizations
Some bugfixes

Full Changelog: v0.1.8...v0.1.9

Assets 68

Loading

firengate, Trapper4888, avidwriter, flflow, and Lyrcaxis reacted with thumbs up emoji

firengate, Trapper4888, and flflow reacted with laugh emoji

gittb, RachidAR, firengate, and Trapper4888 reacted with hooray emoji

firengate, Trapper4888, and flflow reacted with heart emoji

LemgonUltimate, firengate, Trapper4888, and flflow reacted with rocket emoji

All reactions

👍 5 reactions
😄 3 reactions
🎉 4 reactions
❤️ 3 reactions
🚀 4 reactions

8 people reacted

0.1.8

24 Jul 06:36

Compare

Choose a tag to compare

Loading

0.1.8

Support Llama 3.1 (correct RoPE scaling etc.)
Support IndexTeam architecture
Some bugfixes and QoL improvements

Full Changelog: v0.1.7...v0.1.8

Assets 68

Loading

firengate and flflow reacted with thumbs up emoji

firengate and flflow reacted with laugh emoji

GrennKren, nktice, flflow, ccrvlh, mamei16, firengate, pabl-o-ce, and gittb reacted with hooray emoji

flflow, firengate, and pabl-o-ce reacted with heart emoji

firengate and flflow reacted with rocket emoji

All reactions

👍 2 reactions
😄 2 reactions
🎉 8 reactions
❤️ 3 reactions
🚀 2 reactions

8 people reacted

0.1.7

11 Jul 13:20

Compare

Choose a tag to compare

Loading

0.1.7

Support Gemma2
Support InternLM2
Various bugfixes and optimizations

Full Changelog: v0.1.6...v0.1.7

Assets 47

Loading

dancemanUK, pabl-o-ce, jepjoo, firengate, GralchemOz, dillonroach, and flflow reacted with thumbs up emoji

firengate and flflow reacted with laugh emoji

firengate, anxiangyipiao, and flflow reacted with hooray emoji

pabl-o-ce, flflow, beep39, gittb, firengate, dillonroach, and Djahal reacted with heart emoji

firengate, dillonroach, and flflow reacted with rocket emoji

All reactions

👍 7 reactions
😄 2 reactions
🎉 3 reactions
❤️ 7 reactions
🚀 3 reactions

11 people reacted

Previous 1 2 3 4 Next

Previous Next

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.