Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

NVIDIA / kvpress Public

Notifications You must be signed in to change notification settings
Fork 28
Star 424

Code
Issues 1
Pull requests 1
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Releases: NVIDIA/kvpress

Releases · NVIDIA/kvpress

v0.2.3

18 Feb 16:51

SimJeg

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.2.3 Latest

Latest

Fix distributed inference for the ExpectedAttentionPress, #49 by @SimJeg
Add DuoAttentionPress, #50 by @SimJeg

Contributors

SimJeg

Assets 2

Loading

All reactions

v0.2.2

12 Feb 13:39

SimJeg

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.2.2

Fix style check, #48 by @maxjeblick
Add CriticalKVPress, #46 by @FFY0
Add epsilon to ExpectedAttentionPress, #47 by @SimJeg

Contributors

SimJeg, maxjeblick, and FFY0

Assets 2

Loading

1ytic reacted with hooray emoji

All reactions

🎉 1 reaction

1 person reacted

v0.2.1

21 Jan 15:21

SimJeg

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.2.1

Add ChunkPress, #40 by @maxjeblick and @giulio98
Update README, including new huggingface space, #41 and #42 by @SimJeg

Contributors

SimJeg, maxjeblick, and giulio98

Assets 2

Loading

All reactions

v0.2.0

13 Jan 17:44

SimJeg

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.2.0

Transformers v4.48 introduced breaking changes handled in this release. The release also features AdaKVPress, the first press allowing head-wise compression by patching the attention functions registered in ALL_ATTENTION_FUNCTIONS since v4.48. When combined with ExpectedAttentionPress, AdaKVPress achieved the best results observed yet on the RULER benchmark (see this post).

Add AdaKVPress, #38 by @SimJeg and @FFY0
Handle transformers 4.48, #39 by @SimJeg
Add InfiniteBench results, #11 by @maxjeblick

Contributors

SimJeg, maxjeblick, and FFY0

Assets 2

Loading

All reactions

v0.1.1

07 Jan 10:44

maxjeblick

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.1.1

What's Changed

#33 by @SimJeg fixes a small bug in the pipeline
#36 by @maxjeblick sets transformers <4.48 as a dependency

Full Changelog: v0.1.0...v0.1.1

Contributors

SimJeg and maxjeblick

Assets 2

Loading

All reactions

v0.1.0

12 Dec 09:22

SimJeg

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.1.0

#24 by @maxjeblick and #29 by @SimJeg introduce a non-breaking refactoring:

a press does not require the compression_ratio input argument anymore as some presses do not explicitly require it (e.g. ThinKPress, SimLayerKVPress). However every press must have a compression_ratio attribute after any forward pass (assertion added in tests) to allow average compression ratio measurement on a benchmark
the core compression logic has been moved from BasePress.forward_hook to BasePress.compress. BasePress.forward_hook now only checks if compress must be called (pre-filling vs decoding), de-quantize cache before compress and re-quantize it afterwards
the BasePress does not implement a score method anymore, this has been moved to the ScorerPress with the associated ScorerPress.compress method

Other features:

Add SimLayerKVPress, #28 by @SimJeg and @dame-cell
Add ComposedPress, #29 by @SimJeg
Add KeyReRotationPress, #31 by @maxjeblick and @giulio98
Fix QuantizedCache, #30 by @maxjeblick
Add new tests, including an integration test on a sample from RULER

Contributors

SimJeg, maxjeblick, and 2 other contributors

Assets 2

Loading

All reactions

v0.0.4

03 Dec 15:31

SimJeg

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.0.4

Add ThinKPress, #20 by @SimJeg and @yuhuixu1993

Contributors

SimJeg and yuhuixu1993

Assets 2

Loading

All reactions

v0.0.3

26 Nov 13:14

SimJeg

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.0.3

Update speed and memory plots, #10 by @maxjeblick
Add TOVAPress, #12 by @SimJeg and @hassidm

Contributors

SimJeg, maxjeblick, and hassidm

Assets 2

Loading

All reactions

v0.0.2

21 Nov 15:54

SimJeg

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.0.2

Add support for QuantizedCache, #5 by @SimJeg
Add colab demo notebook, #6 by @maxjeblick

Contributors

SimJeg and maxjeblick

Assets 2

Loading

All reactions

Initial release

13 Nov 16:34

SimJeg

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Initial release

v0.0.1

install poetry in workflows (#1)

Assets 2

Loading

All reactions

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.