Skip to content

Releases: NVIDIA/kvpress

v0.2.1

21 Jan 15:21
72cc784
Compare
Choose a tag to compare

v0.2.0

13 Jan 17:44
fe4610e
Compare
Choose a tag to compare

Transformers v4.48 introduced breaking changes handled in this release. The release also features AdaKVPress, the first press allowing head-wise compression by patching the attention functions registered in ALL_ATTENTION_FUNCTIONS since v4.48. When combined with ExpectedAttentionPress, AdaKVPress achieved the best results observed yet on the RULER benchmark (see this post).

v0.1.1

07 Jan 10:44
d242538
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.1.0...v0.1.1

v0.1.0

12 Dec 09:22
2b350b0
Compare
Choose a tag to compare

#24 by @maxjeblick and #29 by @SimJeg introduce a non-breaking refactoring:

  • a press does not require the compression_ratio input argument anymore as some presses do not explicitly require it (e.g. ThinKPress, SimLayerKVPress). However every press must have a compression_ratio attribute after any forward pass (assertion added in tests) to allow average compression ratio measurement on a benchmark
  • the core compression logic has been moved from BasePress.forward_hook to BasePress.compress. BasePress.forward_hook now only checks if compress must be called (pre-filling vs decoding), de-quantize cache before compress and re-quantize it afterwards
  • the BasePress does not implement a score method anymore, this has been moved to the ScorerPress with the associated ScorerPress.compress method

Other features:

v0.0.4

03 Dec 15:31
ac2445e
Compare
Choose a tag to compare

v0.0.3

26 Nov 13:14
51f3877
Compare
Choose a tag to compare

v0.0.2

21 Nov 15:54
64b3c17
Compare
Choose a tag to compare

Initial release

13 Nov 16:34
2536a98
Compare
Choose a tag to compare
v0.0.1

install poetry in workflows (#1)