Skip to content
This repository has been archived by the owner on Oct 25, 2024. It is now read-only.

Commit

Permalink
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add desc to h2o in readme
Browse files Browse the repository at this point in the history
Signed-off-by: n1ck-guo <[email protected]>
n1ck-guo committed Jul 15, 2024
1 parent 0894b6d commit d241c25
Showing 1 changed file with 13 additions and 1 deletion.
14 changes: 13 additions & 1 deletion examples/huggingface/pytorch/text-generation/h2o/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,17 @@
# H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
Code for the paper "**H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models**"

**Heavy-Hitter Oracal (H2O)** is a novel approach for implementing the KV cache wihich significantly reduces memory footprint.

This methods base on the fact that the accumulated attention scores of all tokens in attention blocks adhere to a power-law distribution. It suggests that there exists a small set of influential tokens that are critical during generation, named heavy-hitters (H2). H2 provides an opportunity to step away from the combinatorial search problem and identify an eviction policy that maintains accuracy.

H2O can dynamically retains the balance of recent and H2 tokens. Significantly increase model throughput while ensuring accuracy.


For more info, please refer to the paper [H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models](https://arxiv.org/pdf/2306.14048).


![](./imgs/1.png)


## Usage and Examples
### Evaluation on tasks from [lm-eval-harness](https://github.com/EleutherAI/lm-evaluation-harness) framework

0 comments on commit d241c25

Please sign in to comment.