add desc to h2o in readme

Signed-off-by: n1ck-guo <[email protected]>
intel · Jul 15, 2024 · d241c25 · d241c25
1 parent 0894b6d
commit d241c25
Showing 1 changed file with 13 additions and 1 deletion.
diff --git a/examples/huggingface/pytorch/text-generation/h2o/README.md b/examples/huggingface/pytorch/text-generation/h2o/README.md
@@ -1,5 +1,17 @@
 # H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
-Code for the paper "**H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models**"
+
+**Heavy-Hitter Oracal (H2O)** is a novel approach for implementing the KV cache wihich significantly reduces memory footprint. 
+
+This methods base on the fact that the accumulated attention scores of all tokens in attention blocks adhere to a power-law distribution. It suggests that there exists a small set of influential tokens that are critical during generation, named heavy-hitters (H2). H2 provides an opportunity to step away from the combinatorial search problem and identify an eviction policy that maintains accuracy.
+
+H2O can dynamically retains the balance of recent and H2 tokens. Significantly increase model throughput while ensuring accuracy.
+
+
+For more info, please refer to the paper [H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models](https://arxiv.org/pdf/2306.14048).
+
+
+![](./imgs/1.png)
+
 
 ## Usage and Examples
 ### Evaluation on tasks from [lm-eval-harness](https://github.com/EleutherAI/lm-evaluation-harness) framework