Releases · DefTruth/Awesome-LLM-Inference

25 Nov 03:22

DefTruth

v2.6.6

40292d7

v2.6.6 Latest

Latest

What's Changed

Add code link to BPT by @DefTruth in #95
add vAttention code link by @KevinZeng08 in #96
🔥[SageAttention] SAGEATTENTION: ACCURATE 8-BIT ATTENTION FOR PLUG-AND-PLAY INFERENCE ACCELERATION(@thu-ml) by @DefTruth in #97
🔥[SageAttention-2] SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration(@thu-ml) by @DefTruth in #98
🔥[Squeezed Attention] SQUEEZED ATTENTION: Accelerating Long Context Length LLM Inference(@uc Berkeley) by @DefTruth in #99
🔥[SparseInfer] SparseInfer: Training-free Prediction of Activation Sparsity for Fast LLM Inference by @DefTruth in #100

New Contributors

@KevinZeng08 made their first contribution in #96

Full Changelog: v2.6.5...v2.6.6

Contributors

uc, thu-ml, and 2 other contributors

Assets 2

18 Nov 02:53

DefTruth

v2.6.5

06c76ad

v2.6.5

What's Changed

Add DP/TP/SP/CP papers with codes by @DefTruth in #92
🔥🔥[SP: BPT] Blockwise Parallel Transformer for Large Context Models by @DefTruth in #93
🔥🔥[TP: Comm Compression] Communication Compression for Tensor Parallel LLM Inference by @DefTruth in #94

Full Changelog: v2.6.4...v2.6.5

Contributors

DefTruth

Assets 2

13 Nov 07:02

DefTruth

v2.6.4

f3f27a7

v2.6.4

What's Changed

🔥[BitNet] BitNet a4.8: 4-bit Activations for 1-bit LLMs by @DefTruth in #91

Full Changelog: v2.6.3...v2.6.4

Contributors

DefTruth

Assets 2

01 Nov 01:18

DefTruth

v2.6.3

a854d6c

v2.6.3

What's Changed

🔥[Fast Best-of-N] Fast Best-of-N Decoding via Speculative Rejection by @DefTruth in #89
🔥[Tensor Product] Acceleration of Tensor-Product Operations with Tensor Cores by @DefTruth in #90

Full Changelog: v2.6.2...v2.6.3

Contributors

DefTruth

Assets 2

28 Oct 02:38

DefTruth

v2.6.2

613300d

v2.6.2

What's Changed

early exit of LLM inference by @boyi-liu in #85
Add paper AdaKV by @FFY0 in #86
Efficient Hybrid Inference for LLMs: Reward-Based Token Modelling with Selective Cloud Assistance by @aharshms in #87
🔥[FastAttention] FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs for Efficient Inference by @DefTruth in #88

New Contributors

@boyi-liu made their first contribution in #85
@FFY0 made their first contribution in #86
@aharshms made their first contribution in #87

Full Changelog: v2.6.1...v2.6.2

Contributors

DefTruth, aharshms, and 2 other contributors

Assets 2

14 Oct 05:08

DefTruth

v2.6.1

7ba03a6

v2.6.1

What's Changed

[From Author] Link CacheGen and CacheBlend to LMCache by @KuntaiDu in #80
🔥[LORC] Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy by @DefTruth in #81
Large Language Model Performance Benchmarking on Mobile Platforms: A Thorough Evaluation by @DefTruth in #82
[LLM Inference] LARGE LANGUAGE MODEL INFERENCE ACCELERATION: A COMPREHENSIVE HARDWARE PERSPECTIVE by @DefTruth in #83
🔥[PARALLELSPEC] PARALLELSPEC: PARALLEL DRAFTER FOR EFFICIENT SPECULATIVE DECODING by @DefTruth in #84

New Contributors

@KuntaiDu made their first contribution in #80

Full Changelog: v2.6...v2.6.1

Contributors

KuntaiDu and DefTruth

Assets 2

03 Oct 01:02

DefTruth

v2.6

c3f1409

v2.6

What's Changed

🔥[VPTQ] VPTQ: EXTREME LOW-BIT VECTOR POST-TRAINING QUANTIZATION FOR LARGE LANGUAGE MODELS by @DefTruth in #70
fix typo by @DefTruth in #71
🔥🔥[INT-FLASHATTENTION] INT-FLASHATTENTION: ENABLING FLASH ATTENTION FOR INT8 QUANTIZATION by @DefTruth in #72
[Low-bit] A Survey of Low-bit Large Language Models: Basics, Systems, and Algorithms by @DefTruth in #73
🔥🔥[HiFloat8] Ascend HiFloat8 Format for Deep Learning by @DefTruth in #74
🔥[AlignedKV] AlignedKV: Reducing Memory Access of KV-Cache with Precision-Aligned Quantization by @DefTruth in #75
🔥🔥[Tensor Cores] Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores by @DefTruth in #76
🔥[KV-COMPRESS] PAGED KV-CACHE COMPRESSION WITH VARIABLE COMPRESSION RATES PER ATTENTION HEAD by @DefTruth in #77
🔥[LayerKV] Optimizing Large Language Model Serving with Layer-wise KV Cache Management by @DefTruth in #78
Bump up to v2.6 by @DefTruth in #79

Full Changelog: v2.5...v2.6

Contributors

DefTruth

Assets 2

26 Sep 03:25

DefTruth

v2.5

3e43647

v2.5

What's Changed

🔥[InstInfer] InstInfer: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference by @DefTruth in #65
Update codebase of paper "parallel speculative decoding with adaptive draft length" by @smart-lty in #66
move RetrievalAttention -> long context by @DefTruth in #67
🔥🔥[CRITIPREFILL] CRITIPREFILL: A SEGMENT-WISE CRITICALITYBASED APPROACH FOR PREFILLING ACCELERATION IN LLMS by @DefTruth in #68
Bump up to v2.5 by @DefTruth in #69

New Contributors

@smart-lty made their first contribution in #66

Full Changelog: v2.4...v2.5

Contributors

DefTruth and smart-lty

Assets 2

18 Sep 05:10

DefTruth

v2.4

829da5a

v2.4

What's Changed

🔥[RetrievalAttention] Accelerating Long-Context LLM Inference via Vector Retrieval by @DefTruth in #62
🔥[Inf-MLLM] Inf-MLLM: Efficient Streaming Inference of Multimodal Large Language Models on a Single GPU by @DefTruth in #63
Bump up to v2.4 by @DefTruth in #64

Full Changelog: v2.3...v2.4

Contributors

DefTruth

Assets 2

09 Sep 01:25

DefTruth

v2.3

f0860e8

v2.3

What's Changed

🔥[CHESS] CHESS : Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification by @DefTruth in #59
🔥[SpMM] High Performance Unstructured SpMM Computation Using Tensor Cores by @DefTruth in #60
Bump up to v2.3 by @DefTruth in #61

Full Changelog: v2.2...v2.3

Contributors

DefTruth

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

Releases: DefTruth/Awesome-LLM-Inference

v2.6.6

What's Changed

New Contributors

Contributors

v2.6.5

What's Changed

Contributors

v2.6.4

What's Changed

Contributors

v2.6.3

What's Changed

Contributors

v2.6.2

What's Changed

New Contributors

Contributors

v2.6.1

What's Changed

New Contributors

Contributors

v2.6

What's Changed

Contributors

v2.5

What's Changed

New Contributors

Contributors

v2.4

What's Changed

Contributors

v2.3

What's Changed

Contributors