Any other mechanisms to save gpu memory other than paged attention? #290
-
Hi guys, are there other mechanisms implemented in vllm to save gpu memory other than paged attention? Thank you. |
Beta Was this translation helpful? Give feedback.
Answered by
zhuohan123
Jun 29, 2023
Replies: 1 comment
-
Another technique is continuous batching, which reduces padding memory and computation. You can read this blog to learn more. |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
zhuohan123
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Another technique is continuous batching, which reduces padding memory and computation. You can read this blog to learn more.