Any other mechanisms to save gpu memory other than paged attention? #290

lyz-sys · 2023-06-28T11:34:17Z

lyz-sys
Jun 28, 2023

Hi guys, are there other mechanisms implemented in vllm to save gpu memory other than paged attention? Thank you.

Another technique is continuous batching, which reduces padding memory and computation. You can read this blog to learn more.

zhuohan123 · 2023-06-29T15:20:29Z

Another technique is continuous batching, which reduces padding memory and computation. You can read this blog to learn more.

0 replies