Replies: 1 comment
-
Setting |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am trying to load a very large model (70B in 4 bit) into 2x24GB GPUs. The model loads fine (takes about 20GB on each), but then the KV cache is failing to allocate enough memory (it tries to get 5GB).
How do I limit its size?
Beta Was this translation helpful? Give feedback.
All reactions