You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I'm running olah for a while now and tried pulling larger models like Llama-70B and similar (relevant files between 5G and 17G) -
huggingface-cli stopped at some point with incomplete message errors. I dig into it and found that olah-server went OOM and got killed.
The VM I had it running on was a redhat 9.4 with 4 vcpus, 16G RAM, and 500G ssd for local cache. huggingface-cli usually pulls with 8 workers hence 8 http-requests. And it seems like olah tries to caches all of them in RAM while sending the response. That's at least how it looked like. Maybe the chunk-size could be agnostic and dynamic to the available RAM?
For now I increased the memory of the VM to 64G and maybe you want to add some info on RAM considerations or sizing of the caching server in general.
Best
Andreas
The text was updated successfully, but these errors were encountered:
Actually for one huggingface-cli download of llama 70B to work I needed even more than 64G RAM on the olah-server. It failed when trying to download 8 files of size 17.4GB. I now set it to 128G.
Hi,
I'm running olah for a while now and tried pulling larger models like Llama-70B and similar (relevant files between 5G and 17G) -
huggingface-cli stopped at some point with incomplete message errors. I dig into it and found that olah-server went OOM and got killed.
The VM I had it running on was a redhat 9.4 with 4 vcpus, 16G RAM, and 500G ssd for local cache. huggingface-cli usually pulls with 8 workers hence 8 http-requests. And it seems like olah tries to caches all of them in RAM while sending the response. That's at least how it looked like. Maybe the chunk-size could be agnostic and dynamic to the available RAM?
For now I increased the memory of the VM to 64G and maybe you want to add some info on RAM considerations or sizing of the caching server in general.
Best
Andreas
The text was updated successfully, but these errors were encountered: