Ressource estimation #35

henkela · 2024-11-22T07:45:39Z

Hi,
I'm running olah for a while now and tried pulling larger models like Llama-70B and similar (relevant files between 5G and 17G) -
huggingface-cli stopped at some point with incomplete message errors. I dig into it and found that olah-server went OOM and got killed.
The VM I had it running on was a redhat 9.4 with 4 vcpus, 16G RAM, and 500G ssd for local cache. huggingface-cli usually pulls with 8 workers hence 8 http-requests. And it seems like olah tries to caches all of them in RAM while sending the response. That's at least how it looked like. Maybe the chunk-size could be agnostic and dynamic to the available RAM?
For now I increased the memory of the VM to 64G and maybe you want to add some info on RAM considerations or sizing of the caching server in general.
Best
Andreas

henkela · 2024-11-22T08:10:47Z

Actually for one huggingface-cli download of llama 70B to work I needed even more than 64G RAM on the olah-server. It failed when trying to download 8 files of size 17.4GB. I now set it to 128G.

carlsonp · 2024-12-17T21:49:04Z

I've also seen some OOM issues. Making adjustments and increasing the available RAM seemed to help.

jstzwj self-assigned this Nov 24, 2024

jstzwj added the enhancement New feature or request label Dec 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ressource estimation #35

Ressource estimation #35

henkela commented Nov 22, 2024

henkela commented Nov 22, 2024

carlsonp commented Dec 17, 2024

Ressource estimation #35

Ressource estimation #35

Comments

henkela commented Nov 22, 2024

henkela commented Nov 22, 2024

carlsonp commented Dec 17, 2024