-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Driver build fails on AWS g5g.xlarge #570
Comments
After digging a bit deeper, the root cause seems to be in the I'll try to overwrite the script to limit concurrency here. |
Thanks for reporting this @martin31821 we will look into making max threads as configurable for low memory systems. |
@shivamerla created a I'm not sure if we might also want to update this operator to be able to automatically react via NFD data to cases like gpu cores outweighing available mem GB or something similar when automatically generating driver spec and passing in some determined |
Updated PR after the move for the relevant repo to github: NVIDIA/gpu-driver-container#6 |
1. Quick Debug Information
2. Issue or feature description
On AWS g5g.xlarge (smallest gpu node), the driver build fails because it is running out of system memory.
It would maybe be possible to limit concurrency to a much smaller level, in order to be able to run on 8GB of memory.
3. Steps to reproduce the issue
4. Information to attach (optional if deemed irrelevant)
Is there already a way to limit concurrency in the nvcr.io/nvidia/driver container or is that not possible at the moment?
The text was updated successfully, but these errors were encountered: