Monitor nvidia-smi output to see GPU resource consumption #72

samhodge-aiml · 2024-03-13T06:03:59Z

Is your feature request related to a problem? Please describe.
I need to see how much VRAM and GPU compute are being used by a process in a container, and have a historical record in a sql table to continue to narrow the gap between resources allocated and resources consumed

Describe the solution you'd like
I would like to be able to wrap the output of nvidia-smi and have it come out in the same dictionary or a side car type concept for the rest of the watchme metrics

Describe alternatives you've considered
Use the following https://github.com/petronny/nvsmi and dump that into a dictionary at the same time as the watchme decorator

Additional context
Getting computation to match the resources allocated closely is a problem with commercial value, anyone who makes use of GPUs should be interested in how much these resources are occupied because buying and renting them is not cheap

samhodge-aiml · 2024-03-14T06:34:08Z

Sorry I found the correct documentation

https://github.com/vsoch/watchme/blob/f209d3d4bf99a25cd2dcaeaa2431cf3ecfe68585/docs/_docs/watcher-tasks/gpu.md#use-as-a-decorator

vsoch · 2024-03-15T05:17:06Z

hey @samhodge-aiml ! This seems like a cool idea (and simple to implement) but I'm not sure I'll have time to work on it soon - too many cool things going on <3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monitor nvidia-smi output to see GPU resource consumption #72

Monitor nvidia-smi output to see GPU resource consumption #72

samhodge-aiml commented Mar 13, 2024

samhodge-aiml commented Mar 14, 2024

vsoch commented Mar 15, 2024

Monitor nvidia-smi output to see GPU resource consumption #72

Monitor nvidia-smi output to see GPU resource consumption #72

Comments

samhodge-aiml commented Mar 13, 2024

samhodge-aiml commented Mar 14, 2024

vsoch commented Mar 15, 2024