使用gpu-manager调度的pod起不来，会报Segmentation fault (core dumped) ，cuda版本是11.1 #177

Justin-ZL · 2023-02-08T08:24:03Z

No description provided.

DennisYoung96 · 2023-05-31T09:13:23Z

so which gpu card u used?

ls-2018 · 2023-11-20T16:15:09Z

I had the same problem
tesla p4

hnyoumfk · 2024-05-29T08:01:31Z

Maybe your application call nvmlInit() to init CUDA enviroment.

Try add this code at the start of your python entrypoint to find out which function makes seg fault. ( if your application is writen in python)

import faulthandler
faulthandler.enable()

And if nvmlInit() make this happen, these code can make it througth

from ctypes import CDLL, c_int, byref
nvml_h = CDLL("libnvidia-ml.so.1")
nvml_h.nvmlInit_v2()

Provide feedback