Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用gpu-manager调度的pod起不来,会报Segmentation fault (core dumped) ,cuda版本是11.1 #177

Open
Justin-ZL opened this issue Feb 8, 2023 · 3 comments

Comments

@Justin-ZL
Copy link

No description provided.

@DennisYoung96
Copy link

so which gpu card u used?

@ls-2018
Copy link

ls-2018 commented Nov 20, 2023

I had the same problem
tesla p4

@hnyoumfk
Copy link

hnyoumfk commented May 29, 2024

Maybe your application call nvmlInit() to init CUDA enviroment.

Try add this code at the start of your python entrypoint to find out which function makes seg fault. ( if your application is writen in python)

import faulthandler
faulthandler.enable()

And if nvmlInit() make this happen, these code can make it througth

from ctypes import CDLL, c_int, byref
nvml_h = CDLL("libnvidia-ml.so.1")
nvml_h.nvmlInit_v2()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants