-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ci] CUDA CI jobs are broken: "driver/library version mismatch" #5546
Comments
@shiyu1994 since you are the only person with administrative access to the machine the CUDA jobs runs on, can you please try rebooting that machine and investigating other fixes for this? I'm happy to help do other research however I can, but you are the only person who can reboot the machine. |
I just re-triggered a CUDA job...this is still broken. https://github.com/microsoft/LightGBM/actions/runs/3281397887/jobs/5452120895 @shiyu1994 is there any way I can help you resolve this? |
I just triggered another run and this is still happening. https://github.com/microsoft/LightGBM/actions/runs/3350110051/jobs/5563225006 @shiyu1994 I really hope you're able to get to this soon. |
@jameslamb Sorry for the long delay. I've fixed the virtual machine. And now the CI tests should be able to run. |
Excellent, thanks @shiyu1994 ! I'll try re-running the checks from #5545 right now. If they work, I'll work on merging some of the approved PRs today. |
🎉 🎉 🎉 that worked! https://github.com/microsoft/LightGBM/actions/runs/3281397887/jobs/5609753729 Thank you so much for the help @shiyu1994 ! |
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Description
The CUDA CI jobs for this project are all failing, with the following error.
Reproducible example
References
Here is the line where these jobs are failing.
LightGBM/.github/workflows/cuda.yml
Line 109 in 0c0eb2a
The references at https://stackoverflow.com/questions/43022843/nvidia-nvml-driver-library-version-mismatch suggest that this issue could be resolved by rebooting.
The text was updated successfully, but these errors were encountered: