-
Notifications
You must be signed in to change notification settings - Fork 697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error: invalid memory address or nil pointer dereference #1553
Comments
Hi, thanks for the update. Yes seems as if that might be the cause. Ok, so this was fixed in kubeflow/common#178 , however the PR is from 26.11.2021 and the latest kubeflow/common Release v0.4.1 was on 24.11.2021. And as far as I can tell, the kubeflow/training-operator Release 1.4.0 still uses kubeflow/common v0.4.1 In order to fix this the kubeflow/common version needs to be bumped, right? Correct me if I am wrong. Thanks, |
I guess so, you can try to update it and |
Yeah I figured building an Image would be a workaround. Would be great to have a kubeflow/common v0.4.2 and kubeflow/training-operator 1.4.1 release though. |
kubeflow/common is now at v.0.4.3 f554921 any change we can get a training-operator minor release? |
Ref: #1622 |
Awesome possum, looking forward to the release. Still I think the project should maybe consider bugfix releases inbetween a major release? Maybe there is or was already discussion about that, just mentioning it. Anywho, keep up the good work! |
You can try RC release #1622 (comment) |
Should be fixed in 1.5.0 |
Hi,
I'm running into an
invalid memory address or nil pointer dereference
Error when a PyTorchJob on the Cluster is failing.The PyTorchJob Pod runs into a Python Exception which then causes the training-operator Deployment to crash.
Training Operator Release: v1.3.0
Kubernetes: rke2 v1.21.10
PyTorchJob Manifest:
Error Message:
I don't know if this is related to #1382 (Seemed already fixed to me)
Let me know if you need further information.
Cheers,
Markus
The text was updated successfully, but these errors were encountered: