-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RDMA allocatable resources changed to 0 after kubelet restart #74
Comments
What is the K8s version you are using ? i see in the logs:
does the following path exist in your system: |
please check #82 it should solve the issue. |
v1.4.0 is out please check :) |
@adrianchiris v1.4.0 release seems to be broken, cannot find the release: Nor can be seen here: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/cloud-native/containers/k8s-rdma-shared-dev-plugin/tags |
Version: v1.3.2
RDMA device plugin log:
As can be seen from the log, when kubelet restart, it triggers context canceled and restart will block because channel size is 0, context listener added in this issue: #51.
When the kubelet restarts, ListAndWatch will receive the event from the stop channel, there is no need to watch context, so I fixed the bug by removing the context listener. If necessary, i can submit a PR.
The text was updated successfully, but these errors were encountered: