Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

not working on a local one-node cluster created by local-up-cluster.sh #32

Closed
myrfy001 opened this issue Feb 28, 2018 · 4 comments
Closed

Comments

@myrfy001
Copy link

myrfy001 commented Feb 28, 2018

Hi, I'm building a single machine testing environment, because Minikube doesn't support GPU well, so I use the local-up-cluster.sh provided at https://github.com/kubernetes/kubernetes/blob/master/hack/local-up-cluster.sh to build up single node cluster, but it not work with k8s-device-plugin well.

Do the following to reproduce it:

  • get source code using go get -d k8s.io/kubernetes

  • In order to make the local-up-cluster.sh launch kubelet with gate options, I insert the following line to the top of local-up-cluster.sh
    FEATURE_GATES="DevicePlugins=true"

  • start the cluster using sudo ./hack/local-up-cluster.sh

When running
docker run -it -v /var/lib/kubelet/device-plugins:/var/lib/kubelet/device-plugins nvidia/k8s-device-plugin:1.9

I got:

2018/02/28 12:14:51 Loading NVML
2018/02/28 12:14:51 Fetching devices.
2018/02/28 12:14:51 Starting FS watcher.
2018/02/28 12:14:51 Starting OS watcher.
2018/02/28 12:14:51 Starting to serve on /var/lib/kubelet/device-plugins/nvidia.sock
2018/02/28 12:14:51 Could not register device plugin: rpc error: code = Unimplemented desc = unknown service deviceplugin.Registration
2018/02/28 12:14:51 Could not contact Kubelet, retrying. Did you enable the device plugin feature gate?
2018/02/28 12:14:51 You can check the prerequisites at: https://github.com/NVIDIA/k8s-device-plugin#prerequisites
2018/02/28 12:14:51 You can learn how to set the runtime at: https://github.com/NVIDIA/k8s-device-plugin#quick-start
2018/02/28 12:14:51 Starting to serve on /var/lib/kubelet/device-plugins/nvidia.sock
2018/02/28 12:14:51 Could not register device plugin: rpc error: code = Unimplemented desc = unknown service deviceplugin.Registration
2018/02/28 12:14:51 Could not contact Kubelet, retrying. Did you enable the device plugin feature gate?
2018/02/28 12:14:51 You can check the prerequisites at: https://github.com/NVIDIA/k8s-device-plugin#prerequisites
2018/02/28 12:14:51 You can learn how to set the runtime at: https://github.com/NVIDIA/k8s-device-plugin#quick-start
2018/02/28 12:14:51 Starting to serve on /var/lib/kubelet/device-plugins/nvidia.sock
2018/02/28 12:14:51 Could not register device plugin: rpc error: code = Unimplemented desc = unknown service deviceplugin.Registration
2018/02/28 12:14:51 Could not contact Kubelet, retrying. Did you enable the device plugin feature gate?
2018/02/28 12:14:51 You can check the prerequisites at: https://github.com/NVIDIA/k8s-device-plugin#prerequisites
2018/02/28 12:14:51 You can learn how to set the runtime at: https://github.com/NVIDIA/k8s-device-plugin#quick-start
2018/02/28 12:14:51 Starting to serve on /var/lib/kubelet/device-plugins/nvidia.sock

Here are some of my configs:
/etc/docker/daemon.json

root@ubuntu-10-53-66-17:~# cat /etc/docker/daemon.json
{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

The command line to run kubelet

# ps aux | grep kubelet

root      71370  3.4  0.0 2621216 123892 pts/0  Sl+  20:20   0:06 /home/mi/go/src/k8s.io/kubernetes/_output/local/bin/linux/amd64/hyperkube kubelet --v=3 --vmodule= --chaos-chance=0.0 --container-runtime=docker --rkt-path= --rkt-stage1-image= --hostname-override=127.0.0.1 --cloud-provider= --cloud-config= --address=127.0.0.1 --kubeconfig /var/run/kubernetes/kubelet.kubeconfig --feature-gates=DevicePlugins=true --cpu-cfs-quota=true --enable-controller-attach-detach=true --cgroups-per-qos=true --cgroup-driver=cgroupfs --keep-terminated-pod-volumes=true --eviction-hard=memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5% --eviction-soft= --eviction-pressure-transition-period=1m --pod-manifest-path=/var/run/kubernetes/static-pods --fail-swap-on=false --cluster-dns=10.0.0.10 --cluster-domain=cluster.local --port=10250

The output of nvidia-smi on the machine:

Wed Feb 28 20:26:18 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81                 Driver Version: 384.81                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:05:00.0 Off |                    0 |
| N/A   34C    P8    26W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | 00000000:06:00.0 Off |                    0 |
| N/A   28C    P8    30W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           Off  | 00000000:84:00.0 Off |                    0 |
| N/A   40C    P8    27W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           Off  | 00000000:85:00.0 Off |                    0 |
| N/A   33C    P8    29W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
@RenaudWasTaken
Copy link
Contributor

RenaudWasTaken commented Feb 28, 2018

Hello!

You are using master (i.e soon to be 1.10) and the device plugin you are using is for 1.9.

Master recently broke the pluginapi before graduating to beta and we have yet to publish an implementation for 1.10 (which we'll be doing before the official release).

In the meantime you can just checkout the release-1.9 tag (git ck release-1.9) or the v1.9.3 tag

@mlushpenko
Copy link

I am now running k8s 1.14.1 and device-plugin version 1.9 wasn't working, while 1.11 worked fine.

@doomzhou
Copy link

k8s 1.15.3 and plugin version 1.0.0-beta worked.

@h6mx
Copy link

h6mx commented Jul 26, 2023

hello,i follow your step to build a single machine testing environment, all is work well , but when i use kubectl get nodes, i get no resource . how can i create a node ? Thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants