Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"missing RDMA device spec for device 0000:e5:00.1, RDMA device \"issm\" not found" #94

Open
gurumohan123 opened this issue Jan 9, 2024 · 4 comments

Comments

@gurumohan123
Copy link

Getting "missing RDMA device spec for device 0000:e5:00.1, RDMA device "issm" not found" error while creating a pod after installing k8s-rdma-shared-dev-plugin, what is the solution for this error.

@sober-wang
Copy link

You can run these command.

mst start 
mst status -v 

maybe the 0000:e5:00.1 id is ethernet interface card.

@a-c-dream
Copy link

I also had the same problem. My logs are as follows:

error creating new device: "missing RDMA device spec for device 0000:e1:00.0, RDMA device \"issm\" not found"

I add the device's deviceID and vendors to RDMA Shared Device Plugin Configurations,then apply and restart the pods.The problem was solved.

I also encountered the same issue. My logs were as follows:

error creating new device: "missing RDMA device spec for device 0000:e1:00.0, RDMA device \"issm\" not found"

I added the device's deviceID and vendor information to the RDMA Shared Device Plugin configurations, then applied the changes and restarted the pods. The problem was resolved.

My RDMA Shared Device Plugin Configuration is as follows:

apiVersion: v1
kind: ConfigMap
metadata:
  name: rdma-devices
  namespace: kube-system
data:
  config.json: |
    {
        "periodicUpdateInterval": 300,
        "configList": [{
             "resourceName": "cx5_bond_shared_devices_a",
             "rdmaHcaMax": 1000,
             "selectors": {
               "vendors": ["15b3"],
               "deviceIDs": ["1017","1019"]
             }
           },
           {
             "resourceName": "cx6dx_shared_devices_b",
             "rdmaHcaMax": 500,
             "selectors": {
               "vendors": ["15b3"],
               "deviceIDs": ["101d"]
             }
           }
        ]
    }

@adrianchiris
Copy link
Collaborator

adrianchiris commented Jun 24, 2024

assuming 0000:e5:00.1 belongs to an MLNX NIC, the error "issm" not found" means that some linux char device ( found under /dev/infiniband/issm<N>) is missing for the selected NIC. that means not all rdma modules were loaded. do you have rmda-core package installed ? it sets up udev rules to bind the needed drivers to MLNX NIC.

@souleb
Copy link

souleb commented Jul 10, 2024

assuming 0000:e5:00.1 belongs to an MLNX NIC, the error "issm" not found" means that some linux char device ( found under /dev/infiniband/issm) is missing for the selected NIC. that means not all rdma modules were loaded. do you have rmda-core package installed ? it sets up udev rules to bind the needed drivers to MLNX NIC.

That solved it for me! Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants