Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: rdma exlusive handling #603

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rollandf
Copy link
Contributor

@rollandf rollandf commented Oct 21, 2024

In case a RDMA device in exclusive mode is in use by a Pod, the DP was not reporting it as a resource after DP restart.

Following changes are introduced in RdmaSpec:

  • isRdma: in case of no rdma resources, check if netlink "enable_rdma" is available.
  • GetRdmaDeviceSpec: the device specs are retrieved dynamically and not on discovery stage as before.

Dynamic RDMA specs computation vs on discovery, comes to solve following scenario for exlusive mode:

  • Discover RDMA device
  • Allocate to Pod (resources are hidden on host)
  • Restart DP pod
  • Discovery
  • Deallocate
  • Reallocate

Fixes #565

In case a RDMA device in exclusive mode is in use
by a Pod, the DP was not reporting it as a resource
after DP restart.

Following changes are introduced in RdmaSpec:

- isRdma: in case of no rdma resources,
  check if netlink "enable_rdma" is available.
- GetRdmaDeviceSpec: the device specs are retrieved
  dynamically and not on discovery stage as before.

Dynamic RDMA specs computatiopn vs on discovery, comes
to solve following scenario for exlusive mode:
- Discover RDMA device
- Allocate to Pod (resources are hidden on host)
- Restart DP pod
- Deallocate
- Reallocate

Fixes k8snetworkplumbingwg#565

Signed-off-by: Fred Rolland <[email protected]>
@coveralls
Copy link
Collaborator

Pull Request Test Coverage Report for Build 11442729496

Details

  • 21 of 43 (48.84%) changed or added relevant lines in 4 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.5%) to 74.778%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/utils/utils.go 0 6 0.0%
pkg/devices/rdma.go 20 27 74.07%
pkg/utils/netlink_provider.go 0 9 0.0%
Totals Coverage Status
Change from base Build 10918054008: -0.5%
Covered Lines: 2102
Relevant Lines: 2811

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Capacity and Allocatable number shows wrong if sriov-network-device-plugin restarts
2 participants