Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement RDMA subsystem mode change #666

Closed
wants to merge 1 commit into from

Conversation

e0ne
Copy link
Collaborator

@e0ne e0ne commented Mar 24, 2024

Now it's possible to configure RDMA subsystem mode using SR-IOV Network Operator in systemd mode.

We can't configure RDMA subsystem in a daemon mode because it should be done on host before any network namespace is created.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@coveralls
Copy link

coveralls commented Mar 24, 2024

Pull Request Test Coverage Report for Build 11229466077

Details

  • 84 of 189 (44.44%) changed or added relevant lines in 11 files are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.1%) to 45.063%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/host/internal/lib/netlink/netlink.go 0 3 0.0%
pkg/daemon/writer.go 0 6 0.0%
pkg/host/internal/network/network.go 19 25 76.0%
controllers/sriovnetworknodepolicy_controller.go 0 7 0.0%
pkg/daemon/daemon.go 1 8 12.5%
api/v1/zz_generated.deepcopy.go 2 12 16.67%
pkg/helper/mock/mock_helper.go 0 21 0.0%
pkg/host/mock/mock_host.go 0 21 0.0%
controllers/helper.go 50 74 67.57%
Files with Coverage Reduction New Missed Lines %
pkg/host/internal/network/network.go 2 57.56%
Totals Coverage Status
Change from base Build 11212485228: -0.1%
Covered Lines: 6700
Relevant Lines: 14868

💛 - Coveralls

newFile := false
// remove the device plugin revision as we don't need it here
newState.Spec.DpConfigVersion = ""

// shared mode is a default on OS
rdmaMode := consts.RdmaSubsystemModeShared
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should try to query/change mode only in case if rdmaMode parameter is explicitly set in the poolConfig, to provide a safer behavior for ENVs which doesn't use RDMA.

@@ -152,6 +152,17 @@ func phasePre(setupLog logr.Logger, conf *systemd.SriovConfig, hostHelpers helpe
hostHelpers.TryEnableTun()
hostHelpers.TryEnableVhostNet()

rdmaSubsystem, err := hostHelpers.GetRDMASubsystem()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should execute this logic only if mode configuration is explicitly requested by a user.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


// +kubebuilder:validation:Enum=shared;exclusive
// RDMA subsystem. Allowed value "shared", "exclusive".
RdmaMode string `json:"rdmaMode,omitempty"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option is only valid for systemd mode?
Do we want to document this somehow?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done as log message in a SriovNetworkPoolConfig controller

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link
Collaborator

@ykulazhenkov ykulazhenkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added few additional comments

if conf.RdmaMode != "" {
rdmaSubsystem, err := hostHelpers.GetRDMASubsystem()
if err != nil {
setupLog.Error(err, "failed to get RDMA subsystem mode")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If conf.RdmaMode is not empty string, then the user explicitly requested RDMA mode configuration. I think we can return error in this case. WDYT?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if rdmaSubsystem != conf.RdmaMode {
err = hostHelpers.SetRDMASubsystem(conf.RdmaMode)
if err != nil {
setupLog.Error(err, "failed to set RDMA subsystem mode")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to return error here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we want, thanks!

@@ -522,6 +522,34 @@ func (k *kernel) InstallRDMA(packageManager string) error {
return nil
}

func (k *kernel) GetRDMASubsystem() (string, error) {
log.Log.Info("GetRDMASubsystem(): retrieving RDMA subsystem mode")
chrootDefinition := utils.GetChrootExtension()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have helper to enter chroot (part of utilsHelper). Do we want to use it here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd got the same implementation in all `kernel' methods. Let's do it in a scope of a separate PR

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

github-actions bot commented Apr 3, 2024

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@e0ne
Copy link
Collaborator Author

e0ne commented Apr 15, 2024

@SchSeba could you please review this PR?

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

adrianchiris
adrianchiris previously approved these changes Jun 25, 2024
@@ -161,3 +171,94 @@ func AnnotateNode(ctx context.Context, nodeName string, key, value string, c cli

return AnnotateObject(ctx, node, key, value, c)
}

func FindNodePoolConfig(ctx context.Context, node *corev1.Node, c client.Client) (*sriovnetworkv1.SriovNetworkPoolConfig, []corev1.Node, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add docstring

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also im thinking we should have two functions:

  1. find node pool for node
  2. find nodes for node pool (with special handling for case where default node pool was provided)

WDYT ?

Also please add UT for whatever we end up with

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -26,6 +28,14 @@ const (
controlPlaneNodeLabelKey = "node-role.kubernetes.io/control-plane"
)

var (
oneNode = intstr.FromInt32(1)
defaultNpcl = &sriovnetworkv1.SriovNetworkPoolConfig{Spec: sriovnetworkv1.SriovNetworkPoolConfigSpec{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we use full name here ? e.g defaultPoolConfig ?

also the 'l' at the end is not related

return nil, nil, err
}

// list all the nodes that are also part of this pool and return them
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for those nodes why arent we validating they match exactly one ncp ? like in L223

@@ -420,6 +421,13 @@ func (dn *Daemon) nodeStateSyncHandler() error {
// When using systemd configuration we write the file
if vars.UsingSystemdMode {
log.Log.V(0).Info("nodeStateSyncHandler(): writing systemd config file to host")
// get node object
node := &corev1.Node{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont see node is being used in this scope.

@@ -92,6 +92,23 @@ spec:
mountPath: /host/etc/os-release
readOnly: true
{{- end }}
{{- if .RDMACNIImage }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi please rebase this PR now that we merged the rdma-cni deployment

@@ -152,6 +152,21 @@ func phasePre(setupLog logr.Logger, conf *systemd.SriovConfig, hostHelpers helpe
hostHelpers.TryEnableTun()
hostHelpers.TryEnableVhostNet()

if conf.Spec.System.RdmaMode != "" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this one as we do the configure via the modeprobe file

@@ -114,10 +115,15 @@ type OVSUplinkConfigExt struct {
Interface OVSInterfaceConfig `json:"interface,omitempty"`
}

type System struct {
RdmaMode string `json:"rdmaMode,omitempty"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can add here also

// +kubebuilder:validation:Enum=shared;exclusive
// RDMA subsystem. Allowed value "shared", "exclusive".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -269,6 +270,13 @@ func (r *SriovNetworkNodePolicyReconciler) syncAllSriovNetworkNodeStates(ctx con
ns.Name = node.Name
ns.Namespace = vars.Namespace
j, _ := json.Marshal(ns)
netPoolConfig, _, err := utils.FindNodePoolConfig(context.Background(), &node, r.Client)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use the context from the function don't create a new one

@@ -73,6 +73,19 @@ func (r *SriovNetworkPoolConfigReconciler) Reconcile(ctx context.Context, req ct
return reconcile.Result{}, err
}

// RdmaMode could be set in systemd mode only
if instance.Spec.RdmaMode != "" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this one as we support this on both modes

@@ -522,6 +523,29 @@ func (k *kernel) InstallRDMA(packageManager string) error {
return nil
}

func (k *kernel) DiscoverRDMASubsystem() (string, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can move this function to the network or sriov package

@@ -522,6 +523,29 @@ func (k *kernel) InstallRDMA(packageManager string) error {
return nil
}

func (k *kernel) DiscoverRDMASubsystem() (string, error) {
log.Log.Info("DiscoverRDMASubsystem(): retrieving RDMA subsystem mode")
subsystem, err := netlink.RdmaSystemGetNetnsMode()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use the netlink interface in the project so we can have a mock for it on unit tests

return subsystem, nil
}

func (k *kernel) SetRDMASubsystem(mode string) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function is no needed now that we use the modprobe file

@@ -161,3 +171,94 @@ func AnnotateNode(ctx context.Context, nodeName string, key, value string, c cli

return AnnotateObject(ctx, node, key, value, c)
}

func FindNodePoolConfig(ctx context.Context, node *corev1.Node, c client.Client) (*sriovnetworkv1.SriovNetworkPoolConfig, []corev1.Node, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@SchSeba
Copy link
Collaborator

SchSeba commented Aug 27, 2024

Hi @e0ne can you please rebase the PR?

@e0ne
Copy link
Collaborator Author

e0ne commented Aug 27, 2024

Hi @e0ne can you please rebase the PR?

done

@e0ne e0ne force-pushed the rdma-subsytem-mode branch 2 times, most recently from e294415 to 288e028 Compare August 28, 2024 08:34
Copy link
Collaborator

@SchSeba SchSeba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work!

I left some small comments

}
return defaultNpcl, defaultNodeLists, nil
}
return utils.FindNodePoolConfig(ctx, node, dr.Client)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we put this in the helper of the controllers?
I don't want to utils to start growing again

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it makes sense. done

@@ -272,6 +273,13 @@ func (r *SriovNetworkNodePolicyReconciler) syncAllSriovNetworkNodeStates(ctx con
ns.Name = node.Name
ns.Namespace = vars.Namespace
j, _ := json.Marshal(ns)
netPoolConfig, _, err := utils.FindNodePoolConfig(ctx, &node, r.Client)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a general todo here we should have in memory map I think for this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please elaborate on this?

@@ -23,6 +24,8 @@ import (
"github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/vars"
)

var ManifestsPath = "./bindata/manifests"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets put this in consts

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not needed anymore, so I deleted it

pkg/host/internal/network/network.go Show resolved Hide resolved
modeValue = 0
}
config := fmt.Sprintf("options ib_core netns_mode=%d\n", modeValue)
err := os.WriteFile("/etc/modprobe.d/ib_core.conf", []byte(config), 0644)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use the getExtention here so we know if we are not inside a chroot

return fmt.Errorf("failed to write ib_core config: %v", err)
}

err = os.WriteFile(path.Join(consts.Chroot, "/etc/modprobe.d/ib_core.conf"), []byte(config), 0644)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like a duplicate?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebase issue, it's deleted now

@@ -77,6 +77,14 @@ func RenderDir(manifestDir string, d *RenderData) ([]*unstructured.Unstructured,
return out, nil
}

func RenderToString(path string, d *RenderData) (string, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this function where we use it?

@@ -161,3 +171,94 @@ func AnnotateNode(ctx context.Context, nodeName string, key, value string, c cli

return AnnotateObject(ctx, node, key, value, c)
}

func FindNodePoolConfig(ctx context.Context, node *corev1.Node, c client.Client) (*sriovnetworkv1.SriovNetworkPoolConfig, []corev1.Node, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please move this function to the helpers in controllers better then adding more stuff to utils

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted

@@ -272,6 +272,13 @@ func (r *SriovNetworkNodePolicyReconciler) syncAllSriovNetworkNodeStates(ctx con
ns.Name = node.Name
ns.Namespace = vars.Namespace
j, _ := json.Marshal(ns)
netPoolConfig, _, err := findNodePoolConfig(ctx, &node, r.Client)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move this b4 L274 so j contains rdmamode information ?

@@ -272,6 +272,13 @@ func (r *SriovNetworkNodePolicyReconciler) syncAllSriovNetworkNodeStates(ctx con
ns.Name = node.Name
ns.Namespace = vars.Namespace
j, _ := json.Marshal(ns)
netPoolConfig, _, err := findNodePoolConfig(ctx, &node, r.Client)
if err != nil {
log.Log.Error(err, "nodeStateSyncHandler(): failed to get SriovNetworkPoolConfig for the current node")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: err msg func name is wrong

@@ -68,6 +68,8 @@ type NetlinkLib interface {
RdmaLinkByName(name string) (*netlink.RdmaLink, error)
// IsLinkAdminStateUp checks if the admin state of a link is up
IsLinkAdminStateUp(link Link) bool
// DiscoverRDMASubsystem returns RDMA subsystem mode
DiscoverRDMASubsystem() (string, error)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: any chance to stick to the method name from netlink lib ?(RdmaSystemGetNetnsMode)

return subsystem, nil
}

func (n *network) SetRDMASubsystem(mode string) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we make the distinction between: (?)

  1. mode is "shared"
  2. mode is "exclusive"
  3. mode is unspecified (i.e "") which means system default

the latter would mean we need to delete the file.

changing the default value in kernel is a matter of one line change:
https://github.com/torvalds/linux/blob/d3d1556696c1a993eec54ac585fe5bf677e07474/drivers/infiniband/core/device.c#L127

modeValue = 0
}
config := fmt.Sprintf("options ib_core netns_mode=%d\n", modeValue)
path := filepath.Join(vars.FilesystemRoot, consts.Host, "etc", "modprobe.d", "ib_core.conf")
Copy link
Collaborator

@adrianchiris adrianchiris Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use a more unique name ? e.g sriov_network_operator_modules_config.conf
ib_core.conf feels like a file that might exists with some values that we override when re-writing it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also i wonder if we should search all conf files and see if the value is already there or we have a conflict and log it.

generally we dont expect this module parameter to be specified in the system.

if mode == "exclusive" {
modeValue = 0
}
config := fmt.Sprintf("options ib_core netns_mode=%d\n", modeValue)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps add some comment to the beginning of the file like

# This file is managed by sriov-network-operator do not edit.

@@ -429,6 +429,16 @@ func (dn *Daemon) nodeStateSyncHandler() error {
reqReboot = reqReboot || r
}

if dn.currentNodeState.Status.System.RdmaMode != dn.desiredNodeState.Spec.System.RdmaMode {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to handle the case when dn.desiredNodeState.Spec.System.RdmaMode is empty (system default)

in this case need to delete the file if its present and decide if reboot is needed depending on the current kernel default.

root# modinfo ib_core
filename:       /lib/modules/5.15.0-121-generic/kernel/drivers/infiniband/core/ib_core.ko
alias:          rdma-netlink-subsys-4
license:        Dual BSD/GPL
description:    core kernel InfiniBand API
author:         Roland Dreier
alias:          net-pf-16-proto-20
alias:          rdma-netlink-subsys-5
srcversion:     C45D89EC6DCCFE96001D79F
depends:
retpoline:      Y
intree:         Y
name:           ib_core
vermagic:       5.15.0-121-generic SMP mod_unload modversions
sig_id:         PKCS#7
signer:         Build time autogenerated kernel key
sig_key:        5E:7B:57:CA:17:D7:74:58:75:3F:84:AD:DE:07:46:5C:DC:AD:16:4E
sig_hashalgo:   sha512
signature:      AE:90:AA:07:BB:6C:07:8C:AD:25:51:4B:1A:C6:FC:9F:D1:14:5B:B9:
                90:F0:F5:84:E6:85:10:7E:AD:79:B5:04:5E:38:CF:5F:EC:6C:CD:BD:
                E5:BD:4D:4A:5D:7F:76:56:5E:DA:F0:C3:EA:63:98:0A:EE:B8:51:06:
                42:8F:FD:08:51:28:DC:AD:4A:38:2E:A4:C4:7C:9E:42:4F:37:98:AD:
                4D:8F:7F:5C:5C:41:93:27:62:C2:A1:D8:A0:5E:D5:15:25:5A:B9:C6:
                8C:4D:17:CC:1F:A1:72:FE:18:5C:08:55:64:E6:A2:A7:2C:DD:57:1D:
                03:A1:8C:12:17:76:61:72:E7:F9:A4:8F:F9:26:8F:36:02:8F:C6:56:
                7B:A4:9E:6D:1D:ED:28:0E:7A:B5:81:F2:F0:FC:C4:05:0F:37:44:D3:
                C6:F4:00:B9:81:E2:32:EB:9B:1B:8E:EF:E5:CA:73:8F:4D:5E:11:80:
                51:80:EB:AD:EC:97:2D:30:15:E9:8F:6B:9B:DB:40:5F:89:99:94:B1:
                01:16:82:EF:22:01:5A:0F:14:F2:DE:64:68:76:3F:8B:26:F5:E9:97:
                E3:7F:DD:23:18:B2:A6:8F:8F:0F:A2:74:E1:B0:18:9F:E0:46:9F:7A:
                BE:89:9C:B7:C6:D4:47:64:70:E9:28:69:DC:A1:B0:F9:CB:A3:84:67:
                DF:68:A3:3D:E5:93:63:7D:91:A4:86:A9:CC:AA:DA:08:A8:64:97:D5:
                CC:BB:13:BB:28:17:87:1B:10:1B:2C:43:A6:0D:A0:05:6F:DB:45:03:
                1C:0B:C5:67:37:94:CB:E3:CB:CF:03:6F:81:80:F2:77:E1:FD:09:2A:
                8F:0F:FE:EA:C0:B8:CD:14:D2:69:55:0F:2F:82:3D:2D:30:0B:6E:72:
                42:0C:F4:AB:6C:F8:D4:CA:45:AF:74:C9:A1:5D:EC:BE:C6:8C:81:4B:
                2F:F4:46:EE:F6:28:83:11:B5:0D:EE:38:53:68:EF:1E:AC:AC:A9:B0:
                91:C6:76:D4:46:2E:DA:CB:47:66:99:42:84:E2:31:99:35:C2:A5:4B:
                04:F8:6A:34:E7:8A:AA:76:F3:83:DF:A8:82:E9:C8:14:05:51:90:F3:
                18:31:3D:A7:40:F8:EE:32:B9:F7:C2:01:9F:71:2A:B1:8C:00:34:0F:
                F2:7C:DE:50:54:E3:CF:4B:EA:05:43:AF:E3:9D:A1:05:E6:A8:48:EE:
                82:B7:6B:06:E3:C5:3D:AA:48:92:63:D8:7B:54:3E:F4:45:C7:5B:F6:
                77:97:DD:32:93:ED:AC:DB:AD:EB:24:81:89:24:4F:25:A8:34:EA:63:
                A1:D4:FC:D8:B2:B2:41:61:C3:D3:E3:F5
parm:           send_queue_size:Size of send queue in number of work requests (int)
parm:           recv_queue_size:Size of receive queue in number of work requests (int)
parm:           netns_mode:Share device among net namespaces; default=1 (shared) (bool)
parm:           force_mr:Force usage of MRs for RDMA READ/WRITE operations (bool)

maybe parse the cmd above for:
parm: netns_mode:Share device among net namespaces; default=1 (shared) (bool)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now the default is shared, i dont know how likely its to change. maybe we can assume shared is the default.

Copy link
Collaborator

@SchSeba SchSeba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general another point we need to take care here is when we create/update a pool it will not do anything we need to wait for the nodePolicy controller to apply this.

we have two options here:

  1. we pass a channel so the pool controller can trigger a policy one
  2. we handle the system section in the pool controller directly

I must say I am not sure what is the best option @adrianchiris @zeeke WDYT?

@SchSeba
Copy link
Collaborator

SchSeba commented Oct 10, 2024

/hold

this doesn't work on OCP it puts the node in boot loop because the mode didn't change.

@github-actions github-actions bot added the hold label Oct 10, 2024
@adrianchiris
Copy link
Collaborator

in general another point we need to take care here is when we create/update a pool it will not do anything we need to wait for the nodePolicy controller to apply this.

we have two options here:

  1. we pass a channel so the pool controller can trigger a policy one
  2. we handle the system section in the pool controller directly

I must say I am not sure what is the best option @adrianchiris @zeeke WDYT?

we can watch on pool obj as well and trigger reconcile event.

@adrianchiris
Copy link
Collaborator

this doesn't work on OCP it puts the node in boot loop because the mode didn't change.

writing files under /etc/modprobe.d is not persistent in coreOS ?

@SchSeba
Copy link
Collaborator

SchSeba commented Oct 10, 2024

we can watch on pool obj as well and trigger reconcile event.

sure that also can work on we don't expect machine changes on it

@SchSeba
Copy link
Collaborator

SchSeba commented Oct 10, 2024

writing files under /etc/modprobe.d is not persistent in coreOS ?

I checked with our kernel team on OCP platform.
the only way to do it for OCP will be to add it to the kernel args like we do with iommu :(
we can't use the etc/modprobe.d files

@e0ne let me know if you want me to work on this and push the changes for you

@SchSeba
Copy link
Collaborator

SchSeba commented Oct 31, 2024

closing this one in favor or #799

@SchSeba SchSeba closed this Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants