-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement RDMA subsystem mode change #666
Conversation
Thanks for your PR,
To skip the vendors CIs use one of:
|
c986251
to
d3641ab
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
d3641ab
to
a0af988
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
a0af988
to
a3ce3a3
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
a3ce3a3
to
3952e04
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
3952e04
to
bcb0804
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
Pull Request Test Coverage Report for Build 11229466077Details
💛 - Coveralls |
pkg/systemd/systemd.go
Outdated
newFile := false | ||
// remove the device plugin revision as we don't need it here | ||
newState.Spec.DpConfigVersion = "" | ||
|
||
// shared mode is a default on OS | ||
rdmaMode := consts.RdmaSubsystemModeShared |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should try to query/change mode only in case if rdmaMode parameter is explicitly set in the poolConfig, to provide a safer behavior for ENVs which doesn't use RDMA.
@@ -152,6 +152,17 @@ func phasePre(setupLog logr.Logger, conf *systemd.SriovConfig, hostHelpers helpe | |||
hostHelpers.TryEnableTun() | |||
hostHelpers.TryEnableVhostNet() | |||
|
|||
rdmaSubsystem, err := hostHelpers.GetRDMASubsystem() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should execute this logic only if mode configuration is explicitly requested by a user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
// +kubebuilder:validation:Enum=shared;exclusive | ||
// RDMA subsystem. Allowed value "shared", "exclusive". | ||
RdmaMode string `json:"rdmaMode,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This option is only valid for systemd mode?
Do we want to document this somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done as log message in a SriovNetworkPoolConfig controller
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@e0ne cant we set this using module parameter ?
bcb0804
to
79013d7
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
79013d7
to
f60fdf7
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added few additional comments
if conf.RdmaMode != "" { | ||
rdmaSubsystem, err := hostHelpers.GetRDMASubsystem() | ||
if err != nil { | ||
setupLog.Error(err, "failed to get RDMA subsystem mode") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If conf.RdmaMode is not empty string, then the user explicitly requested RDMA mode configuration. I think we can return error in this case. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
if rdmaSubsystem != conf.RdmaMode { | ||
err = hostHelpers.SetRDMASubsystem(conf.RdmaMode) | ||
if err != nil { | ||
setupLog.Error(err, "failed to set RDMA subsystem mode") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to return error here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we want, thanks!
pkg/host/internal/kernel/kernel.go
Outdated
@@ -522,6 +522,34 @@ func (k *kernel) InstallRDMA(packageManager string) error { | |||
return nil | |||
} | |||
|
|||
func (k *kernel) GetRDMASubsystem() (string, error) { | |||
log.Log.Info("GetRDMASubsystem(): retrieving RDMA subsystem mode") | |||
chrootDefinition := utils.GetChrootExtension() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have helper to enter chroot (part of utilsHelper). Do we want to use it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd got the same implementation in all `kernel' methods. Let's do it in a scope of a separate PR
f60fdf7
to
a5f0d3b
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
Thanks for your PR,
To skip the vendors CIs use one of:
|
feb4bd0
to
ab92392
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
ab92392
to
f37c6d1
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
@SchSeba could you please review this PR? |
f37c6d1
to
3d19033
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
Thanks for your PR,
To skip the vendors CIs use one of:
|
pkg/utils/cluster.go
Outdated
@@ -161,3 +171,94 @@ func AnnotateNode(ctx context.Context, nodeName string, key, value string, c cli | |||
|
|||
return AnnotateObject(ctx, node, key, value, c) | |||
} | |||
|
|||
func FindNodePoolConfig(ctx context.Context, node *corev1.Node, c client.Client) (*sriovnetworkv1.SriovNetworkPoolConfig, []corev1.Node, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add docstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also im thinking we should have two functions:
- find node pool for node
- find nodes for node pool (with special handling for case where default node pool was provided)
WDYT ?
Also please add UT for whatever we end up with
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
pkg/utils/cluster.go
Outdated
@@ -26,6 +28,14 @@ const ( | |||
controlPlaneNodeLabelKey = "node-role.kubernetes.io/control-plane" | |||
) | |||
|
|||
var ( | |||
oneNode = intstr.FromInt32(1) | |||
defaultNpcl = &sriovnetworkv1.SriovNetworkPoolConfig{Spec: sriovnetworkv1.SriovNetworkPoolConfigSpec{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we use full name here ? e.g defaultPoolConfig ?
also the 'l' at the end is not related
pkg/utils/cluster.go
Outdated
return nil, nil, err | ||
} | ||
|
||
// list all the nodes that are also part of this pool and return them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for those nodes why arent we validating they match exactly one ncp ? like in L223
pkg/daemon/daemon.go
Outdated
@@ -420,6 +421,13 @@ func (dn *Daemon) nodeStateSyncHandler() error { | |||
// When using systemd configuration we write the file | |||
if vars.UsingSystemdMode { | |||
log.Log.V(0).Info("nodeStateSyncHandler(): writing systemd config file to host") | |||
// get node object | |||
node := &corev1.Node{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont see node is being used in this scope.
@@ -92,6 +92,23 @@ spec: | |||
mountPath: /host/etc/os-release | |||
readOnly: true | |||
{{- end }} | |||
{{- if .RDMACNIImage }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi please rebase this PR now that we merged the rdma-cni deployment
@@ -152,6 +152,21 @@ func phasePre(setupLog logr.Logger, conf *systemd.SriovConfig, hostHelpers helpe | |||
hostHelpers.TryEnableTun() | |||
hostHelpers.TryEnableVhostNet() | |||
|
|||
if conf.Spec.System.RdmaMode != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove this one as we do the configure via the modeprobe file
@@ -114,10 +115,15 @@ type OVSUplinkConfigExt struct { | |||
Interface OVSInterfaceConfig `json:"interface,omitempty"` | |||
} | |||
|
|||
type System struct { | |||
RdmaMode string `json:"rdmaMode,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can add here also
// +kubebuilder:validation:Enum=shared;exclusive
// RDMA subsystem. Allowed value "shared", "exclusive".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -269,6 +270,13 @@ func (r *SriovNetworkNodePolicyReconciler) syncAllSriovNetworkNodeStates(ctx con | |||
ns.Name = node.Name | |||
ns.Namespace = vars.Namespace | |||
j, _ := json.Marshal(ns) | |||
netPoolConfig, _, err := utils.FindNodePoolConfig(context.Background(), &node, r.Client) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use the context from the function don't create a new one
@@ -73,6 +73,19 @@ func (r *SriovNetworkPoolConfigReconciler) Reconcile(ctx context.Context, req ct | |||
return reconcile.Result{}, err | |||
} | |||
|
|||
// RdmaMode could be set in systemd mode only | |||
if instance.Spec.RdmaMode != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove this one as we support this on both modes
pkg/host/internal/kernel/kernel.go
Outdated
@@ -522,6 +523,29 @@ func (k *kernel) InstallRDMA(packageManager string) error { | |||
return nil | |||
} | |||
|
|||
func (k *kernel) DiscoverRDMASubsystem() (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can move this function to the network or sriov package
pkg/host/internal/kernel/kernel.go
Outdated
@@ -522,6 +523,29 @@ func (k *kernel) InstallRDMA(packageManager string) error { | |||
return nil | |||
} | |||
|
|||
func (k *kernel) DiscoverRDMASubsystem() (string, error) { | |||
log.Log.Info("DiscoverRDMASubsystem(): retrieving RDMA subsystem mode") | |||
subsystem, err := netlink.RdmaSystemGetNetnsMode() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use the netlink interface in the project so we can have a mock for it on unit tests
pkg/host/internal/kernel/kernel.go
Outdated
return subsystem, nil | ||
} | ||
|
||
func (k *kernel) SetRDMASubsystem(mode string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this function is no needed now that we use the modprobe file
pkg/utils/cluster.go
Outdated
@@ -161,3 +171,94 @@ func AnnotateNode(ctx context.Context, nodeName string, key, value string, c cli | |||
|
|||
return AnnotateObject(ctx, node, key, value, c) | |||
} | |||
|
|||
func FindNodePoolConfig(ctx context.Context, node *corev1.Node, c client.Client) (*sriovnetworkv1.SriovNetworkPoolConfig, []corev1.Node, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
e0ae318
to
8ce9b72
Compare
Hi @e0ne can you please rebase the PR? |
done |
e294415
to
288e028
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work!
I left some small comments
controllers/drain_controller.go
Outdated
} | ||
return defaultNpcl, defaultNodeLists, nil | ||
} | ||
return utils.FindNodePoolConfig(ctx, node, dr.Client) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we put this in the helper of the controllers?
I don't want to utils to start growing again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it makes sense. done
@@ -272,6 +273,13 @@ func (r *SriovNetworkNodePolicyReconciler) syncAllSriovNetworkNodeStates(ctx con | |||
ns.Name = node.Name | |||
ns.Namespace = vars.Namespace | |||
j, _ := json.Marshal(ns) | |||
netPoolConfig, _, err := utils.FindNodePoolConfig(ctx, &node, r.Client) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a general todo here we should have in memory map I think for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please elaborate on this?
pkg/host/internal/network/network.go
Outdated
@@ -23,6 +24,8 @@ import ( | |||
"github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/vars" | |||
) | |||
|
|||
var ManifestsPath = "./bindata/manifests" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets put this in consts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not needed anymore, so I deleted it
pkg/host/internal/network/network.go
Outdated
modeValue = 0 | ||
} | ||
config := fmt.Sprintf("options ib_core netns_mode=%d\n", modeValue) | ||
err := os.WriteFile("/etc/modprobe.d/ib_core.conf", []byte(config), 0644) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use the getExtention here so we know if we are not inside a chroot
pkg/host/internal/network/network.go
Outdated
return fmt.Errorf("failed to write ib_core config: %v", err) | ||
} | ||
|
||
err = os.WriteFile(path.Join(consts.Chroot, "/etc/modprobe.d/ib_core.conf"), []byte(config), 0644) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like a duplicate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rebase issue, it's deleted now
pkg/render/render.go
Outdated
@@ -77,6 +77,14 @@ func RenderDir(manifestDir string, d *RenderData) ([]*unstructured.Unstructured, | |||
return out, nil | |||
} | |||
|
|||
func RenderToString(path string, d *RenderData) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this function where we use it?
pkg/utils/cluster.go
Outdated
@@ -161,3 +171,94 @@ func AnnotateNode(ctx context.Context, nodeName string, key, value string, c cli | |||
|
|||
return AnnotateObject(ctx, node, key, value, c) | |||
} | |||
|
|||
func FindNodePoolConfig(ctx context.Context, node *corev1.Node, c client.Client) (*sriovnetworkv1.SriovNetworkPoolConfig, []corev1.Node, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please move this function to the helpers in controllers better then adding more stuff to utils
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deleted
288e028
to
596e73d
Compare
596e73d
to
4e5c92b
Compare
4e5c92b
to
60432e0
Compare
@@ -272,6 +272,13 @@ func (r *SriovNetworkNodePolicyReconciler) syncAllSriovNetworkNodeStates(ctx con | |||
ns.Name = node.Name | |||
ns.Namespace = vars.Namespace | |||
j, _ := json.Marshal(ns) | |||
netPoolConfig, _, err := findNodePoolConfig(ctx, &node, r.Client) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: move this b4 L274 so j
contains rdmamode information ?
@@ -272,6 +272,13 @@ func (r *SriovNetworkNodePolicyReconciler) syncAllSriovNetworkNodeStates(ctx con | |||
ns.Name = node.Name | |||
ns.Namespace = vars.Namespace | |||
j, _ := json.Marshal(ns) | |||
netPoolConfig, _, err := findNodePoolConfig(ctx, &node, r.Client) | |||
if err != nil { | |||
log.Log.Error(err, "nodeStateSyncHandler(): failed to get SriovNetworkPoolConfig for the current node") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: err msg func name is wrong
@@ -68,6 +68,8 @@ type NetlinkLib interface { | |||
RdmaLinkByName(name string) (*netlink.RdmaLink, error) | |||
// IsLinkAdminStateUp checks if the admin state of a link is up | |||
IsLinkAdminStateUp(link Link) bool | |||
// DiscoverRDMASubsystem returns RDMA subsystem mode | |||
DiscoverRDMASubsystem() (string, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: any chance to stick to the method name from netlink lib ?(RdmaSystemGetNetnsMode)
return subsystem, nil | ||
} | ||
|
||
func (n *network) SetRDMASubsystem(mode string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we make the distinction between: (?)
- mode is "shared"
- mode is "exclusive"
- mode is unspecified (i.e "") which means system default
the latter would mean we need to delete the file.
changing the default value in kernel is a matter of one line change:
https://github.com/torvalds/linux/blob/d3d1556696c1a993eec54ac585fe5bf677e07474/drivers/infiniband/core/device.c#L127
modeValue = 0 | ||
} | ||
config := fmt.Sprintf("options ib_core netns_mode=%d\n", modeValue) | ||
path := filepath.Join(vars.FilesystemRoot, consts.Host, "etc", "modprobe.d", "ib_core.conf") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use a more unique name ? e.g sriov_network_operator_modules_config.conf
ib_core.conf
feels like a file that might exists with some values that we override when re-writing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also i wonder if we should search all conf files and see if the value is already there or we have a conflict and log it.
generally we dont expect this module parameter to be specified in the system.
if mode == "exclusive" { | ||
modeValue = 0 | ||
} | ||
config := fmt.Sprintf("options ib_core netns_mode=%d\n", modeValue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps add some comment to the beginning of the file like
# This file is managed by sriov-network-operator do not edit.
@@ -429,6 +429,16 @@ func (dn *Daemon) nodeStateSyncHandler() error { | |||
reqReboot = reqReboot || r | |||
} | |||
|
|||
if dn.currentNodeState.Status.System.RdmaMode != dn.desiredNodeState.Spec.System.RdmaMode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to handle the case when dn.desiredNodeState.Spec.System.RdmaMode
is empty (system default)
in this case need to delete the file if its present and decide if reboot is needed depending on the current kernel default.
root# modinfo ib_core
filename: /lib/modules/5.15.0-121-generic/kernel/drivers/infiniband/core/ib_core.ko
alias: rdma-netlink-subsys-4
license: Dual BSD/GPL
description: core kernel InfiniBand API
author: Roland Dreier
alias: net-pf-16-proto-20
alias: rdma-netlink-subsys-5
srcversion: C45D89EC6DCCFE96001D79F
depends:
retpoline: Y
intree: Y
name: ib_core
vermagic: 5.15.0-121-generic SMP mod_unload modversions
sig_id: PKCS#7
signer: Build time autogenerated kernel key
sig_key: 5E:7B:57:CA:17:D7:74:58:75:3F:84:AD:DE:07:46:5C:DC:AD:16:4E
sig_hashalgo: sha512
signature: AE:90:AA:07:BB:6C:07:8C:AD:25:51:4B:1A:C6:FC:9F:D1:14:5B:B9:
90:F0:F5:84:E6:85:10:7E:AD:79:B5:04:5E:38:CF:5F:EC:6C:CD:BD:
E5:BD:4D:4A:5D:7F:76:56:5E:DA:F0:C3:EA:63:98:0A:EE:B8:51:06:
42:8F:FD:08:51:28:DC:AD:4A:38:2E:A4:C4:7C:9E:42:4F:37:98:AD:
4D:8F:7F:5C:5C:41:93:27:62:C2:A1:D8:A0:5E:D5:15:25:5A:B9:C6:
8C:4D:17:CC:1F:A1:72:FE:18:5C:08:55:64:E6:A2:A7:2C:DD:57:1D:
03:A1:8C:12:17:76:61:72:E7:F9:A4:8F:F9:26:8F:36:02:8F:C6:56:
7B:A4:9E:6D:1D:ED:28:0E:7A:B5:81:F2:F0:FC:C4:05:0F:37:44:D3:
C6:F4:00:B9:81:E2:32:EB:9B:1B:8E:EF:E5:CA:73:8F:4D:5E:11:80:
51:80:EB:AD:EC:97:2D:30:15:E9:8F:6B:9B:DB:40:5F:89:99:94:B1:
01:16:82:EF:22:01:5A:0F:14:F2:DE:64:68:76:3F:8B:26:F5:E9:97:
E3:7F:DD:23:18:B2:A6:8F:8F:0F:A2:74:E1:B0:18:9F:E0:46:9F:7A:
BE:89:9C:B7:C6:D4:47:64:70:E9:28:69:DC:A1:B0:F9:CB:A3:84:67:
DF:68:A3:3D:E5:93:63:7D:91:A4:86:A9:CC:AA:DA:08:A8:64:97:D5:
CC:BB:13:BB:28:17:87:1B:10:1B:2C:43:A6:0D:A0:05:6F:DB:45:03:
1C:0B:C5:67:37:94:CB:E3:CB:CF:03:6F:81:80:F2:77:E1:FD:09:2A:
8F:0F:FE:EA:C0:B8:CD:14:D2:69:55:0F:2F:82:3D:2D:30:0B:6E:72:
42:0C:F4:AB:6C:F8:D4:CA:45:AF:74:C9:A1:5D:EC:BE:C6:8C:81:4B:
2F:F4:46:EE:F6:28:83:11:B5:0D:EE:38:53:68:EF:1E:AC:AC:A9:B0:
91:C6:76:D4:46:2E:DA:CB:47:66:99:42:84:E2:31:99:35:C2:A5:4B:
04:F8:6A:34:E7:8A:AA:76:F3:83:DF:A8:82:E9:C8:14:05:51:90:F3:
18:31:3D:A7:40:F8:EE:32:B9:F7:C2:01:9F:71:2A:B1:8C:00:34:0F:
F2:7C:DE:50:54:E3:CF:4B:EA:05:43:AF:E3:9D:A1:05:E6:A8:48:EE:
82:B7:6B:06:E3:C5:3D:AA:48:92:63:D8:7B:54:3E:F4:45:C7:5B:F6:
77:97:DD:32:93:ED:AC:DB:AD:EB:24:81:89:24:4F:25:A8:34:EA:63:
A1:D4:FC:D8:B2:B2:41:61:C3:D3:E3:F5
parm: send_queue_size:Size of send queue in number of work requests (int)
parm: recv_queue_size:Size of receive queue in number of work requests (int)
parm: netns_mode:Share device among net namespaces; default=1 (shared) (bool)
parm: force_mr:Force usage of MRs for RDMA READ/WRITE operations (bool)
maybe parse the cmd above for:
parm: netns_mode:Share device among net namespaces; default=1 (shared) (bool)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for now the default is shared, i dont know how likely its to change. maybe we can assume shared is the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in general another point we need to take care here is when we create/update a pool it will not do anything we need to wait for the nodePolicy controller to apply this.
we have two options here:
- we pass a channel so the pool controller can trigger a policy one
- we handle the system section in the pool controller directly
I must say I am not sure what is the best option @adrianchiris @zeeke WDYT?
/hold this doesn't work on OCP it puts the node in boot loop because the mode didn't change. |
we can watch on pool obj as well and trigger reconcile event. |
writing files under |
sure that also can work on we don't expect machine changes on it |
I checked with our kernel team on OCP platform. @e0ne let me know if you want me to work on this and push the changes for you |
closing this one in favor or #799 |
Now it's possible to configure RDMA subsystem mode using SR-IOV Network Operator in systemd mode.
We can't configure RDMA subsystem in a daemon mode because it should be done on host before any network namespace is created.