Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[windows] Port may be failed with error "could not add network device xxx to ofproto (Invalid argument)" in containerd environment #343

Open
twofish197 opened this issue Oct 2, 2024 · 3 comments

Comments

@twofish197
Copy link

On Windows platform, we created Deployed Windows Large Cluster with 3 Ubuntu CP node and 100+ Windows Worker Nodes,

On Windows node ovs is to create one containerd pod on host to create vNIC, and set ports type to internal to support the connections between ovs and pods. It is found port creating error on some Windows node(2% -3%) during the test. Below is the output of CMD "ovs-vsctl show".

Bridge br-int
datapath_type: system
Port antrea-gw0
Interface antrea-gw0
type: internal
Port antrea-tun0
Interface antrea-tun0
type: geneve
options: {key=flow, local_ip="10.244.3.24", remote_ip=flow}
Port eth0
Interface eth0
Port br-int
Interface br-int
type: internal
Port vsphere--c546b0
Interface vsphere--c546b0
type: internal
error: "could not add network device vsphere--c546b0 to ofproto (Invalid argument)"

After any ovsdb-server config change, it could be recovered.  Below is the CMD used(just one example). Restart ovs-vswitchd

could also fix this issue.

ovs-vsctl.exe --no-wait add-port br-int podvif38 -- set interface podvif38 
ovs-vsctl.exe --no-wait del-port br-int podvif38

 Below is the complete the ovs-vswitchd.log on failed node.
@twofish197
Copy link
Author

Attach the failed log for file ovs-vswitchd_failed_port_allocating.log

@twofish197
Copy link
Author

@twofish197
Copy link
Author

After the debugging on some failed windows vm. This issue should be an known issue which does have a fix via commit below.

So it is likely ovs-windows will block some port allocating to avoid some unrecoverble case.

netdev-windows: Add checking when creating netdev with system type on Windows
openvswitch/ovs@1cdc052

Quoting the bug description here,
Some system type port will be created netdev successfully and it will cause conflict as in the dpif side it will be internal type. So finally the port will be created failed and it could not be easily recovered.

With the patch, on Windows the netdev creating will be blocked for system type when the ovs_type got on dpif is internal. More detailed case description is in the reported issue No.262 with link below.
#262

In current ovs windows logic, the failed port adding on ovs does needs the extra config change on ovsdb server. It may be checked if we could add some logic in ovs userspace to do the resyncing when ovs windows is blocking some port adding.
It will be tracked by this upstream issue on ovs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant