Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't delete OVS datapath port after renaming interface #284

Open
ghost opened this issue Jul 18, 2023 · 10 comments
Open

Can't delete OVS datapath port after renaming interface #284

ghost opened this issue Jul 18, 2023 · 10 comments

Comments

@ghost
Copy link

ghost commented Jul 18, 2023

Hi,

We are hitting an issue in which we can't add a port to OVS anymore, because the netdev has been renamed while it was managed by OVS.

Steps to reproduce:

Consider the following host setup:

root@ubuntu-Virtual-Machine:/home/ubuntu# uname -arn
Linux ubuntu-Virtual-Machine 5.15.0-67-generic #74~20.04.1-Ubuntu SMP Wed Feb 22 14:52:34 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
root@ubuntu-Virtual-Machine:/home/ubuntu# ip -br a
lo               UNKNOWN        127.0.0.1/8 ::1/128
eth0             UP             172.23.172.197/20 fe80::e6b5:302:7570:730e/64
docker0          DOWN           172.17.0.1/16
eth1             DOWN

and on that host we are trying to do the following:

BR=br-temp
PORT=eth1
NAME=new_eth1
ovs-vsctl add-br $BR
ip l set $PORT down
ip l set $PORT name $NAME
ovs-vsctl add-port $BR $NAME
ip l set $NAME down
ip l set $NAME name $PORT
ovs-vsctl del-port $BR $NAME 
ovs-vsctl add-port $BR $PORT

This will bring us to the error we are hitting:

ovs-vsctl: Error detected while setting up 'eth1': could not add network device eth1 to ofproto (File exists).  See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch".

root@ubuntu-Virtual-Machine:/home/ubuntu# ovs-vsctl show
758a2d65-0a95-4402-a3dc-f332636b1318
    Bridge br-temp
        Port br-temp
            Interface br-temp
                type: internal
        Port eth1
            Interface eth1
                error: "could not add network device eth1 to ofproto (File exists)"
    ovs_version: "3.0.90"
root@ubuntu-Virtual-Machine:/home/ubuntu# ovs-dpctl show
system@ovs-system:
  lookups: hit:0 missed:0 lost:0
  flows: 0
  masks: hit:0 total:0 hit/pkt:0.00
  port 0: ovs-system (internal)
  port 1: br-temp (internal)
  port 2: eth1

Trying to delete the port eth1 from the userspace, or even forcing it from the datapath does not work.

root@ubuntu-Virtual-Machine:/home/ubuntu# ovs-vsctl del-port eth1
root@ubuntu-Virtual-Machine:/home/ubuntu# ovs-vsctl show
758a2d65-0a95-4402-a3dc-f332636b1318
    Bridge br-temp
        Port br-temp
            Interface br-temp
                type: internal
    ovs_version: "3.0.90"
root@ubuntu-Virtual-Machine:/home/ubuntu# ovs-dpctl show
system@ovs-system:
  lookups: hit:0 missed:0 lost:0
  flows: 0
  masks: hit:0 total:0 hit/pkt:0.00
  port 0: ovs-system (internal)
  port 1: br-temp (internal)
  port 2: eth1
root@ubuntu-Virtual-Machine:/home/ubuntu# ovs-dpctl del-if ovs-system eth1
ovs-dpctl: no port named eth1

CC: @igsilya

@ghost
Copy link
Author

ghost commented Jul 18, 2023

CC: @dickmanmaor, @girishmg

@girishmg
Copy link

this is happening because we are renaming the device owned by OVS using ip-link(1m) command and that rename is reflected on netdev unbeknownst to OVS. this causes a disconnect between ovs and kernel.

the only way to get out of this is to restart ovs-vswitchd @aserdean ?

@igsilya
Copy link
Member

igsilya commented Jul 18, 2023

Yeah, renaming the interface while it is attached to OVS is not a great idea. The main reason being that OVS database stores port names, i.e. the configuration is applied to a port by name in most cases. And many other operations inside ovs-vswitchd will expect the correct name.

Re-start may fix the problem, because OVS will remove all the ports it doesn't recognize on start up. I'm not sure why dpctl call doesn't work though. There might be some clash on the kernel side as well, I guess. You may try removing the datapath port by the port number instead: ovs-appctl dpctl/del-if system@ovs-system 2. I didn't try, but I hope that works.

Note: Please, don't use ovs-dpctl utility while ovs-vswitchd is running, use ovs-appctl dpctl/ instead.

@girishmg
Copy link

girishmg commented Jul 18, 2023

You may try removing the datapath port by the port number instead: ovs-appctl dpctl/del-if system@ovs-system 2. I didn't try, but I hope that works.

Using the port number works. Thanks @igsilya.

The larger question is whether we can do anything in OVS to prevent this? When we delete a OVS port from an OVS bridge should we try to also delete it from the datapath using the port number instead of the name? (do both, first try the name and then try the port number).

@ghost
Copy link
Author

ghost commented Jul 18, 2023

@girishmg, @igsilya thanks for the quick feedback.
Both solutions work:

  • restarting the systemd service openvswitch-switch(I'm unsure what it does behind the scenes besides restarting ovs-vswitchd)
  • deleting the port using the datapath number.

As Girish was mentioning above we should have something from the OVS side to mitigate this.

I was wondering if we could enforce operation not supported when renaming a interface which is currently managed by OVS or by auto-removing the port from the datapath since its state has changed.

FWIW we faced somewhat the same challenge on Windows and there we added two properties to figure out if port is added by both OVS and the kernel.

L.E. Sorry for using ovs-dpctl command directly, I forgot it is deprecated and it is a bad habit.

@igsilya
Copy link
Member

igsilya commented Jul 19, 2023

restarting the systemd service openvswitch-switch(I'm unsure what it does behind the scenes besides restarting ovs-vswitchd)

On restart ovs-vswitchd will dump all the ports from the datapath and remove ones that it doesn't recognize. This is done for a case where users change the database while ovs-vswitchd is down.

deleting the port using the datapath number.

As Girish was mentioning above we should have something from the OVS side to mitigate this.

I was wondering if we could enforce operation not supported when renaming a interface which is currently managed by OVS or by auto-removing the port from the datapath since its state has changed.

I don't think it's a good thing to do as we will most likely just shift the problem to another module that is not expecting the name change to fail.

One possible solution from the OVS side is to try to perform deletion of stale datapath ports every once in a while. For example, we could trigger removal of stale ports while processing ovs-vsctl port-del if we can't find a corresponding datapath port to remove. It's still a bit tricky, because sometimes it's a legitimate case where datapath port is getting removed underneath us (e.g. tunctl -d or unload of a kernel module that is responsible for a virtual interface).

I'm still not sure why the del-if with a name fails though, because it supposed to talk directly to the kernel and if the current name is used it should be able to find that port by name. Strange.

@dickmanmaor
Copy link

I'm still not sure why the del-if with a name fails though, because it supposed to talk directly to the kernel and if the current name is used it should be able to find that port by name. Strange.

In the kernel, the ovs vports are stored in hash table [1] which use the dev name as a key [2].
When vport is created it is added to hash bucket according to the original name, when calling del-if with the new name it fails because the new name is mapped to different hash bucket which doesn't contain the vport.

[1] static struct hlist_head *dev_table;

[2]
static struct hlist_head *hash_bucket(const struct net *net, const char *name)
{
unsigned int hash = jhash(name, strlen(name), (unsigned long) net);
return &dev_table[hash & (VPORT_HASH_BUCKETS - 1)];
}

@igsilya
Copy link
Member

igsilya commented Jul 19, 2023

Hmm. Thanks, @dickmanmaor !
So, it's a kernel bug then. I suppose, at least net/openvswitch/dp_notify.c:dp_device_event() should handle NETDEV_CHANGENAME by re-hashing vport table and notifying the userspace about the port change via dp_vport_genl_family.

We may also consider just removing the port from a datapath from the kernel side on rename. But garbage collection from userspace might be a better call.

@igsilya
Copy link
Member

igsilya commented Jul 19, 2023

I posted a patch that, I hope, should fix most of the issues: https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/
Would be great if you can test it out in your setup.

The kernel part still needs fixing though.

ovsrobot pushed a commit to ovsrobot/ovs that referenced this issue Jul 19, 2023
OVS configuration is based on port names and OpenFlow port numbers.
Names are stored in the database and translated later to OF ports.
On the datapath level, each port has a name and a datapath port number.
Port name in the database has to match datapath port name, unless it's
a tunnel port.

If a datapath port is renamed with 'ip link set DEV name NAME',
ovs-vswitchd will wake up, destroy all the OpenFlow-related structures
and clean other things up.  This is because the port no longer
represents the port from a database due to a name difference.

However, ovs-vswitch will not actually remove the port from the
datapath, because it thinks that this port is no longer there.  This
is happening because lookup is performed by name and the name have
changed.  As a result we have a port in a datapath that is not related
to any port known to ovs-vswitchd and ovs-vswitchd can't remove it.
This port also occupies a datapath port number and prevents the port
to be added back with a new name.

Fix that by performing lookup by a datapath port number during the port
destruction.  The name was used only to avoid spurious warnings in a
normal case where the port was successfully deleted by other parts of
OVS.  Adding an extra flag to avoid these warnings instead.

Fixes: 02f8d64 ("ofproto-dpif: Query port existence by name to prevent warnings.")
Reported-at: openvswitch/ovs-issues#284
Signed-off-by: Ilya Maximets <[email protected]>
Signed-off-by: 0-day Robot <[email protected]>
@ghost
Copy link
Author

ghost commented Jul 20, 2023

Thanks for the quick fix!
I reviewed and tested the patch.

I'm unsure if I should leave the issue open until the kernel counterpart gets fixed also.

igsilya added a commit to igsilya/ovs that referenced this issue Jul 31, 2023
OVS configuration is based on port names and OpenFlow port numbers.
Names are stored in the database and translated later to OF ports.
On the datapath level, each port has a name and a datapath port number.
Port name in the database has to match datapath port name, unless it's
a tunnel port.

If a datapath port is renamed with 'ip link set DEV name NAME',
ovs-vswitchd will wake up, destroy all the OpenFlow-related structures
and clean other things up.  This is because the port no longer
represents the port from a database due to a name difference.

However, ovs-vswitch will not actually remove the port from the
datapath, because it thinks that this port is no longer there.  This
is happening because lookup is performed by name and the name have
changed.  As a result we have a port in a datapath that is not related
to any port known to ovs-vswitchd and ovs-vswitchd can't remove it.
This port also occupies a datapath port number and prevents the port
to be added back with a new name.

Fix that by performing lookup by a datapath port number during the port
destruction.  The name was used only to avoid spurious warnings in a
normal case where the port was successfully deleted by other parts of
OVS.  Adding an extra flag to avoid these warnings instead.

Fixes: 02f8d64 ("ofproto-dpif: Query port existence by name to prevent warnings.")
Reported-at: openvswitch/ovs-issues#284
Tested-by: Alin-Gabriel Serdean <[email protected]>
Acked-by: Alin-Gabriel Serdean <[email protected]>
Acked-by: Aaron Conole <[email protected]>
Signed-off-by: Ilya Maximets <[email protected]>
igsilya added a commit to igsilya/ovs that referenced this issue Jul 31, 2023
OVS configuration is based on port names and OpenFlow port numbers.
Names are stored in the database and translated later to OF ports.
On the datapath level, each port has a name and a datapath port number.
Port name in the database has to match datapath port name, unless it's
a tunnel port.

If a datapath port is renamed with 'ip link set DEV name NAME',
ovs-vswitchd will wake up, destroy all the OpenFlow-related structures
and clean other things up.  This is because the port no longer
represents the port from a database due to a name difference.

However, ovs-vswitch will not actually remove the port from the
datapath, because it thinks that this port is no longer there.  This
is happening because lookup is performed by name and the name have
changed.  As a result we have a port in a datapath that is not related
to any port known to ovs-vswitchd and ovs-vswitchd can't remove it.
This port also occupies a datapath port number and prevents the port
to be added back with a new name.

Fix that by performing lookup by a datapath port number during the port
destruction.  The name was used only to avoid spurious warnings in a
normal case where the port was successfully deleted by other parts of
OVS.  Adding an extra flag to avoid these warnings instead.

Fixes: 02f8d64 ("ofproto-dpif: Query port existence by name to prevent warnings.")
Reported-at: openvswitch/ovs-issues#284
Tested-by: Alin-Gabriel Serdean <[email protected]>
Acked-by: Alin-Gabriel Serdean <[email protected]>
Acked-by: Aaron Conole <[email protected]>
Signed-off-by: Ilya Maximets <[email protected]>
igsilya added a commit to igsilya/ovs that referenced this issue Jul 31, 2023
OVS configuration is based on port names and OpenFlow port numbers.
Names are stored in the database and translated later to OF ports.
On the datapath level, each port has a name and a datapath port number.
Port name in the database has to match datapath port name, unless it's
a tunnel port.

If a datapath port is renamed with 'ip link set DEV name NAME',
ovs-vswitchd will wake up, destroy all the OpenFlow-related structures
and clean other things up.  This is because the port no longer
represents the port from a database due to a name difference.

However, ovs-vswitch will not actually remove the port from the
datapath, because it thinks that this port is no longer there.  This
is happening because lookup is performed by name and the name have
changed.  As a result we have a port in a datapath that is not related
to any port known to ovs-vswitchd and ovs-vswitchd can't remove it.
This port also occupies a datapath port number and prevents the port
to be added back with a new name.

Fix that by performing lookup by a datapath port number during the port
destruction.  The name was used only to avoid spurious warnings in a
normal case where the port was successfully deleted by other parts of
OVS.  Adding an extra flag to avoid these warnings instead.

Fixes: 02f8d64 ("ofproto-dpif: Query port existence by name to prevent warnings.")
Reported-at: openvswitch/ovs-issues#284
Tested-by: Alin-Gabriel Serdean <[email protected]>
Acked-by: Alin-Gabriel Serdean <[email protected]>
Acked-by: Aaron Conole <[email protected]>
Signed-off-by: Ilya Maximets <[email protected]>
igsilya added a commit to igsilya/ovs that referenced this issue Jul 31, 2023
OVS configuration is based on port names and OpenFlow port numbers.
Names are stored in the database and translated later to OF ports.
On the datapath level, each port has a name and a datapath port number.
Port name in the database has to match datapath port name, unless it's
a tunnel port.

If a datapath port is renamed with 'ip link set DEV name NAME',
ovs-vswitchd will wake up, destroy all the OpenFlow-related structures
and clean other things up.  This is because the port no longer
represents the port from a database due to a name difference.

However, ovs-vswitch will not actually remove the port from the
datapath, because it thinks that this port is no longer there.  This
is happening because lookup is performed by name and the name have
changed.  As a result we have a port in a datapath that is not related
to any port known to ovs-vswitchd and ovs-vswitchd can't remove it.
This port also occupies a datapath port number and prevents the port
to be added back with a new name.

Fix that by performing lookup by a datapath port number during the port
destruction.  The name was used only to avoid spurious warnings in a
normal case where the port was successfully deleted by other parts of
OVS.  Adding an extra flag to avoid these warnings instead.

Fixes: 02f8d64 ("ofproto-dpif: Query port existence by name to prevent warnings.")
Reported-at: openvswitch/ovs-issues#284
Tested-by: Alin-Gabriel Serdean <[email protected]>
Acked-by: Alin-Gabriel Serdean <[email protected]>
Acked-by: Aaron Conole <[email protected]>
Signed-off-by: Ilya Maximets <[email protected]>
igsilya added a commit to igsilya/ovs that referenced this issue Jul 31, 2023
OVS configuration is based on port names and OpenFlow port numbers.
Names are stored in the database and translated later to OF ports.
On the datapath level, each port has a name and a datapath port number.
Port name in the database has to match datapath port name, unless it's
a tunnel port.

If a datapath port is renamed with 'ip link set DEV name NAME',
ovs-vswitchd will wake up, destroy all the OpenFlow-related structures
and clean other things up.  This is because the port no longer
represents the port from a database due to a name difference.

However, ovs-vswitch will not actually remove the port from the
datapath, because it thinks that this port is no longer there.  This
is happening because lookup is performed by name and the name have
changed.  As a result we have a port in a datapath that is not related
to any port known to ovs-vswitchd and ovs-vswitchd can't remove it.
This port also occupies a datapath port number and prevents the port
to be added back with a new name.

Fix that by performing lookup by a datapath port number during the port
destruction.  The name was used only to avoid spurious warnings in a
normal case where the port was successfully deleted by other parts of
OVS.  Adding an extra flag to avoid these warnings instead.

Fixes: 02f8d64 ("ofproto-dpif: Query port existence by name to prevent warnings.")
Reported-at: openvswitch/ovs-issues#284
Tested-by: Alin-Gabriel Serdean <[email protected]>
Acked-by: Alin-Gabriel Serdean <[email protected]>
Acked-by: Aaron Conole <[email protected]>
Signed-off-by: Ilya Maximets <[email protected]>
hzhou8 pushed a commit to hzhou8/ovs that referenced this issue Apr 8, 2024
OVS configuration is based on port names and OpenFlow port numbers.
Names are stored in the database and translated later to OF ports.
On the datapath level, each port has a name and a datapath port number.
Port name in the database has to match datapath port name, unless it's
a tunnel port.

If a datapath port is renamed with 'ip link set DEV name NAME',
ovs-vswitchd will wake up, destroy all the OpenFlow-related structures
and clean other things up.  This is because the port no longer
represents the port from a database due to a name difference.

However, ovs-vswitch will not actually remove the port from the
datapath, because it thinks that this port is no longer there.  This
is happening because lookup is performed by name and the name have
changed.  As a result we have a port in a datapath that is not related
to any port known to ovs-vswitchd and ovs-vswitchd can't remove it.
This port also occupies a datapath port number and prevents the port
to be added back with a new name.

Fix that by performing lookup by a datapath port number during the port
destruction.  The name was used only to avoid spurious warnings in a
normal case where the port was successfully deleted by other parts of
OVS.  Adding an extra flag to avoid these warnings instead.

Fixes: 02f8d64 ("ofproto-dpif: Query port existence by name to prevent warnings.")
Reported-at: openvswitch/ovs-issues#284
Tested-by: Alin-Gabriel Serdean <[email protected]>
Acked-by: Alin-Gabriel Serdean <[email protected]>
Acked-by: Aaron Conole <[email protected]>
Signed-off-by: Ilya Maximets <[email protected]>
roseoriorden pushed a commit to roseoriorden/ovs that referenced this issue Jul 1, 2024
OVS configuration is based on port names and OpenFlow port numbers.
Names are stored in the database and translated later to OF ports.
On the datapath level, each port has a name and a datapath port number.
Port name in the database has to match datapath port name, unless it's
a tunnel port.

If a datapath port is renamed with 'ip link set DEV name NAME',
ovs-vswitchd will wake up, destroy all the OpenFlow-related structures
and clean other things up.  This is because the port no longer
represents the port from a database due to a name difference.

However, ovs-vswitch will not actually remove the port from the
datapath, because it thinks that this port is no longer there.  This
is happening because lookup is performed by name and the name have
changed.  As a result we have a port in a datapath that is not related
to any port known to ovs-vswitchd and ovs-vswitchd can't remove it.
This port also occupies a datapath port number and prevents the port
to be added back with a new name.

Fix that by performing lookup by a datapath port number during the port
destruction.  The name was used only to avoid spurious warnings in a
normal case where the port was successfully deleted by other parts of
OVS.  Adding an extra flag to avoid these warnings instead.

Fixes: 02f8d64 ("ofproto-dpif: Query port existence by name to prevent warnings.")
Reported-at: openvswitch/ovs-issues#284
Tested-by: Alin-Gabriel Serdean <[email protected]>
Acked-by: Alin-Gabriel Serdean <[email protected]>
Acked-by: Aaron Conole <[email protected]>
Signed-off-by: Ilya Maximets <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants