forked from ovn-org/ovn
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
extend-table: Fix table ID double allocation after OVS restart.
There were problems observed occasionally after OVS restart, the OVS flow bundle installation from ovn-controller was failed because of "GROUP_EXISTS" error, which end up with missing flows/groups/meters in OVS until ovn-controller is restarted. Example error logs in OVS: 2022-07-08T01:38:22.837Z|00676|connmgr|INFO|br-int<->unix#0: sending OFPGMFC_GROUP_EXISTS error reply to OFPT_BUNDLE_ADD_MESSAGE message 2022-07-08T01:38:22.913Z|00677|connmgr|INFO|br-int<->unix#0: sending OFPBFC_MSG_FAILED error reply to OFPT_BUNDLE_CONTROL message The root cause is that with ovn-ofctrl-wait-before-clear set, ofctrl module would call ovn_extend_table_clear() to clear the "existing" group table AFTER ovn-controller finished computing the desired flows/groups/meters in the state S_CLEAR. However, the function ovn_extend_table_clear() clears the bitmap of the group IDs, while the IDs are still being used by the "desired" group table. This is not a problem if a recompute happens soon, the desired group table will be cleared first and IDs will be reallocated and the bitmap will reflect the actual allocations. However, if there are any group creation changes (related to LB, ECMP, etc.) happen before the recompute, new IDs may be allocated to be conflict with existing IDs because the cleared bitmap status doesn't reflect the real IDs being used. The conflict IDs finally causes the "GROUP_EXIST" error replied by OVS when ovn-controller tries to install the desired groups to OVS. Even worse, because the group modifications are now wrapped in a bundle with flow modifications, it would end up with not only missing groups but also missing flows. Both desired table and existing table share the same bitmap, which is to avoid reallocating an ID that still exists in OVS, but the current logic seems to have an assumption that the "existing" table entries are deleted always AFTER the "desired" entries. This assumption is not true after the introduction of ovn-ofctrl-wait-before-clear feature. The fix here is to introduce a reference between the desired and existing entries, so that when deleting an entry in either of the tables it knows if the ID is still in use by its peer and decide if it is the right time to clear the bit from the bitmap, without depending on the order of deletion. Fixes: 896adfd ("ofctrl: Support ovn-ofctrl-wait-before-clear to reduce down time during upgrade.") Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2112111 Signed-off-by: Han Zhou <[email protected]> Acked-by: Numan Siddique <[email protected]> (cherry picked from commit db15cf2)
- Loading branch information
Showing
3 changed files
with
73 additions
and
37 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters