Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add back tranmutex #153

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Add back tranmutex #153

wants to merge 2 commits into from

Conversation

trozet
Copy link
Contributor

@trozet trozet commented Sep 10, 2021

Need a RWMutex to protect nil client usage during reconnect. Also includes a workaround for https://bugzilla.redhat.com/show_bug.cgi?id=2000375

Mutex is needed to protect client usage when client may be nil during a
reconnect. Otherwise the code will panic in multiple places.

Co-authored-by: Surya Seetharaman <[email protected]>

Signed-off-by: Tim Rozet <[email protected]>
This is a workaround due to a bug in ovsdb-server where it may silently
drop monitors:
https://bugzilla.redhat.com/show_bug.cgi?id=2000375

Related to ovnkube bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1998614

tl;dr ovsdb-server may drop the monitor to go-ovn silently which will
cause the go-ovn cache to go stale. Subsequent retry txns for the
resource will result in errors from ovsdb-server (since the resource may
have been created in ovsdb, but we are not getting updates) threfore a
txn error is encountered force reconnect to rebuild cache and get a new
monitor.

Co-authored-by: Surya Seetharaman <[email protected]>

Signed-off-by: Tim Rozet <[email protected]>
@trozet
Copy link
Contributor Author

trozet commented Sep 10, 2021

@hzhou8
Copy link
Contributor

hzhou8 commented Sep 17, 2021

Hi @trozet, the problem looks clear but I wonder if this is the right way to fix. There can be different reasons for a transaction to fail - it can be temporary. Are we going to blindly reconnect regardless of the reason?

@@ -164,7 +166,8 @@ func (odbi *ovndb) transact(db string, ops ...libovsdb.Operation) ([]libovsdb.Op
if i < len(ops) {
opsInfo = fmt.Sprintf("%v", ops[i])
}
return nil, fmt.Errorf("Transaction Failed due to an error: %v details: %v in %s",
odbi.close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't trigger reconnect if odbi.reconn is not true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants