Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutex debug #866

Open
wants to merge 28 commits into
base: master
Choose a base branch
from
Open

Mutex debug #866

wants to merge 28 commits into from

Conversation

wadey
Copy link
Member

@wadey wadey commented May 9, 2023

experimental test to see if we can have a test mode that verifies
mutexes lock in the order we want, while having no hit on production
performance. Since this uses a build tag, it should all compile out
during the build process and be a no-op unless the tag is set.

This ensures that we don't do any re-entrant locking, and also that we don't grab two locks of different types unless we expect it to happen (and in a specific order)

If the locks are grabbed in a different order, the debug mutex will panic.

wadey added 3 commits May 8, 2023 11:17
experimental test to see if we can have a test mode that verifies
mutexes lock in the order we want, while having no hit on production
performance. Since this uses a build tag, it should all compile out
during the build process and be a no-op unless the tag is set.
@wadey wadey added this to the v1.8.0 milestone May 9, 2023
@wadey
Copy link
Member Author

wadey commented May 9, 2023

It looks like the test is finding a case where we have two hostinfo locks at the same time, during a relay flow:

We need to decide if this is okay, or if there is the potential for deadlock.

panic: grabbing hostinfo lock and already have a hostinfo lock: state=map[{hostinfo  176160770}:{/home/runner/work/nebula/nebula/handshake_manager.go 108}] add={hostinfo  176160896}

goroutine 112 [running]:
github.com/slackhq/nebula.checkMutex(0xc0003db020, {{0xc8edcf, 0x8}, {0x0, 0x0}, 0xa800080})
	/home/runner/work/nebula/nebula/mutex_debug.go:44 +0x425
github.com/slackhq/nebula.(*syncRWMutex).Lock(0xc0004ab4a0)
	/home/runner/work/nebula/nebula/mutex_debug.go:62 +0x97
github.com/slackhq/nebula.(*Interface).getOrHandshake(0xc0003fa5a0, 0x20?)
	/home/runner/work/nebula/nebula/inside.go:140 +0x10b
github.com/slackhq/nebula.(*Interface).Handshake(0xc00047cf50?, 0x4?)
	/home/runner/work/nebula/nebula/inside.go:113 +0x19
github.com/slackhq/nebula.(*HandshakeManager).handleOutbound(0xc0000a1200, 0xa800002, {0xde4830, 0xc0003fa5a0}, 0x0)
	/home/runner/work/nebula/nebula/handshake_manager.go:210 +0x163f
github.com/slackhq/nebula.(*HandshakeManager).NextOutboundHandshakeTimerTick(0xc0000a1200, {0xc00066ef24?, 0x39?, 0x12d82e0?}, {0xde4830, 0xc0003fa5a0})
	/home/runner/work/nebula/nebula/handshake_manager.go:99 +0x6b
github.com/slackhq/nebula.(*HandshakeManager).Run(0xc0000a1200, {0xde3cd0, 0xc0002d4d20}, {0xde4830, 0xc0003fa5a0})
	/home/runner/work/nebula/nebula/handshake_manager.go:87 +0x1a5
created by github.com/slackhq/nebula.Main
	/home/runner/work/nebula/nebula/main.go:323 +0x263d

@wadey
Copy link
Member Author

wadey commented May 11, 2023

Once we fix the first issue, this one fails next:

panic: grabbing hostinfo lock and already have a hostinfo lock: state=map[{hostinfo  176160867}:{handshake_ix.go 345}] add={hostinfo  176160867}

goroutine 66 [running, locked to thread]:
github.com/slackhq/nebula.checkMutex(0xc0005f64e0, {{0xc8ffef, 0x8}, {0x0, 0x0}, 0xa800063})
	mutex_debug.go:44 +0x425
github.com/slackhq/nebula.(*syncRWMutex).Lock(0xc000642000)
	mutex_debug.go:62 +0x97
github.com/slackhq/nebula.(*Interface).getOrHandshake(0xc0004bc360, 0x526000?)
	inside.go:140 +0x10b
github.com/slackhq/nebula.ixHandshakeStage2(0xc0004bc360, 0xc00015be40, 0x0, 0xc000526000, {0xc00036f680, 0x10f, 0x10f}, 0xc000166f60)
	handshake_ix.go:429 +0xb52
github.com/slackhq/nebula.HandleIncomingHandshake(0xc0004bc360, 0xc00015be40, 0x0?, {0xc00036f680, 0x10f, 0x10f}, 0xc000166f60, 0x0?)
	handshake.go:24 +0x185
github.com/slackhq/nebula.(*Interface).readOutsidePackets(0xc0004bc360, 0xc00015be40, 0x0?, {0xc0002e5400, 0x0, 0x2329}, {0xc00036f680, 0x10f, 0x10f}, 0xc000166f60, ...)
	outside.go:201 +0xf7b
github.com/slackhq/nebula.readOutsidePackets.func1(0xc0004c76e8?, {0xc0002e5400?, 0xc000137dd0?, 0x18?}, {0xc00036f680?, 0xbf1300?, 0xc00012b860?}, 0x7fcc4eab4408?, 0x7fcc75e5e5b8?, 0xc000137dd0, ...)
	outside.go:36 +0xb0
github.com/slackhq/nebula/udp.(*Conn).ListenOut(0xc00032a740, 0xc000265810, 0x0?, 0x0?, 0x0?)
	udp/udp_tester.go:125 +0x1c4
github.com/slackhq/nebula.(*Interface).listenOut(0xc0004bc360, 0x0)
	interface.go:256 +0x26d
created by github.com/slackhq/nebula.(*Interface).run
	interface.go:234 +0x30

@wadey wadey modified the milestones: v1.8.0, v1.9.0 Oct 30, 2023
@nbrownus nbrownus modified the milestones: v1.9.0, v1.10.0 Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants