Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix libp2p helper concurrent writes #14467

Merged
merged 10 commits into from
Nov 17, 2023

Conversation

nholland94
Copy link
Member

@nholland94 nholland94 commented Oct 31, 2023

This PR:

  • Wraps access to a few data structures in libp2p helper into mutex-guarded sections
  • Fixes one FIXME
  • Introduces fine-grained locking for stream writes

Fixes race conditions mentioned on #14345

@nholland94 nholland94 requested a review from a team as a code owner October 31, 2023 19:19
@tizoc
Copy link
Member

tizoc commented Nov 14, 2023

Some remaining data races when running the node with -race enabled:

==================
WARNING: DATA RACE
Read at 0x00c00010a3a8 by goroutine 371:
  main.AddPeerReq.handle()
      ./src/app/libp2p_helper/src/libp2p_helper/peer_msg.go:43 +0x23d
  main.(*AddPeerReq).handle()
      <autogenerated>:1 +0xac
  main.(*app).handleIncomingMsg.func1()
      ./src/app/libp2p_helper/src/libp2p_helper/incoming_msg.go:67 +0x473
  main.(*app).handleIncomingMsg()
      ./src/app/libp2p_helper/src/libp2p_helper/incoming_msg.go:68 +0xd0
  main.main.func5()
      ./src/app/libp2p_helper/src/libp2p_helper/main.go:191 +0x47

Previous write at 0x00c00010a3a8 by goroutine 369:
  main.AddPeerReq.handle()
      ./src/app/libp2p_helper/src/libp2p_helper/peer_msg.go:43 +0x31c
  main.(*AddPeerReq).handle()
      <autogenerated>:1 +0xac
  main.(*app).handleIncomingMsg.func1()
      ./src/app/libp2p_helper/src/libp2p_helper/incoming_msg.go:67 +0x473
  main.(*app).handleIncomingMsg()
      ./src/app/libp2p_helper/src/libp2p_helper/incoming_msg.go:68 +0xd0
  main.main.func5()
      ./src/app/libp2p_helper/src/libp2p_helper/main.go:191 +0x47

Goroutine 371 (running) created at:
  main.main()
      ./src/app/libp2p_helper/src/libp2p_helper/main.go:191 +0x95e

Goroutine 369 (running) created at:
  main.main()
      ./src/app/libp2p_helper/src/libp2p_helper/main.go:191 +0x95e
==================
==================
WARNING: DATA RACE
Read at 0x00c0002e6470 by goroutine 370:
  main.AddPeerReq.handle()
      ./src/app/libp2p_helper/src/libp2p_helper/peer_msg.go:53 +0x60f
  main.(*AddPeerReq).handle()
      <autogenerated>:1 +0xac
  main.(*app).handleIncomingMsg.func1()
      ./src/app/libp2p_helper/src/libp2p_helper/incoming_msg.go:67 +0x473
  main.(*app).handleIncomingMsg()
      ./src/app/libp2p_helper/src/libp2p_helper/incoming_msg.go:68 +0xd0
  main.main.func5()
      ./src/app/libp2p_helper/src/libp2p_helper/main.go:191 +0x47

Previous write at 0x00c0002e6470 by goroutine 369:
  main.AddPeerReq.handle()
      ./src/app/libp2p_helper/src/libp2p_helper/peer_msg.go:53 +0x724
  main.(*AddPeerReq).handle()
      <autogenerated>:1 +0xac
  main.(*app).handleIncomingMsg.func1()
      ./src/app/libp2p_helper/src/libp2p_helper/incoming_msg.go:67 +0x473
  main.(*app).handleIncomingMsg()
      ./src/app/libp2p_helper/src/libp2p_helper/incoming_msg.go:68 +0xd0
  main.main.func5()
      ./src/app/libp2p_helper/src/libp2p_helper/main.go:191 +0x47

Goroutine 370 (running) created at:
  main.main()
      ./src/app/libp2p_helper/src/libp2p_helper/main.go:191 +0x95e

Goroutine 369 (running) created at:
  main.main()
      ./src/app/libp2p_helper/src/libp2p_helper/main.go:191 +0x95e
==================

@tizoc
Copy link
Member

tizoc commented Nov 14, 2023

btw, here is a patch that adds some wrapper methods for accessing/modifying shared data structures, so far with this extra changes I am not getting any race conditions. Solution is not ideal because it would be better to not have to use so many mutexes and instead have a single thread handling the state, but that would require a bigger change

openmina@7e9f7ab

(EDIT: PR targetting this branch here)

Copy link
Member

@georgeee georgeee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nits

prevOnConnect(net, c)
f(net, c)
}
cm.onConnectMutex.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use defer idiom

prevOnDisconnect(net, c)
f(net, c)
}
cm.onDisconnectMutex.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use defer idiom

h.gatingState.TrustedAddrFilters = ma.NewFilters()
h.gatingState.trustedAddrFiltersMutex.Lock()
h.gatingState.trustedAddrFilters = ma.NewFilters()
h.gatingState.trustedAddrFiltersMutex.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use defer idiom

cm.OnConnect(net, c)
cm.onConnectMutex.Lock()
cm.onConnect(net, c)
cm.onConnectMutex.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use defer idiom

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use RLock

cm.OnDisconnect(net, c)
cm.onDisconnectMutex.Lock()
cm.onDisconnect(net, c)
cm.onDisconnectMutex.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use defer idiom

func (gs *CodaGatingState) TrustPeer(p peer.ID) {
gs.trustedPeersMutex.Lock()
gs.trustedPeers[p] = struct{}{}
gs.trustedPeersMutex.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use defer idiom

_, isTrusted := gs.TrustedPeers[p]
gs.trustedPeersMutex.Lock()
_, isTrusted := gs.trustedPeers[p]
gs.trustedPeersMutex.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use defer idiom

_, isBanned := gs.BannedPeers[p]
gs.bannedPeersMutex.Lock()
_, isBanned := gs.bannedPeers[p]
gs.bannedPeersMutex.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use defer idiom

gs.TrustedPeers[testInfo.ID] = struct{}{}
gs.trustedPeersMutex.Lock()
gs.trustedPeers[testInfo.ID] = struct{}{}
gs.trustedPeersMutex.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use defer idiom

handleStreamReads(appB, stream, streamIdx)
streamIdx++
streamMutex.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move handleStreamReads out

@georgeee georgeee force-pushed the fix/libp2p-helper-concurrent-writes branch from e64d3d5 to 5ec0334 Compare November 16, 2023 10:46
Problem: RPC protocol requires response to stream open RPC to arrive
before any message from the stream. This was implemented with use of an
ugly hack.

Solution: remove hack, introduce notion of after-write handler in
to be executed after the rpc response is written to output.
Problem: writing to a stream pauses all other stream writes. This might
be troublesome e.g. in case of a peer disconnecting: until connection
timeout is not propagated, we may get stuck trying to send it bytes,
while writing to streams of other peers is blocked.

Solution: use fine-grained locking on stream level.

P.S. previous commit introduces a potential concurrency issue in stream
reset/write not being synchronized. This commit fixes this issue.
@georgeee
Copy link
Member

!ci-build-me

@georgeee
Copy link
Member

!ci-build-me

1 similar comment
@georgeee
Copy link
Member

!ci-build-me

@georgeee georgeee force-pushed the fix/libp2p-helper-concurrent-writes branch from 827ecf2 to 7543edf Compare November 17, 2023 09:52
@georgeee georgeee force-pushed the fix/libp2p-helper-concurrent-writes branch from 7543edf to 8548686 Compare November 17, 2023 09:53
@georgeee
Copy link
Member

!ci-build-me

Copy link
Member Author

@nholland94 nholland94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

ValidatorMutex: &sync.Mutex{},
Validators: make(map[uint64]*validationStatus),
Streams: make(map[uint64]net.Stream),
_subs: make(map[uint64]subscription),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the underscore prefixes? We don't do this in other parts of the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is @tizoc's decision, I guess to visually discourage usage of these fields.

We may roll back the prefixes, I agree they look a bit alien to the rest of the codebase.
Though I liked the idea of making use of these fields visually uncomfortable, so thought why not

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess to visually discourage usage of these fields.

That was exactly it. I am not terribly familiar with Go, so I guess that naming convention is not idiomatic at all. From what I found, the proper way would be to separate this into a separate package to make those fields truly private.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 on moving to a different package later

@nholland94
Copy link
Member Author

!approved-for-mainnet

@deepthiskumar deepthiskumar merged commit 44af5c2 into rampup Nov 17, 2023
2 checks passed
@deepthiskumar deepthiskumar deleted the fix/libp2p-helper-concurrent-writes branch November 17, 2023 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants