Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: enhancing protocol management with mutex locks #2137

Open
wants to merge 22 commits into
base: master
Choose a base branch
from

Conversation

danisharora099
Copy link
Collaborator

@danisharora099 danisharora099 commented Sep 17, 2024

Problem

The current implementation showcased a lot of inconsistencies around peer management for protocols.
Multiple weird behaviour were observed:

  • protocol not able to find new peers from available pool, while it truly existed
  • peer renewal for a protocol was unsuccessful about the Peer Manager count didn't reflect that
  • protocol sometimes showed a count of available peers much greater than the total actually connected peers

Solution

  • Implemented mutex locks to ensure thread-safe protocol management
  • Refactored peer management methods in the BaseProtocol class to enhance performance and reliability
  • Improved error handling and logging in the FilterCore class
  • Use async-mutex for all locks-related chores

Notes

Contribution checklist:

  • covered by unit tests;
  • covered by e2e test;
  • add ! in title if breaks public API;

Copy link

github-actions bot commented Sep 17, 2024

size-limit report 📦

Path Size Loading time (3g) Running time (snapdragon) Total time
Waku node 84.45 KB (+1% 🔺) 1.7 s (+1% 🔺) 406 ms (-0.64% 🔽) 2.1 s
Waku Simple Light Node 136.45 KB (+0.88% 🔺) 2.8 s (+0.88% 🔺) 877 ms (-1.5% 🔽) 3.7 s
ECIES encryption 22.92 KB (-0.11% 🔽) 459 ms (-0.11% 🔽) 208 ms (+3.86% 🔺) 666 ms
Symmetric encryption 22.39 KB (-0.02% 🔽) 448 ms (-0.02% 🔽) 248 ms (-2.14% 🔽) 696 ms
DNS discovery 72.35 KB (+0.11% 🔺) 1.5 s (+0.11% 🔺) 326 ms (-21.37% 🔽) 1.8 s
Peer Exchange discovery 73.65 KB (-0.28% 🔽) 1.5 s (-0.28% 🔽) 385 ms (-43.39% 🔽) 1.9 s
Local Peer Cache Discovery 67.56 KB (-0.11% 🔽) 1.4 s (-0.11% 🔽) 574 ms (-13.87% 🔽) 2 s
Privacy preserving protocols 74.84 KB (+0.07% 🔺) 1.5 s (+0.07% 🔺) 610 ms (-8.07% 🔽) 2.2 s
Waku Filter 79.38 KB (+0.89% 🔺) 1.6 s (+0.89% 🔺) 531 ms (-0.91% 🔽) 2.2 s
Waku LightPush 77.33 KB (+0.58% 🔺) 1.6 s (+0.58% 🔺) 652 ms (+47.08% 🔺) 2.2 s
History retrieval protocols 76.55 KB (+0.69% 🔺) 1.6 s (+0.69% 🔺) 548 ms (-2.26% 🔽) 2.1 s
Deterministic Message Hashing 7.38 KB (0%) 148 ms (0%) 148 ms (+42.49% 🔺) 296 ms

@danisharora099 danisharora099 force-pushed the fix/peer-management branch 2 times, most recently from 4de00a5 to aeb05cd Compare September 24, 2024 05:58
@danisharora099 danisharora099 changed the title chore: fix feat: enhancing protocol management with mutex locks Sep 30, 2024
@danisharora099 danisharora099 marked this pull request as ready for review October 1, 2024 13:18
@danisharora099 danisharora099 requested a review from a team as a code owner October 1, 2024 13:18
@@ -45,7 +45,7 @@ class Metadata extends BaseProtocol implements IMetadata {
pubsubTopicsToShardInfo(this.pubsubTopics)
);

const peer = await this.peerStore.get(peerId);
const peer = await this.libp2pComponents.peerStore.get(peerId);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already had access to libp2pCompnents on the Base class, just turned it from private to protected and removed an additional arg of PeerStore

@@ -18,7 +18,7 @@ export class ReliabilityMonitorManager {
public static createReceiverMonitor(
pubsubTopic: PubsubTopic,
getPeers: () => Peer[],
renewPeer: (peerId: PeerId) => Promise<Peer>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in which cases it can be undefined? I expect it should be always provided

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no new is found, this function will indeed return undefined:

 const newPeer = await this.peerManager.findPeers(1);
    if (newPeer.length === 0) {
      this.log.error(
        "Failed to find a new peer to replace the disconnected one"
      );
      return undefined;
    }

@@ -54,6 +54,9 @@ class FilterSDK extends BaseProtocolSDK implements IFilterSDK {

await subscription.processIncomingMessage(wakuMessage, peerIdStr);
},
async (error: Error) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not needed

@@ -1,4 +1,4 @@
export const DEFAULT_KEEP_ALIVE = 30 * 1000;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should be fine with increasing this number, not decreasing.
How did you reach this number?

@@ -36,10 +35,6 @@ export type NetworkConfig = StaticSharding | AutoSharding;
* Options for using LightPush and Filter
*/
export type ProtocolUseOptions = {
/**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why these properties are removed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not required for the backoff

sortPeersByLatency
} from "@waku/utils/libp2p";
import { Logger } from "@waku/utils";
import { getPeersForProtocol, sortPeersByLatency } from "@waku/utils/libp2p";
Copy link
Collaborator

@weboko weboko Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: these utils are used only in one place, let's move them in then

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's do it in a different PR

);
this.updatePeers(updatedPeers);
await this.connectionManager.dropConnection(peerToDisconnect);
await this.peerManager.removePeer(peerToDisconnect);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe .replace operation should be implemented with a lock inside? this should be enough so we can to avoid usage of extra lib

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removePeer indeed has a lock inside

{
payload: utf8ToBytes("Hello_World")
},
{ forceUseAllPeers: true }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this added?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so that PeerManager is forced to find all peers until numPeersToUse which awaits the process instead

Copy link
Collaborator

@weboko weboko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comments

@danisharora099 danisharora099 requested a review from a team October 3, 2024 11:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: inconsistent protocol peer management bug: Filter subscriptions are not stable and missing messages
2 participants