-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cluster info on geyser #46
Add cluster info on geyser #46
Conversation
add cluster notifier registration and test Add cluster node remove geyzer notification rebase on master and correct PR comments fix rebase, fmt, clippy
c15fe52
to
897e0bc
Compare
@godmodegalactus -- sorry for getting to this late. Can you address the conflicts? |
why do we need to stream contact info for every gossip update? could we get away with only streaming contact info when an actual also, do we need to stream contact info when a |
Yeah i think you are right. I will pick up this PR again and try to resolve all the issues. |
if let CrdsData::LegacyContactInfo(ref cluster_info) = value.value.data { | ||
let notifier = &clusterinfo_update_notifier.read().unwrap(); | ||
notifier.notify_clusterinfo_update(cluster_info); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh jk I see you already set it up to update only on reception of LegacyContactInfo. but ya probably don't need to send an update if the contact info fails to insert
//notify geyser interface | ||
self.notify_clusterinfo_update(self.table.get(&gossip_label)); | ||
ret |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Crds::insert requires an exclusive lock on the crds table.
So we shouldn't put the notification code here.
Instead somewhere else, e.g. here:
https://github.com/anza-xyz/agave/blob/7629a152d/gossip/src/cluster_info.rs#L1618
we need to read locks crds table and obtain new entries since last time using a crds::Cursor
.
similar to the code here:
https://github.com/anza-xyz/agave/blob/7629a152d/gossip/src/crds_gossip_push.rs#L184-L190
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alternatively somewhere in geyser can pull
new contact-infos every once in a while or when needed.
for example in turbine we cache contact-infos for 5 seconds and if 5 seconds have passed we pull
again all contact-infos from gossip:
https://github.com/anza-xyz/agave/blob/7629a152d/turbine/src/cluster_nodes.rs#L287
#[derive(Debug, Clone, PartialEq, Eq)] | ||
#[repr(C)] | ||
/// Information about a node in the cluster. | ||
pub struct ReplicaClusterInfoNode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens to this struct if we change gossip ContactInfo
, e.g. add a new socket for a new protocol?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ya so that is the goal of doing something with an FFI interface. See PR: #3434. We could modify that PR and choose to only expose a pointer to a ContactInfo struct, making it much easier to update ContactInfo
in the future. More details here: #3434 (comment)
@gregcusack @lijunwangs @behzadnouri #3434 PR superseeds this PR, or should I start working on this PR ? |
We can follow up on #3434 |
Problem
See issue solana-labs#32934
Summary of Changes
This PR is a POC to add cluster node's info to the Geyser plugin notification.
The LegacyContactInfo received on GOSSIP network are forwarded with the Geyser interface to a client subscriber.
Summary of Changes
A GRPC Yellowstone plugin update has been done to test the process and see the notified data volume. You can see the PR here.
An example of client can be found here. It subscribes to the Cluster info node notification and call from time to time the RPC get_cluster_nodes to compare with the notified data.
Early results
The test has been done on Testnet cluster with the Solana version 1.17.5 and Yellowstone plugin 1.11.0+solana.1.17.5.
Comparison with RPC call
It takes a few second for the plugin to get the majority of the Testnet cluster nodes and after 30s there's less than 10 nodes in error when comparing with the RPC call (around 3000 nodes in the cluster).
After a few minutes, Geyser and RPC has the same list of cluster's nodes. From time to time they can have some difference (a few nodes) from one part or the other depending on the notification time propagation of update or removal.
Data volume
I use Tcpdump to monitor the packet send between the validator grpc server and the client to build the get_cluster_nodes RPC call nodes list.
This graphic is an example of network activity and bandwidth use. The volume of data transmitted is between 0,5 Mbits/s and 3 Mbits/s.
I've done some monitoring of the network connection between the grpc server and the client for the cluster node info notification.
This is the graphic of the data transferred between the validator grpc server and the client to build the get_cluster_nodes RPC call nodes list.
Fixes #