-
Notifications
You must be signed in to change notification settings - Fork 290
Generalised Gossipsub #664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Just a small structural comment, can we put those under pubsub/ at the very least? Ideally it should go into the pubsub/gossipsub/ ditectory |
Co-authored-by: Pop Chunhapanya <[email protected]>
Yeah sure. This is still very much a rough draft. Can move location and fix a lot of this up, if there is any interest in this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done an initial pass on these documents. Thank you for writing this! I like the general idea and want to clarify a couple high level points before diving into some of the details.
- How can new strategies be sure to be compatible with other strategies and their scoring methods?
- Specifically, how can new strategies stay backwards compatible with the expectations of the existing gossipsub network?
- Generally, I wonder if we need to define rules of what peers MUST NOT do in order to remain in good standing.
- This might be the trickiest part of this proposal.
- Do we need a way to add new messages to the protocol?
- It seems to me that Choke is needed here for the "random choke" strategy. I can imagine new strategies which needing new messages as well.
- example: A variant on choke that allows for new original messages to be sent, while still avoiding forwarded messages.
- My concern is overfitting for these two strategies while claiming a generic interface.
- Maybe drafting a third strategy would be enough to address this concern.
- It seems to me that Choke is needed here for the "random choke" strategy. I can imagine new strategies which needing new messages as well.
- Is the expectation that within a given topic different peers may implement different strategies depending on their own concerns?
CHOKE is also needed for implementing the announcesub proposal, where we then use eager IHAVE instead of IANNOUNCE. |
@MarcoPolo Thanks for the comments.
However, I think you're asking more or less how the general network would look, and whether different scoring strategies can work together. In general they can't. I imagine, this to be spec'd outside of libp2p. For example, if Ethereum wanted to use a specific strategy, they would would specify that so that all of their nodes used one strategy. There is no versioning, you are right. Perhaps that might be something to add, but in the Ethereum case, we can upgrade strategies on hard forks to avoid partial upgrades.
|
When a message is received that is valid and was not published by the router itself, the router informs the broadcast module via the | ||
Forward(Topic) interface. | ||
|
||
The broadcast module will return which peers to forward the message to and which | ||
to gossip to (if any). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems contradicting to what is said earlier in this file.
Upon receiving a `CHOKE` message, the router MUST no longer forward messages to
the peer that sent the `CHOKE` message, while it is still in the mesh. Instead
it MUST always send an IHAVE message (provided there are messages to send and
it does not hit the IHAVE message limit) immediately to the peer.
It's already stated in this file that you must send IHAVE instead of forwarding the message to peers that sent the CHOKE
message.
How can the broadcast module do otherwise? If it can't, there is no point to have the broadcast module at all, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if we want the choke action, then a broadcast module cannot send a message to a peer that has requested us to choke them (this would defeat the purpose of the choking mechanism).
A broadcast module can still decide who to choke, which non choked peers to send a message to, which peers to send IHAVE's to.
Here are some examples of different broadcast strategies that demonstrate the point of the broadcast module:
- Floodsub-like: Send the direct message to all known peers that are non-choked
- Episub-Like: Send direct message to all mesh peers that are non-choked
- Gossipsubv2-like: Pick a random set of mesh peers and send IHAVEs to them, send direct messages to non-choked peers.
- No Duplicates: Only send IHAVE messages to mesh peers.
|
||
### Interface | ||
|
||
This section defines the basic logical interface that defines a mesh strategy. The exact API of how a mesh strategy interfaces with the core |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section defines the basic logical interface that defines a mesh strategy. The exact API of how a mesh strategy interfaces with the core | |
This section defines the basic logical interface that defines a broadcast strategy. The exact API of how a broadcast strategy interfaces with the core |
|
||
## Interface Implementation | ||
|
||
### Publish(Topic) and Forward(Topic) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### Publish(Topic) and Forward(Topic) | |
### Publish(Topic) |
No, your random choke is not gossipsubv2. They are completely different things. Saying that random choke is gossipsubv2 is very misleading. In gsv2, when I receive the first IANNOUNCE, I send INEED. If the timeout is triggered, I send INEED to the peer who sent the second IANNOUNCE. An example timeline is as follows. Let's say the timeout is 500ms While in your random choke, the timeline will be different Please also note that with your current generalization, gossipsubv2 strategies cannot be implemented at all. @AgeManning I encourage you to read the Gossipsub v2 spec to get familiar with the protocol. |
After rereading gsv2 and random choke, it seems like the difference is whether the source or destination controls the push vs pull semantics. In gsv2, the source decides to push the message or rely on the peer to pull the message following an On the wire, |
That's not the only difference. Another one is explained here #664 (comment). The idea is that gsv2 has the timeout which is the core of the improvement.
The problem is that this PR doesn't allow me to do the timeout. |
My understanding is that the timeout could be implemented by the broadcast strategy. From this PR:
That makes it seem that the broadcast strategy could track these |
That would make sense if the generalization allows me to delay answering that question. You can see that I received IANNOUNCE at 0.0100s but decided to send INEED at 0.5001s. |
@AgeManning apologies if this is obvious, but what are the benefits of this spec versus say starting from scratch with knowledge of GossipSub? I don't yet have a strong opinion here. This is not a leading question. To play devils advocate, in many ways the strategy for mesh and broadcasting is a big part of the solution. If we take that out what is the valuable thing being spec'd here? Is it the control messages? What does an application gain by constraining itself to only these control messages? Is it the interface for the strategies? Is the hope that by defining the interface, applications can build against that and share code with other applications? Are we trying to define high level primitives for any pubsub-like system? |
Ah, I think I see your point. Your point is that "random choke" as defined in this PR does not implement this delay. That's true. However, I don't see anything in the "generalized" part that would prohibit a strategy from implementing this delay. |
Let me explain.
The point is how can you try another peer? One way to do it is when you receive IANNOUNCE from that another peer, the core protocol will invoke the strategy's It will request if the peer before it doesn't respond the message and it won't if the message was received. So it's very unlikely that the node can decide at the time The issue is if the core protocol allows the strategy to block the Does it make sense? Disclaimer: I assumed that you cannot arbitrarily send IWANT any time you want. The only way to send it is to tell the core protocol to send it for you through the |
I think this is where we interpret the document differently. This highlights the need for more precise language 😄 . I assumed the opposite, that a a strategy could arbitrarily send IWANTs anytime. |
Thanks guys. Keep in mind, this was just a quick writeup to explain the concept, I wouldn't consider it ready to merge and we can change it to adapt to peoples needs.
I never said that random choke it is gossipsub v2 (thats why I named it differently), here is what I wrote:
And I do think it resembles gossipsub v2 (which primarily makes a trade-off between direct sending messages in the mesh and gossiping them. It decides this on a per-message basis, with a random factor. As does random-choke, but per mesh-peer rather than message).
I have read it, that's why I'm suggesting this (and our long telegram chat).
Not sure why the message from peer2 is coming earlier here. But as I explained in the telegram chat, IWANT's also have a timeout that is configurable. It's required from the gossipsub 1.1 spec. If you set that to 500ms, then the IWANT will fail also. If you want to write an implementation that re-requests an IWANT when one expires but from another peer, there is nothing in the specifications from preventing you doing that. If it is a major concern, we can also have an API in a broadcast strategy that decides how to handle failed IWANTs.
Honestly I'm surprised that the core improvement is re-requesting IWANTs. I didn't really think this would have an impact so didn't directly focus on it, I was focused more on the trade-off between the direct vs gossip send in the meshes. I assumed this to be the major change. It would be very easy to re-request IWANTs in our current gossipsub implementation if its core to the improvement.
I didn't think this would be such a big issue. This is just handling "broken promises" (i.e an IWANT response didn't hit its deadline). As I mentioned, anyone can implement this, but if people want it in the spec, I'd suggest just adding an API to handle broken promises to the broadcast module.
I think the main principle of gossipsubv2 can be implemented. If it is just re-requesting IWANT's that you're concerned about, we can do that. If it's specifically the IANNOUNCE/INEED messages, we can use those instead of CHOKE/UNCHOKE, but I dont think they gain us much and have the following issue:
Exactly. |
Yeah good question, I guess I didn't explain it explicitly. There has been out-of-band conversation about this, which is where this originated from. This adds little to no value in terms of specification (apart from the CHOKE/UNCHOKE, but I'm not entirely convinced we should add them). If you take those messages out, this is exactly the same spec as current gossipsub (as we include current gossipsub broadcast and mesh strategies). So I'm trying to make as little change to gossipsub as possible to avoid building anything new from scratch. I consider this more of an engineering refactor of code, where we clearly split up parts of the spec that all implementations MUST agree on for the protocol to work, and the other parts where implementations could currently have wildly wrong (they build the mesh incorrectly (not to spec) or send IWANTs for every message etc), or nodes have different configurations (huge mesh degrees) etc and other nodes on the network wouldn't know the difference. I don't intend this to constrain gossipsub to only these control messages. More control messages can and probably should be added, just as we do with current gossipsub. Even the interface to strategies is unimportant here, because we're not spec'ing cross-client strategies. I don't even expect code sharing. The value here I think, is to clearly label what MUST be implemented to be part of a gossipsub network (i.e control messages) and what implementers can tweak and modify for their own individual use cases for optimisations and different scenarios. The thing that has motivated this, was the gossipsubv2 spec, which trades latency for bandwidth. This is an optimisation which has a specific use case. The choice of making this trade-off will be application and network specific. It will be good in some circumstances and bad in others. In general there's a lot more trade-offs that can be made and I'd like us to be able to make other trade-offs and not go too deep down one path. This PR is proposing to shift the decision of which trade-off to make to be outside of this spec. So every interested party is free to make their own trade-offs without effecting each other. Also if one turns out not to work as expected, its easy to revert and try something else. For example, an optimization designed for Ethereum doesn't now necessarily affect the filecoin network. Each team are free to choose. Also, it's applied per-topic which allows trade-offs per use (which is important in Ethereum). The main value in this is deciding what is common in gossipsub across every single party, and what can we split out to so we can individually make our optimisations. |
I was curious about the re-sending of the IWANTs, as I thought I mis-judged the importance of this. Here is a node running gossipsub on Ethereum mainnet. It is sending about 1700 IWANT messages every 12 seconds: Of those 1700 IWANT messages, barely any are failed to be delivered in time: As such I've always considered failed IWANT messages to be a low issue, not making a re-send worth it. |
Message size raises IWANT request count because sending a large message takes considerable time. During this period, peers already receiving this message may also send IWANT requests (multiple IWANTS in v1.1) for the same message. The same is the limitation faced by IDONTWANT announcements. A receiver can send IDONTWANTs only after it finishes downloading the entire message. For a large download time, the receiver will likely start receiving the same message from multiple mesh members. Message preamble is crucial in eliminating these duplicates. |
repeated ControlPrune prune = 4; | ||
repeated ControlChoke choke = 5; | ||
repeated ControlUnChoke unchoke = 6; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are additional message proposals that can be valuable for a broader range of GossipSub use cases. For example,
-
PREAMBLE/IMRECEIVING
Message preamble enables receivers to learn about ongoing message receptions, and IMRECEIVING requests mesh members to defer sending the same message we are already receiving. -
OBSERVE can allow observing a topic without having to download entire messages. Analogous to CHOKE, but allows observing non-mesh members as well
-
INEED already discussed here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, like the current gossipsub, these are not excluded and can be added as spec changes.
The generalization proposed here aims to specify these strategies and allow the core protocol | ||
to select each strategy per topic. This allows for scenarios where an | ||
application may have a topic where resiliency is not very important, so a | ||
low-bandwidth strategy could be chosen, (i.e low-mesh, sparse topology) and at | ||
the same time have a topic where resiliency is important so chooses a | ||
high-bandwidth strategy (i.e high-mesh, dense topology). | ||
|
||
A node that is heavily resource constrained, might also wish to switch to a | ||
combination of strategies that is known to perform better under those | ||
conditions. | ||
|
||
The goal of this protocol is into increase the degrees of freedom in fine-tuning | ||
p2p message dissemination in a general way that doesn't require specification | ||
changes to apply. It also aims to minimize engineering overhead for | ||
implementations that already have gossipsub. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A broader use case encompassing different implementations may still require negotiating around either mesh/forwarding/scoring strategies or the appropriate GossipSub versions.
However, generalizing the GossipSub core and facilitating dynamicity and adaptiveness in mesh/forwarding/scoring strategies would probably encourage the adoption of new techniques with minimal engineering effort.
More importantly, remaining open to incorporating new messages and techniques into the GossipSub core is crucial for the continued growth and evolution of the broader ecosystem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I think we agree here.
There are have been a number of responses around the core spec proposed here, that have assumed that no additions to the core protocol can be made. On the contrary, just like the current gossipsub spec, I expect we make spec changes to this to add extra control messages or spec changes and increase the multistream select protocol-id to support this. I'm not proposing this set if fixed by any means. Simply that we don't necessarily spec the other changes which I'm trying to remove from the spec as "strategies". |
As part of working on the [generalized `gossipsub` strategies](libp2p/specs#664) I have been cleaning up code, here follows some of the improvements Pull-Request: #5991.
while refactoring scoring for libp2p/specs#664 I recalled #5711 and so this PR changed scope from a refactor to a feature. Per commit review is suggested for an easier experience. CC @drHuangMHT Pull-Request: #6020.
Interesting proposal. BTW we still have multiple topics per message 'feature'. It could complicate the things. |
We killed it a while ago in go-libp2p-pubsub afair, just one topic per message now. |
May be we could legitimize it in the spec? |
Yeah, we should. It doesnt make much sense in any real world applications, and it complicates implementation as well... R.I.P. |
Overview
This an idea for modularizing gossipsub to allow for greater optimizations while minimizing any engineering effort for implementations that current support gossipsub.
Fundamentally this is just a re-factor of code, with the addition of two new control messages:
CHOKE/UNCHOKE
The added control messages are the only thing that needs to be implemented to support this change. Any variation of these control messages would also work, I think we just need this degree of freedom to adjust mesh broadcasts without having to adjust the mesh sizes.
tldr
We shift current gossipsub logic into conceptual modules called strategies:
Each of these can be applied per-topic (and dynamically if needed), allowing a complex network with competing constraints to choose different strategies per different applications (per-topic) or under different scenarios (i.e bandwidth limitations).
Benefits (Intended at least)