-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: enhanced error propagation from publish + enable fast path send even observeres are added #1243
base: master
Are you sure you want to change the base?
Conversation
pubsub/gossipsub publish will raise specific exception according to problem occured during processing the message. pubsub observers are modified with distinch onBeforeSend and onAfterSent callbacks - this helps keep sending messages on fast path. (This is needed for nwaku - that has no intent on modifying message before forwarding but still observs them)
…riginal behavior and separate new exception based error propagation under publishEx interface
Commits must follow the Conventional Commits specification
|
the original sin here is that |
type | ||
PublishingError* = object of LPError | ||
|
||
NoTopicSpecifiedError* = object of PublishingError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an API violation, should not be a catchable error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get your point here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the caller is responsible for verifying that the topic is not empty - this is not something to raise an exception about (it could potentially be an assert)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In pubsub, a ""
string is allowed as there's no explicit validation rejecting it even if it semantically makes no sense. In https://github.com/libp2p/specs/blob/master/pubsub/README.md#the-topic-descriptor it is specified names can be descriptive or random or anything that the creator chooses
.
Not allowing empty topics should be a concern for the application using libp2p, and we should not enforce this in nim-libp2p.
PublishingError* = object of LPError | ||
|
||
NoTopicSpecifiedError* = object of PublishingError | ||
PayloadIsEmptyError* = object of PublishingError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is either an api violation or allowed (I don't remember if empty payloads are allowed by libp2p)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/libp2p/specs/tree/master/pubsub#the-message suggests that at least from libp2p spec perspective t's allowed:
The
data
(optional) field is an opaque blob of data representing the payload. It can contain any data that the publisher wants it to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will skip publishing anyway, hance this feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of questions:
-
Is this not basically another logic error in the calling side? If the caller doesn't want to send empty messages, and the caller presumably typically has some framing protocol itself whereby even say "empty" chat messages (or other Waku messages without Waku-application-level-payloads) would from a libp2p perspective have some content; and probably more importantly;
-
it's inexpensively detectable at the caller end, and creates a new exception where one doesn't current exist by involving libp2p in this detection process.
So it's basically another logic error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @tersec. The caller is the one having the context to determine whether a message with empty data is valid or not. Adding this validation in publish would mean we're no longer being compliant with the spec.
|
||
NoTopicSpecifiedError* = object of PublishingError | ||
PayloadIsEmptyError* = object of PublishingError | ||
DuplicateMessageError* = object of PublishingError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Publishing an "already known" message is not an error in distributed systems in general - it is expected that it can happen as a consequence of the chosen message identity function because messages can be generated in arbitrary order and sometimes end up being the same (imagine two nodes existing for redundancy and publishing the same data to the network from different entry points into the mesh).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, but it is still information for the caller why the message is not relayed, so it might not need to re-try with this peer, maybe not with other peers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, but it is still information for the caller why the message is not relayed, so it might not need to re-try with this peer, maybe not with other peers.
When the message is in seen
, the message has been relayed - either directly, indirectly or topped up with IHAVE/IWANT.
You can think of publish as "put this message on the pubsub network where it will stay for a period of time as agreed by the seen/cache timeouts" - after that's been done once by any peer for a particular message (as identified by the id function), the network, and not the caller, is responsible for ensuring that all peers get to observe it - it would be incorrect for lightpush to try again and the API of publish should reflect this point so that users of libp2p don't make this mistake in their code.
NoTopicSpecifiedError* = object of PublishingError | ||
PayloadIsEmptyError* = object of PublishingError | ||
DuplicateMessageError* = object of PublishingError | ||
NotSubscribedToTopicError* = object of PublishingError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't have to be subscribed to publish messages
PayloadIsEmptyError* = object of PublishingError | ||
DuplicateMessageError* = object of PublishingError | ||
NotSubscribedToTopicError* = object of PublishingError | ||
NoPeersToPublishError* = object of PublishingError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is covered by the return value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I choose the way to treat as a failure as your message has not been tried to be relayed at all.
Still there is the possiblity it won't be relayed due to various reasons. That uncertainty is another story to tackle within waku.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not failure from the point of view of pubsub - the message has been put in the pubsub message cache and when a peer connects it will eventually get broadcast. Peers come and go and the system as a whole is designed to handle this particular case - any protocol using pubsub (such as lightpush) must be robust according to these semantics, ie handle them gracefully.
DuplicateMessageError* = object of PublishingError | ||
NotSubscribedToTopicError* = object of PublishingError | ||
NoPeersToPublishError* = object of PublishingError | ||
GeneratingMessageIdError* = object of PublishingError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a programming error - you should not be attempting to publish messages that are invalid by your own message id generation function - the default id generation function cannot fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I used a dumb approach not questioning the valid occurrence of an error just followed the code where it can return log and error and return 0.
I think it is there as message id generator can be replaced and there is no guarantee not getting error from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a buggy or badly-specified message ID function, then.
p.observers = observers | ||
|
||
proc hasObservers*(p: PubSubPeer): bool = | ||
p.observers != nil and anyIt(p.observers[], it != nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it allowed to set nil
observers within the list? It creates checking for this basically logic-error condition throughout the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Obviously not as it's being initialize with the pubsub instance.
It just had been there from loooong time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why observers
needs to be ref
?
libp2p/protocols/pubsub/pubsub.nim
Outdated
): Future[int] {.async: (raises: [LPError]), public.} = | ||
try: | ||
return await p.doPublish(topic, data) | ||
except LPError as ex: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general smaller (not-major-design-point), Nim's code generation can/does sometimes create more efficient code when the as ex
part of this is excluded, as it simply never materializes the exception as a variable.
So if it's truly not used, better not to create it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, absolutely.
@arnetheduck, @tersec : First of all thank you guys looking after it, I really appreciate all and any comments. The initial story behind is waku lightpush as a non-relay protocol access to the relay network needed some more detailed responses towards client side (aka edge node). There were two restrictions I put first - not to change the interface (...as I checked other libp2p implementations having exact same publish interface) and try not break other clients. |
cc: @arnetheduck , @Ivansete-status, @richard-ramos: This tries to utilize a polymorphic approach to separate the interface from internal implementation down the inheritance chain. But. It turns to a seg-fault while running specific pubsub tests. running it result an immediate segfault right at start without any stack trace. I may overlooked something however I tried out this solution in a dummy code, where it worked. So I may overlook something more complicated initialization within pubsub/floodsub/gossipsub. Thank you. update: I found the issue that caused an infinite recursion... my bad. |
In general, #950 discusses this as well. |
I think the content of this PR may reflect some conceptual misunderstandings with respect to what publish does as specified by libp2p - in particular the points about publishing various forms of invalid messages which suggest bugs in waku (if they are happening) as well as publishing things already in the I see two outcomes from a PR like this:
|
There's currently an scenario in which nim-libp2p/libp2p/protocols/pubsub/rpc/message.nim Lines 85 to 86 in c5aa373
peerInfo when calling newSwitch . I identified this while adding the raises annotations in the pubsub module.
If |
yeah - that's probably reasonable - or we can log it, or we can maybe check for it during startup .. The risk we want to avoid is that libp2p crashes in potentially "uncommon" scenarios when it already looks like it's "up and running" - if we add an assert here, we should also add a "regular" error somewhere else so that it (ideally) becomes hard or impossible to reach the point of trying to publish messages like this. |
ie the intent in past discussions about |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for it! 🙌
Some comments so far
g: GossipSub, topic: string, data: seq[byte] | ||
): Future[int] {.async: (raises: [LPError]).} = | ||
logScope: | ||
topic | ||
|
||
if topic.len <= 0: # data could be 0/empty | ||
debug "Empty topic, skipping publish" | ||
return 0 | ||
raise newException(NoTopicSpecifiedError, "Empty topic, skipping publish") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest using Result
instead of exceptions. Applies elsewhere.
I also think we don't need new error types.
I'd rather go for something much simpler like:
Future[Result[int, string]]
I don't care the type of error. To me, it is only interesting that an error happened and I can get the detail directly from the underlying string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we had this debate and we can argue about it different ways.
It is clear the Future is designed to be able to handle different failure cases.
One is that is capable to catch exceptions and report it to the listener on the other end as error.
This is because async task might not get waited where it has been launched. So exception handling might detach.
If you do await it, it will be raised so you need to catch it, while if you just let it go you will be able to examine fail state of the future when it completes.
Such a solution implies that a future might fail in two different ways. If there is an exception. than f.failed() will be true. While if future is ok you still need to check result for isErr?... I'm not sure how it's gonna make code more clean in general.
For this particular case, of course, if we decide that publish should never raise an exception, that is ok to sign somehow that something went wrong you need to deal with it.
p.observers = observers | ||
|
||
proc hasObservers*(p: PubSubPeer): bool = | ||
p.observers != nil and anyIt(p.observers[], it != nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why observers
needs to be ref
?
var mm = msg | ||
# trigger send hooks | ||
p.sendObservers(mm) | ||
p.beforeSendObservers(mm) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the benefit/need of having beforeSendObservers
? Wouldn't it suffice by just having the sendObservers
moved after the actual sendEncoded
call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well it all comes from the original implementation where sendObservers were able to modify the messages before sending - peers by peer. Thus we went to slow path even if we did not intended to change anything internal to messages - I can imagine some use cases for some specific use of libp2p but not waku obviously.
So to keep the original option to influence messages so the sendObserver has became two. So user has the freedom to choose, and will definitely allow fast path sending if afterSentObserver is used.
@@ -495,6 +510,8 @@ proc send*( | |||
trace "sending msg to peer", peer = p, rpcMsg = shortLog(msg) | |||
asyncSpawn p.sendEncoded(encoded, isHighPriority) | |||
|
|||
p.afterSentObservers(msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if we can ensure 100% that the msg
has been sent at this point. Notice that the sendEncoded
spawns an async task.
Maybe better to invoke the afterSentObservers(msg)
inside sendEncoded
?
@NagyZoltanPeter in #1244 I removed the exception being raised in |
Yeah, thanks! I will revert my solution to the original design that details errors in Result. |
Description
Nwaku new lightpush protocol enriches the information sent back to its clients in case a problem occurs during relaying messages. This implies enhancing pubsub/gossipsub publish method.
Along with the latest exception indication added in 1.8.0, I decided while keeping the original interface any errors will be propagated via exception to the caller.
Hence new exceptions are defined.
Disclaimer about the interface change.
To avoid breaking change in users of pubsub.publish method - that usually only expects a non-zero return value in case of success and zero in case of any problem occurs.
Therefore old publish interface will digest all exception and translate it zero return.
New publishEx interface is added to be used by waku and for any other heavy user that needs more comprehensive feedback from the operation.
Optimizing message observers
Another shortcoming found and enhanced within this PR. While using any kind of observer on pubsub message relaying forces messages to be encoded as many peers it sends to. It is totally useless if the observer does not want to modify the message. While keeping the option to add an invasive observer before message-send, it is extended to having a view-only after-send observer. This allows fast path send whenever possible.