-
Notifications
You must be signed in to change notification settings - Fork 958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
share/p2p/shrexeds: Error handling #2176
Comments
Added 3 more possible errors and edge case that needs handling for edsclient |
From libp2p docs:
Action plan:
|
On the client side, it sounds like Application Error should be handled as a RateLimit/NotFound request, and be scheduled for a retry (so just cooldown the peer) |
No recent network activity.
|
Happens for multiple peers at the same time. Probably related to closing connections over the limit on libp2p level.
|
It's not and there is not so much value in doing so.
FN/BNs have unlimited resources. The limits are only on LN.
FN does not have limits. This error is coming from remote LN. We should not be doing anything with this error and simply ignore it and handle with general error handling flow. |
No traffic was over the connection for over 30secs. Might be a connection that was ungracefully dropped, e.g. application was killed without telling the remote. Should go over general error-handling flow as well.
Or the server side
It does not need a special error case and can go over the general error path.
There are other cases when this can happen. The client can gracefully stop their node while doing a request which results in a stream reset. It's unlikely to be a bug |
To summarize. Most of the errors above are unrelated to shrex. Similarly, shrex should not be responsible for handling them. They are various manifestations of issues on the remote side, which shrex can do nothing about. It should just ignore this peer, let it go and move on. However, I really appreciate the effort here to document each. |
Do we know when this will happen? AFAUR, this can happen when you kill all the connections(reboot) and reconnect to the same peers, which has not yet figured out that the old connection was dead. EDIT: quic-go/quic-go#2836 (comment), seems true |
Light node don't serve eds via shrex server. So the error happens when FN / BN is reseting the stream on server side |
If we can prevent them to happen by properly configuring networking stack it would benefit shrex stability. Why do you think there is no much value to fix this error case? |
I mean, there is not so much value in trying to save active transfers. |
Hi,
and experience following issue:
This is causing the container to stop and needs to be restarted (though same problem happens again). Any advice or more information are needed? Thanks, |
Hey @bert2002, please open a separate issue for the issue. The good news is that we know what went wrong |
I am closing this one is resolved and in favor of #2234 If you (@distractedm1nd and @walldiss) think there are essential errors here that were not resolved, let's move them into a separate issue. This one has lots of unrelated context |
A lot of new errors are appearing client side against BSR.
io.EOF
. The deadline should be smaller here7. RequestEDS checks child context for ctx error, then returns parent context error, leading to returning of nil, nil and causing panicResolvedNot an actual error, but server node could always return ErrNotFound and not get any punishment
No recent network activity.
The text was updated successfully, but these errors were encountered: