-
-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: avoid uncaught exception when polling validator indices #6891
Conversation
Performance Report✔️ no performance regression detected Full benchmark results
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry i don't materially see what this PR solves which the other PR didn't.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## unstable #6891 +/- ##
=========================================
Coverage 62.75% 62.75%
=========================================
Files 578 578
Lines 61347 61345 -2
Branches 2114 2124 +10
=========================================
Hits 38498 38498
+ Misses 22811 22809 -2
Partials 38 38 |
The other PR returns an empty array if there is an error, if the upstream caller does not have a handler attached this results in no error, and wrongly returns no (new) indices. This can have unwanted side-effects e.g. in doppelganger protection (see #6888 (comment)) lodestar/packages/validator/src/services/doppelgangerService.ts Lines 142 to 143 in f29a6db
We can apply these minimal changes on your PR, I don't mind either way, just opened a new PR as it seemed simpler |
|
There are different upstream consumers on this which needs to handle it in different way e.g. when polling beacon proposer, we catch and log the error, and continue as it's not critical for this code path that the request succeeds lodestar/packages/validator/src/services/prepareBeaconProposer.ts Lines 33 to 35 in 7e35c92
but we have a case where it is important that the code does not proceed
it's just not the responsibility of
it does, the uncaught exception was due to a floating promise, we accidentally created by not handling the return value of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok i see that at other places pollValidatorIndices has been caught and handled, however this is a "harmless" service.
lets say you have 100 batches which you polled and 1 batch always error out of 100 (lets say you got a bad BN), you will never do any duty as the entire poll is invalidated.
We need to make sure that if we can go forward with remaining keys for whatever duty/work/monitoring we should do it and not block other keys.
This goes for other duties as well and well require some more work. so i am approving the PR as of now as it does seem that it will handle uncaught rejection
yeah, sorry about confusing regarding doppelganger, our implementation is pretty robust and handles all cases well
you mean specifically here because we have batches of 64 right now? I think we can kinda solve this by switching to |
no no, small batches are good, but latency of batch size to number of batches needs to be optimized unless the apis are send in aprallel (whose concurrently also needs to be fixed) for e.g. here we send all indices, so we need to see if a setup has 1000 keys, what is the error rate/latency of batching this up etc |
really not sure if small batches are good here, we have to apply same filtering on the beacon node side for each request, this seems really inefficient to me, a single request would reduce the work on the BN side by a lot. One problem with this endpoint is definitely the lack of ssz-serialization, there is some work in progess on the spec side.
Right now, this is all done at startup, even if you have 10k keys, it spams the beacon node but it usually works out fine and after that you only poll undiscovered keys, so unless there are a lot of keys imported that are unknown (to be discvered) it's not big deal, I've never seen an error on any of my nodes due to this api. A lot of client (like LH) poll this API every epoch for all pubkeys because they keep track for validator statuses over time, even that seems fine to me but we need to get rid of these 64 chunked requests which are all executed in sequence and parallel might be DoS on the beacon node. |
yes, we need to empirically see whats what.
pollBeaconAttesters is done every epoch i think I am bringing this up because our error rate is hight on obol dvt setups so whatever is the reason someone needs to dig it up and make changes |
yeah, what I meant is we only poll for pubkeys we haven't discovered yet, so unless there are tons of pending pubkeys should be just one request as it's 64 per batch, or none at all. If I look at my metrics, there are no lodestar/packages/validator/src/services/indices.ts Lines 107 to 111 in 7e35c92
we need to have a proper testing setup for this, let's discuss on discord |
🎉 This PR is included in v1.20.0 🎉 |
Motivation
See #6888 for logs and more context
Description
Avoid uncaught exception when polling validator indices. The
finally
was called in a different context, causing an uncaught exception ascatch
handlers of upstream code where called earlier.