-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle terminally stuck transactions on send #14127
Handle terminally stuck transactions on send #14127
Conversation
common/txmgr/confirmer.go
Outdated
@@ -999,6 +999,21 @@ func (ec *Confirmer[CHAIN_ID, HEAD, ADDR, TX_HASH, BLOCK_HASH, R, SEQ, FEE]) han | |||
ec.SvcErrBuffer.Append(sendError) | |||
// This will loop continuously on every new head so it must be handled manually by the node operator! | |||
return ec.txStore.DeleteInProgressAttempt(ctx, attempt) | |||
case client.TerminallyStuck: | |||
// A transaction could broadcast successfully but then be considered terminally stuck on another attempt | |||
// Even though the transaction can succeeed under different circumstances, we want to purge this transaction as soon as we get this error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Even though the transaction can succeeed under different circumstances, we want to purge this transaction as soon as we get this error | |
// Even though the transaction can succeed under different circumstances, we want to purge this transaction as soon as we get this error |
common/txmgr/confirmer.go
Outdated
case client.TerminallyStuck: | ||
// A transaction could broadcast successfully but then be considered terminally stuck on another attempt | ||
// Even though the transaction can succeeed under different circumstances, we want to purge this transaction as soon as we get this error | ||
lggr.Errorw("terminally stuck transaction detected", "err", sendError.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whats the criteria for a Error log vs a Critical log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from my understanding this (tx stuck due to overflow, not enough keccak counters to continue the execution)
is expected behavior, as least for zkSync. When it happens we just need to cancel/purge the existing tx by reprocessing, and it's not critical/fatal issue related to chainlink node that we need to raise alert, critial/fatal one for example: Invariant violation: fatal error while re-attempting transaction
should not happen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's my understanding as well. Since the TXM resolves this on its own, we don't have to raise a signal for NOPs to take any actions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. I would say, change log level also to a warn.
The Tx is bad, and is nothing wrong with the TXM.
return tx.Nonce() == uint64(346) && tx.Value().Cmp(big.NewInt(243)) == 0 | ||
}), fromAddress).Return(commonclient.Fatal, errors.New(terminallyStuckError)).Once() | ||
|
||
// Do the thing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: replace with more descriptive comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol I notice we have a lot Do the thing
lines in many test files, confirmer_test.go for example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya this was just copied from another broadcaster test haha. But never too late to update at least the new tests to say something better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm except some lint errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good
321c21a
core/chains/evm/client/errors.go
Outdated
@@ -596,6 +596,11 @@ func ClassifySendError(err error, clientErrors config.ClientErrors, lggr logger. | |||
) | |||
return commonclient.ExceedsMaxFee | |||
} | |||
if sendError.IsTerminallyStuckConfigError(configErrors) { | |||
lggr.Criticalw("Transaction that would have been terminally stuck in the mempool detected on send. Marking as fatal error.", "err", sendError, "etx", tx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't be critical log. Should be a warning.
Even the Errorw() log is for cases where we clearly see a failure, although ones that we can recover from.
From example, an important RPC failed, or database writing failed.
A stuck tx is now an expected behavior, so warn log is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do! I was just trying to match the behavior for Fatal. Think we're planning to rework logs in the near future so that might change anyways
common/txmgr/confirmer.go
Outdated
case client.TerminallyStuck: | ||
// A transaction could broadcast successfully but then be considered terminally stuck on another attempt | ||
// Even though the transaction can succeeed under different circumstances, we want to purge this transaction as soon as we get this error | ||
lggr.Errorw("terminally stuck transaction detected", "err", sendError.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. I would say, change log level also to a warn.
The Tx is bad, and is nothing wrong with the TXM.
Quality Gate passedIssues Measures |
Cherry-pick of smartcontractkit/chainlink#14127 Required to successfully handle zk overflows on Polygon zkEVM and X Layer. Co-authored-by: amit-momin <[email protected]> Co-authored-by: Rens Rooimans <[email protected]>
BCI-3014