Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pcli view db corrupted with invalid anchors #4577

Closed
conorsch opened this issue Jun 7, 2024 · 3 comments · Fixed by #4578
Closed

pcli view db corrupted with invalid anchors #4577

conorsch opened this issue Jun 7, 2024 · 3 comments · Fixed by #4578
Labels
A-CI/CD Relates to continuous integration & deployment of Penumbra A-client Area: Design and implementation for client functionality A-node Area: System design and implementation for node software

Comments

@conorsch
Copy link
Contributor

conorsch commented Jun 7, 2024

Describe the bug
Since launching ceremony 2.1, we've seen a marked upstick in reports of:

status: Internal, message: "Error submitting transaction: code 1, log: failed to deliver transaction: check_stateful failed: provided anchor d1b0a17ec504a6c6abd31a4a30aceb934e9d87eb43014ec97fe4a1d86ddc1909 is not a valid SCT root", details: [], metadata: MetadataMap { headers: {} }

To Reproduce
Several team members confirm this is reproducible by simply looping over pcli-tx-send. For example, here's an attempt I just submitted:

full terminal session from send-loop
❯ for i in $(seq 10); do pcli tx send --to $(pcli view address 10 -e) --memo "testing looped pcli sends $i" 0.1penumbra ; done
Scanning blocks from last sync height 438649 to latest height 438649
[0s] ██████████████████████████████████████████████████       0/0       0/s ETA: 0s
building transaction [3 actions, 3 proofs]...
finished proving in 0.547 seconds [3 actions, 3 proofs, 2402 bytes]
broadcasting transaction and awaiting confirmation...
transaction broadcast successfully: 0837b70f240bf405cd299443af16fa12d2a805d6f0849d67b22caf9d45c2a02a
transaction confirmed and detected: 0837b70f240bf405cd299443af16fa12d2a805d6f0849d67b22caf9d45c2a02a @ height 438651
Scanning blocks from last sync height 438651 to latest height 438651
[0s] ██████████████████████████████████████████████████       0/0       0/s ETA: 0s
building transaction [3 actions, 3 proofs]...
finished proving in 0.540 seconds [3 actions, 3 proofs, 2402 bytes]
broadcasting transaction and awaiting confirmation...
transaction broadcast successfully: 3fd02e3404905ca43b0460a6b34ead4524ad20db6d50980e1b67d71f811c5dfe
transaction confirmed and detected: 3fd02e3404905ca43b0460a6b34ead4524ad20db6d50980e1b67d71f811c5dfe @ height 438652
Scanning blocks from last sync height 438652 to latest height 438652
[0s] ██████████████████████████████████████████████████       0/0       0/s ETA: 0s
building transaction [4 actions, 4 proofs]...
finished proving in 0.700 seconds [4 actions, 4 proofs, 2784 bytes]
broadcasting transaction and awaiting confirmation...
transaction broadcast successfully: 01629c9cebe60fcc29e2a67061dd9af8b66d84b107c8accc80ecde86f7f1b8cf
transaction confirmed and detected: 01629c9cebe60fcc29e2a67061dd9af8b66d84b107c8accc80ecde86f7f1b8cf @ height 438653
Scanning blocks from last sync height 438653 to latest height 438653
[0s] ██████████████████████████████████████████████████       0/0       0/s ETA: 0s
building transaction [3 actions, 3 proofs]...
finished proving in 0.558 seconds [3 actions, 3 proofs, 2402 bytes]
broadcasting transaction and awaiting confirmation...
transaction broadcast successfully: e9a227a8272eb998df1312b6a6f6a6c6101791a99eb5c8d5c97960791fd9b4a9
transaction confirmed and detected: e9a227a8272eb998df1312b6a6f6a6c6101791a99eb5c8d5c97960791fd9b4a9
Scanning blocks from last sync height 438654 to latest height 438654
[0s] ██████████████████████████████████████████████████       0/0       0/s ETA: 0s
building transaction [3 actions, 3 proofs]...
2024-06-07T23:21:33.100482Z ERROR penumbra_view::worker: view worker error e=Wrong block height 438654 for latest sync height Some(438654)
finished proving in 0.548 seconds [3 actions, 3 proofs, 2402 bytes]
broadcasting transaction and awaiting confirmation...
transaction broadcast successfully: a8fede3c084fbb793a686f16dbe72c4f2ff1f4bd92faac56c50a8f68b6fc6f2c
transaction confirmed and detected: a8fede3c084fbb793a686f16dbe72c4f2ff1f4bd92faac56c50a8f68b6fc6f2c @ height 438655
Scanning blocks from last sync height 438656 to latest height 438656
[0s] ██████████████████████████████████████████████████       0/0       0/s ETA: 0s
building transaction [3 actions, 3 proofs]...
finished proving in 0.537 seconds [3 actions, 3 proofs, 2402 bytes]
broadcasting transaction and awaiting confirmation...
Error: error broadcasting transaction

Caused by:
    status: Internal, message: "Error submitting transaction: code 1, log: failed to deliver transaction: check_stateful failed: provided anchor aeee9dd3066c887e275b23121982764f4d0fd92fbe125d77dcf4dcea04acfd06 is not a valid SCT root", details: [], metadata: MetadataMap { headers: {} }
Scanning blocks from last sync height 438656 to latest height 438656
[0s] ██████████████████████████████████████████████████       0/0       0/s ETA: 0s
building transaction [3 actions, 3 proofs]...
finished proving in 0.550 seconds [3 actions, 3 proofs, 2402 bytes]
broadcasting transaction and awaiting confirmation...
Error: error broadcasting transaction

Caused by:
    status: Internal, message: "Error submitting transaction: code 1, log: failed to deliver transaction: check_stateful failed: provided anchor aeee9dd3066c887e275b23121982764f4d0fd92fbe125d77dcf4dcea04acfd06 is not a valid SCT root", details: [], metadata: MetadataMap { headers: {} }
Scanning blocks from last sync height 438656 to latest height 438656
[0s] ██████████████████████████████████████████████████       0/0       0/s ETA: 0s
building transaction [3 actions, 3 proofs]...
finished proving in 0.536 seconds [3 actions, 3 proofs, 2402 bytes]
broadcasting transaction and awaiting confirmation...
Error: error broadcasting transaction

Caused by:
    status: Internal, message: "Error submitting transaction: code 1, log: failed to deliver transaction: check_stateful failed: provided anchor aeee9dd3066c887e275b23121982764f4d0fd92fbe125d77dcf4dcea04acfd06 is not a valid SCT root", details: [], metadata: MetadataMap { headers: {} }
Scanning blocks from last sync height 438656 to latest height 438656
[0s] ██████████████████████████████████████████████████       0/0       0/s ETA: 0s
building transaction [3 actions, 3 proofs]...
finished proving in 0.543 seconds [3 actions, 3 proofs, 2402 bytes]
broadcasting transaction and awaiting confirmation...
Error: error broadcasting transaction

Caused by:
    status: Internal, message: "Error submitting transaction: code 1, log: failed to deliver transaction: check_stateful failed: provided anchor aeee9dd3066c887e275b23121982764f4d0fd92fbe125d77dcf4dcea04acfd06 is not a valid SCT root", details: [], metadata: MetadataMap { headers: {} }
Scanning blocks from last sync height 438657 to latest height 438657
[0s] ██████████████████████████████████████████████████       0/0       0/s ETA: 0s
building transaction [3 actions, 3 proofs]...
^C

Steps to reproduce the behavior:

  1. Run a command like: for i in $(seq 10); do pcli tx send --to $(pcli view address 10 -e) --memo "testing looped pcli sends $i" 0.1penumbra ; done
  2. Observe eventual corruption, after a variable number of txs

Once pcli gets into this state, all subsequent txs will fail, which is very bad:

❯ pcli tx send --to $(pcli view address 10 -e) --memo "testing looped pcli sends $i" 0.1penumbra
Scanning blocks from last sync height 438719 to latest height 438719
[0s] ██████████████████████████████████████████████████       0/0       0/s ETA: 0s
building transaction [3 actions, 3 proofs]...
finished proving in 0.536 seconds [3 actions, 3 proofs, 2402 bytes]
broadcasting transaction and awaiting confirmation...
Error: error broadcasting transaction

Caused by:
    status: Internal, message: "Error submitting transaction: code 1, log: failed to deliver transaction: check_stateful failed: provided anchor eedc922763b21aa8dc94447868e
1d29e037cfdf92046a88fde87e2da15419910 is not a valid SCT root", details: [], metadata: MetadataMap { headers: {} }

Simply running pcli view reset and rerunning a single, not aggressively looped, tx will work fine.

Expected behavior
I should be able to throw transactions quickly at a remote endpoint and not encounter errors.

Additional context
So far it appears this bug only occurs on the primary load-balanced endpoints, i.e. https://grpc.penumbra.testnet.zone. If I test against a solo node, I've not been able to reproduce.

Workarounds

  • If one observes the "not a valid SCT root" error message, run pcli view reset and try the transaction again.
  • If running many successive transactions, sleep ~5s between each if against an LB endpoint
  • If running many successive transactions, run your own node and point the client at that
@conorsch conorsch added A-node Area: System design and implementation for node software A-client Area: Design and implementation for client functionality labels Jun 7, 2024
@github-actions github-actions bot added the needs-refinement unclear, incomplete, or stub issue that needs work label Jun 7, 2024
@conorsch conorsch added A-CI/CD Relates to continuous integration & deployment of Penumbra and removed needs-refinement unclear, incomplete, or stub issue that needs work labels Jun 7, 2024
cronokirby added a commit that referenced this issue Jun 10, 2024
This should fix an issue wherein a load balanced RPC can cause data
corruption by delivering the same block twice in the stream.

## Issue ticket number and link

This should close #4577.


## Checklist before requesting a review

- [x] If this code contains consensus-breaking changes, I have added the
"consensus-breaking" label. Otherwise, I declare my belief that there
are not consensus-breaking changes, for the following reason:

  > Just a client change
conorsch pushed a commit that referenced this issue Jun 20, 2024
This should fix an issue wherein a load balanced RPC can cause data
corruption by delivering the same block twice in the stream.

## Issue ticket number and link

This should close #4577.

## Checklist before requesting a review

- [x] If this code contains consensus-breaking changes, I have added the
"consensus-breaking" label. Otherwise, I declare my belief that there
are not consensus-breaking changes, for the following reason:

  > Just a client change

(cherry picked from commit 4245ebc)
conorsch pushed a commit that referenced this issue Jun 20, 2024
This should fix an issue wherein a load balanced RPC can cause data
corruption by delivering the same block twice in the stream.

## Issue ticket number and link

This should close #4577.

## Checklist before requesting a review

- [x] If this code contains consensus-breaking changes, I have added the
"consensus-breaking" label. Otherwise, I declare my belief that there
are not consensus-breaking changes, for the following reason:

  > Just a client change

(cherry picked from commit 4245ebc)
@cratelyn
Copy link
Contributor

this seems to still be happening sometimes. i am going to reopen this until we're confident that this has been resolved.

@cratelyn cratelyn reopened this Jun 21, 2024
@conorsch
Copy link
Contributor Author

We released v0.77.3 yesterday (#4648) so I suspect that anyone still encountering this error is on 0.77.2 or below. Let's make sure to check versions.

@cratelyn
Copy link
Contributor

ah! my apologies for the noise, let's keep this closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-CI/CD Relates to continuous integration & deployment of Penumbra A-client Area: Design and implementation for client functionality A-node Area: System design and implementation for node software
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants