Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document historical reindexing process for abci events #4566

Closed
conorsch opened this issue Jun 6, 2024 · 4 comments · Fixed by #4603
Closed

document historical reindexing process for abci events #4566

conorsch opened this issue Jun 6, 2024 · 4 comments · Fixed by #4603
Assignees
Labels
A-CI/CD Relates to continuous integration & deployment of Penumbra A-docs Area: Documentation needs for the project A-upgrades Area: Relates to chain upgrades C-chore Codebase maintenance that doesn't fix bugs or add features, and isn't urgent or blocking.

Comments

@conorsch
Copy link
Contributor

conorsch commented Jun 6, 2024

We need figure out the process for creating an ABCI event database from scratch. The process should look something like:

  1. Download historical archive of chain state
  2. Set up new node to run in archive mode
  3. Sync node up to upgrade boundary
  4. Update versions, rinse, repeat, until current height

Let's do it manually for now, working from private backups. When I last tried this, getting cometbft to "replay" blocks to pd over ABCI was fairly simple: unfortunately, that replay process did not result in ABCI events being emitted to a newly created sidecar indexing db. Why is that? Is there some other process we should use to handle this use case?

Closely related to #4525 & #4526.

@github-actions github-actions bot added the needs-refinement unclear, incomplete, or stub issue that needs work label Jun 6, 2024
@hdevalence
Copy link
Member

Syncing a new cometbft instance will solve the event emission problem.

conorsch added a commit that referenced this issue Jun 7, 2024
Due to an oversight in the volume-mount path, out postgres databases for
indexing ABCI events via CometBFT's psql indexer setup were not
persisting across node restarts. This was due to the fact that the
upstream `postgres` container image declares its own VOLUME at
`/var/lib/postgresql/data`, visible here:

    ❯ podman inspect docker.io/postgres:16-bookworm | jq '.[0].Config.Volumes'
    {
      "/var/lib/postgresql/data": {}
    }

which clobbered our manual mount at `/var/lib/postgresql`, a level up.
We must override the volume mountpoint in order for the PVC to take
precedence over the anonymous volume. Done by adding the fullpath
to that exact mount, and then overriding PGDATA to use a subdir therein.

Also pins a stable version of the container image tag, so we don't get
surprised by postgres jumps.

Refs #4526. Will make sure this applies cleanly automatically on preview
post-merge, then will update the deployments for currently-running
networks. Does not handle historical reingestion; that's tracked
separately in #4566.
conorsch added a commit that referenced this issue Jun 8, 2024
Due to an oversight in the volume-mount path, out postgres databases for
indexing ABCI events via CometBFT's psql indexer setup were not
persisting across node restarts. This was due to the fact that the
upstream `postgres` container image declares its own VOLUME at
`/var/lib/postgresql/data`, visible here:

    ❯ podman inspect docker.io/postgres:16-bookworm | jq '.[0].Config.Volumes'
    {
      "/var/lib/postgresql/data": {}
    }

which clobbered our manual mount at `/var/lib/postgresql`, a level up.
We must override the volume mountpoint in order for the PVC to take
precedence over the anonymous volume. Done by adding the fullpath
to that exact mount, and then overriding PGDATA to use a subdir therein.

Also pins a stable version of the container image tag, so we don't get
surprised by postgres jumps.

Refs #4526. Will make sure this applies cleanly automatically on preview
post-merge, then will update the deployments for currently-running
networks. Does not handle historical reingestion; that's tracked
separately in #4566.
@conorsch
Copy link
Contributor Author

conorsch commented Jun 8, 2024

Progress! Dumping some notes here so I can pick this back up. The process so far is a bit clunky, but it's definitely tractable.

We want to reindex events on the penumbra-testnet-deimos-8 chain. There have been several upgrades already, so we'll need to rewind several versions to do that. Let's go back to 0.75.1. That version won't run out of the box with archived node state, because the halt-height will have been reached, causing pd to exit. So, branch from the v0.75.1 tag, increment the (now-defunct) const TOTAL_HALT_COUNT by one, and build that version of pd, saving it as pd-0.75.1-plus-halt. Use that to set up the initial version of the nodes.

  1. Restore node state from a private backup. This backup was created by tarring up the entire ~/.penumbra/testnet_data/node0 directory on a fullnode. (Related we should sanitize this archive and host it, see ci: testnet archives should include cometbft data #4525).
  2. Install pd-0.75.1-plus-halt to the system. Run it.
  3. Configure CometBFT v0.37.5 with extra options: --p2p.pex=false --p2p.seeds='' --moniker archive-node-1. Make sure its CometBFT RPC is accessible to a second node, soon to be created.
  4. Set up another node with pd-0.75.1-plus-halt and cometbft v0.37.5. Join this node to the first node via pd-0.75.1-plus-halt testnet join <url>. Run its cometbft with extra options:: --p2p.pex=false --p2p.seeds='' --moniker='reindexer-1' --p2p.persistent_peers='tcp://<node_1_id>@<node_1_ip>:<node_1_port>'
  5. Edit node2's cometbft config [tx_index] block and set indexer="psql" & psql_conn="<POSTGRES_DB_URL>". Run it.

Now you should see blocks syncing from node1 -> node2, and more importantly, node2 -> postgres. Once node2 catches up the node1's halt height, the ABCI event db will have blocks 1 -> halt height. Then it's time to shift versions, redeploy backups, and sync again. That piecemeal process, unscripted, should get us full recovery of historical events. Once we have a db, we can shove that behind the relevant frontends.

@conorsch conorsch removed the needs-refinement unclear, incomplete, or stub issue that needs work label Jun 8, 2024
@conorsch
Copy link
Contributor Author

Update on progress: I've successfully reindexed blocks 0 through 219220 on penumbra-testnet-deimos-8, using pd v0.75.1, using the process described above. Next up, we need to reindex events from height 220220 to 326700, which matches pd versions 0.76.x to 0.77.x. To do so, we must perform some version gymnastics, related to the overhaul of halting logic in #4373: specifically, we want to run v0.76.0, but restoring from a backup on that state will not let us resume the chain. So we'll briefly use v0.77.0 to edit the halt-bit, then fall back to v0.76.0 to sync.

So the steps for reindexing 220220 to 326700 on penumbra-testnet-deimos-8 are:

  1. Restore node0 state from a private backup of pre-77 state. This backup was created by tarring up the entire ~/.penumbra/testnet_data/node0 directory on a fullnode.
  2. Install both v0.76.0 and v0.77.2 pd to the system.
  3. Run pd-v0.77.2 migrate --ready-to-start to disable the halt bit.
  4. Configure CometBFT v0.37.5 with extra options: --p2p.pex=false --p2p.seeds='' --moniker archive-node-1. Make sure its CometBFT RPC is accessible to a second node, soon to be created.
  5. Set up second node with pd-v0.76.0 and cometbft v0.37.5. Join this node to the first node via pd-v0.76.0 testnet join <url> --archive-url https://snapshots.penumbra.zone/testnet/pd-migrated-state-75-76.tar.gz. The archive URL is mandatory. Run its cometbft with extra options:: --p2p.pex=false --p2p.seeds='' --moniker='reindexer-1' --p2p.persistent_peers='tcp://<node_1_id>@<node_1_ip>:<node_1_port>'
  6. Edit node2's cometbft config [tx_index] block and set indexer="psql" & psql_conn="<POSTGRES_DB_URL>". Run it.

You should now see blocks streaming from node1 to node2, and ABCI events for each block added to the postgresql db. Wait for the next chain halt, at height 326700, and then it's time to update versions once again.

@conorsch conorsch self-assigned this Jun 10, 2024
@conorsch conorsch added A-CI/CD Relates to continuous integration & deployment of Penumbra C-chore Codebase maintenance that doesn't fix bugs or add features, and isn't urgent or blocking. A-docs Area: Documentation needs for the project A-upgrades Area: Relates to chain upgrades labels Jun 10, 2024
@erwanor
Copy link
Member

erwanor commented Jun 10, 2024

Nice!

conorsch added a commit that referenced this issue Jun 12, 2024
Collects some information from the cuiloa README [0], as well as some
generalized instructions captured in #4566, particularly the use of
`--ready-to-start` from #4499. Refs #4494, closes #4566.

[0] https://github.com/penumbra-zone/cuiloa/blob/dc4133f7b36706cdf5a3ee6b4e0fb2c09e5a8bb8/README.md
conorsch added a commit that referenced this issue Jun 13, 2024
Collects some information from the cuiloa README [0], as well as some
generalized instructions captured in #4566, particularly the use of
`--ready-to-start` from #4499. Refs #4494, closes #4566.

[0] https://github.com/penumbra-zone/cuiloa/blob/dc4133f7b36706cdf5a3ee6b4e0fb2c09e5a8bb8/README.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-CI/CD Relates to continuous integration & deployment of Penumbra A-docs Area: Documentation needs for the project A-upgrades Area: Relates to chain upgrades C-chore Codebase maintenance that doesn't fix bugs or add features, and isn't urgent or blocking.
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants