document historical reindexing process for abci events #4566

conorsch · 2024-06-06T00:37:58Z

We need figure out the process for creating an ABCI event database from scratch. The process should look something like:

Download historical archive of chain state
Set up new node to run in archive mode
Sync node up to upgrade boundary
Update versions, rinse, repeat, until current height

Let's do it manually for now, working from private backups. When I last tried this, getting cometbft to "replay" blocks to pd over ABCI was fairly simple: unfortunately, that replay process did not result in ABCI events being emitted to a newly created sidecar indexing db. Why is that? Is there some other process we should use to handle this use case?

Closely related to #4525 & #4526.

hdevalence · 2024-06-06T00:40:51Z

Syncing a new cometbft instance will solve the event emission problem.

Due to an oversight in the volume-mount path, out postgres databases for indexing ABCI events via CometBFT's psql indexer setup were not persisting across node restarts. This was due to the fact that the upstream `postgres` container image declares its own VOLUME at `/var/lib/postgresql/data`, visible here: ❯ podman inspect docker.io/postgres:16-bookworm | jq '.[0].Config.Volumes' { "/var/lib/postgresql/data": {} } which clobbered our manual mount at `/var/lib/postgresql`, a level up. We must override the volume mountpoint in order for the PVC to take precedence over the anonymous volume. Done by adding the fullpath to that exact mount, and then overriding PGDATA to use a subdir therein. Also pins a stable version of the container image tag, so we don't get surprised by postgres jumps. Refs #4526. Will make sure this applies cleanly automatically on preview post-merge, then will update the deployments for currently-running networks. Does not handle historical reingestion; that's tracked separately in #4566.

conorsch · 2024-06-08T00:20:47Z

Progress! Dumping some notes here so I can pick this back up. The process so far is a bit clunky, but it's definitely tractable.

We want to reindex events on the penumbra-testnet-deimos-8 chain. There have been several upgrades already, so we'll need to rewind several versions to do that. Let's go back to 0.75.1. That version won't run out of the box with archived node state, because the halt-height will have been reached, causing pd to exit. So, branch from the v0.75.1 tag, increment the (now-defunct) const TOTAL_HALT_COUNT by one, and build that version of pd, saving it as pd-0.75.1-plus-halt. Use that to set up the initial version of the nodes.

Restore node state from a private backup. This backup was created by tarring up the entire ~/.penumbra/testnet_data/node0 directory on a fullnode. (Related we should sanitize this archive and host it, see ci: testnet archives should include cometbft data #4525).
Install pd-0.75.1-plus-halt to the system. Run it.
Configure CometBFT v0.37.5 with extra options: --p2p.pex=false --p2p.seeds='' --moniker archive-node-1. Make sure its CometBFT RPC is accessible to a second node, soon to be created.
Set up another node with pd-0.75.1-plus-halt and cometbft v0.37.5. Join this node to the first node via pd-0.75.1-plus-halt testnet join <url>. Run its cometbft with extra options:: --p2p.pex=false --p2p.seeds='' --moniker='reindexer-1' --p2p.persistent_peers='tcp://<node_1_id>@<node_1_ip>:<node_1_port>'
Edit node2's cometbft config [tx_index] block and set indexer="psql" & psql_conn="<POSTGRES_DB_URL>". Run it.

Now you should see blocks syncing from node1 -> node2, and more importantly, node2 -> postgres. Once node2 catches up the node1's halt height, the ABCI event db will have blocks 1 -> halt height. Then it's time to shift versions, redeploy backups, and sync again. That piecemeal process, unscripted, should get us full recovery of historical events. Once we have a db, we can shove that behind the relevant frontends.

conorsch · 2024-06-10T21:04:00Z

Update on progress: I've successfully reindexed blocks 0 through 219220 on penumbra-testnet-deimos-8, using pd v0.75.1, using the process described above. Next up, we need to reindex events from height 220220 to 326700, which matches pd versions 0.76.x to 0.77.x. To do so, we must perform some version gymnastics, related to the overhaul of halting logic in #4373: specifically, we want to run v0.76.0, but restoring from a backup on that state will not let us resume the chain. So we'll briefly use v0.77.0 to edit the halt-bit, then fall back to v0.76.0 to sync.

So the steps for reindexing 220220 to 326700 on penumbra-testnet-deimos-8 are:

Restore node0 state from a private backup of pre-77 state. This backup was created by tarring up the entire ~/.penumbra/testnet_data/node0 directory on a fullnode.
Install both v0.76.0 and v0.77.2 pd to the system.
Run pd-v0.77.2 migrate --ready-to-start to disable the halt bit.
Configure CometBFT v0.37.5 with extra options: --p2p.pex=false --p2p.seeds='' --moniker archive-node-1. Make sure its CometBFT RPC is accessible to a second node, soon to be created.
Set up second node with pd-v0.76.0 and cometbft v0.37.5. Join this node to the first node via pd-v0.76.0 testnet join <url> --archive-url https://snapshots.penumbra.zone/testnet/pd-migrated-state-75-76.tar.gz. The archive URL is mandatory. Run its cometbft with extra options:: --p2p.pex=false --p2p.seeds='' --moniker='reindexer-1' --p2p.persistent_peers='tcp://<node_1_id>@<node_1_ip>:<node_1_port>'
Edit node2's cometbft config [tx_index] block and set indexer="psql" & psql_conn="<POSTGRES_DB_URL>". Run it.

You should now see blocks streaming from node1 to node2, and ABCI events for each block added to the postgresql db. Wait for the next chain halt, at height 326700, and then it's time to update versions once again.

erwanor · 2024-06-10T21:32:29Z

Nice!

Collects some information from the cuiloa README [0], as well as some generalized instructions captured in #4566, particularly the use of `--ready-to-start` from #4499. Refs #4494, closes #4566. [0] https://github.com/penumbra-zone/cuiloa/blob/dc4133f7b36706cdf5a3ee6b4e0fb2c09e5a8bb8/README.md

github-actions bot added the needs-refinement unclear, incomplete, or stub issue that needs work label Jun 6, 2024

conorsch mentioned this issue Jun 7, 2024

ci: fix persistence for indexing dbs #4575

Merged

1 task

conorsch removed the needs-refinement unclear, incomplete, or stub issue that needs work label Jun 8, 2024

conorsch self-assigned this Jun 10, 2024

conorsch added A-CI/CD Relates to continuous integration & deployment of Penumbra C-chore Codebase maintenance that doesn't fix bugs or add features, and isn't urgent or blocking. A-docs Area: Documentation needs for the project A-upgrades Area: Relates to chain upgrades labels Jun 10, 2024

conorsch mentioned this issue Jun 12, 2024

ci: db persistence for large index dbs #4603

Merged

1 task

conorsch closed this as completed in #4603 Jun 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

document historical reindexing process for abci events #4566

document historical reindexing process for abci events #4566

conorsch commented Jun 6, 2024

hdevalence commented Jun 6, 2024

conorsch commented Jun 8, 2024

conorsch commented Jun 10, 2024

erwanor commented Jun 10, 2024

document historical reindexing process for abci events #4566

document historical reindexing process for abci events #4566

Comments

conorsch commented Jun 6, 2024

hdevalence commented Jun 6, 2024

conorsch commented Jun 8, 2024

conorsch commented Jun 10, 2024

erwanor commented Jun 10, 2024