Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: abci event databases do not persist #4526

Closed
conorsch opened this issue Jun 3, 2024 · 0 comments · Fixed by #4603
Closed

ci: abci event databases do not persist #4526

conorsch opened this issue Jun 3, 2024 · 0 comments · Fixed by #4603
Assignees
Labels
A-CI/CD Relates to continuous integration & deployment of Penumbra C-bug Category: a bug _P-V1 Priority: slated for V1 release
Milestone

Comments

@conorsch
Copy link
Contributor

conorsch commented Jun 3, 2024

Describe the bug
The Postgres databases storing ABCI event info for frontend apps like the block-explorer are not persisting. This has resulted in data loss for events on the penumbra-testnet-deimos-8 chain. Let's fix the persistence problem, then try to reindex historical events (#4525).

Additional context
Most likely the problem is a naive application of the CometBFT Postgres schema on application start: if the upstream schema declaration is not idempotent, e.g. via use of "IF NOT EXISTS" then re-applying it is a destructive action, essentially dropping existing tables. If that's the case, let's drop a flag on-disk in the PVC and only apply the schema if that flag is missing.

@conorsch conorsch added C-bug Category: a bug A-CI/CD Relates to continuous integration & deployment of Penumbra labels Jun 3, 2024
@github-project-automation github-project-automation bot moved this to Backlog in Penumbra Jun 3, 2024
@github-actions github-actions bot added the needs-refinement unclear, incomplete, or stub issue that needs work label Jun 3, 2024
@conorsch conorsch self-assigned this Jun 3, 2024
@conorsch conorsch added this to the Sprint 8 milestone Jun 3, 2024
@aubrika aubrika moved this from Backlog to Todo in Penumbra Jun 3, 2024
@aubrika aubrika added _P-V1 Priority: slated for V1 release and removed needs-refinement unclear, incomplete, or stub issue that needs work labels Jun 3, 2024
conorsch added a commit that referenced this issue Jun 7, 2024
Due to an oversight in the volume-mount path, out postgres databases for
indexing ABCI events via CometBFT's psql indexer setup were not
persisting across node restarts. This was due to the fact that the
upstream `postgres` container image declares its own VOLUME at
`/var/lib/postgresql/data`, visible here:

    ❯ podman inspect docker.io/postgres:16-bookworm | jq '.[0].Config.Volumes'
    {
      "/var/lib/postgresql/data": {}
    }

which clobbered our manual mount at `/var/lib/postgresql`, a level up.
We must override the volume mountpoint in order for the PVC to take
precedence over the anonymous volume. Done by adding the fullpath
to that exact mount, and then overriding PGDATA to use a subdir therein.

Also pins a stable version of the container image tag, so we don't get
surprised by postgres jumps.

Refs #4526. Will make sure this applies cleanly automatically on preview
post-merge, then will update the deployments for currently-running
networks. Does not handle historical reingestion; that's tracked
separately in #4566.
conorsch added a commit that referenced this issue Jun 8, 2024
Due to an oversight in the volume-mount path, out postgres databases for
indexing ABCI events via CometBFT's psql indexer setup were not
persisting across node restarts. This was due to the fact that the
upstream `postgres` container image declares its own VOLUME at
`/var/lib/postgresql/data`, visible here:

    ❯ podman inspect docker.io/postgres:16-bookworm | jq '.[0].Config.Volumes'
    {
      "/var/lib/postgresql/data": {}
    }

which clobbered our manual mount at `/var/lib/postgresql`, a level up.
We must override the volume mountpoint in order for the PVC to take
precedence over the anonymous volume. Done by adding the fullpath
to that exact mount, and then overriding PGDATA to use a subdir therein.

Also pins a stable version of the container image tag, so we don't get
surprised by postgres jumps.

Refs #4526. Will make sure this applies cleanly automatically on preview
post-merge, then will update the deployments for currently-running
networks. Does not handle historical reingestion; that's tracked
separately in #4566.
conorsch added a commit that referenced this issue Jun 12, 2024
Makes a few changes:

  * raises db storage 1GB -> 10GB
  * leaves db running in maintenanceMode, for dump/restore
  * increases available shm size for db container

Ideally we'd make the db persistence customizable, but this is good
enough for now. Looks like 500k blocks maps to roughly 3.5GB of dbdump.
We can adjust over time if we plan to keep running long-lived testnet
chains.

The increased shm size is specifically to support large joins by e.g.
the dex explorer frontend, otherwise db connection was reporting
"could not resize shared memory segment".

Refs #4526.
conorsch added a commit that referenced this issue Jun 13, 2024
Makes a few changes:

  * raises db storage 1GB -> 10GB
  * leaves db running in maintenanceMode, for dump/restore
  * increases available shm size for db container

Ideally we'd make the db persistence customizable, but this is good
enough for now. Looks like 500k blocks maps to roughly 3.5GB of dbdump.
We can adjust over time if we plan to keep running long-lived testnet
chains.

The increased shm size is specifically to support large joins by e.g.
the dex explorer frontend, otherwise db connection was reporting
"could not resize shared memory segment".

Refs #4526.
@github-project-automation github-project-automation bot moved this from Todo to Done in Penumbra Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-CI/CD Relates to continuous integration & deployment of Penumbra C-bug Category: a bug _P-V1 Priority: slated for V1 release
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants