Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: accounting error on secondary tenant resident size #9628

Open
3 tasks
VladLazar opened this issue Nov 4, 2024 · 0 comments
Open
3 tasks

pageserver: accounting error on secondary tenant resident size #9628

VladLazar opened this issue Nov 4, 2024 · 0 comments
Assignees
Labels
c/storage/pageserver Component: storage: pageserver c/storage Component: storage t/bug Issue Type: Bug triaged bugs that were already triaged

Comments

@VladLazar
Copy link
Contributor

This testing assertion fired during test_storage_controller_many_tenants on PR #8613 (alure).
There's a Slack thread with a bit of context here.

Spent some time looking at it, but didn't spot the issue. It happened shortly after a live migration and the error makes me think
we somehow counted the only existing layer twice.

2024-11-01T15:41:55.639814Z ERROR secondary_download{tenant_id=11bd770f785975106c00a57e1b60ae2c shard_id=0408}:panic{thread=background op worker location=pageserver/src/tenant/secondary/downloader.rs:778:13}: assertion `left == right` failed
  left: 2605056
 right: 5210112

<backtrace snipped>

2024-11-01T15:41:55.873307Z ERROR secondary_download_scheduler:panic{thread=background op worker location=pageserver/src/tenant/secondary/scheduler.rs:307:36}: Panic in background task: JoinError::Panic(Id(1360288), ...)

<backtrace snipped>

2024-11-01T15:41:55.883651Z ERROR Task panicked, exiting process: Any { .. } task_name="secondary tenant downloads"

One observation was that the panic completely killed the pageserver process. This was unexpected to me, so panic handling on that code path should be checked as well.

Todo:

  • Root cause the bug
  • Fix
  • Check that panic handling is correct on the secondary download code path
@VladLazar VladLazar added c/storage Component: storage c/storage/pageserver Component: storage: pageserver t/bug Issue Type: Bug labels Nov 4, 2024
@jcsp jcsp self-assigned this Nov 12, 2024
@jcsp jcsp added the triaged bugs that were already triaged label Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/pageserver Component: storage: pageserver c/storage Component: storage t/bug Issue Type: Bug triaged bugs that were already triaged
Projects
None yet
Development

No branches or pull requests

2 participants