Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sentry: replica_consistency.go:850: log.Fatal: ATTENTION: (1) attached stack trace -- stack trace: | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).computeChecksumPostApply.func2 | ... #131540

Open
cockroach-sentry opened this issue Sep 27, 2024 · 0 comments
Labels
branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-sentry Originated from an in-the-wild panic report.

Comments

@cockroach-sentry
Copy link
Collaborator

cockroach-sentry commented Sep 27, 2024

This issue was auto filed by Sentry. It represents a crash or reported error on a live cluster with telemetry enabled.

Sentry Link: https://cockroach-labs.sentry.io/issues/5917804722/?referrer=webhooks_plugin

Panic Message:

replica_consistency.go:850: log.Fatal: ATTENTION:
(1) attached stack trace
  -- stack trace:
  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).computeChecksumPostApply.func2
  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_consistency.go:850
  | github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2
  | 	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:484
  | runtime.goexit
  | 	src/runtime/asm_amd64.s:1650
Wraps: (2) log.Fatal: ATTENTION:
  |
  | This node is terminating because a replica inconsistency was detected between [n1,s1,r39/1:×]
  | and its other replicas: (n1,s1):1,(n2,s2):2,(n3,s3):3. Please check your cluster-wide log files for more
  | information and contact the CockroachDB support team. It is not necessarily safe
  | to replace this node; cluster data may still be at risk of corruption.
  |
  | A checkpoints directory to aid (expert) debugging should be present in:
  | /mnt/provision/crdb-stage/node0/auxiliary
  |
  | A file preventing this node from restarting was placed at:
  | /mnt/provision/crdb-stage/node0/auxiliary/_CRITICAL_ALERT.txt
  |
  | Checkpoints are created on each node/store hosting this range, to help
  | investigate the cause. Only nodes that are more likely to have incorrect data
  | are terminated, and usually a majority of replicas continue running. Checkpoints
  | are partial, i.e. contain only the data from to the inconsistent range, and
  | possibly its neighbouring ranges.
  |
  | The storage checkpoint directories can/should be deleted when no longer needed.
  | They are very helpful in debugging this issue, so before deleting them, please
  | consider alternative actions:
  |
  | - If the store has enough capacity, hold off the deletion until CRDB staff has
  |   diagnosed the issue.
  | - Back up the checkpoints for later investigation.
  | - If the stores are nearly full, but the cluster has enough capacity, consider
  |   gradually decommissioning the affected nodes, to retain the checkpoints.
  |
  | To inspect the checkpoints, one can use the cockroach debug range-data tool, and
  | command line tools like diff. For example:
  |
  | $ cockroach debug range-data --replicated data/auxiliary/checkpoints/rN_at_M N
  |
  | Note that a directory that ends with "_pending" might not represent a valid
  | checkpoint. Such directories can exist if the node fails during checkpoint
  | creation. These directories should be deleted, or inspected with caution.
Error types: (1) *withstack.withStack (2) *errutil.leafError
-- report composition:
*errutil.leafError: log.Fatal: ATTENTION:
replica_consistency.go:850: *withstack.withStack (top exception)
Stacktrace (expand for inline code snippets):

src/runtime/asm_amd64.s#L1649-L1651

sp.UpdateGoroutineIDToCurrent()
f(ctx)
}()

https://github.com/cockroachdb/cockroach/blob/8a4e55e40110d752c158414bfce22846faeb1722/pkg/kv/kvserver/pkg/kv/kvserver/replica_consistency.go#L849-L851

src/runtime/asm_amd64.s in runtime.goexit at line 1650
pkg/util/stop/stopper.go in pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2 at line 484
pkg/kv/kvserver/pkg/kv/kvserver/replica_consistency.go in pkg/kv/kvserver.(*Replica).computeChecksumPostApply.func2 at line 850

Tags

Tag Value
Command server
Environment v23.2.7
Go Version go1.21.10 X:nocoverageredesign
Platform linux amd64
Distribution CCL
Cockroach Release v23.2.7
Cockroach SHA 8a4e55e
# of CPUs 8
# of Goroutines 510

Jira issue: CRDB-42589

@cockroach-sentry cockroach-sentry added branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-sentry Originated from an in-the-wild panic report. labels Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-sentry Originated from an in-the-wild panic report.
Projects
None yet
Development

No branches or pull requests

1 participant