-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TiFlash compute node crashes after executing ALTER RANGE
in TiDB
#9750
Comments
Reproduce steps:Deploy a cluster with tikv and disaggregated tiflash
Load TPC-H dataset into the cluster and create tiflash replicaCreate placement policy
The label of store under disagg arch
Those steps will create a placement-rule as follow, and try to add peer on the tiflash compute node
|
Disaster recovery measuresIf a user run into a crash as above. Then he/she can Restore the policy to defaultALTER TABLE test.region PLACEMENT POLICY=default;
-- or
ALTER RANGE global PLACEMENT POLICY=default; scale-in all the tiflash compute node
Wait for the PD remove all peers from existing tiflash compute nodeRe-deploy the tiflash compute node |
Root causeWhen creating a placement policy through tidb, tidb will create a placement-rule with The logic in PD of choosing store for rule: pkg/schedule/placement/label_constraint.go @ pd
So the PD would pick tiflash compute node as target store to place the Region peer. WorkaroundAs a workaround, user can explicitly exclude tiflash and tiflash_compute in theirs rule before this behavior is fixed. For example
|
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
Reported from https://asktug.com/t/topic/1037727
2. What did you expect to see? (Required)
3. What did you see instead (Required)
4. What is your TiFlash version? (Required)
v7.5.4
deploy with storage and compute disaggregated arch
The text was updated successfully, but these errors were encountered: