Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics: The learner-peer-count cannot be restored after PD leader changes #7728

Closed
AndreMouche opened this issue Jan 17, 2024 · 2 comments · Fixed by #7748
Closed

metrics: The learner-peer-count cannot be restored after PD leader changes #7728

AndreMouche opened this issue Jan 17, 2024 · 2 comments · Fixed by #7748

Comments

@AndreMouche
Copy link
Member

Bug Report

the metrics of learner-peer-count cannot be restored after PD leader changes

What did you do?

  1. set up a cluster with playground
tiup playground --tag v7.1.1 v7.1.1 --db 1 --pd 3 --kv 3 --tiflash 1 --monitor --host=0.0.0.0 & 
  1. create table and insert some data into the cluster
tiup bench tpcc --warehouses 4 --parts 4 prepare 
  1. add tiflash replica for the above tables
mysql> show tables;
+----------------+
| Tables_in_test |
+----------------+
| customer       |
| district       |
| history        |
| item           |
| new_order      |
| order_line     |
| orders         |
| stock          |
| warehouse      |
+----------------+
9 rows in set (0.00 sec)

mysql> alter table customer set tiflash replica 1;
Query OK, 0 rows affected (0.21 sec)

mysql> alter table district set tiflash replica 1;
Query OK, 0 rows affected (0.18 sec)

mysql> alter table history set tiflash replica 1;
Query OK, 0 rows affected (0.23 sec)

mysql> alter table item set tiflash replica 1;
Query OK, 0 rows affected (0.14 sec)

mysql> alter table new_order set tiflash replica 1;
Query OK, 0 rows affected (0.25 sec)

mysql> alter table order_line set tiflash replica 1;
Query OK, 0 rows affected (0.19 sec)

mysql> alter table orders set tiflash replica 1;
Query OK, 0 rows affected (0.21 sec)

mysql> alter table stock set tiflash replica 1;
Query OK, 0 rows affected (0.33 sec)

mysql> alter table warehouse set tiflash replica 1;
Query OK, 0 rows affected (0.26 sec)

  1. check the metrics on PD->region-healthy->learner-peer-count, After the number(learner-peer-count) stop changing, manually execute "transfer leader".
fun :: ~ » tiup ctl:v7.1.1 pd -u http://127.0.0.1:2379 member leader transfer pd-2
Starting component `ctl`: /home/fun/.tiup/components/ctl/v7.1.1/ctl pd -u http://127.0.0.1:2379 member leader transfer pd-2
Success!
You have new mail.                                                                                                                                                                                                                                                                                                      
fun :: ~ » tiup ctl:v7.1.1 pd -u http://127.0.0.1:2379 member leader transfer pd-0
Starting component `ctl`: /home/fun/.tiup/components/ctl/v7.1.1/ctl pd -u http://127.0.0.1:2379 member leader transfer pd-0
Success!
fun :: ~ » date
20240117 17:59:24 CST
  1. check metrics of learner-peer-count again, and it do not restored in 30 minutes( this number has not recovered even after a day in some cluster .):

893249a6-d51d-47ac-b7bf-549d2d72e99d

What did you expect to see?

The number of learner-peer-count should be recover in 10 minutes(https://docs-archive.pingcap.com/tidb/v7.2/tikv-configuration-file#peer-stale-state-check-interval), since hibernate region should sent heartbeat to pd in 10minutes by default.

What did you see instead?

it do not restored in 30 minutes( this number has not recovered even after a day in some cluster .):

What version of PD are you using (pd-server -V)?

v7.1.1, and this also happens on v6.5.0

@AndreMouche AndreMouche added the type/bug The issue is confirmed as a bug. label Jan 17, 2024
@AndreMouche
Copy link
Member Author

Meanwhile, when I try to restart the pd only (without restart tikv or tiflash), the number of learner-peer-count recoverd:
7901bf23-061e-4429-ac92-a435a7bb0905

@CabinfeverB CabinfeverB self-assigned this Jan 18, 2024
@ti-chi-bot ti-chi-bot bot closed this as completed in #7748 Feb 6, 2024
ti-chi-bot bot pushed a commit that referenced this issue Feb 6, 2024
close #7728

Signed-off-by: Cabinfever_B <[email protected]>
ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this issue Feb 6, 2024
ti-chi-bot bot pushed a commit that referenced this issue Feb 10, 2024
close #7728

Signed-off-by: ti-chi-bot <[email protected]>
Signed-off-by: Cabinfever_B <[email protected]>

Co-authored-by: Yongbo Jiang <[email protected]>
Co-authored-by: Cabinfever_B <[email protected]>
ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this issue Feb 22, 2024
ti-chi-bot bot pushed a commit that referenced this issue Feb 23, 2024
close #7728

Signed-off-by: Cabinfever_B <[email protected]>

Co-authored-by: Yongbo Jiang <[email protected]>
Co-authored-by: Cabinfever_B <[email protected]>
ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this issue Mar 25, 2024
CabinfeverB added a commit to ti-chi-bot/pd that referenced this issue Mar 27, 2024
ti-chi-bot bot pushed a commit that referenced this issue Mar 27, 2024
close #7728

Signed-off-by: husharp <[email protected]>
Signed-off-by: Cabinfever_B <[email protected]>

Co-authored-by: husharp <[email protected]>
Co-authored-by: Yongbo Jiang <[email protected]>
Co-authored-by: Cabinfever_B <[email protected]>
@seiya-annie
Copy link

/found customer

@ti-chi-bot ti-chi-bot bot added the report/customer Customers have encountered this bug. label Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging a pull request may close this issue.

4 participants