Fix stale region cache with no leader #445

yongman · 2024-03-11T03:47:46Z

When we create client during tikv-server startup and the region has no leader been elected yet, the region cache in client may be stale with no leader.

It will cause the region access return no leader error until the region id_ver changed.

Signed-off-by: yongman <[email protected]>

pingyu · 2024-03-11T07:08:53Z

It seems that if there is still no leader when read through PD server, we would all the same get the no leader error.

How about try to handle this situation uniformly by handle_region_error ? Then this error can be retried, as well as backoff to avoid cause too much press to PD servers.

(It's likely that some related codes need to be changed too as this error is raised at apply_shard. Maybe we can try to pass the region_store to single_shard_handler and handle the condition of no leader there.)

yongman · 2024-03-12T01:36:58Z

@pingyu Thanks for your advise. It's not enough just handling the NotLeader error in single_shard_handler. In Shardable::shards, store_stream_for_keys, store_stream_for_range, store_stream_for_ranges and resolve_locks will also raise this error.

This seems to require lots of modifications, which could take a lot of time and introduce more risks. Moreover, the logic of the application should have the ability to retry and backoff during handling this error, so just refresh the region cache seems reasonable.

Fix stale region cache with no leader

6d318ca

Signed-off-by: yongman <[email protected]>

yongman force-pushed the ym/fix-region-cache branch from 21cbeca to 04c77d2 Compare March 11, 2024 06:23

fix unit test

45b5fae

Signed-off-by: yongman <[email protected]>

yongman force-pushed the ym/fix-region-cache branch from 04c77d2 to 45b5fae Compare March 11, 2024 06:46

manelmontilla mentioned this pull request Apr 16, 2024

Bug: SureralDB can't survive TiKV cluster restart surrealdb/surrealdb#3570

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix stale region cache with no leader #445

Fix stale region cache with no leader #445

yongman commented Mar 11, 2024

pingyu commented Mar 11, 2024

yongman commented Mar 12, 2024 •

edited

Loading

Fix stale region cache with no leader #445

Are you sure you want to change the base?

Fix stale region cache with no leader #445

Conversation

yongman commented Mar 11, 2024

pingyu commented Mar 11, 2024

yongman commented Mar 12, 2024 • edited Loading

yongman commented Mar 12, 2024 •

edited

Loading