Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid a region that was opened by two datanode at the same time. #2289

Closed
WenyXu opened this issue Aug 31, 2023 · 1 comment
Closed

Avoid a region that was opened by two datanode at the same time. #2289

WenyXu opened this issue Aug 31, 2023 · 1 comment
Assignees
Labels
C-enhancement Category Enhancements O-chaos Found by chaos tests

Comments

@WenyXu
Copy link
Member

WenyXu commented Aug 31, 2023

What type of enhancement is this?

Tech debt reduction

What does the enhancement do?

The Failover Procedure will wait for the 40s if the Datanode is unreachable. If the unreachable Datanode restarts at the 23rd sec, the Datanode will open the old failed region (due to the table route not being updated after the failure occurred); a RegionAliveKeeper will wait for the 20s before the first lease arrives or closes the region (with flush). However, The procedure might step into the next stage, and the failed region will be opened in another Datanode. It incurs a region opened by two Datanodes, and the newer Datanode will overwrite the existing manifest, which may let us lose the indexes of the flushed files.

image

Implementation challenges

Maybe we need to introduce Intermediate states.

  1. Remove the failed region in the TableRoute
  2. Deactivate the failed region
  3. Update TableRoute
  4. Activate the region
@WenyXu WenyXu added the C-enhancement Category Enhancements label Aug 31, 2023
@fengjiachun
Copy link
Collaborator

When a node is not granted permission (the first lease), we should prohibit it from accessing shared resources.

The problem now is that the restarted node accessed the shared resource (region) without permission. This is a wrong way. The correct approach is to first obtain permission (obtain the first lease), then access the shared resource.

@WenyXu WenyXu closed this as completed Oct 13, 2023
@WenyXu WenyXu added the O-chaos Found by chaos tests label Oct 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category Enhancements O-chaos Found by chaos tests
Projects
None yet
Development

No branches or pull requests

2 participants