Lock service #315

yfei-z · 2024-10-25T04:18:09Z

A RAFT implementation of lock service

jabolina · 2024-10-27T14:37:29Z

Hey, @yfei-z. I'll take a look. Some things missing are a design document in the ./doc/design/ folder and expanding the Java docs at the LockService class. It should include information such as the guarantees, what happens in case of failures (node holding a lock, leader loss, majority loss, etc.), and, if necessary, usage patterns.

On a side note, I am skeptical about distributed locks, even in consensus. I am not the author, but I agree with https://belaban.blogspot.com/2020/11/i-hate-distributed-locks.html

yfei-z · 2024-10-28T09:38:11Z

Yes, I will complete the docs.

yfei-z · 2024-11-06T10:41:46Z

Hi @jabolina. I have finished the initial doc, and I will continue to improve it if I think of anything else, you can take a look.

jabolina · 2024-11-07T11:55:24Z

Thanks for the work, @yfei-z! I'll take a look.

jabolina

Hey, @yfei-z, I've done passing over the LockService class. I'll look into the tests next.

src/org/jgroups/raft/blocks/LockService.java

jabolina · 2024-11-08T14:35:17Z

src/org/jgroups/raft/blocks/LockService.java

+	protected void cleanup() {
+		lockStatus.forEach((k, v) -> notifyListeners(k, v, NONE, true));
+	}


I see this notification is submitted when the current node leaves or the majority is lost. To identify the real cause, we should include two different method in the Listener interface.

My intention is to make both of them real, that means no matter the member is disconnected or partition into a minority subgroup, it will also be unlocked by reset command immediately or eventually. But I found a problem when I am writing the design document, I'm working on it, hope it could be fixed otherwise I will consider to separate the notifications, but it's very different from what I intent to do, so it will be a big change.

I think it has been fixed, and I adjust the test as well. This method has been renamed to resign, although it notifies listeners not based on what really happened in state machine, but it will immediately or eventually happen in state machine.

Hrrrmm, a bit tricky. I believe we would want to be notified in these cases (shutdown, partitioning) but doing so from outside the state machine seems a recipe for much more work and patching corner cases. Would it be easier to handle with dedicated methods? Notifications come from the state machine and the other methods to notify of what just happened.

First of all, the state machine should release the lock if the holder leaves the cluster unexpectedly, otherwise the lock service wouldn't be that useful, A left member can't know what really happen in state machine, perhaps the leader has released the lock it was holding, or the cluster has lost its leader, one way or another the member can only assume that the lock it was holding may has its new holder, therefore, regardless of how many types of notifications there are, the only thing the user can do is treat it as unlocked. so I think the real question is how to maintain a consistent state between state machine and the left member, although there are a lot of corner cases. I have some analysis in the design document, maybe it doesn't cover all cases, but I think it is a pessimistic approach, which means the state machine will release the lock as much as possible, the holding status will only be retained under certain conditions.
Because the notification mechanism has changed to response-triggered, I added some query scenarios to deal with the inconsistent state caused by over-pessimism, the purpose is to synchronize the states after the members are reconnected.

src/org/jgroups/raft/blocks/LockService.java

jabolina · 2024-11-08T17:37:57Z

src/org/jgroups/raft/blocks/LockService.java

+			if (curr != HOLDING && holder != null) {
+				status = curr;
+				var handler = unlockHandler;
+				if (handler != null) try {
+					handler.accept(this);
+				} catch (Throwable e) {
+					log.error("Error occurred on unlock handler", e);
+				}
+			} else if (curr != NONE && acquirers.get() == 0) {
+				status = curr;
+				var handler = lockHandler;
+				if (handler != null) try {
+					handler.accept(this);
+				} catch (Throwable e) {
+					log.error("Error occurred on lock handler", e);
+				}
+			} else if (prev == WAITING) {
+				delegate.lock();
+				try {
+					if (status == WAITING) {
+						status = curr;
+						notWaiting.signalAll();
+					}
+				} finally {
+					delegate.unlock();
+				}
+			}


This seems to be handling some corner cases I wouldn't expect ever to happen. I would expect this method to just assert prev is equal to the Mutex's state and do a switch with the curr value.

IIUC, the inconsistencies can exist if the user utilize the Mutex API and the LockService simultaneously? If so, I would say to make the methods in the LockService private and expose them only through the Mutex class. You could also push the Mutex outside and create a RaftLock or something like that.

The notified status could be far behind returned status from command executing, it can't be directly set. But WAITTING status is special, if mutex is in WAITTING status, then all calling threads will wait the notification, before that no further commands will be executed.

Mutex is just a Lock representation of lock service. I think from user's point of view, his requirement is either exclusive thread or exclusive process, even if both are used, they can be separated by different lockIds.

State inconsistency is inevitable, it may be caused by a disconnection or partition, etc. not like lock service it can be done by a notification to the listener, it more like a error to mutex, so there are unexpected handlers.
Like I mentioned status can not be set directly by the notification of lock service, so above set status code do have problem. I will fix it, maybe set the status via notification only?

It makes sense to make this update only via notifications from the LockService when handling commands. This might still be tricky when installing/recovering from the persistent state. The delegate lock might not be held, which could lead to some internal inconsistency. The mutex must be completely deterministic, regardless of whether or not commands are handled from the log or the user. I believe that making the updates come from the LockService might help.

I have changed the notifications source from logs applying to command responses, logs applying is just a tip for the client to query the latest state from server, a read-only QUERY command is added. It becomes a pure client mode I think, local status and notifications are based only on command responses. This is based on the fact that command response callbacks are synchronous and ordered, no matter it's leader or follower.
Currently, the lockStatus method in lock service returns the latest status from the client's perspective, just like the status field in Mutex. This is based on that client's previous status will be cleared before it connect or reconnect or resume from partition.
For the mutex, the status change will be notified just before the executing command complete with the same status, so the notified status is just the value that about to be set if the command is sent by the mutex.

src/org/jgroups/raft/blocks/LockService.java

lock service

a204627

yfei-z added 2 commits November 5, 2024 17:53

lock service

996b7f6

lock service

f2e9021

yfei-z added 2 commits November 7, 2024 14:15

lock service

70a2692

lock service

3d7130c

jabolina requested changes Nov 8, 2024

View reviewed changes

yfei-z added 11 commits November 13, 2024 15:58

lock service

826ea6b

lock service

cd0e196

lock service

64d4c25

lock service

1de6cdc

lock service

f22d704

lock service

ca650f7

lock service

60f2bac

lock service

559af98

lock service

7804d66

lock service

4ddc5db

lock service

8b1a20e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lock service #315

Lock service #315

yfei-z commented Oct 25, 2024

jabolina commented Oct 27, 2024

yfei-z commented Oct 28, 2024

yfei-z commented Nov 6, 2024

jabolina commented Nov 7, 2024

jabolina left a comment

jabolina Nov 8, 2024

yfei-z Nov 11, 2024 •

edited

Loading

yfei-z Nov 13, 2024

jabolina Nov 18, 2024

yfei-z Nov 20, 2024

jabolina Nov 8, 2024

yfei-z Nov 13, 2024

yfei-z Nov 14, 2024

yfei-z Nov 15, 2024

jabolina Nov 18, 2024

yfei-z Nov 19, 2024 •

edited

Loading

Lock service #315

Are you sure you want to change the base?

Lock service #315

Conversation

yfei-z commented Oct 25, 2024

jabolina commented Oct 27, 2024

yfei-z commented Oct 28, 2024

yfei-z commented Nov 6, 2024

jabolina commented Nov 7, 2024

jabolina left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yfei-z Nov 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yfei-z Nov 19, 2024 • edited Loading

Choose a reason for hiding this comment

yfei-z Nov 11, 2024 •

edited

Loading

yfei-z Nov 19, 2024 •

edited

Loading