jgroups-extras · yfei-z · Oct 25, 2024 · Nov 5, 2024 · Nov 6, 2024 · Nov 7, 2024
diff --git a/doc/design/LockService.adoc b/doc/design/LockService.adoc
@@ -0,0 +1,49 @@
+= Lock Service Design
+Zhang Yifei;
+
+Lock service maintains the holder and waiters of a specified lockId, lockId could be seen as the identity of a lock,
+and a lock could have only one holder and multiple waiters at same time.
+The waiters will be queued, the first waiter could be changed to holder by the unlocking operation of last holder.
+
+== Holder Identity
+The identity of a holder or waiter has to be a member of the RAFT cluster, because of there are server-initiated
+messages. currently the clients are stateless to the server after reply, so there is no way to send the server-initiated
+message to the client.
+I have considered to create a new protocol to maintain sessions for clients, but it will be a lot of work to do, for
+example, the session's creation and destruction needs to be recorded in the RAFT log, sessions needs to be available
+in the new leader if leadership changed, and the client actually keeps connections to all members.
+Holders and waiters in server are represented by the address(UUID) of the channel, the advantage of doing so is
+the server can clear those disconnected holders and waiters base on the view of the cluster.
+
+== Holding Status
+The holding status is only for connected members. Disconnected members can assume that they have released all locks,
+because the leader of the cluster will clear those leaving members from the locking status when the view change event
+arrived.
+For the partition, members are in a minority subgroup will also being cleared by the leader if majority subgroup still
+present, if all subgroups are minority, the new elected leader will force clear all previous locking status after cluster
+resumed.
+A new started cluster will clear all previous locking status as well, because of all members have a new address.
+Since the locking status has the same lifecycle as the cluster, the log storage could be in-memory implementation.
+
+== Waiting Status
+Waiting status is treated the same as holding status in the case of disconnection and partitioning.
+The tricky part is how to let the waiter know that it has become the holder, this is the server-initiated message
+mentioned earlier. As members of the cluster, leader can send messages to any lock service, but in what way?
+Those messages must be in order and can't be lost or duplicated, assume a dedicated message to do this, leader will
+send them after logs are applied, and the sending process could be async, what if the leader left, the new leader can't
+ensure those messages are not lost or duplicated.
+Base on the log applying process of each member is a reliable choice, although it's not perfect.
+
+== Mutex
+With the lock service and the ReentrantLock could implement an exclusive lock cross JVMs.
+
+=== Command executing
+The mutex's methods involve executing commands in the lock service, RaftException will be thrown when the command fails
+to execute.
+The command executing process is uninterruptible to avoid the inconsistent state, but a timeout could be set to control
+the waiting time.
+
+=== Unexpected status
+Many factors can cause unexpected unlocking or locking status, for example, disconnect the channel, network partition,
+even calling the lock service with the same lockId, so handlers could be set to handle the unexpected status, let users
+know the risks and decide how to deal with them, the RaftException also comes from the same idea.