Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lock service #315

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
49 changes: 49 additions & 0 deletions doc/design/LockService.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
= Lock Service Design
Zhang Yifei;

Lock service maintains the holder and waiters of a specified lockId, lockId could be seen as the identity of a lock,
and a lock could have only one holder and multiple waiters at same time.
The waiters will be queued, the first waiter could be changed to holder by the unlocking operation of last holder.

== Holder Identity
The identity of a holder or waiter has to be a member of the RAFT cluster, because of there are server-initiated
messages. currently the clients are stateless to the server after reply, so there is no way to send the server-initiated
message to the client.
I have considered to create a new protocol to maintain sessions for clients, but it will be a lot of work to do, for
example, the session's creation and destruction needs to be recorded in the RAFT log, sessions needs to be available
in the new leader if leadership changed, and the client actually keeps connections to all members.
Holders and waiters in server are represented by the address(UUID) of the channel, the advantage of doing so is
the server can clear those disconnected holders and waiters base on the view of the cluster.

== Holding Status
The holding status is only for connected members. Disconnected members can assume that they have released all locks,
because the leader of the cluster will clear those leaving members from the locking status when the view change event
arrived.
For the partition, members are in a minority subgroup will also being cleared by the leader if majority subgroup still
present, if all subgroups are minority, the new elected leader will force clear all previous locking status after cluster
resumed.
A new started cluster will clear all previous locking status as well, because of all members have a new address.
Since the locking status has the same lifecycle as the cluster, the log storage could be in-memory implementation.

== Waiting Status
Waiting status is treated the same as holding status in the case of disconnection and partitioning.
The tricky part is how to let the waiter know that it has become the holder, this is the server-initiated message
mentioned earlier. As members of the cluster, leader can send messages to any lock service, but in what way?
Those messages must be in order and can't be lost or duplicated, assume a dedicated message to do this, leader will
send them after logs are applied, and the sending process could be async, what if the leader left, the new leader can't
ensure those messages are not lost or duplicated.
Base on the log applying process of each member is a reliable choice, although it's not perfect.

== Mutex
With the lock service and the ReentrantLock could implement an exclusive lock cross JVMs.

=== Command executing
The mutex's methods involve executing commands in the lock service, RaftException will be thrown when the command fails
to execute.
The command executing process is uninterruptible to avoid the inconsistent state, but a timeout could be set to control
the waiting time.

=== Unexpected status
Many factors can cause unexpected unlocking or locking status, for example, disconnect the channel, network partition,
even calling the lock service with the same lockId, so handlers could be set to handle the unexpected status, let users
know the risks and decide how to deal with them, the RaftException also comes from the same idea.
Loading
Loading