中文 | English
Book title: 【Distributed Data Services: Transaction Models, Processing Language, Consistency and Architecture.】
ISBN:978-7-111-73737-7
This book provides a detailed introduction to the implementation principles and code analysis of the eRaft prototype system in the distributed data services protocol library.
The eraft t project is to industrialize the mit6.824 lab operation into a distributed storage system. We will introduce the principles of distributed systems in the simplest and straightforward way, and guide you to design and implement an industrialized distributed storage system.
If you want to check the latest documents, please visit eraft official website
First, let's look at the shortcomings of traditional single-node C/S or B/S systems:
A single node means that only one machine is used, the performance of the machine is limited, and the machine with better performance is more expensive, like IBM's mainframe, the price is very expensive. At the same time, if the machine hangs up or the process is abnormal due to a bug in the code written, it cannot tolerate faults, and the system is directly unavailable.
After we analyze the shortcomings of single-node systems, we can summarize the design goals of distributed systems
The distributed system we design must be scalable. The scalability here is that we can obtain higher total system throughput and better performance by using more machines. Of course, it is not that the more machines, the better the performance. For some complex computing scenarios, more nodes are not necessarily better performance.
The distributed system will not stop services directly if a machine in the system fails. After a machine fails, the system can quickly switch traffic to a normal machine and continue to provide services.
To achieve this, the most important algorithm is the replication algorithm. We need a replication algorithm to ensure that the data of the dead machine and the machine that is cut to replace it are consistent, usually in the field of distributed systems. Consistency algorithm to ensure the smooth progress of replication.
It is recommended to read raft small paper
Take a look with the following questions:
-
what is split brain?
-
What is our solution to the split brain?
-
why does majority help avoid split brain?
-
why the logs?
-
why a leader?
-
how to ensure at most one leader in a term?
-
how does a server learn about a newly elected leader?
-
what happens if an election doesn't succeed?
-
how does Raft avoid split votes?
-
how to choose the election timeout?
-
what if old leader isn't aware a new leader is elected?
-
how can logs disagree?
-
what would we like to happen after a server crashes?
Well, through the basic Raft algorithm, we can achieve a highly available raft server group. We have solved the previous issues of availability and consistency, but the problem still exists. There is only one leader in a raft server group to receive read and write traffic. Of course, you can use followers to share part of the read traffic to improve performance (there will be some issues related to transactions, which we will discuss later). But there is a limit to what the system can provide.
At this time, we need to think about slicing the requests written by the client, just like map reduce, in the first stage of map, first cut the huge data set into small ones for processing.
The hash sharding method is used in eraft. We map data into buckets through a hash algorithm, and then different raft groups are responsible for some of the buckets. How many buckets a raft group can be responsible for can be adjusted.
It is the logical unit of data management in the cluster, and a grouped service can be responsible for the data of multiple buckets
Cluster configuration table, which mainly maintains the mapping relationship between cluster service groups and buckets. Before clients access cluster data, they need to go to this table to query the list of service groups where the bucket is located.
It is mainly responsible for the version management of the cluster configuration table. It internally maintains a version chain of the cluster configuration table, which can record changes to the cluster configuration.
It is mainly responsible for cluster data storage. Generally, three machines form a raft group to provide high-availability services to the outside world.
pre-dependencies
go version >= go 1.21
download code and make it
git clone https://github.com/eraft-io/eraft.git
cd eraft
make
run basic cluster
go test -run TestBasicClusterRW tests/integration_test.go -v
run basic cluster bench
go test -run TestClusterRwBench tests/integration_test.go -v