Skip to content

Commit

Permalink
Refine the documentation to eliminate ambiguity.
Browse files Browse the repository at this point in the history
  • Loading branch information
wangbin579 committed Nov 2, 2024
1 parent d6bcbf2 commit 7be5b9f
Show file tree
Hide file tree
Showing 5 changed files with 26 additions and 26 deletions.
2 changes: 1 addition & 1 deletion Chapter2.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ The root problem lies in the certification database mechanism used by Group Repl

## 2.7 Modified Group Replication Outperforms Semisynchronous Replication

Group Replication has been extensively enhanced while addressing scalability problems in MySQL 8.0.32. To validate these improvements, simultaneous testing of semisynchronous replication and Group Replication with Paxos log persistence was conducted. The deployment setup included two-node configurations for both semisynchronous and Group Replication, hosted on the same machine with independent SSDs and NUMA binding to isolate each node. Specifically, the MySQL primary utilized NUMA nodes 0 to 2, while the MySQL secondary utilized NUMA node 3. All settings, except those directly related to semisynchronous or Group Replication configurations, remained identical.
Group Replication has been extensively enhanced while addressing scalability problems in MySQL 8.0.32. To validate these improvements, simultaneous testing of semisynchronous replication and Group Replication with Paxos log persistence was conducted. The deployment setup included two-node configurations for both semisynchronous and Group Replication, hosted on the same machine with independent NVMe SSDs and NUMA binding to isolate each node. Specifically, the MySQL primary utilized NUMA nodes 0 to 2, while the MySQL secondary utilized NUMA node 3. All settings, except those directly related to semisynchronous or Group Replication configurations, remained identical.

The following figure shows the throughput comparison of semisynchronous replication and Group Replication with Paxos log persistence under different concurrency levels.

Expand Down
2 changes: 1 addition & 1 deletion Chapter4_3.md
Original file line number Diff line number Diff line change
Expand Up @@ -207,7 +207,7 @@ Dynamic programming simplifies a complex problem by breaking it down into simple

In the context of execution plan optimization, MySQL 8.0 has explored using dynamic programming algorithms to determine the optimal join order. This approach can greatly improve the performance of complex joins, though it remains experimental in its current implementation.

It is important to note that, due to potentially inaccurate cost estimation, the join order determined by dynamic programming algorithms may not always be the true optimal solution. Dynamic programming algorithms often provide the best plan but can have high computational overhead and may suffer from large costs due to incorrect cost estimation [55]. For a deeper understanding of the complex mechanisms involved, readers can refer to the paper "Dynamic Programming Strikes Back".
It is important to note that, due to potentially inaccurate cost estimation, the join order determined by dynamic programming algorithms may not always be the true optimal solution. Dynamic programming algorithms often provide the best plan but can have high computational overhead and may suffer from large costs due to incorrect cost estimation [55]. For a deeper understanding of the complex mechanisms involved, readers can refer to the paper "Dynamic Programming Strikes Back" [35].

### 4.3.5 Amortized Analysis

Expand Down
4 changes: 2 additions & 2 deletions Chapter4_5.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,9 @@ With the increase in CPU core count, the CPU overhead for MySQL parsing has beco

Profile-guided optimization (PGO) is a compiler technique that improves program performance by using profiling data from test runs of the instrumented program. Rather than relying on programmer-supplied frequency information, PGO leverages profile data to optimize the final generated code, focusing on frequently executed areas of the program. This method reduces reliance on heuristics and can improve performance, provided the profiling data accurately represents typical usage scenarios.

Extensive practice has shown that MySQL's large codebase is especially well-suited for PGO. However, the effectiveness of PGO can be influenced by I/O storage devices and network latency. On systems with slower I/O devices, like hard drives, I/O becomes the primary bottleneck, limiting PGO's performance gains due to Amdahl's Law. In contrast, on systems with faster I/O devices such as SSDs, PGO can lead to substantial performance improvements. Network latency also affects PGO effectiveness, with higher latency generally reducing the benefits.
Extensive practice has shown that MySQL's large codebase is especially well-suited for PGO. However, the effectiveness of PGO can be influenced by I/O storage devices and network latency. On systems with slower I/O devices, like hard drives, I/O becomes the primary bottleneck, limiting PGO's performance gains due to Amdahl's Law. In contrast, on systems with faster I/O devices such as NVMe SSDs, PGO can lead to substantial performance improvements. Network latency also affects PGO effectiveness, with higher latency generally reducing the benefits.

In summary, while MySQL 8.0's PGO capabilities can greatly improve computational performance, the actual improvement depends on the balance between computational and I/O bottlenecks in the server setup. The following figure demonstrates that with SSD hardware configuration and NUMA binding, PGO can significantly improve the performance of MySQL.
In summary, while MySQL 8.0's PGO capabilities can greatly improve computational performance, the actual improvement depends on the balance between computational and I/O bottlenecks in the server setup. The following figure demonstrates that with NVMe SSD hardware configuration and NUMA binding, PGO can significantly improve the performance of MySQL.

<img src="media/image-20240829083941927.png" alt="image-20240829083941927" style="zoom:150%;" />

Expand Down
16 changes: 8 additions & 8 deletions Chapter4_6.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,13 @@ The following figure depicts a scenario where MySQL primary and MySQL secondary

![](media/4daa989affba4bd90fadec0d4236343a.png)

Figure 4-23. Testing architecture for Group Replication with pure Paxos protocol
Figure 4-23. Testing architecture for Group Replication with modified Mencius protocol.

The cluster's Paxos algorithm employs a modified Mencius approach, removing batching and pipelining, making it similar to pure Paxos. Tests were conducted at various concurrency levels under a network latency of 10ms, as illustrated in the following figure:

<img src="media/image-20240830114557281.png" alt="image-20240830114557281" style="zoom:150%;" />

Figure 4-24. Results of testing Group Replication with pure Paxos protocol
Figure 4-24. Results of testing Group Replication with modified Mencius protocol.

In a WAN testing scenario, the throughput remains nearly constant across different concurrency levels—50, 100, or 150—because the time MySQL takes to process TPC-C transactions is negligible compared to the network latency of 10ms. This network latency dominates the overall transaction time, making the impact of concurrency changes relatively insignificant.

Expand All @@ -26,9 +26,9 @@ This closely matches the test results above, where 0.45 is an empirical factor d

![](media/484244432e5e53aff18ece6ad75eb616.png)

Figure 4-25. Insights into the pure Paxos protocol from packet capture data.
Figure 4-25. Insights into the modified Mencius protocol from packet capture data.

In the figure, the network latency between the two Paxos instances is approximately 10ms, matching the exact network delay. Numerous examples suggest that pure Paxos communication is inherently serial. In scenarios where network latency is the predominant factor, it acts as a single queue bottleneck. Consequently, regardless of concurrency levels, the throughput of pure Paxos is limited by this network latency.
In the figure, the network latency between the two Paxos instances is approximately 10ms, matching the exact network delay. Numerous examples suggest that Paxos communication is inherently serial. In scenarios where network latency is the predominant factor, it acts as a single queue bottleneck. Consequently, regardless of concurrency levels, the throughput of modified Mencius is limited by this network latency.

### 4.6.2 Multiple Queue Bottlenecks

Expand Down Expand Up @@ -100,10 +100,10 @@ To prevent performance degradation, controlling resource usage is crucial. For M

A practical transaction throttling mechanism for MySQL is as follows:

1. Before entering the transaction system, check if the number of concurrent processing threads exceeds the limit.
2. If the limit is exceeded, block the user thread until other threads activate this thread.
3. If the limit is not exceeded, allow the thread to proceed with processing within the transaction system.
4. Upon transaction completion, activate the first transaction in the waiting queue.
1. Before entering the transaction system, check if the number of concurrent processing threads exceeds the limit.
2. If the limit is exceeded, block the user thread until other threads activate this thread.
3. If the limit is not exceeded, allow the thread to proceed with processing within the transaction system.
4. Upon transaction completion, activate the first transaction in the waiting queue.

This approach helps maintain performance by controlling concurrency and managing resource usage effectively. The following figure illustrates the relationship between TPC-C throughput and concurrency under transaction throttling conditions, with 1000 warehouses.

Expand Down
28 changes: 14 additions & 14 deletions Chapter4_7.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,16 +14,16 @@ The FLP impossibility theorem is valuable in problem-solving as it highlights th

The Mencius algorithm used in Group Replication addresses the FLP impossibility by using a failure detector oracle to bypass the result. Like Paxos, it relies on the failure detector only for liveness. Mencius requires that eventually, all and only faulty servers are suspected by the failure detector. This can be achieved by implementing failure detectors with exponentially increasing timeouts [32].

To avoid the problems posed by the FLP impossibility, careful design is needed. TCP, for example, addresses this with timeout retransmission and idempotent design, ensuring that even if duplicate messages are received due to transmission errors, they can be safely discarded.
To avoid problems caused by uncertainty, careful design is needed. TCP, for example, addresses this with timeout retransmission and idempotent design, ensuring that even if duplicate messages are received due to transmission errors, they can be safely discarded.

### 4.7.2 TCP/IP Protocol Stack

The Internet protocol suite, commonly known as TCP/IP, organizes the set of communication protocols used in the Internet and similar computer networks [45]. It provides end-to-end data communication, specifying how data should be packetized, addressed, transmitted, routed, and received. The suite is divided into four abstraction layers, each classifying related protocols based on their networking scope:

1. **Link Layer**: Handles communication within a single network segment (link).
2. **Internet Layer**: Manages internetworking between independent networks.
3. **Transport Layer**: Facilitates host-to-host communication.
4. **Application Layer**: Enables process-to-process data exchange for applications.
1. **Link Layer**: Handles communication within a single network segment (link).
2. **Internet Layer**: Manages internetworking between independent networks.
3. **Transport Layer**: Facilitates host-to-host communication.
4. **Application Layer**: Enables process-to-process data exchange for applications.

An implementation of these layers for a specific application forms a protocol stack. The TCP/IP protocol stack is one of the most widely used globally, having operated successfully for many years since its design. The following figure illustrates how a client program interacts with a MySQL Server using the TCP/IP protocol stack.

Expand All @@ -33,9 +33,9 @@ Figure 4-34. A client program interacts with a MySQL Server using the TCP/IP pro

Due to the layered design of the TCP/IP protocol stack, a client program typically interacts only with the local TCP to access a remote MySQL server. This design is elegant in its simplicity:

1. **Client-Side TCP**: Handles sending SQL queries end-to-end to the remote MySQL server. It manages retransmission if packets are lost.
2. **Server-Side TCP**: Receives the SQL queries from the client-side TCP and forwards them to the MySQL server application. After processing, it sends the response back through its TCP stack.
3. **Routing and Forwarding**: TCP uses the IP layer for routing and forwarding, while the IP layer relies on the data link layer for physical transmission within the same network segment.
1. **Client-Side TCP**: Handles sending SQL queries end-to-end to the remote MySQL server. It manages retransmission if packets are lost.
2. **Server-Side TCP**: Receives the SQL queries from the client-side TCP and forwards them to the MySQL server application. After processing, it sends the response back through its TCP stack.
3. **Routing and Forwarding**: TCP uses the IP layer for routing and forwarding, while the IP layer relies on the data link layer for physical transmission within the same network segment.

Although TCP ensures reliable transmission, it cannot guarantee that messages will always reach their destination due to potential network anomalies. For example, SQL requests might be blocked by a network firewall, preventing them from reaching the MySQL server. In such cases, the client application might not receive a response, leading to uncertainty about whether the request was processed or still in transit.

Expand All @@ -51,8 +51,8 @@ Figure 4-35. Classic TCP state machine overview.

A flexible understanding of state transitions is crucial for troubleshooting MySQL network problems. For example:

- **CLOSE_WAIT State**: A large number of *CLOSE_WAIT* states on the server indicates that the application did not close connections promptly or failed to initiate the close process, causing connections to linger in this state.
- **SYN_RCVD State**: Numerous *SYN_RCVD* states may suggest a SYN flood attack, where an excessive number of SYN requests overwhelm the server's capacity to handle them effectively.
- **CLOSE_WAIT State**: A large number of *CLOSE_WAIT* states on the server indicates that the application did not close connections promptly or failed to initiate the close process, causing connections to linger in this state.
- **SYN_RCVD State**: Numerous *SYN_RCVD* states may suggest a SYN flood attack, where an excessive number of SYN requests overwhelm the server's capacity to handle them effectively.

Understanding these state transitions helps in diagnosing and addressing network-related problems more effectively.

Expand Down Expand Up @@ -108,7 +108,7 @@ Why does pure Paxos perform poorly in WAN environments? Refer to the packet capt

Figure 4-41. Insights into the pure Paxos protocol from packet capture data.

From the figure, it is evident that the delay between two Paxos instances is around 10ms, matching the network latency. The low throughput of pure Paxos stems from its serial interaction nature, where network latency primarily determines throughput.
The figure clearly shows that when both pipelining and batching are disabled, referred to here as pure Paxos, throughput drops significantly to just 2833 tpmC. The low throughput of pure Paxos stems from its serial interaction nature, where network latency primarily determines throughput.

In general, the test conclusions of pipelining and batching are consistent with the conclusions in the following paper [48]:

Expand Down Expand Up @@ -138,9 +138,9 @@ Figure 4-44. Network partitioning types.

The figure categorizes network partitions into three types:

1. **Complete Network Partition (a)**: Two partitions are completely disconnected from each other, widely recognized as a complete network partition.
2. **Partial Network Partition (b)**: Group 1 and Group 2 are disconnected from each other, but Group 3 can still communicate with both. This is termed a partial network partition.
3. **Simplex Network Partition (c)**: Communication is possible in one direction but not the other, known as a simplex network partition.
1. **Complete Network Partition (a)**: Two partitions are completely disconnected from each other, widely recognized as a complete network partition.
2. **Partial Network Partition (b)**: Group 1 and Group 2 are disconnected from each other, but Group 3 can still communicate with both. This is termed a partial network partition.
3. **Simplex Network Partition (c)**: Communication is possible in one direction but not the other, known as a simplex network partition.

The most complex type is the partial network partition. Partial partitions isolate a set of nodes from some, but not all, nodes in the cluster, leading to a confusing system state where nodes disagree on whether a server is up or down. These disagreements are poorly understood and tested, even by expert developers [6].

Expand Down

0 comments on commit 7be5b9f

Please sign in to comment.