Skip to content

Commit

Permalink
Update the book based on feedback
Browse files Browse the repository at this point in the history
  • Loading branch information
wangbin579 committed Nov 1, 2024
1 parent 6fadc39 commit c385f9e
Show file tree
Hide file tree
Showing 6 changed files with 102 additions and 88 deletions.
104 changes: 54 additions & 50 deletions Appendix.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,8 +108,8 @@ struct lock_t {
So, what distinguishes these two concepts? Consider this metaphor [19]:
- A **latch** secures a door, gate, or window in place but does not offer protection against unauthorized access.
- A **lock**, however, restricts entry to those without the key, ensuring security and control.
- A **latch** secures a door, gate, or window in place but does not offer protection against unauthorized access.
- A **lock**, however, restricts entry to those without the key, ensuring security and control.
In MySQL, a global latch is employed to serialize specific processing procedures. For instance, the following is MySQL's description of the role of a global latch.
Expand All @@ -132,8 +132,8 @@ In MySQL, locks are integral to the transaction model, with common types includi

Understanding locks is crucial for:

- Implementing large-scale, busy, or highly reliable database applications
- Tuning MySQL performance
- Implementing large-scale, busy, or highly reliable database applications
- Tuning MySQL performance

Familiarity with InnoDB locking and the InnoDB transaction model is essential for these tasks.

Expand Down Expand Up @@ -163,8 +163,6 @@ static void lock_grant(lock_t *lock) {
...
```
**15 Maintaining Transaction Order with replica_preserve_commit_order**
In MySQL, the *replica_preserve_commit_order* configuration ensures that transactions on secondary databases are committed in the same order as they appear in the relay log. This setting lays the foundation for maintaining the causal relationship between transactions: if transaction A commits before transaction B on the primary, transaction A will also commit before transaction B on the secondary. This prevents inconsistencies where transactions could be read in the reverse order on the secondary.
Expand Down Expand Up @@ -229,9 +227,10 @@ In computer programming, a thread pool is a design pattern used to achieve concu
Throughput measures the number of requests a system processes within a unit of time. Common statistical indicators include:
1. **Transactions Per Second (TPS):** The number of database transactions performed per second.
2. **Queries Per Second (QPS):** The number of database queries performed per second.
3. **tpmC for TPC-C:** The rate of New-Order transactions executed per minute in TPC-C benchmarks.
1. **Transactions Per Second (TPS):** The number of database transactions performed per second.
2. **Queries Per Second (QPS):** The number of database queries performed per second.
3. **tpmC for TPC-C:** The rate of New-Order transactions executed per minute in TPC-C benchmarks.
4. **tpmTOTAL for TPC-C:** The rate of total transactions executed per minute in TPC-C benchmarks.
**31 Thundering Herd**
Expand Down Expand Up @@ -269,11 +268,11 @@ The TPC-C benchmark, defined by the Transaction Processing Council, is an OLTP t
This schema is used by five different transactions, each creating varied access patterns:
1. **Item:** Read-only.
2. **Warehouse, District, Customer, Stock:** Read/write.
3. **New-Order:** Insert, read, and delete.
4. **Order and Order-Line:** Inserts with time-delayed updates, causing rows to become stale and infrequently read.
5. **History:** Insert-only.
1. **Item:** Read-only.
2. **Warehouse, District, Customer, Stock:** Read/write.
3. **New-Order:** Insert, read, and delete.
4. **Order and Order-Line:** Inserts with time-delayed updates, causing rows to become stale and infrequently read.
5. **History:** Insert-only.
The diverse access patterns of this small schema with a limited number of transactions contribute to TPC-C's ongoing significance as a major database benchmark. In this book, BenchmarkSQL is primarily employed to evaluate TPC-C performance in MySQL.
Expand Down Expand Up @@ -317,9 +316,9 @@ The preprocessor performs preliminary tasks such as verifying the existence of t
The query optimizer determines the execution plan for the SQL query. This phase includes:
- **Logical Query Rewrites:** Transforming queries into logically equivalent forms.
- **Cost-Based Join Optimization:** Evaluating different join methods to minimize execution cost.
- **Rule-Based Access Path Selection:** Choosing the best data access paths based on predefined rules.
- **Logical Query Rewrites:** Transforming queries into logically equivalent forms.
- **Cost-Based Join Optimization:** Evaluating different join methods to minimize execution cost.
- **Rule-Based Access Path Selection:** Choosing the best data access paths based on predefined rules.
The query optimizer generates the execution plan, which is then used by the query executor engine.
Expand All @@ -341,11 +340,11 @@ Since this query condition does not use an index, the optimizer chooses a full t
The execution process for the executor and storage engine is as follows:
1. The Server layer calls the storage engine's full scan interface to start reading records from the table.
2. The executor checks if the age of the retrieved record exceeds 20. Records that meet this condition are dispatched to the network write buffer if there is available space.
3. The executor requests the next record from the storage engine in a loop. Each record is evaluated against the query conditions, and those that meet the criteria are sent to the network write buffer, provided the buffer is not full.
4. Once the storage engine has read all records from the table, it notifies the executor that reading is complete.
5. Upon receiving the completion signal, the executor exits the loop and flushes the query results to the client.
1. The Server layer calls the storage engine's full scan interface to start reading records from the table.
2. The executor checks if the age of the retrieved record exceeds 20. Records that meet this condition are dispatched to the network write buffer if there is available space.
3. The executor requests the next record from the storage engine in a loop. Each record is evaluated against the query conditions, and those that meet the criteria are sent to the network write buffer, provided the buffer is not full.
4. Once the storage engine has read all records from the table, it notifies the executor that reading is complete.
5. Upon receiving the completion signal, the executor exits the loop and flushes the query results to the client.
To optimize performance, MySQL minimizes frequent write system calls by checking if the network buffer is full before sending records to the client. Records are sent only when the buffer is full or when the completion signal is received.
Expand Down Expand Up @@ -374,7 +373,7 @@ The execution process with an index is as follows:
2. The storage engine retrieves and returns the matching index record to the Server layer.
3. The executor checks if the record meets the additional query conditions (e.g., id \< 3).
If conditions are met, the corresponding name is added to the network buffer, unless it is full. If conditions are not met, the executor skips the record and requests the next one from the storage engine.
4. This cycle continues as the executor repeatedly requests and evaluates the next index record that matches the query condition until all relevant index records are processed.
Expand All @@ -393,23 +392,23 @@ MySQL follows the client-server architecture, which divides the system into two
### 1 Client
1. The client is an application that interacts with the MySQL database server.
2. It can be a standalone application, a web application, or any program requiring a database.
3. The client sends SQL queries to the MySQL server for processing.
1. The client is an application that interacts with the MySQL database server.
2. It can be a standalone application, a web application, or any program requiring a database.
3. The client sends SQL queries to the MySQL server for processing.
### 2 Server
1. The server is the MySQL database management system responsible for storing, managing, and processing data.
2. It receives SQL queries, processes them, and returns the result sets.
3. It manages data storage, security, and concurrent access for multiple clients.
1. The server is the MySQL database management system responsible for storing, managing, and processing data.
2. It receives SQL queries, processes them, and returns the result sets.
3. It manages data storage, security, and concurrent access for multiple clients.
The client communicates with the server over the network using the MySQL protocol, enabling multiple clients to interact concurrently. Applications use MySQL connectors to connect to the database server. MySQL also provides client tools, such as the terminal-based MySQL client, for direct interaction with the server.
The MySQL database server includes several daemon processes:
1. **SQL Interface**: Provides a standardized interface for applications to interact with the database using SQL queries.
2. **Query Parser**: Analyzes SQL queries to understand their structure and syntax, breaking them down into components for further processing.
3. **Query Optimizer**: Evaluates various execution plans for a given query and selects the most efficient one to improve performance.
1. **SQL Interface**: Provides a standardized interface for applications to interact with the database using SQL queries.
2. **Query Parser**: Analyzes SQL queries to understand their structure and syntax, breaking them down into components for further processing.
3. **Query Optimizer**: Evaluates various execution plans for a given query and selects the most efficient one to improve performance.
In MySQL, a storage engine is responsible for storage, retrieval, and management of data. MySQL's pluggable storage engine architecture allows selecting different storage engines, such as InnoDB and MyISAM, to meet specific performance and scalability requirements while maintaining a consistent SQL interface.
Expand All @@ -423,9 +422,9 @@ The most common way to create a fault-tolerant system is to use redundant compon
Replication in MySQL copies data from one server (primary) to one or more servers (secondaries), offering several advantages:
1. **Scale-out solutions**: Spreads the load among multiple secondaries to improve performance. All writes and updates occur on the primary server, while reads can occur on secondaries, enhancing read speed.
2. **Analytics**: Permits analysis on secondaries without impacting primary performance.
3. **Long-distance data distribution**: Creates local data copies for remote sites without needing constant access to the primary.
1. **Scale-out solutions**: Spreads the load among multiple secondaries to improve performance. All writes and updates occur on the primary server, while reads can occur on secondaries, enhancing read speed.
2. **Analytics**: Permits analysis on secondaries without impacting primary performance.
3. **Long-distance data distribution**: Creates local data copies for remote sites without needing constant access to the primary.
The original synchronization type is one-way asynchronous replication. The advantage of asynchronous replication is that user response time is unaffected by secondaries. However, there is a significant risk of data loss if the primary server fails and secondaries are not fully synchronized.
Expand Down Expand Up @@ -535,8 +534,6 @@ The testing command is as follows:
./tpcc_start -h127.0.0.1 -P 3306 -d tpcc200 -u xxx -p "yyy" -w 200 -c 100 -r 0 -l 60 -F 1
```
### 6 Configuration Parameters
Due to numerous tests, only typical configurations are listed here. Special configurations require corresponding parameter modifications.
Expand Down Expand Up @@ -587,7 +584,9 @@ slave_parallel_type=LOGICAL_CLOCK
slave_preserve_commit_order=on
```
Regarding the improved Group Replication, the configuration parameters for the primary server are as follows:
Regarding the improved Group Replication, since it is similar between MySQL 8.0.32 and MySQL 8.0.40, we have provided a version available for online use at the following address: https://github.com/advancedmysql/mysql-8.0.40.
Accordingly, the configuration parameters for the primary server are as follows:
```
# for mgr
Expand All @@ -600,16 +599,13 @@ loose-group_replication_group_name="aaaaaaaa-aaaa-aaaa-aaaa-baaaaaaaaaab"
loose-group_replication_local_address=127.0.0.1:63318
loose-group_replication_group_seeds= "127.0.0.1:63318,127.0.0.1:53318,127.0.0.1:43318"
loose-group_replication_member_weight=50
loose-group_replication_applier_batch_size_threshold=10000
loose-group_replication_single_primary_fast_mode=1
loose-group_replication_flow_control_mode=disabled
loose-group_replication_broadcast_gtid_executed_period=1000

slave_parallel_workers=256
slave_parallel_type=LOGICAL_CLOCK
slave_preserve_commit_order=on
```
The parameter *group_replication_single_primary_fast_mode*=1 disables the traditional database certification mode. For the improved Group Replication, the configuration parameters for the secondary server are as follows:
For the improved Group Replication, the configuration parameters for the secondary server are as follows:
```
# for mgr
Expand All @@ -622,28 +618,36 @@ loose-group_replication_group_name="aaaaaaaa-aaaa-aaaa-aaaa-baaaaaaaaaab"
loose-group_replication_local_address=127.0.0.1:53318
loose-group_replication_group_seeds= "127.0.0.1:63318,127.0.0.1:53318,127.0.0.1:43318"
loose-group_replication_member_weight=50
loose-group_replication_applier_batch_size_threshold=10000
loose-group_replication_single_primary_fast_mode=1
loose-group_replication_flow_control_mode=disabled
loose-group_replication_broadcast_gtid_executed_period=1000

slave_parallel_workers=256
slave_parallel_type=LOGICAL_CLOCK
slave_preserve_commit_order=on
```
Please note that we no longer provide the source code based on MySQL 8.0.32, but we do provide the source code based on MySQL 8.0.40.
The details related to semisynchronous replication can be found at the following address:
https://github.com/advancedmysql/mysql_8.0.27/blob/main/semisynchronous.txt
### 7 Source Code Repository
The patch address for "Percona Server for MySQL 8.0.27-18" is as follows:
**Patch for "Percona Server for MySQL 8.0.27-18":**
Patch Address:
https://github.com/advancedmysql/mysql_8.0.27/blob/main/book_8.0.27_single.patch
Please note that this patch focuses on optimizing a standalone MySQL instance. The cluster patch will be open-sourced on August 1, 2025.
This patch specifically targets optimizations for standalone MySQL instances, including:
- **MVCC ReadView** enhancements
- **Binlog group commit** improvements
- **Query execution plan** optimizations
**Cluster Source Code:**
The source code for MySQL cluster versions is available here: https://github.com/advancedmysql/mysql-8.0.40
For a MySQL standalone instance, the patch includes optimizations such as MVCC ReadView enhancements, binlog group commit improvements, and query execution plan optimizations. For cluster versions, the patch adds optimizations for Group Replication and MySQL secondary replay.
For MySQL clusters, the patch introduces further optimizations for **Group Replication** and **MySQL secondary replay**.
## About the Author
Expand Down
34 changes: 19 additions & 15 deletions Chapter12.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,10 @@ Currently, Group Replication faces challenging concurrent view change problems.

MySQL scalability can be further improved in the following areas:

1. Eliminating additional latch bottlenecks, particularly in non-partitioned environments.
2. Improving the stability of long-term performance testing.
3. Improving MySQL's NUMA-awareness in mainstream NUMA environments.
4. Addressing Performance Schema's adverse impact on NUMA environments during MySQL secondary replay processes.
1. Eliminating additional latch bottlenecks, particularly in non-partitioned environments.
2. Improving the stability of long-term performance testing.
3. Improving MySQL's NUMA-awareness in mainstream NUMA environments.
4. Addressing Performance Schema's adverse impact on NUMA environments during MySQL secondary replay processes.

## 12.5 Further Improving SQL Performance Under Low Concurrency

Expand All @@ -43,27 +43,31 @@ In mainstream NUMA environments, MySQL's primary server efficiency in handling l

Currently, jemalloc 4.5 is the best-found memory allocation tool, but it has high memory consumption and instability on ARM architecture. A key future focus could be developing a more efficient and stable memory allocation tool.

## 12.10 Introducing AI into MySQL Systems
## 12.10 Integrating a High-Performance File System

Enhancing MySQL with a better file system, especially improving the performance of MySQL secondary replay.

## 12.11 Introducing AI into MySQL Systems

Integrating AI with MySQL for automated knob tuning and learning-based database monitoring could be another key focus for the future.

### 12.10.1 Knob Tuning
### 12.11.1 Knob Tuning

Integrating AI for parameter optimization can significantly reduce DBA workload. Key parameters suitable for AI-driven optimization include:

1. Buffer pool size
2. Spin delay settings
3. Dynamic transaction throttling limits based on environment
4. Dynamic XCom cache size adjustment
5. MySQL secondary worker max queue size
6. The number of Paxos pipelining instances and the size of batching
7. Automatic parameter adjustments under heavy load to improve processing capability
1. Buffer pool size
2. Spin delay settings
3. Dynamic transaction throttling limits based on environment
4. Dynamic XCom cache size adjustment
5. MySQL secondary worker max queue size
6. The number of Paxos pipelining instances and the size of batching
7. Automatic parameter adjustments under heavy load to improve processing capability

### 12.10.2 Learning-based Database Monitoring
### 12.11.2 Learning-based Database Monitoring

AI could optimize database monitoring by determining the optimal times and methods for tracking various database metrics.

## 12.11 Summary
## 12.12 Summary

Programming demands strong logical reasoning skills, crucial for problem-solving, algorithm design, debugging, code comprehension, performance optimization, and testing. It helps in analyzing problems, creating solutions, correcting errors, and ensuring software reliability. Developing logical reasoning is essential for programmers to think systematically and build efficient, reliable software [56].

Expand Down
2 changes: 1 addition & 1 deletion Chapter2.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ This chapter introduces nine puzzling MySQL problems or phenomena that serve as

## 2.1 SysBench Read-Write Test Demonstrates Super-Linear Throughput Growth

In the MySQL 8.0.27 release version, for example, in a NUMA environment on x86 architecture, using SysBench to remotely test MySQL's read-write capabilities. The MySQL transaction isolation level is set to Read Committed. MySQL instances 1 and 2 are deployed on the same machine, with a testing duration of 60 seconds. The results of separate SysBench tests for MySQL instance 1 and instance 2 are shown in the following figure.
In the MySQL 8.0.27 release version, for example, in a 4-way NUMA environment on x86 architecture, using SysBench to remotely test MySQL's read-write capabilities. The MySQL transaction isolation level is set to Read Committed. MySQL instances 1 and 2 are deployed on the same machine, with a testing duration of 60 seconds. The results of separate SysBench tests for MySQL instance 1 and instance 2 are shown in the following figure.

<img src="media/image-20240829081346732.png" alt="image-20240829081346732" style="zoom:150%;" />

Expand Down
Loading

0 comments on commit c385f9e

Please sign in to comment.