From effc80912738c439b9092a8b7460e19ca285ed0c Mon Sep 17 00:00:00 2001 From: Airton Lastori Date: Wed, 14 May 2025 18:42:37 -0400 Subject: [PATCH 1/3] - Added documentation for the new Azure DM private link feature in the migration guide. - Included reference to the related pull request for additional context: https://github.com/pingcap/docs/pull/20873. --- tidb-cloud/migrate-from-mysql-using-data-migration.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/tidb-cloud/migrate-from-mysql-using-data-migration.md b/tidb-cloud/migrate-from-mysql-using-data-migration.md index f72c93472e6fe..f1ad6da1e8d55 100644 --- a/tidb-cloud/migrate-from-mysql-using-data-migration.md +++ b/tidb-cloud/migrate-from-mysql-using-data-migration.md @@ -143,6 +143,13 @@ If your MySQL service is in a Google Cloud VPC, take the following steps: +
+ Set up Azure Private Link + +If you want to connect to TiDB Cloud via private endpoint in Azure, refer to [Set up Azure Private Link](/tidb-cloud/private-link-to-azure.md) for detailed instructions. + +
+ ### Enable binary logs To perform incremental data migration, make sure the following requirements are met: From 1c6c20c56ea87d70e6bb76ee011ea62e03cef4f7 Mon Sep 17 00:00:00 2001 From: Airton Lastori Date: Fri, 16 May 2025 02:20:33 -0400 Subject: [PATCH 2/3] This commit significantly revises and expands the prerequisites section for migrating MySQL-compatible databases to TiDB Cloud using Data Migration in tidb-cloud/migrate-from-mysql-using-data-migration.md. The primary goals of this update are to: 1. Provide much more detailed and actionable guidance on network connectivity options, with a strong focus on Private Link/Endpoint configurations for AWS and Azure, alongside existing VPC Peering and Public IP methods. 2. Introduce comprehensive instructions for enabling and configuring binary logs on various MySQL source types (self-managed, AWS RDS/Aurora, Azure Database for MySQL, Google Cloud SQL). 3. Incorporate Azure Database for MySQL flexible servers as a fully supported data source throughout the document. 4. Improve clarity and consistency in terminology (e.g., "MySQL source database" instead of "upstream"). Key changes include: - Updated to include Azure and better articulate the benefits of the DM feature. - Restructured "Prerequisites": - Added "Azure Database for MySQL flexible servers" to supported data sources and expanded version support for other cloud MySQL services. - New major section "Ensure network connectivity": - Table summarizing Public Endpoints, Private Links (AWS/Azure), and VPC Peering (AWS/GCP). - Emphasis on end-to-end TLS/SSL encryption with links to provider docs. - Detailed setup guides for Public Endpoints, AWS PrivateLink, and Azure Private Link, including CLI verification examples and provider-specific considerations. - Existing VPC Peering instructions for AWS and GCP are now integrated here. - New major section "Enable binary logs for replication": - Table of required binlog settings. - Provider-specific instructions (self-managed, AWS, Azure, GCP) for configuring these binlog settings. - Clarified privilege requirements and user hostname considerations for DM service access. - **Terminology Updates:** Consistently used "MySQL source database" and "target TiDB Cloud cluster" for better readability. - Minor phrasing improvements throughout the document for clarity. These changes aim to provide users with a more complete, robust, and easier-to-follow guide for preparing their environment for data migration to TiDB Cloud, especially when using private networking solutions." --- ...migrate-from-mysql-using-data-migration.md | 246 ++++++++++++++---- 1 file changed, 190 insertions(+), 56 deletions(-) diff --git a/tidb-cloud/migrate-from-mysql-using-data-migration.md b/tidb-cloud/migrate-from-mysql-using-data-migration.md index f1ad6da1e8d55..9bacdc7323972 100644 --- a/tidb-cloud/migrate-from-mysql-using-data-migration.md +++ b/tidb-cloud/migrate-from-mysql-using-data-migration.md @@ -1,16 +1,16 @@ --- title: Migrate MySQL-Compatible Databases to TiDB Cloud Using Data Migration -summary: Learn how to migrate data from MySQL-compatible databases hosted in Amazon Aurora MySQL, Amazon Relational Database Service (RDS), Google Cloud SQL for MySQL, or a local MySQL instance to TiDB Cloud using Data Migration. +summary: Learn how to seamlessly migrate your MySQL databases from Amazon Aurora MySQL, Amazon RDS, Azure Database for MySQL flexible servers, Google Cloud SQL for MySQL, or self-managed MySQL instances to TiDB Cloud with minimal downtime using the Data Migration feature. aliases: ['/tidbcloud/migrate-data-into-tidb','/tidbcloud/migrate-incremental-data-from-mysql'] --- # Migrate MySQL-Compatible Databases to TiDB Cloud Using Data Migration -This document describes how to migrate data from a MySQL-compatible database on a cloud provider (Amazon Aurora MySQL, Amazon Relational Database Service (RDS), or Google Cloud SQL for MySQL) or self-hosted source database to TiDB Cloud using the Data Migration feature of the TiDB Cloud console. +This document guides you through migrating your MySQL databases from Amazon Aurora MySQL, Amazon RDS, Azure Database for MySQL flexible servers, Google Cloud SQL for MySQL, or self-managed MySQL instances to TiDB Cloud using the Data Migration feature in the console. -This feature helps you migrate your source databases' existing data and ongoing changes to TiDB Cloud (either in the same region or cross regions) directly in one go. +This feature enables you to migrate both your existing MySQL data and replicate ongoing changes (binlog) from your MySQL source databases directly to TiDB Cloud, maintaining consistency whether in the same region or across different regions. The streamlined process eliminates the need for separate dump and load operations, reducing downtime and simplifying your migration from MySQL to a more scalable platform. -If you want to migrate incremental data only, see [Migrate Incremental Data from MySQL-Compatible Databases to TiDB Cloud Using Data Migration](/tidb-cloud/migrate-incremental-data-from-mysql-using-data-migration.md). +If you only want to replicate ongoing binlog changes from your MySQL database to TiDB Cloud, see [Migrate Incremental Data from MySQL-Compatible Databases to TiDB Cloud Using Data Migration](/tidb-cloud/migrate-incremental-data-from-mysql-using-data-migration.md). ## Limitations @@ -40,9 +40,9 @@ You can create up to 200 migration jobs for each organization. To create more mi ### Limitations of incremental data migration -- During incremental data migration, if the table to be migrated already exists in the target database with duplicated keys, an error is reported and the migration is interrupted. In this situation, you need to make sure whether the upstream data is accurate. If yes, click the "Restart" button of the migration job and the migration job will replace the downstream conflicting records with the upstream records. +- During incremental data migration, if the table to be migrated already exists in the target database with duplicated keys, an error is reported and the migration is interrupted. In this situation, you need to make sure whether the MySQL source data is accurate. If yes, click the "Restart" button of the migration job and the migration job will replace the target TiDB Cloud cluster conflicting records with the MySQL source records. -- During incremental replication (migrating ongoing changes to your cluster), if the migration job recovers from an abrupt error, it might open the safe mode for 60 seconds. During the safe mode, `INSERT` statements are migrated as `REPLACE`, `UPDATE` statements as `DELETE` and `REPLACE`, and then these transactions are migrated to the downstream cluster to make sure that all the data during the abrupt error has been migrated smoothly to the downstream cluster. In this scenario, for upstream tables without primary keys or not-null unique indexes, some data might be duplicated in the downstream cluster because the data might be inserted repeatedly to the downstream. +- During incremental replication (migrating ongoing changes to your cluster), if the migration job recovers from an abrupt error, it might open the safe mode for 60 seconds. During the safe mode, `INSERT` statements are migrated as `REPLACE`, `UPDATE` statements as `DELETE` and `REPLACE`, and then these transactions are migrated to the target TiDB Cloud cluster to make sure that all the data during the abrupt error has been migrated smoothly to the target TiDB Cloud cluster. In this scenario, for MySQL source tables without primary keys or not-null unique indexes, some data might be duplicated in the target TiDB Cloud cluster because the data might be inserted repeatedly to the target TiDB Cloud cluster. - In the following scenarios, if the migration job takes longer than 24 hours, do not purge binary logs in the source database to ensure that Data Migration can get consecutive binary logs for incremental replication: @@ -51,20 +51,141 @@ You can create up to 200 migration jobs for each organization. To create more mi ## Prerequisites -Before performing the migration, you need to check the data sources, prepare privileges for upstream and downstream databases, and set up network connections. +Before migrating, check supported data sources, set up network connections, and prepare privileges for the MySQL source and target TiDB Cloud cluster databases. ### Make sure your data source and version are supported Data Migration supports the following data sources and versions: -- MySQL 5.6, 5.7, and 8.0 local instances or on a public cloud provider. Note that MySQL 8.0 is still experimental on TiDB Cloud and might have incompatibility issues. -- Amazon Aurora (MySQL 5.6 and 5.7) -- Amazon RDS (MySQL 5.7) -- Google Cloud SQL for MySQL 5.6 and 5.7 +- Self-managed MySQL instances MySQL 8.0, 5.7, and 5.6 local instances or on a public cloud provider. +- Amazon Aurora MySQL (8.0, 5.7, and 5.6) +- Amazon RDS MySQL (8.0, and 5.7) +- Azure Database for MySQL flexible servers (8.0, and 5.7) +- Google Cloud SQL for MySQL (8.0, 5.7, and 5.6) -### Grant required privileges to the upstream database +### Ensure network connectivity -The username you use for the upstream database must have all the following privileges: +Before creating a migration job, you need to plan and set up proper network connectivity between your source MySQL instance, TiDB Cloud DM (Data Migration) service, and your target TiDB Cloud cluster. + +Your options are: + +| Connectivity Pattern | Availability | Recommended for | +|:---------------------|:-------------|:----------------| +| Public Endpoints / IPs | All cloud providers | Quick proof-of-concept migrations, testing, or when private connectivity isn't available | +| Private Links / Endpoints | AWS and Azure only | Production workloads without exposing data to the public internet | +| VPC Peering | AWS and GCP only | Production workloads that need low-latency, intra-region connections and whose VPC/VNet CIDRs do not overlap | + +Choose the connection method that best fits your security requirements, network topology, and cloud provider. Proceed with the setup for the chosen connectivity pattern. + +#### End-to-end Encryption over TLS/SSL + +In any case, TLS/SSL is highly recommended for end-to-end encryption. Private Link and VPC Peering protect the network path, but end‑to‑end encryption protects the data and satisfies compliance checks. + +
+ Download and store the provider's certificates for TLS/SSL encrypted connections + +- [AWS RDS / Aurora using SSL/TLS to encrypt a connection to a DB instance or cluster](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/UsingWithRDS.SSL.html) +- [Azure Database for MySQL Flexible Server - connect with encrypted connections](https://learn.microsoft.com/en-us/azure/mysql/flexible-server/how-to-connect-tls-ssl) +- [GCP Cloud SQL - manage SSL/TLS certificates](https://cloud.google.com/sql/docs/mysql/manage-ssl-instance) + +
+ +#### Public Endpoints / IPs + +- If you use a Public Endpoint for your source MySQL database, get its IP address or Hostname (FQDN) and make sure that it can be connected through the public network. You may also need to configure firewall rules or security groups accordingly to your cloud provider guides. + +- Identify and record the source MySQL instance endpoint hostname (FQDN) or public IP. +- Add the TiDB Cloud DM egress IP range to the database's firewall/security‑group rules. See your provider’s docs for: + - [AWS RDS / Aurora VPC security groups](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Overview.RDSSecurityGroups.html). + - [Azure Database for MySQL Flexible Server Public Network Access](https://learn.microsoft.com/en-us/azure/mysql/flexible-server/concepts-networking-public) + - [Cloud SQL Authorized Networks](https://cloud.google.com/sql/docs/mysql/configure-ip#authorized-networks). +- Verify connectivity from your machine with public internet access using the certificates: + + ```shell + mysql -h -P -u -p --ssl-ca= -e "SELECT version();" + ``` + +#### Private Link / Private Endpoint + +If you use a provider-native Private Link, create a Private Endpoint for the source MySQL instance (RDS, Aurora, or Azure Database for MySQL). + +
+ Set up AWS Private Link and Private Endpoint for the MySQL source database + +Create a Network Load Balancer (NLB) and publish that NLB as an Endpoint Service associated with the source MySQL instance you want to migrate to TiDB Cloud. AWS doesn't expose RDS/Aurora directly through PrivateLink. + +1. In the AWS web console, create an NLB in the same subnet(s) as your RDS/Aurora writer. Add a TCP listener on port `3306` that targets the DB instance endpoint. +2. Under VPC, Endpoint Services, create a service backed by the NLB and enable Require acceptance. Note the Service Name (format `com.amazonaws.vpce-svc-xxxxxxxxxxxxxxxxx`). +3. Optionally, test connectivity from a bastion or client inside the same VPC/VNet before starting the migration: + + ```shell + mysql -h -P 3306 -u -p --ssl-ca= -e "SELECT version();" + ``` + +4. Later, when configuring TiDB Cloud DM to connect via PrivateLink, you will return to the AWS console to approve a Pending connection request from TiDB Cloud DM to this Private Endpoint. + +For detailed instructions, see [AWS guide to access VPC resources through PrivateLink](https://docs.aws.amazon.com/vpc/latest/privatelink/privatelink-access-resources.html). + +
+ +
+ Set up Azure Private Link and Private Endpoint for the MySQL source database + +Azure supports Private Endpoints natively on each MySQL Flexible Server instance. You can either create Private access (VNet Integration) during the MySQL instance creation or add a Private Endpoint later. To add a new Private Endpoint: + +1. In the Azure portal, open MySQL Flexible Server, Networking, Private Endpoints, and click on the "+ Create private endpoint" button. +2. Follow the wizard by selecting the VNet/subnet where TiDB Cloud can reach, keep Private DNS integration enabled, and finish the wizard. The hostname to be used to connect with the instance can be found under the Connect menu (typical format `.mysql.database.azure.com`). +3. Optionally, test connectivity from a bastion or client inside the same VPC/VNet before starting the migration: + + ```shell + mysql -h -P 3306 -u -p --ssl-ca= -e "SELECT version();" + ``` + +4. Go back to the MySQL Flexible Server instance (not the private‑endpoint object), and note its resource ID. You can retrieve it in the MySQL Flexible Server instance JSON view (format `/subscriptions//resourceGroups//providers/Microsoft.DBforMySQL/flexibleServers/`). You will use this resource ID (not the private endpoint) to configure TiDB Cloud DM. +5. Later, when configuring TiDB Cloud DM to connect via Private Link, you will return to the Azure portal to approve a Pending connection request from TiDB Cloud DM to this Private Endpoint. + +For detailed instructions, see [Azure guide to create a private endpoint via Private Link Center](https://learn.microsoft.com/en-us/azure/mysql/flexible-server/how-to-networking-private-link-portal#create-a-private-endpoint-via-private-link-center) + +
+ +- If you use AWS VPC Peering or Google Cloud VPC Network Peering, see the following instructions to configure the network. + +
+ Set up AWS VPC Peering + +If your MySQL service is in an AWS VPC, take the following steps: + +1. [Set up a VPC peering connection](/tidb-cloud/set-up-vpc-peering-connections.md) between the VPC of the MySQL service and your TiDB cluster. + +2. Modify the inbound rules of the security group that the MySQL service is associated with. + + You must add [the CIDR of the region where your TiDB Cloud cluster is located](/tidb-cloud/set-up-vpc-peering-connections.md#prerequisite-set-a-cidr-for-a-region) to the inbound rules. Doing so allows the traffic to flow from your TiDB cluster to the MySQL instance. + +3. If the MySQL URL contains a DNS hostname, you need to allow TiDB Cloud to be able to resolve the hostname of the MySQL service. + + 1. Follow the steps in [Enable DNS resolution for a VPC peering connection](https://docs.aws.amazon.com/vpc/latest/peering/modify-peering-connections.html#vpc-peering-dns). + 2. Enable the **Accepter DNS resolution** option. + +
+ +
+ Set up Google Cloud VPC Network Peering + +If your MySQL service is in a Google Cloud VPC, take the following steps: + +1. If it is a self-hosted MySQL, you can skip this step and proceed to the next step. If your MySQL service is Google Cloud SQL, you must expose a MySQL endpoint in the associated VPC of the Google Cloud SQL instance. You might need to use the [Cloud SQL Auth proxy](https://cloud.google.com/sql/docs/mysql/sql-proxy) developed by Google. + +2. [Set up a VPC peering connection](/tidb-cloud/set-up-vpc-peering-connections.md) between the VPC of your MySQL service and your TiDB cluster. + +3. Modify the ingress firewall rules of the VPC where MySQL is located. + + You must add [the CIDR of the region where your TiDB Cloud cluster is located](/tidb-cloud/set-up-vpc-peering-connections.md#prerequisite-set-a-cidr-for-a-region) to the ingress firewall rules. This allows the traffic to flow from your TiDB cluster to the MySQL endpoint. + +
+ +### Grant required privileges in the source MySQL database + +The username you use for migration in the source database must have all the following privileges: | Privilege | Scope | |:----|:----| @@ -76,12 +197,12 @@ The username you use for the upstream database must have all the following privi For example, you can use the following `GRANT` statement to grant corresponding privileges: ```sql -GRANT SELECT,LOCK TABLES,REPLICATION SLAVE,REPLICATION CLIENT ON *.* TO 'your_user'@'your_IP_address_of_host' +GRANT SELECT, LOCK TABLES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'your_user'@'your_IP_address_of_host' ``` -### Grant required privileges to the downstream TiDB Cloud cluster +### Grant required privileges in the target TiDB Cloud cluster -The username you use for the downstream TiDB Cloud cluster must have the following privileges: +The username you use for the migration in the target TiDB Cloud cluster must have the following privileges: | Privilege | Scope | |:----|:----| @@ -97,69 +218,82 @@ The username you use for the downstream TiDB Cloud cluster must have the followi For example, you can execute the following `GRANT` statement to grant corresponding privileges: ```sql -GRANT CREATE,SELECT,INSERT,UPDATE,DELETE,ALTER,DROP,INDEX ON *.* TO 'your_user'@'your_IP_address_of_host' +GRANT CREATE, SELECT, INSERT, UPDATE, DELETE, ALTER, DROP, INDEX ON *.* TO 'your_user'@'your_IP_address_of_host' ``` -To quickly test a migration job, you can use the `root` account of the TiDB Cloud cluster. +To quickly test a migration job, you can use the `root` account of the TiDB Cloud cluster. The MySQL user hostname part also needs to allow connections from the TiDB Cloud DM service, and you can use `%` for simplification. -### Set up network connection +### Enable binary logs for replication -Before creating a migration job, set up the network connection according to your connection methods. See [Connect to Your TiDB Cloud Dedicated Cluster](/tidb-cloud/connect-to-tidb-cluster.md). +To enable replication from the source MySQL database to the TiDB Cloud target cluster using DM for continuously capturing incremental changes, you need these MySQL configurations: -- If you use public IP (this is, public connection) for network connection, make sure that the upstream database can be connected through the public network. +| Configuration | Required value | Why | +|:--------------|:---------------|:----| +| `log_bin` | `ON` | Enables binary logging that DM reads to replay changes in TiDB | +| `binlog_format` | `ROW` | Captures all data changes accurately (other formats miss edge cases) | +| `binlog_row_image` | `FULL` | Includes all column values in events for safe conflict resolution | +| `binlog_expire_logs_seconds` | ≥ 86400 (1 day), 604800 (7 days) recommended | Ensures DM can access consecutive logs during migration | -- If you use AWS VPC Peering or Google Cloud VPC Network Peering, see the following instructions to configure the network. +#### Check current values and configure the source MySQL instance -
- Set up AWS VPC Peering +To confirm the current configurations, connect to the source MySQL instance and run: -If your MySQL service is in an AWS VPC, take the following steps: +```sql +SHOW VARIABLES WHERE Variable_name IN +('log_bin','server_id','binlog_format','binlog_row_image', +'binlog_expire_logs_seconds','expire_logs_days'); +``` -1. [Set up a VPC peering connection](/tidb-cloud/set-up-vpc-peering-connections.md) between the VPC of the MySQL service and your TiDB cluster. +If necessary, change the source MySQL instance configurations to match the requirements. -2. Modify the inbound rules of the security group that the MySQL service is associated with. +
+ Configure a self‑managed MySQL instance - You must add [the CIDR of the region where your TiDB Cloud cluster is located](/tidb-cloud/set-up-vpc-peering-connections.md#prerequisite-set-a-cidr-for-a-region) to the inbound rules. Doing so allows the traffic to flow from your TiDB cluster to the MySQL instance. +1. Open `/etc/my.cnf` and add: -3. If the MySQL URL contains a DNS hostname, you need to allow TiDB Cloud to be able to resolve the hostname of the MySQL service. + ```toml + [mysqld] + log_bin = mysql-bin + binlog_format = ROW + binlog_row_image = FULL + binlog_expire_logs_seconds = 604800 # 7 days retention + ``` - 1. Follow the steps in [Enable DNS resolution for a VPC peering connection](https://docs.aws.amazon.com/vpc/latest/peering/modify-peering-connections.html#vpc-peering-dns). - 2. Enable the **Accepter DNS resolution** option. +2. Restart: `sudo systemctl restart mysqld` + +3. Run the `SHOW VARIABLES` query again to verify that the settings took effect.
- Set up Google Cloud VPC Network Peering + Configure AWS RDS or Aurora MySQL -If your MySQL service is in a Google Cloud VPC, take the following steps: - -1. If it is a self-hosted MySQL, you can skip this step and proceed to the next step. If your MySQL service is Google Cloud SQL, you must expose a MySQL endpoint in the associated VPC of the Google Cloud SQL instance. You might need to use the [Cloud SQL Auth proxy](https://cloud.google.com/sql/docs/mysql/sql-proxy) developed by Google. - -2. [Set up a VPC peering connection](/tidb-cloud/set-up-vpc-peering-connections.md) between the VPC of your MySQL service and your TiDB cluster. - -3. Modify the ingress firewall rules of the VPC where MySQL is located. - - You must add [the CIDR of the region where your TiDB Cloud cluster is located](/tidb-cloud/set-up-vpc-peering-connections.md#prerequisite-set-a-cidr-for-a-region) to the ingress firewall rules. This allows the traffic to flow from your TiDB cluster to the MySQL endpoint. +1. In the AWS console, open RDS, Parameter groups, and create (or edit) a custom parameter group. +2. Set the four parameters above to the required values. +3. Attach the parameter group to your instance/cluster and reboot to apply changes. +4. After the reboot, connect and run the `SHOW VARIABLES` query to confirm.
- Set up Azure Private Link + Configure Azure Database for MySQL ‑ Flexible Server -If you want to connect to TiDB Cloud via private endpoint in Azure, refer to [Set up Azure Private Link](/tidb-cloud/private-link-to-azure.md) for detailed instructions. +1. In the Azure portal, open MySQL Flexible Server, Server parameters. +2. Search for each setting and update the values. +Most changes apply without restart; the portal indicates if a reboot is needed. +3. Verify with the `SHOW VARIABLES` query.
-### Enable binary logs - -To perform incremental data migration, make sure the following requirements are met: +
+ Configure Google Cloud SQL for MySQL -- Binary logs are enabled for the upstream database. -- The binary logs are retained for at least 24 hours. -- The binlog format for the upstream database is set to `ROW`. If not, update the format to `ROW` as follows to avoid the [format error](/tidb-cloud/tidb-cloud-dm-precheck-and-troubleshooting.md#error-message-check-whether-mysql-binlog_format-is-row): +1. In the Google Cloud console, go to Cloud SQL, ``, Flags. +2. Add or edit the necessary flags (`log_bin`, `binlog_format`, `binlog_row_image`, `binlog_expire_logs_seconds`). +3. Click Save. Cloud SQL prompts a restart if required. +4. After Cloud SQL restarts, run the `SHOW VARIABLES` query to confirm. - - MySQL: execute the `SET GLOBAL binlog_format=ROW;` statement. If you want to persist this change across reboots, you can execute the `SET PERSIST binlog_format=ROW;` statement. - - Amazon Aurora MySQL or RDS for MySQL: follow the instructions in [AWS documentation](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/USER_WorkingWithDBInstanceParamGroups.html) to create a new DB parameter group. Set the `binlog_format=row` parameter in the new DB parameter group, modify the instance to use the new DB parameter group, and then restart the instance to take effect. +
## Step 1: Go to the **Data Migration** page @@ -216,9 +350,9 @@ To migrate data to TiDB Cloud once and for all, choose both **Existing data migr You can use **physical mode** or **logical mode** to migrate **existing data** and **incremental data**. -- The default mode is **logical mode**. This mode exports data from upstream databases as SQL statements, and then executes them on TiDB. In this mode, the target tables before migration can be either empty or non-empty. But the performance is slower than physical mode. +- The default mode is **logical mode**. This mode exports data from MySQL source databases as SQL statements, and then executes them on TiDB. In this mode, the target tables before migration can be either empty or non-empty. But the performance is slower than physical mode. -- For large datasets, it is recommended to use **physical mode**. This mode exports data from upstream databases and encodes it as KV pairs, writing directly to TiKV to achieve faster performance. This mode requires the target tables to be empty before migration. For the specification of 16 RCUs (Replication Capacity Units), the performance is about 2.5 times faster than logical mode. The performance of other specifications can increase by 20% to 50% compared with logical mode. Note that the performance data is for reference only and might vary in different scenarios. +- For large datasets, it is recommended to use **physical mode**. This mode exports data from MySQL source databases and encodes it as KV pairs, writing directly to TiKV to achieve faster performance. This mode requires the target tables to be empty before migration. For the specification of 16 RCUs (Replication Capacity Units), the performance is about 2.5 times faster than logical mode. The performance of other specifications can increase by 20% to 50% compared with logical mode. Note that the performance data is for reference only and might vary in different scenarios. Physical mode is available for TiDB clusters deployed on AWS and Google Cloud. @@ -227,9 +361,9 @@ Physical mode is available for TiDB clusters deployed on AWS and Google Cloud. > - When you use physical mode, you cannot create a second migration job or import task for the TiDB cluster before the existing data migration is completed. > - When you use physical mode and the migration job has started, do **NOT** enable PITR (Point-in-time Recovery) or have any changefeed on the cluster. Otherwise, the migration job will be stuck. If you need to enable PITR or have any changefeed, use logical mode instead to migrate data. -Physical mode exports the upstream data as fast as possible, so [different specifications](/tidb-cloud/tidb-cloud-billing-dm.md#specifications-for-data-migration) have different performance impacts on QPS and TPS of the upstream database during data export. The following table shows the performance regression of each specification. +Physical mode exports the MySQL source data as fast as possible, so [different specifications](/tidb-cloud/tidb-cloud-billing-dm.md#specifications-for-data-migration) have different performance impacts on QPS and TPS of the MySQL source database during data export. The following table shows the performance regression of each specification. -| Migration specification | Maximum export speed | Performance regression of the upstream database | +| Migration specification | Maximum export speed | Performance regression of the MySQL source database | |---------|-------------|--------| | 2 RCUs | 80.84 MiB/s | 15.6% | | 4 RCUs | 214.2 MiB/s | 20.0% | @@ -318,7 +452,7 @@ When scaling a migration job specification, note the following: - You can only scale a migration job specification when the job is in the **Running** or **Paused** status. - TiDB Cloud does not support scaling a migration job specification during the existing data export stage. - Scaling a migration job specification will restart the job. If a source table of the job does not have a primary key, duplicate data might be inserted. -- During scaling, do not purge the binary log of the source database or increase `expire_logs_days` of the upstream database temporarily. Otherwise, the job might fail because it cannot get the continuous binary log position. +- During scaling, do not purge the binary log of the source database or increase `expire_logs_days` of the MySQL source database temporarily. Otherwise, the job might fail because it cannot get the continuous binary log position. ### Scaling procedure From 2ff08b8c8734598920a2222e2e8fbebfcd7206b7 Mon Sep 17 00:00:00 2001 From: Airton Lastori Date: Fri, 16 May 2025 13:04:31 -0400 Subject: [PATCH 3/3] Key changes: - Added "Azure Database for MySQL flexible servers" to supported sources and updated the supported versions table. - Expanded the "Enable binary logs" section with specific MySQL configurations (`log_bin`, `binlog_format`, `binlog_row_image`, `binlog_expire_logs_seconds`) and detailed setup instructions for self-managed MySQL, AWS RDS/Aurora, Azure Database for MySQL, and Google Cloud SQL for MySQL. - Enhanced "Ensure network connectivity" section covering Public Endpoints, Private Link (for AWS and Azure with setup steps), and VPC Peering (for AWS and GCP with setup steps), along with recommendations for TLS/SSL. - Refined the "Grant required privileges" section with clearer tables detailing privileges, scope, and purpose for both source and target databases. - Various wording adjustments and formatting improvements for better readability and consistency. --- ...migrate-from-mysql-using-data-migration.md | 238 ++++++++++-------- 1 file changed, 128 insertions(+), 110 deletions(-) diff --git a/tidb-cloud/migrate-from-mysql-using-data-migration.md b/tidb-cloud/migrate-from-mysql-using-data-migration.md index 9bacdc7323972..a779b4b9962c9 100644 --- a/tidb-cloud/migrate-from-mysql-using-data-migration.md +++ b/tidb-cloud/migrate-from-mysql-using-data-migration.md @@ -51,17 +51,99 @@ You can create up to 200 migration jobs for each organization. To create more mi ## Prerequisites -Before migrating, check supported data sources, set up network connections, and prepare privileges for the MySQL source and target TiDB Cloud cluster databases. +Before migrating, check supported data sources, enable binary logging in your MySQL source, ensure network connectivity, and prepare appropriate privileges for both the MySQL source and target TiDB Cloud cluster databases. ### Make sure your data source and version are supported Data Migration supports the following data sources and versions: -- Self-managed MySQL instances MySQL 8.0, 5.7, and 5.6 local instances or on a public cloud provider. -- Amazon Aurora MySQL (8.0, 5.7, and 5.6) -- Amazon RDS MySQL (8.0, and 5.7) -- Azure Database for MySQL flexible servers (8.0, and 5.7) -- Google Cloud SQL for MySQL (8.0, 5.7, and 5.6) +| Data Source | Supported Versions | +|:------------|:-------------------| +| Self-managed MySQL (on-prem or public cloud) | 8.0, 5.7, 5.6 | +| Amazon Aurora MySQL | 8.0, 5.7, 5.6 | +| Amazon RDS MySQL | 8.0, 5.7 | +| Azure Database for MySQL Flexible Servers | 8.0, 5.7 | +| Google Cloud SQL for MySQL | 8.0, 5.7, 5.6 | + +### Enable binary logs in the source MySQL database for replication + +To enable replication from the source MySQL database to the TiDB Cloud target cluster using DM for continuously capturing incremental changes, you need these MySQL configurations: + +| Configuration | Required value | Why | +|:--------------|:---------------|:----| +| `log_bin` | `ON` | Enables binary logging that DM reads to replay changes in TiDB | +| `binlog_format` | `ROW` | Captures all data changes accurately (other formats miss edge cases) | +| `binlog_row_image` | `FULL` | Includes all column values in events for safe conflict resolution | +| `binlog_expire_logs_seconds` | ≥ `86400` (1 day), `604800` (7 days, recommended) | Ensures DM can access consecutive logs during migration | + +#### Check current values and configure the source MySQL instance + +To confirm the current configurations, connect to the source MySQL instance and run: + +```sql +SHOW VARIABLES WHERE Variable_name IN +('log_bin','server_id','binlog_format','binlog_row_image', +'binlog_expire_logs_seconds','expire_logs_days'); +``` + +If necessary, change the source MySQL instance configurations to match the requirements. + +
+ Configure a self‑managed MySQL instance + +1. Open `/etc/my.cnf` and add: + + ```toml + [mysqld] + log_bin = mysql-bin + binlog_format = ROW + binlog_row_image = FULL + binlog_expire_logs_seconds = 604800 # 7 days retention + ``` + +2. Restart: `sudo systemctl restart mysqld` + +3. Run the `SHOW VARIABLES` query again to verify that the settings took effect. + +For detailed instructions, see [MySQL Server System Variables documentation](https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html) and [The Binary Log](https://dev.mysql.com/doc/refman/8.0/en/binary-log.html) in the MySQL Reference Manual. + +
+ +
+ Configure AWS RDS or Aurora MySQL + +1. In the AWS console, open RDS, Parameter groups, and create (or edit) a custom parameter group. +2. Set the four parameters above to the required values. +3. Attach the parameter group to your instance/cluster and reboot to apply changes. +4. After the reboot, connect and run the `SHOW VARIABLES` query to confirm. + +For detailed instructions, see [Working with DB Parameter Groups](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_WorkingWithParamGroups.html) and [Configuring MySQL Binary Logging](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_LogAccess.MySQL.BinaryFormat.html) in the AWS documentation. + +
+ +
+ Configure Azure Database for MySQL ‑ Flexible Server + +1. In the Azure portal, open MySQL Flexible Server, Server parameters. +2. Search for each setting and update the values. +Most changes apply without a restart; the portal indicates if a reboot is needed. +3. Verify with the `SHOW VARIABLES` query. + +For detailed instructions, see [Server Parameters in Azure Database for MySQL - Flexible Server](https://learn.microsoft.com/en-us/azure/mysql/flexible-server/concepts-server-parameters) in the Microsoft Azure documentation. + +
+ +
+ Configure Google Cloud SQL for MySQL + +1. In the Google Cloud console, go to Cloud SQL, ``, Flags. +2. Add or edit the necessary flags (`log_bin`, `binlog_format`, `binlog_row_image`, `binlog_expire_logs_seconds`). +3. Click Save. Cloud SQL prompts a restart if required. +4. After Cloud SQL restarts, run the `SHOW VARIABLES` query to confirm. + +For detailed instructions, see [Configure database flags](https://cloud.google.com/sql/docs/mysql/flags) and [Use point-in-time recovery](https://cloud.google.com/sql/docs/mysql/backup-recovery/pitr) in the Google Cloud documentation. + +
### Ensure network connectivity @@ -92,14 +174,14 @@ In any case, TLS/SSL is highly recommended for end-to-end encryption. Private  #### Public Endpoints / IPs -- If you use a Public Endpoint for your source MySQL database, get its IP address or Hostname (FQDN) and make sure that it can be connected through the public network. You may also need to configure firewall rules or security groups accordingly to your cloud provider guides. +If you use a Public Endpoint for your source MySQL database, ensure you can connect to it via the public internet. You may also need to configure firewall rules or security groups accordingly to your cloud provider guides. -- Identify and record the source MySQL instance endpoint hostname (FQDN) or public IP. -- Add the TiDB Cloud DM egress IP range to the database's firewall/security‑group rules. See your provider’s docs for: +1. Identify and record the source MySQL instance endpoint hostname (FQDN) or public IP. +2. If necessary, add the TiDB Cloud DM egress IP range to the database's firewall/security‑group rules. See your provider’s docs for: - [AWS RDS / Aurora VPC security groups](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Overview.RDSSecurityGroups.html). - [Azure Database for MySQL Flexible Server Public Network Access](https://learn.microsoft.com/en-us/azure/mysql/flexible-server/concepts-networking-public) - [Cloud SQL Authorized Networks](https://cloud.google.com/sql/docs/mysql/configure-ip#authorized-networks). -- Verify connectivity from your machine with public internet access using the certificates: +3. Verify connectivity from your machine with public internet access using the certificate for in-transit encryption: ```shell mysql -h -P -u -p --ssl-ca= -e "SELECT version();" @@ -148,7 +230,9 @@ For detailed instructions, see [Azure guide to create a private endpoint via Pri
-- If you use AWS VPC Peering or Google Cloud VPC Network Peering, see the following instructions to configure the network. +#### VPC Peering + +If you use AWS VPC Peering or Google Cloud VPC Network Peering, see the following instructions to configure the network.
Set up AWS VPC Peering @@ -183,118 +267,52 @@ If your MySQL service is in a Google Cloud VPC, take the following steps:
-### Grant required privileges in the source MySQL database +### Grant required privileges for migration -The username you use for migration in the source database must have all the following privileges: +Before starting migration, you need to set up appropriate database users with the correct privileges on both source and target databases. These privileges enable TiDB Cloud DM to read data from MySQL, replicate changes, and write to your TiDB Cloud cluster securely. Since migration involves both full data dumps and binlog replication for incremental changes, your migration user requires specific permissions beyond basic read access. -| Privilege | Scope | -|:----|:----| -| `SELECT` | Tables | -| `LOCK` | Tables | -| `REPLICATION SLAVE` | Global | -| `REPLICATION CLIENT` | Global | +#### Grant required privileges to the migration user in the source MySQL database -For example, you can use the following `GRANT` statement to grant corresponding privileges: +For testing purposes, you can use an administrative user (e.g., `root`) in your source MySQL database. -```sql -GRANT SELECT, LOCK TABLES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'your_user'@'your_IP_address_of_host' -``` +For production workloads, it is recommended to have a dedicated user for data dump and replication in the source MySQL database, and grant only the necessary privileges: -### Grant required privileges in the target TiDB Cloud cluster +| Privilege | Scope | Purpose | +|:----------|:------|:--------| +| `SELECT` | Tables | Allows reading data from all tables | +| `LOCK TABLES` | Tables | Ensures consistent snapshots during full dump | +| `REPLICATION SLAVE` | Global | Enables binlog streaming for incremental replication | +| `REPLICATION CLIENT` | Global | Provides access to binlog position and server status | -The username you use for the migration in the target TiDB Cloud cluster must have the following privileges: - -| Privilege | Scope | -|:----|:----| -| `CREATE` | Databases, Tables | -| `SELECT` | Tables | -| `INSERT` | Tables | -| `UPDATE` | Tables | -| `DELETE` | Tables | -| `ALTER` | Tables | -| `DROP` | Databases, Tables | -| `INDEX` | Tables | - -For example, you can execute the following `GRANT` statement to grant corresponding privileges: +For example, you can use the following `GRANT` statement in your source MySQL instance to grant corresponding privileges: ```sql -GRANT CREATE, SELECT, INSERT, UPDATE, DELETE, ALTER, DROP, INDEX ON *.* TO 'your_user'@'your_IP_address_of_host' +GRANT SELECT, LOCK TABLES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'dm_source_user'@'%'; ``` -To quickly test a migration job, you can use the `root` account of the TiDB Cloud cluster. The MySQL user hostname part also needs to allow connections from the TiDB Cloud DM service, and you can use `%` for simplification. - -### Enable binary logs for replication +#### Grant required privileges in the target TiDB Cloud cluster -To enable replication from the source MySQL database to the TiDB Cloud target cluster using DM for continuously capturing incremental changes, you need these MySQL configurations: +For testing purposes, you can use the `root` account of the TiDB Cloud cluster. -| Configuration | Required value | Why | -|:--------------|:---------------|:----| -| `log_bin` | `ON` | Enables binary logging that DM reads to replay changes in TiDB | -| `binlog_format` | `ROW` | Captures all data changes accurately (other formats miss edge cases) | -| `binlog_row_image` | `FULL` | Includes all column values in events for safe conflict resolution | -| `binlog_expire_logs_seconds` | ≥ 86400 (1 day), 604800 (7 days) recommended | Ensures DM can access consecutive logs during migration | +For production workloads, it is recommended to have a dedicated user for replication in the target TiDB Cloud cluster and grant only the necessary privileges: -#### Check current values and configure the source MySQL instance +| Privilege | Scope | Purpose | +|:----------|:------|:--------| +| `CREATE` | Databases, Tables | Creates schema objects in the target | +| `SELECT` | Tables | Verifies data during migration | +| `INSERT` | Tables | Writes migrated data | +| `UPDATE` | Tables | Modifies existing rows during incremental replication | +| `DELETE` | Tables | Removes rows during replication or updates | +| `ALTER` | Tables | Modifies table definitions when schema changes | +| `DROP` | Databases, Tables | Removes objects during schema sync | +| `INDEX` | Tables | Creates and modifies indexes | -To confirm the current configurations, connect to the source MySQL instance and run: +For example, you can execute the following `GRANT` statement to grant corresponding privileges: ```sql -SHOW VARIABLES WHERE Variable_name IN -('log_bin','server_id','binlog_format','binlog_row_image', -'binlog_expire_logs_seconds','expire_logs_days'); +GRANT CREATE, SELECT, INSERT, UPDATE, DELETE, ALTER, DROP, INDEX ON *.* TO 'dm_target_user'@'%'; ``` -If necessary, change the source MySQL instance configurations to match the requirements. - -
- Configure a self‑managed MySQL instance - -1. Open `/etc/my.cnf` and add: - - ```toml - [mysqld] - log_bin = mysql-bin - binlog_format = ROW - binlog_row_image = FULL - binlog_expire_logs_seconds = 604800 # 7 days retention - ``` - -2. Restart: `sudo systemctl restart mysqld` - -3. Run the `SHOW VARIABLES` query again to verify that the settings took effect. - -
- -
- Configure AWS RDS or Aurora MySQL - -1. In the AWS console, open RDS, Parameter groups, and create (or edit) a custom parameter group. -2. Set the four parameters above to the required values. -3. Attach the parameter group to your instance/cluster and reboot to apply changes. -4. After the reboot, connect and run the `SHOW VARIABLES` query to confirm. - -
- -
- Configure Azure Database for MySQL ‑ Flexible Server - -1. In the Azure portal, open MySQL Flexible Server, Server parameters. -2. Search for each setting and update the values. -Most changes apply without restart; the portal indicates if a reboot is needed. -3. Verify with the `SHOW VARIABLES` query. - -
- -
- Configure Google Cloud SQL for MySQL - -1. In the Google Cloud console, go to Cloud SQL, ``, Flags. -2. Add or edit the necessary flags (`log_bin`, `binlog_format`, `binlog_row_image`, `binlog_expire_logs_seconds`). -3. Click Save. Cloud SQL prompts a restart if required. -4. After Cloud SQL restarts, run the `SHOW VARIABLES` query to confirm. - -
- ## Step 1: Go to the **Data Migration** page 1. Log in to the [TiDB Cloud console](https://tidbcloud.com/) and navigate to the [**Clusters**](https://tidbcloud.com/console/clusters) page of your project. @@ -307,9 +325,9 @@ Most changes apply without restart; the portal indicates if a reboot is needed. 3. On the **Data Migration** page, click **Create Migration Job** in the upper-right corner. The **Create Migration Job** page is displayed. -## Step 2: Configure the source and target connection +## Step 2: Configure the source and target connections -On the **Create Migration Job** page, configure the source and target connection. +On the **Create Migration Job** page, configure the source and target connections. 1. Enter a job name, which must start with a letter and must be less than 60 characters. Letters (A-Z, a-z), numbers (0-9), underscores (_), and hyphens (-) are acceptable. @@ -394,12 +412,12 @@ For detailed instructions about incremental data migration, see [Migrate Only In - - If you click **Customize** and select some tables under a dataset name, the migration job only will migrate the existing data and migrate ongoing changes of the selected tables. Tables created afterwards in the same database will not be migrated. + - If you click **Customize** and select some tables under a dataset name, the migration job will only migrate the existing data and migrate ongoing changes of the selected tables. Tables created afterwards in the same database will not be migrated. @@ -440,7 +458,7 @@ If you encounter any problems during the migration, see [Migration errors and so TiDB Cloud supports scaling up or down a migration job specification to meet your performance and cost requirements in different scenarios. -Different migration specifications have different performances. Your performance requirements might vary at different stages as well. For example, during the existing data migration, you want the performance to be as fast as possible, so you choose a migration job with a large specification, such as 8 RCU. Once the existing data migration is completed, the incremental migration does not require such a high performance, so you can scale down the job specification, for example, from 8 RCU to 2 RUC, to save cost. +Different migration specifications have different performances. Your performance requirements might vary at different stages as well. For example, during the existing data migration, you want the performance to be as fast as possible, so you choose a migration job with a large specification, such as 8 RCU. Once the existing data migration is completed, the incremental migration does not require such a high performance, so you can scale down the job specification, for example, from 8 RCU to 2 RCU, to save cost. When scaling a migration job specification, note the following: