diff --git a/media/tidb-cloud/migration-job-select-all.png b/media/tidb-cloud/migration-job-select-all.png deleted file mode 100644 index 6ba40a0607c9c..0000000000000 Binary files a/media/tidb-cloud/migration-job-select-all.png and /dev/null differ diff --git a/media/tidb-cloud/migration-job-select-db-blacklist1.png b/media/tidb-cloud/migration-job-select-db-blacklist1.png deleted file mode 100644 index 42791c57e1aec..0000000000000 Binary files a/media/tidb-cloud/migration-job-select-db-blacklist1.png and /dev/null differ diff --git a/media/tidb-cloud/migration-job-select-db-blacklist2.png b/media/tidb-cloud/migration-job-select-db-blacklist2.png deleted file mode 100644 index fd4eb232fe1ef..0000000000000 Binary files a/media/tidb-cloud/migration-job-select-db-blacklist2.png and /dev/null differ diff --git a/media/tidb-cloud/migration-job-select-db.png b/media/tidb-cloud/migration-job-select-db.png deleted file mode 100644 index 1dbe2b9189581..0000000000000 Binary files a/media/tidb-cloud/migration-job-select-db.png and /dev/null differ diff --git a/media/tidb-cloud/migration-job-select-tables.png b/media/tidb-cloud/migration-job-select-tables.png deleted file mode 100644 index 0e340b66ce94e..0000000000000 Binary files a/media/tidb-cloud/migration-job-select-tables.png and /dev/null differ diff --git a/tidb-cloud/migrate-from-mysql-using-data-migration.md b/tidb-cloud/migrate-from-mysql-using-data-migration.md index f72c93472e6fe..4beffc7ccbecb 100644 --- a/tidb-cloud/migrate-from-mysql-using-data-migration.md +++ b/tidb-cloud/migrate-from-mysql-using-data-migration.md @@ -1,16 +1,16 @@ --- title: Migrate MySQL-Compatible Databases to TiDB Cloud Using Data Migration -summary: Learn how to migrate data from MySQL-compatible databases hosted in Amazon Aurora MySQL, Amazon Relational Database Service (RDS), Google Cloud SQL for MySQL, or a local MySQL instance to TiDB Cloud using Data Migration. +summary: Learn how to seamlessly migrate your MySQL databases from Amazon Aurora MySQL, Amazon RDS, Azure Database for MySQL - Flexible Server, Google Cloud SQL for MySQL, or self-managed MySQL instances to TiDB Cloud with minimal downtime using the Data Migration feature. aliases: ['/tidbcloud/migrate-data-into-tidb','/tidbcloud/migrate-incremental-data-from-mysql'] --- # Migrate MySQL-Compatible Databases to TiDB Cloud Using Data Migration -This document describes how to migrate data from a MySQL-compatible database on a cloud provider (Amazon Aurora MySQL, Amazon Relational Database Service (RDS), or Google Cloud SQL for MySQL) or self-hosted source database to TiDB Cloud using the Data Migration feature of the TiDB Cloud console. +This document guides you through migrating your MySQL databases from Amazon Aurora MySQL, Amazon RDS, Azure Database for MySQL - Flexible Server, Google Cloud SQL for MySQL, or self-managed MySQL instances to TiDB Cloud using the Data Migration feature in the [TiDB Cloud console](https://tidbcloud.com/). -This feature helps you migrate your source databases' existing data and ongoing changes to TiDB Cloud (either in the same region or cross regions) directly in one go. +This feature enables you to migrate your existing MySQL data and continuously replicate ongoing changes (binlog) from your MySQL-compatible source databases directly to TiDB Cloud, maintaining data consistency whether in the same region or across different regions. The streamlined process eliminates the need for separate dump and load operations, reducing downtime and simplifying your migration from MySQL to a more scalable platform. -If you want to migrate incremental data only, see [Migrate Incremental Data from MySQL-Compatible Databases to TiDB Cloud Using Data Migration](/tidb-cloud/migrate-incremental-data-from-mysql-using-data-migration.md). +If you only want to replicate ongoing binlog changes from your MySQL-compatible database to TiDB Cloud, see [Migrate Incremental Data from MySQL-Compatible Databases to TiDB Cloud Using Data Migration](/tidb-cloud/migrate-incremental-data-from-mysql-using-data-migration.md). ## Limitations @@ -18,7 +18,7 @@ If you want to migrate incremental data only, see [Migrate Incremental Data from - The Data Migration feature is available only for **TiDB Cloud Dedicated** clusters. -- The Data Migration feature is only available to clusters that are created in [certain regions](https://www.pingcap.com/tidb-cloud-pricing-details/#dm-cost) after November 9, 2022. If your **project** was created before the date or if your cluster is in another region, this feature is not available to your cluster and the **Data Migration** tab will not be displayed on the cluster overview page in the TiDB Cloud console. +- If you don't see the [Data Migration](/tidb-cloud/migrate-from-mysql-using-data-migration.md#step-1-go-to-the-data-migration-page) entry for your TiDB Cloud Dedicated cluster in the [TiDB Cloud console](https://tidbcloud.com/), the feature might not be available in your region. To request support for your region, contact [TiDB Cloud Support](/tidb-cloud/tidb-cloud-support.md). - Amazon Aurora MySQL writer instances support both existing data and incremental data migration. Amazon Aurora MySQL reader instances only support existing data migration and do not support incremental data migration. @@ -34,84 +34,229 @@ You can create up to 200 migration jobs for each organization. To create more mi ### Limitations of existing data migration -- During existing data migration, if the table to be migrated already exists in the target database with duplicated keys, the duplicate keys will be replaced. +- During existing data migration, if the target database already contains the table to be migrated and there are duplicate keys, the rows with duplicate keys will be replaced. - If your dataset size is smaller than 1 TiB, it is recommended that you use logical mode (the default mode). If your dataset size is larger than 1 TiB, or you want to migrate existing data faster, you can use physical mode. For more information, see [Migrate existing data and incremental data](#migrate-existing-data-and-incremental-data). ### Limitations of incremental data migration -- During incremental data migration, if the table to be migrated already exists in the target database with duplicated keys, an error is reported and the migration is interrupted. In this situation, you need to make sure whether the upstream data is accurate. If yes, click the "Restart" button of the migration job and the migration job will replace the downstream conflicting records with the upstream records. +- During incremental data migration, if the table to be migrated already exists in the target database with duplicate keys, an error is reported and the migration is interrupted. In this situation, you need to make sure whether the MySQL source data is accurate. If yes, click the "Restart" button of the migration job, and the migration job will replace the target TiDB Cloud cluster's conflicting records with the MySQL source records. -- During incremental replication (migrating ongoing changes to your cluster), if the migration job recovers from an abrupt error, it might open the safe mode for 60 seconds. During the safe mode, `INSERT` statements are migrated as `REPLACE`, `UPDATE` statements as `DELETE` and `REPLACE`, and then these transactions are migrated to the downstream cluster to make sure that all the data during the abrupt error has been migrated smoothly to the downstream cluster. In this scenario, for upstream tables without primary keys or not-null unique indexes, some data might be duplicated in the downstream cluster because the data might be inserted repeatedly to the downstream. +- During incremental replication (migrating ongoing changes to your cluster), if the migration job recovers from an abrupt error, it might open the safe mode for 60 seconds. During the safe mode, `INSERT` statements are migrated as `REPLACE`, `UPDATE` statements as `DELETE` and `REPLACE`, and then these transactions are migrated to the target TiDB Cloud cluster to make sure that all the data during the abrupt error has been migrated smoothly to the target TiDB Cloud cluster. In this scenario, for MySQL source tables without primary keys or non-null unique indexes, some data might be duplicated in the target TiDB Cloud cluster because the data might be inserted repeatedly into the target TiDB Cloud cluster. - In the following scenarios, if the migration job takes longer than 24 hours, do not purge binary logs in the source database to ensure that Data Migration can get consecutive binary logs for incremental replication: - - During existing data migration. + - During the existing data migration. - After the existing data migration is completed and when incremental data migration is started for the first time, the latency is not 0ms. ## Prerequisites -Before performing the migration, you need to check the data sources, prepare privileges for upstream and downstream databases, and set up network connections. +Before migrating, check whether your data source is supported, enable binary logging in your MySQL-compatible database, ensure network connectivity, and grant required privileges for both the source database and the target TiDB Cloud cluster database. ### Make sure your data source and version are supported Data Migration supports the following data sources and versions: -- MySQL 5.6, 5.7, and 8.0 local instances or on a public cloud provider. Note that MySQL 8.0 is still experimental on TiDB Cloud and might have incompatibility issues. -- Amazon Aurora (MySQL 5.6 and 5.7) -- Amazon RDS (MySQL 5.7) -- Google Cloud SQL for MySQL 5.6 and 5.7 +| Data source | Supported versions | +|:------------|:-------------------| +| Self-managed MySQL (on-premises or public cloud) | 8.0, 5.7, 5.6 | +| Amazon Aurora MySQL | 8.0, 5.7, 5.6 | +| Amazon RDS MySQL | 8.0, 5.7 | +| Azure Database for MySQL - Flexible Server | 8.0, 5.7 | +| Google Cloud SQL for MySQL | 8.0, 5.7, 5.6 | -### Grant required privileges to the upstream database +### Enable binary logs in the source MySQL-compatible database for replication -The username you use for the upstream database must have all the following privileges: +To continuously replicate incremental changes from the source MySQL-compatible database to the TiDB Cloud target cluster using DM, you need the following configurations to enable binary logs in the source database: -| Privilege | Scope | -|:----|:----| -| `SELECT` | Tables | -| `LOCK` | Tables | -| `REPLICATION SLAVE` | Global | -| `REPLICATION CLIENT` | Global | +| Configuration | Required value | Why | +|:--------------|:---------------|:----| +| `log_bin` | `ON` | Enables binary logging, which DM uses to replicate changes to TiDB | +| `binlog_format` | `ROW` | Captures all data changes accurately (other formats miss edge cases) | +| `binlog_row_image` | `FULL` | Includes all column values in events for safe conflict resolution | +| `binlog_expire_logs_seconds` | ≥ `86400` (1 day), `604800` (7 days, recommended) | Ensures DM can access consecutive logs during migration | -For example, you can use the following `GRANT` statement to grant corresponding privileges: +#### Check current values and configure the source MySQL instance + +To check the current configurations, connect to the source MySQL instance and execute the following statement: ```sql -GRANT SELECT,LOCK TABLES,REPLICATION SLAVE,REPLICATION CLIENT ON *.* TO 'your_user'@'your_IP_address_of_host' +SHOW VARIABLES WHERE Variable_name IN +('log_bin','server_id','binlog_format','binlog_row_image', +'binlog_expire_logs_seconds','expire_logs_days'); ``` -### Grant required privileges to the downstream TiDB Cloud cluster +If necessary, change the source MySQL instance configurations to match the required values. -The username you use for the downstream TiDB Cloud cluster must have the following privileges: +
+ Configure a self‑managed MySQL instance -| Privilege | Scope | -|:----|:----| -| `CREATE` | Databases, Tables | -| `SELECT` | Tables | -| `INSERT` | Tables | -| `UPDATE` | Tables | -| `DELETE` | Tables | -| `ALTER` | Tables | -| `DROP` | Databases, Tables | -| `INDEX` | Tables | +1. Open `/etc/my.cnf` and add the following: -For example, you can execute the following `GRANT` statement to grant corresponding privileges: + ``` + [mysqld] + log_bin = mysql-bin + binlog_format = ROW + binlog_row_image = FULL + binlog_expire_logs_seconds = 604800 # 7 days retention + ``` -```sql -GRANT CREATE,SELECT,INSERT,UPDATE,DELETE,ALTER,DROP,INDEX ON *.* TO 'your_user'@'your_IP_address_of_host' -``` +2. Restart the MySQL service to apply the changes: + + ``` + sudo systemctl restart mysqld + ``` + +3. Run the `SHOW VARIABLES` statement again to verify that the settings take effect. + +For detailed instructions, see [MySQL Server System Variables](https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html) and [The Binary Log](https://dev.mysql.com/doc/refman/8.0/en/binary-log.html) in MySQL documentation. + +
+ +
+ Configure AWS RDS or Aurora MySQL + +1. In the AWS Management Console, open the [Amazon RDS console](https://console.aws.amazon.com/rds/), click **Parameter groups** in the left navigation pane, and then create or edit a custom parameter group. +2. Set the four parameters above to the required values. +3. Attach the parameter group to your instance or cluster, and then reboot to apply the changes. +4. After the reboot, connect to the instance and run the `SHOW VARIABLES` statement to verify the configuration. + +For detailed instructions, see [Working with DB Parameter Groups](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_WorkingWithParamGroups.html) and [Configuring MySQL Binary Logging](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_LogAccess.MySQL.BinaryFormat.html) in AWS documentation. + +
+ +
+ Configure Azure Database for MySQL - Flexible Server + +1. In the [Azure portal](https://portal.azure.com/), search for and select **Azure Database for MySQL servers**, click your instance name, and then click **Setting** > **Server parameters** in the left navigation pane. + +2. Search for each parameter and update its value. + + Most changes take effect without a restart. If a restart is required, you will get a prompt from the portal. + +3. Run the `SHOW VARIABLES` statement to verify the configuration. + +For detailed instructions, see [Configure server parameters in Azure Database for MySQL - Flexible Server using the Azure portal](https://learn.microsoft.com/en-us/azure/mysql/flexible-server/how-to-configure-server-parameters-portal) in the Microsoft Azure documentation. + +
+ +
+ Configure Google Cloud SQL for MySQL + +1. In the [Google Cloud console](https://console.cloud.google.com/project/_/sql/instances), select the project that contains your instance, click your instance name, and then click **Edit**. +2. Add or modify the required flags (`log_bin`, `binlog_format`, `binlog_row_image`, `binlog_expire_logs_seconds`). +3. Click **Save**. If a restart is required, you will get a prompt from the console. +4. After the restart, run the `SHOW VARIABLES` statement to confirm the changes. + +For detailed instructions, see [Configure database flags](https://cloud.google.com/sql/docs/mysql/flags) and [Use point-in-time recovery](https://cloud.google.com/sql/docs/mysql/backup-recovery/pitr) in Google Cloud documentation. + +
+ +### Ensure network connectivity + +Before creating a migration job, you need to plan and set up proper network connectivity between your source MySQL instance, the TiDB Cloud Data Migration (DM) service, and your target TiDB Cloud cluster. + +The available connection methods are as follows: + +| Connection method | Availability | Recommended for | +|:---------------------|:-------------|:----------------| +| Public endpoints or IP addresses | All cloud providers supported by TiDB Cloud | Quick proof-of-concept migrations, testing, or when private connectivity is unavailable | +| Private links or private endpoints | AWS and Azure only | Production workloads without exposing data to the public internet | +| VPC peering | AWS and Google Cloud only | Production workloads that need low-latency, intra-region connections and have non-overlapping VPC/VNet CIDRs | + +Choose a connection method that best fits your cloud provider, network topology, and security requirements, and then follow the setup instructions for that method. + +#### End-to-end encryption over TLS/SSL + +Regardless of the connection method, it is strongly recommended to use TLS/SSL for end-to-end encryption. While private endpoints and VPC peering secure the network path, TLS/SSL secures the data itself and helps meet compliance requirements. + +
+ Download and store the cloud provider's certificates for TLS/SSL encrypted connections + +- Amazon Aurora MySQL or Amazon RDS MySQL: [Using SSL/TLS to encrypt a connection to a DB instance or cluster](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/UsingWithRDS.SSL.html) +- Azure Database for MySQL - Flexible Server: [Connect with encrypted connections](https://learn.microsoft.com/en-us/azure/mysql/flexible-server/how-to-connect-tls-ssl) +- Google Cloud SQL for MySQL: [Manage SSL/TLS certificates](https://cloud.google.com/sql/docs/mysql/manage-ssl-instance) + +
+ +#### Public endpoints or IP addresses + +When using public endpoints, you can verify network connectivity and access both now and later during the DM job creation process. TiDB Cloud will provide specific egress IP addresses and prompt instructions at that time. + +1. Identify and record the source MySQL instance's endpoint hostname (FQDN) or public IP address. +2. Ensure you have the required permissions to modify the firewall or security group rules for your database. Refer to your cloud provider's documentation for guidance as follows: + + - Amazon Aurora MySQL or Amazon RDS MySQL: [Controlling access with security groups](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Overview.RDSSecurityGroups.html). + - Azure Database for MySQL - Flexible Server: [Public Network Access](https://learn.microsoft.com/en-us/azure/mysql/flexible-server/concepts-networking-public) + - Google Cloud SQL for MySQL: [Authorized Networks](https://cloud.google.com/sql/docs/mysql/configure-ip#authorized-networks). + +3. Optional: Verify connectivity to your source database from a machine with public internet access using the appropriate certificate for in-transit encryption: -To quickly test a migration job, you can use the `root` account of the TiDB Cloud cluster. + ```shell + mysql -h -P -u -p --ssl-ca= -e "SELECT version();" + ``` -### Set up network connection +4. Later, during the Data Migration job setup, TiDB Cloud will provide an egress IP range. At that time, you need to add this IP range to your database's firewall or security‑group rules following the same procedure above. -Before creating a migration job, set up the network connection according to your connection methods. See [Connect to Your TiDB Cloud Dedicated Cluster](/tidb-cloud/connect-to-tidb-cluster.md). +#### Private link or private endpoint -- If you use public IP (this is, public connection) for network connection, make sure that the upstream database can be connected through the public network. +If you use a provider-native private link or private endpoint, create a private endpoint for your source MySQL instance (RDS, Aurora, or Azure Database for MySQL). -- If you use AWS VPC Peering or Google Cloud VPC Network Peering, see the following instructions to configure the network. +
+ Set up AWS PrivateLink and Private Endpoint for the MySQL source database + +AWS does not support direct PrivateLink access to RDS or Aurora. Therefore, you need to create a Network Load Balancer (NLB) and publish it as an endpoint service associated with your source MySQL instance. + +1. In the [Amazon EC2 console](https://console.aws.amazon.com/ec2/), create an NLB in the same subnet(s) as your RDS or Aurora writer. Configure the NLB with a TCP listener on port `3306` that forwards traffic to the database endpoint. + + For detailed instructions, see [Create a Network Load Balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/create-network-load-balancer.html) in AWS documentation. + +2. In the [Amazon VPC console](https://console.aws.amazon.com/vpc/), click **Endpoint Services** in the left navigation pane, and then create an endpoint service. During the setup, select the NLB created in the previous step as the backing load balancer, and enable the **Require acceptance for endpoint** option. After the endpoint service is created, copy the service name (in the `com.amazonaws.vpce-svc-xxxxxxxxxxxxxxxxx` format) for later use. + + For detailed instructions, see [Create an endpoint service](https://docs.aws.amazon.com/vpc/latest/privatelink/create-endpoint-service.html) in AWS documentation. + +3. Optional: Test connectivity from a bastion or client inside the same VPC or VNet before starting the migration: + + ```shell + mysql -h -P 3306 -u -p --ssl-ca= -e "SELECT version();" + ``` + +4. Later, when configuring TiDB Cloud DM to connect via PrivateLink, you will need to return to the AWS console and approve the pending connection request from TiDB Cloud to this private endpoint. + +
- Set up AWS VPC Peering + Set up Azure PrivateLink and private endpoint for the MySQL source database + +Azure Database for MySQL - Flexible Server supports native private endpoints. You can either enable private access (VNet Integration) during MySQL instance creation or add a private endpoint later. + +To add a new private endpoint, take the following steps: + +1. In the [Azure portal](https://portal.azure.com/), search for and select **Azure Database for MySQL servers**, click your instance name, and then click **Setting** > **Networking** in the left navigation pane. +2. On the **Networking** page, scroll down to the **Private endpoints** section, click **+ Create private endpoint**, and then follow the on-screen instructions to set up the private endpoint. + + During the setup, select the virtual network and subnet that TiDB Cloud can access in the **Virtual Network** tab, and keep **Private DNS integration** enabled in the **DNS** tab. After the private endpoint is created and deployed, click **Go to resource**, click **Settings** > **DNS configuration** in the left navigation pane, and find the hostname to be used to connect with the instance in the **Customer Visible FQDNs** section. Typically, the hostname is in the `.mysql.database.azure.com` format. + + For detailed instructions, see [Create a private endpoint via Private Link Center](https://learn.microsoft.com/en-us/azure/mysql/flexible-server/how-to-networking-private-link-portal#create-a-private-endpoint-via-private-link-center) in Azure documentation. + +3. Optional: Test connectivity from a bastion or client inside the same VPC or VNet before starting the migration: + + ```shell + mysql -h -P 3306 -u -p --ssl-ca= -e "SELECT version();" + ``` + +4. In the [Azure portal](https://portal.azure.com/), return to the overview page of your MySQL Flexible Server instance (not the private endpoint object), click **JSON View** for the **Essentials** section, and then copy the resource ID for later use. The resource ID is in the `/subscriptions//resourceGroups//providers/Microsoft.DBforMySQL/flexibleServers/` format. You will use this resource ID (not the private endpoint ID) to configure TiDB Cloud DM. + +5. Later, when configuring TiDB Cloud DM to connect via PrivateLink, you will need to return to the Azure portal and approve the pending connection request from TiDB Cloud to this private endpoint. + +
+ +#### VPC peering + +If you use AWS VPC peering or Google Cloud VPC network peering, see the following instructions to configure the network. + +
+ Set up AWS VPC peering If your MySQL service is in an AWS VPC, take the following steps: @@ -129,7 +274,7 @@ If your MySQL service is in an AWS VPC, take the following steps:
- Set up Google Cloud VPC Network Peering + Set up Google Cloud VPC network peering If your MySQL service is in a Google Cloud VPC, take the following steps: @@ -143,16 +288,52 @@ If your MySQL service is in a Google Cloud VPC, take the following steps:
-### Enable binary logs +### Grant required privileges for migration + +Before starting migration, you need to set up appropriate database users with the required privileges on both the source and target databases. These privileges enable TiDB Cloud DM to read data from MySQL, replicate changes, and write to your TiDB Cloud cluster securely. Because the migration involves both full data dumps for existing data and binlog replication for incremental changes, your migration user requires specific permissions beyond basic read access. + +#### Grant required privileges to the migration user in the source MySQL database + +For testing purposes, you can use an administrative user (such as `root`) in your source MySQL database. + +For production workloads, it is recommended to have a dedicated user for data dump and replication in the source MySQL database, and grant only the necessary privileges: + +| Privilege | Scope | Purpose | +|:----------|:------|:--------| +| `SELECT` | Tables | Allows reading data from all tables | +| `LOCK TABLES` | Tables | Ensures consistent snapshots during full dump | +| `REPLICATION SLAVE` | Global | Enables binlog streaming for incremental replication | +| `REPLICATION CLIENT` | Global | Provides access to binlog position and server status | + +For example, you can use the following `GRANT` statement in your source MySQL instance to grant corresponding privileges: + +```sql +GRANT SELECT, LOCK TABLES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'dm_source_user'@'%'; +``` + +#### Grant required privileges in the target TiDB Cloud cluster -To perform incremental data migration, make sure the following requirements are met: +For testing purposes, you can use the `root` account of your TiDB Cloud cluster. -- Binary logs are enabled for the upstream database. -- The binary logs are retained for at least 24 hours. -- The binlog format for the upstream database is set to `ROW`. If not, update the format to `ROW` as follows to avoid the [format error](/tidb-cloud/tidb-cloud-dm-precheck-and-troubleshooting.md#error-message-check-whether-mysql-binlog_format-is-row): +For production workloads, it is recommended to have a dedicated user for replication in the target TiDB Cloud cluster and grant only the necessary privileges: - - MySQL: execute the `SET GLOBAL binlog_format=ROW;` statement. If you want to persist this change across reboots, you can execute the `SET PERSIST binlog_format=ROW;` statement. - - Amazon Aurora MySQL or RDS for MySQL: follow the instructions in [AWS documentation](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/USER_WorkingWithDBInstanceParamGroups.html) to create a new DB parameter group. Set the `binlog_format=row` parameter in the new DB parameter group, modify the instance to use the new DB parameter group, and then restart the instance to take effect. +| Privilege | Scope | Purpose | +|:----------|:------|:--------| +| `CREATE` | Databases, Tables | Creates schema objects in the target | +| `SELECT` | Tables | Verifies data during migration | +| `INSERT` | Tables | Writes migrated data | +| `UPDATE` | Tables | Modifies existing rows during incremental replication | +| `DELETE` | Tables | Removes rows during replication or updates | +| `ALTER` | Tables | Modifies table definitions when schema changes | +| `DROP` | Databases, Tables | Removes objects during schema sync | +| `INDEX` | Tables | Creates and modifies indexes | +| `CREATE VIEW` | View | Create views used by migration | + +For example, you can execute the following `GRANT` statement in your target TiDB Cloud cluster to grant corresponding privileges: + +```sql +GRANT CREATE, SELECT, INSERT, UPDATE, DELETE, ALTER, DROP, INDEX ON *.* TO 'dm_target_user'@'%'; +``` ## Step 1: Go to the **Data Migration** page @@ -166,38 +347,66 @@ To perform incremental data migration, make sure the following requirements are 3. On the **Data Migration** page, click **Create Migration Job** in the upper-right corner. The **Create Migration Job** page is displayed. -## Step 2: Configure the source and target connection +## Step 2: Configure the source and target connections -On the **Create Migration Job** page, configure the source and target connection. +On the **Create Migration Job** page, configure the source and target connections. 1. Enter a job name, which must start with a letter and must be less than 60 characters. Letters (A-Z, a-z), numbers (0-9), underscores (_), and hyphens (-) are acceptable. 2. Fill in the source connection profile. - - **Data source**: the data source type. - - **Region**: the region of the data source, which is required for cloud databases only. - - **Connectivity method**: the connection method for the data source. Currently, you can choose public IP, VPC Peering, or Private Link according to your connection method. - - **Hostname or IP address** (for public IP and VPC Peering): the hostname or IP address of the data source. - - **Service Name** (for Private Link): the endpoint service name. - - **Port**: the port of the data source. - - **Username**: the username of the data source. - - **Password**: the password of the username. - - **SSL/TLS**: if you enable SSL/TLS, you need to upload the certificates of the data source, including any of the following: - - only the CA certificate - - the client certificate and client key - - the CA certificate, client certificate and client key + - **Data source**: the data source type. + - **Connectivity method**: select a connection method for your data source based on your security requirements and cloud provider: + - **Public IP**: available for all cloud providers (recommended for testing and proof-of-concept migrations). + - **Private Link**: available for AWS and Azure only (recommended for production workloads requiring private connectivity). + - **VPC Peering**: available for AWS and Google Cloud only (recommended for production workloads needing low-latency, intra-region connections with non-overlapping VPC/VNet CIDRs). + - Based on the selected **Connectivity method**, do the following: + - If **Public IP** or **VPC Peering** is selected, fill in the **Hostname or IP address** field with the hostname or IP address of the data source. + - If **Private Link** is selected, fill in the following information: + - **Endpoint Service Name** (available if **Data source** is from AWS): enter the VPC endpoint service name (format: `com.amazonaws.vpce-svc-xxxxxxxxxxxxxxxxx`) that you created for your RDS or Aurora instance. + - **Private Endpoint Resource ID** (available if **Data source** is from Azure): enter the resource ID of your MySQL Flexible Server instance (format: `/subscriptions//resourceGroups//providers/Microsoft.DBforMySQL/flexibleServers/`). + - **Port**: the port of the data source. + - **User Name**: the username of the data source. + - **Password**: the password of the username. + - **SSL/TLS**: enable SSL/TLS for end-to-end data encryption (highly recommended for all migration jobs). Upload the appropriate certificates based on your MySQL server's SSL configuration. + +
+ SSL/TLS configuration options + + - **Option 1: Server authentication only** + - If your MySQL server is configured for server authentication only, upload only the **CA Certificate**. + - In this option, the MySQL server presents its certificate to prove its identity, and TiDB Cloud verifies the server certificate against the CA. + - The CA certificate protects against man-in-the-middle attacks and is required if the MySQL server is started with `require_secure_transport = ON`. + - **Option 2: Client certificate authentication** + - If your MySQL server is configured for client certificate authentication, upload **Client Certificate** and **Client private key**. + - In this option, TiDB Cloud presents its certificate to the MySQL server for authentication, but TiDB Cloud does not verify the MySQL server's certificate. + - This option is typically used when the MySQL server is configured with options such as `REQUIRE SUBJECT '...'` or `REQUIRE ISSUER '...'` without `REQUIRE X509`, allowing it to check specific attributes of the client certificate without full CA validation of that client certificate. + - This option is often used when the MySQL server accepts client certificates in self-signed or custom PKI environments. Note that this configuration is vulnerable to man-in-the-middle attacks and is not recommended for production environments unless other network-level controls guarantee server authenticity. + - **Option 3: Mutual TLS (mTLS) - highest security** + - If your MySQL server is configured for mutual TLS (mTLS) authentication, upload **CA Certificate**, **Client Certificate**, and **Client private key**. + - In this option, the MySQL server verifies TiDB Cloud's identity using the client certificate, and TiDB Cloud verifies MySQL server's identity using the CA certificate. + - This option is required when the MySQL server has `REQUIRE X509` or `REQUIRE SSL` configured for the migration user. + - This option is used when the MySQL server requires client certificates for authentication. + - You can get the certificates from the following sources: + - Download from your cloud provider (see [TLS certificate links](#end-to-end-encryption-over-tlsssl)). + - Use your organization's internal CA certificates. + - Self-signed certificates (for development/testing only). + +
3. Fill in the target connection profile. - - **Username**: enter the username of the target cluster in TiDB Cloud. - - **Password**: enter the password of the TiDB Cloud username. + - **User Name**: enter the username of the target cluster in TiDB Cloud. + - **Password**: enter the password of the TiDB Cloud username. 4. Click **Validate Connection and Next** to validate the information you have entered. 5. Take action according to the message you see: - - If you use Public IP or VPC Peering, you need to add the Data Migration service's IP addresses to the IP Access List of your source database and firewall (if any). - - If you use AWS Private Link, you are prompted to accept the endpoint request. Go to the [AWS VPC console](https://us-west-2.console.aws.amazon.com/vpc/home), and click **Endpoint services** to accept the endpoint request. + - If you use **Public IP** or **VPC Peering** as the connectivity method, you need to add the Data Migration service's IP addresses to the IP Access List of your source database and firewall (if any). + - If you use **Private Link** as the connectivity method, you are prompted to accept the endpoint request: + - For AWS: go to the [AWS VPC console](https://us-west-2.console.aws.amazon.com/vpc/home), click **Endpoint services**, and accept the endpoint request from TiDB Cloud. + - For Azure: go to the [Azure portal](https://portal.azure.com), search for your MySQL Flexible Server by name, click **Setting** > **Networking** in the left navigation pane, locate the **Private endpoint** section on the right side, and then approve the pending connection request from TiDB Cloud. ## Step 3: Choose migration job type @@ -209,20 +418,18 @@ To migrate data to TiDB Cloud once and for all, choose both **Existing data migr You can use **physical mode** or **logical mode** to migrate **existing data** and **incremental data**. -- The default mode is **logical mode**. This mode exports data from upstream databases as SQL statements, and then executes them on TiDB. In this mode, the target tables before migration can be either empty or non-empty. But the performance is slower than physical mode. - -- For large datasets, it is recommended to use **physical mode**. This mode exports data from upstream databases and encodes it as KV pairs, writing directly to TiKV to achieve faster performance. This mode requires the target tables to be empty before migration. For the specification of 16 RCUs (Replication Capacity Units), the performance is about 2.5 times faster than logical mode. The performance of other specifications can increase by 20% to 50% compared with logical mode. Note that the performance data is for reference only and might vary in different scenarios. +- The default mode is **logical mode**. This mode exports data from MySQL source databases as SQL statements and then executes them on TiDB. In this mode, the target tables before migration can be either empty or non-empty. But the performance is slower than physical mode. -Physical mode is available for TiDB clusters deployed on AWS and Google Cloud. +- For large datasets, it is recommended to use **physical mode**. This mode exports data from MySQL source databases and encodes it as KV pairs, writing directly to TiKV to achieve faster performance. This mode requires the target tables to be empty before migration. For the specification of 16 RCUs (Replication Capacity Units), the performance is about 2.5 times faster than logical mode. The performance of other specifications can increase by 20% to 50% compared with logical mode. Note that the performance data is for reference only and might vary in different scenarios. > **Note:** > > - When you use physical mode, you cannot create a second migration job or import task for the TiDB cluster before the existing data migration is completed. > - When you use physical mode and the migration job has started, do **NOT** enable PITR (Point-in-time Recovery) or have any changefeed on the cluster. Otherwise, the migration job will be stuck. If you need to enable PITR or have any changefeed, use logical mode instead to migrate data. -Physical mode exports the upstream data as fast as possible, so [different specifications](/tidb-cloud/tidb-cloud-billing-dm.md#specifications-for-data-migration) have different performance impacts on QPS and TPS of the upstream database during data export. The following table shows the performance regression of each specification. +Physical mode exports the MySQL source data as fast as possible, so [different specifications](/tidb-cloud/tidb-cloud-billing-dm.md#specifications-for-data-migration) have different performance impacts on QPS and TPS of the MySQL source database during data export. The following table shows the performance regression of each specification. -| Migration specification | Maximum export speed | Performance regression of the upstream database | +| Migration specification | Maximum export speed | Performance regression of the MySQL source database | |---------|-------------|--------| | 2 RCUs | 80.84 MiB/s | 15.6% | | 4 RCUs | 214.2 MiB/s | 20.0% | @@ -246,22 +453,8 @@ For detailed instructions about incremental data migration, see [Migrate Only In 1. On the **Choose Objects to Migrate** page, select the objects to be migrated. You can click **All** to select all objects, or click **Customize** and then click the checkbox next to the object name to select the object. - If you click **All**, the migration job will migrate the existing data from the whole source database instance to TiDB Cloud and migrate ongoing changes after the full migration. Note that it happens only if you have selected the **Existing data migration** and **Incremental data migration** checkboxes in the previous step. - - - - If you click **Customize** and select some databases, the migration job will migrate the existing data and migrate ongoing changes of the selected databases to TiDB Cloud. Note that it happens only if you have selected the **Existing data migration** and **Incremental data migration** checkboxes in the previous step. - - - - - If you click **Customize** and select some tables under a dataset name, the migration job only will migrate the existing data and migrate ongoing changes of the selected tables. Tables created afterwards in the same database will not be migrated. - - - - + - If you click **Customize** and select some tables under a dataset name, the migration job will only migrate the existing data and migrate ongoing changes of the selected tables. Tables created afterwards in the same database will not be migrated. 2. Click **Next**. @@ -299,7 +492,7 @@ If you encounter any problems during the migration, see [Migration errors and so TiDB Cloud supports scaling up or down a migration job specification to meet your performance and cost requirements in different scenarios. -Different migration specifications have different performances. Your performance requirements might vary at different stages as well. For example, during the existing data migration, you want the performance to be as fast as possible, so you choose a migration job with a large specification, such as 8 RCU. Once the existing data migration is completed, the incremental migration does not require such a high performance, so you can scale down the job specification, for example, from 8 RCU to 2 RUC, to save cost. +Different migration specifications have different performances. Your performance requirements might vary at different stages as well. For example, during the existing data migration, you want the performance to be as fast as possible, so you choose a migration job with a large specification, such as 8 RCU. Once the existing data migration is completed, the incremental migration does not require such a high performance, so you can scale down the job specification, for example, from 8 RCU to 2 RCU, to save cost. When scaling a migration job specification, note the following: @@ -311,7 +504,7 @@ When scaling a migration job specification, note the following: - You can only scale a migration job specification when the job is in the **Running** or **Paused** status. - TiDB Cloud does not support scaling a migration job specification during the existing data export stage. - Scaling a migration job specification will restart the job. If a source table of the job does not have a primary key, duplicate data might be inserted. -- During scaling, do not purge the binary log of the source database or increase `expire_logs_days` of the upstream database temporarily. Otherwise, the job might fail because it cannot get the continuous binary log position. +- During scaling, do not purge the binary log of the source database or increase `expire_logs_days` of the MySQL source database temporarily. Otherwise, the job might fail because it cannot get the continuous binary log position. ### Scaling procedure