Skip to content

Latest commit

 

History

History
165 lines (121 loc) · 10.3 KB

ticdc-sink-to-mysql.md

File metadata and controls

165 lines (121 loc) · 10.3 KB
title summary
Replicate Data to MySQL-compatible Databases
Learn how to replicate data to TiDB or MySQL using TiCDC.

Replicate Data to MySQL-compatible Databases

This document describes how to replicate incremental data to the downstream TiDB database or other MySQL-compatible databases using TiCDC. It also introduces how to use the eventually consistent replication feature in disaster scenarios.

Create a replication task

Create a replication task by running the following command:

cdc cli changefeed create \
    --server=http://10.0.10.25:8300 \
    --sink-uri="mysql://root:[email protected]:3306/" \
    --changefeed-id="simple-replication-task"
Create changefeed successfully!
ID: simple-replication-task
Info: {"sink-uri":"mysql://root:[email protected]:3306/","opts":{},"create-time":"2023-11-28T22:04:08.103600025+08:00","start-ts":415241823337054209,"target-ts":0,"admin-job-type":0,"sort-engine":"unified","sort-dir":".","config":{"case-sensitive":false,"filter":{"rules":["*.*"],"ignore-txn-start-ts":null,"ddl-allow-list":null},"mounter":{"worker-num":16},"sink":{"dispatchers":null},"scheduler":{"type":"table-number","polling-time":-1}},"state":"normal","history":null,"error":null}
  • --server: The address of any TiCDC server in the TiCDC cluster.
  • --changefeed-id: The ID of the replication task. The format must match the ^[a-zA-Z0-9]+(\-[a-zA-Z0-9]+)*$ regular expression. If this ID is not specified, TiCDC automatically generates a UUID (the version 4 format) as the ID.
  • --sink-uri: The downstream address of the replication task. For details, see Configure sink URI with mysql/tidb.
  • --start-ts: Specifies the starting TSO of the changefeed. From this TSO, the TiCDC cluster starts pulling data. The default value is the current time.
  • --target-ts: Specifies the ending TSO of the changefeed. To this TSO, the TiCDC cluster stops pulling data. The default value is empty, which means that TiCDC does not automatically stop pulling data.
  • --config: Specifies the changefeed configuration file. For details, see TiCDC Changefeed Configuration Parameters.

Note:

  • TiCDC only replicates incremental data. To initialize full data, use Dumpling/TiDB Lightning or BR.
  • After the full data is initialized, you need to specify the start-ts as the TSO when the upstream backup is performed. For example, the pos value in the metadata file under the Dumpling directory, or the backupTS value in the log output after BR completes the backup.

Configure sink URI for MySQL or TiDB

Sink URI is used to specify the connection information of the TiCDC target system. The format is as follows:

[scheme]://[userinfo@][host]:[port][/path]?[query_parameters]

Note:

/path is not used for the MySQL sink.

Sample configuration for MySQL:

--sink-uri="mysql://root:[email protected]:3306"

The following are descriptions of sink URI parameters and parameter values that can be configured for MySQL or TiDB:

Parameter/Parameter value Description
root The username of the downstream database. To replicate data to TiDB or other MySQL-compatible databases, make sure that the downstream database user has certain permissions.
123456 The password of the downstream database (can be encoded using Base64).
127.0.0.1 The IP address of the downstream database.
3306 The port for the downstream data.
worker-count The number of SQL statements that can be concurrently executed to the downstream (optional, 16 by default).
cache-prep-stmts Controls whether to use prepared statements when executing SQL in the downstream and enable prepared statement cache on the client side (optional, true by default).
max-txn-row The size of a transaction batch that can be executed to the downstream (optional, 256 by default).
ssl-ca The path of the CA certificate file needed to connect to the downstream MySQL instance (optional).
ssl-cert The path of the certificate file needed to connect to the downstream MySQL instance (optional).
ssl-key The path of the certificate key file needed to connect to the downstream MySQL instance (optional).
time-zone The time zone used when connecting to the downstream MySQL instance, which is effective since v4.0.8. This is an optional parameter. If this parameter is not specified, the time zone of TiCDC service processes is used. If this parameter is set to an empty value, such as time-zone="", no time zone is specified when TiCDC connects to the downstream MySQL instance and the default time zone of the downstream is used.
transaction-atomicity The atomicity level of a transaction. This is an optional parameter, with the default value of none. When the value is table, TiCDC ensures the atomicity of a single-table transaction. When the value is none, TiCDC splits the single-table transaction.

To encode the database password in the sink URI using Base64, use the following command:

echo -n '123456' | base64   # '123456' is the password to be encoded.

The encoded password is MTIzNDU2:

MTIzNDU2

Note:

When the sink URI contains special characters such as ! * ' ( ) ; : @ & = + $ , / ? % # [ ], you need to escape the special characters, for example, in URI Encoder.

Permissions required for the downstream database user

To replicate data to TiDB or other MySQL-compatible databases, the downstream database user needs the following permissions:

  • Select
  • Index
  • Insert
  • Update
  • Delete
  • Create
  • Drop
  • Alter
  • Create View

To replicate RECOVER TABLE to the downstream TiDB, the downstream database user also needs the Super permission.

If the downstream TiDB cluster has read-only mode enabled, the downstream database user also needs the RESTRICTED_REPLICA_WRITER_ADMIN permission.

Eventually consistent replication in disaster scenarios

Starting from v6.1.1, this feature becomes GA. Starting from v5.3.0, TiCDC supports backing up incremental data from an upstream TiDB cluster to an object storage or an NFS of the downstream cluster. When the upstream cluster encounters a disaster and becomes unavailable, TiCDC can restore the downstream data to the recent eventually consistent state. This is the eventually consistent replication capability provided by TiCDC. With this capability, you can switch applications to the downstream cluster quickly, avoiding long-time downtime and improving service continuity.

Currently, TiCDC can replicate incremental data from a TiDB cluster to another TiDB cluster or a MySQL-compatible database system (including Aurora, MySQL, and MariaDB). In case the upstream cluster crashes, TiCDC can restore data in the downstream cluster within 5 minutes, given the conditions that TiCDC replicates data normally before the crash, and the replication lag is small. It allows data loss of 10s at most, that is, RTO <= 5 min, and P95 RPO <= 10s.

TiCDC replication lag increases in the following scenarios:

  • The TPS increases significantly in a short time.
  • Large or long transactions occur in the upstream.
  • The TiKV or TiCDC cluster in the upstream is reloaded or upgraded.
  • Time-consuming DDL statements, such as add index, are executed in the upstream.
  • The PD is configured with aggressive scheduling strategies, resulting in frequent transfer of Region leaders, or frequent Region merge or Region split.

Note:

Starting from v6.1.1, the eventually consistent replication feature of TiCDC supports Amazon S3-compatible object storage. Starting from v6.1.4, this feature supports GCS- and Azure-compatible object storage.

Prerequisites

  • Prepare a highly available object storage or NFS for storing TiCDC's real-time incremental data backup files. These files can be accessed in case of a disaster in the upstream.
  • Enable this feature for changefeeds that need to have eventual consistency in disaster scenarios. To enable it, you can add the following configuration to the changefeed configuration file.
[consistent]
# Consistency level. Options include:
# - none: the default value. In a non-disaster scenario, eventual consistency is only guaranteed if and only if finished-ts is specified.
# - eventual: Uses redo log to guarantee eventual consistency in case of the primary cluster disasters.
level = "eventual"

# Individual redo log file size, in MiB. By default, it's 64. It is recommended to be no more than 128.
max-log-size = 64

# The interval for flushing or uploading redo logs to Amazon S3, in milliseconds. It is recommended that this configuration be equal to or greater than 2000.
flush-interval = 2000

# The path under which redo log backup is stored. The scheme can be nfs (NFS directory), or Amazon S3, GCS, and Azure (uploaded to object storage).
storage = "$SCHEME://logbucket/test-changefeed?endpoint=http://$ENDPOINT/"

Disaster recovery

When a disaster happens in the primary cluster, you need to recover manually in the secondary cluster by running the cdc redo command. The recovery process is as follows.

  1. Ensure that all the TiCDC processes have exited. This is to prevent the primary cluster from resuming service during data recovery and prevent TiCDC from restarting data synchronization.
  2. Use cdc binary for data recovery. Run the following command:
cdc redo apply --tmp-dir="/tmp/cdc/redo/apply" \
    --storage="s3://logbucket/test-changefeed?endpoint=http://10.0.10.25:24927/" \
    --sink-uri="mysql://normal:[email protected]:3306/"

In this command:

  • tmp-dir: Specifies the temporary directory for downloading TiCDC incremental data backup files.
  • storage: Specifies the address for storing the TiCDC incremental data backup files, either an URI of object storage or an NFS directory.
  • sink-uri: Specifies the secondary cluster address to restore the data to. Scheme can only be mysql.