diff --git a/TOC-tidb-cloud.md b/TOC-tidb-cloud.md index d594708a6efc8..2647cabac7914 100644 --- a/TOC-tidb-cloud.md +++ b/TOC-tidb-cloud.md @@ -289,7 +289,7 @@ - [To Cloud Storage](/tidb-cloud/changefeed-sink-to-cloud-storage.md) - Reference - [Setup Self Hosted Kafka Private Link Service in AWS](/tidb-cloud/setup-self-hosted-kafka-pls.md) - - [Setup Self Hosted Kafka Private Service Connect in GCP](/tidb-cloud/setup-self-hosted-kafka-psc.md) + - [Setup Self Hosted Kafka Private Service Connect in Google Cloud](/tidb-cloud/setup-self-hosted-kafka-psc.md) - Disaster Recovery - [Recovery Group Overview](/tidb-cloud/recovery-group-overview.md) - [Get Started](/tidb-cloud/recovery-group-get-started.md) diff --git a/media/tidb-cloud/changefeed/connect-to-aws-self-hosted-kafka-privatelink-service.png b/media/tidb-cloud/changefeed/connect-to-aws-self-hosted-kafka-privatelink-service.png new file mode 100644 index 0000000000000..d9e0ab963ede0 Binary files /dev/null and b/media/tidb-cloud/changefeed/connect-to-aws-self-hosted-kafka-privatelink-service.png differ diff --git a/tidb-cloud/changefeed-sink-to-apache-kafka.md b/tidb-cloud/changefeed-sink-to-apache-kafka.md index ffa986d74a3dc..5bb22589addbc 100644 --- a/tidb-cloud/changefeed-sink-to-apache-kafka.md +++ b/tidb-cloud/changefeed-sink-to-apache-kafka.md @@ -18,6 +18,11 @@ This document describes how to create a changefeed to stream data from TiDB Clou - Currently, TiDB Cloud does not support uploading self-signed TLS certificates to connect to Kafka brokers. - Because TiDB Cloud uses TiCDC to establish changefeeds, it has the same [restrictions as TiCDC](https://docs.pingcap.com/tidb/stable/ticdc-overview#unsupported-scenarios). - If the table to be replicated does not have a primary key or a non-null unique index, the absence of a unique constraint during replication could result in duplicated data being inserted downstream in some retry scenarios. +- If you select **Private Link** or **Private Service Connect** as network connectivity method, please make sure version of TiDB cluster satisfy following conditions. + - For 6.5.x, require >= 6.5.9 + - For 7.1.x, require >= 7.1.4 + - For 7.5.x, require >= 7.5.1 + - Support all versions of 8.1.x and later ## Prerequisites @@ -36,11 +41,11 @@ Make sure that your TiDB cluster can connect to the Apache Kafka service. There If you want a quick try, you can choose **Public IP**. If you want cost-effective, you can choose **VPC Peering**, trade off VPC CIDR conflict and security. If you want to get rid of VPC CIDR conflict and satisfy security compliance, **Private Connect** is the choice, but it will introduce extra [Private Data Link Cost](/tidbcloud/tidb-cloud-billing-ticdc-rcu.md#private-data-link-cost) #### Private Connect -Private Connect leverages Private Link or Private Service Connect technologies which provided by cloud vendors, that allow the resources in your VPC to connect to services in other VPCs using private IP addresses, as if those services were hosted directly in your VPC. +Private Connect leverages **Private Link** or **Private Service Connect** technologies which provided by cloud vendors, that allow the resources in your VPC to connect to services in other VPCs using private IP addresses, as if those services were hosted directly in your VPC. Currently, we only support Private Connect to self-hosted Kafka. 1. If your Apache Kafka service already or will be setup in AWS, please follow [Setup Self Hosted Kafka Private Link Service in AWS](/tidb-cloud/setup-self-hosted-kafka-pls.md) to make sure the network connection is set up properly. -2. If your Apache Kafka service already or will be setup in GCP, please follow [Setup Self Hosted Kafka Private Service Connect in GCP](/tidb-cloud/setup-self-hosted-kafka-psc.md) to make sure the network connection is set up properly. +2. If your Apache Kafka service already or will be setup in Google Cloud, please follow [Setup Self Hosted Kafka Private Service Connect in Google Cloud](/tidb-cloud/setup-self-hosted-kafka-psc.md) to make sure the network connection is set up properly. #### VPC Peering @@ -83,21 +88,21 @@ For example, if your Kafka cluster is in Confluent Cloud, you can see [Resources ## Step 2. Configure the changefeed target -TODO: 1. For **Kafka Provider**, we only provide **Self-hosted Kafka** option, we will support more later. > **Note:** > Currently, we treat all the Apache Kafka Services as self-hosted since we didn't make any special integration to different Kafka Providers, such as Amazon MSK, Confluent ... It doesn't mean that we can not connect to Amazon MSK or Confluent Kafka. If the Kafka Provider can provide standard network connection methods, just like VPC Peering, Public IP, Private Link and Private Service Connect, we definitely can connect to them. You may have question "Can you connect to Amazon MSK by multi VPC which is powered by Private Link technology?" Sorry, we haven't supported it yet since it's not a standard Private Link, but may be later. 2. Select **Connectivity Method** by your Apache Kafka Service setup. 1. If you select **VPC Peering** or **Public IP**, fill in your Kafka brokers endpoints. You can use commas `,` to separate multiple endpoints. 2. If you select **Private Link** - 1. Make sure you select the same **Kafka Type**, **Suggested Kafka Endpoint Service AZ** and fill the same unique ID in **Kafka Advertised Listener Pattern** when you [Setup Self Hosted Kafka Private Link Service in AWS](/tidb-cloud/setup-self-hosted-kafka-pls.md) in **Network** section. - 2. Double-check the **Kafka Advertised Listener Pattern** by clicking the button **Check usage and generate**, which will show message to help you validate the unique ID. - 3. Fill the **Endpoint Service Name** which is configured in [Setup Self Hosted Kafka Private Link Service in AWS](/tidb-cloud/setup-self-hosted-kafka-pls.md) - 4. Fill the **Boostrap Ports**, suggest at least one port for one AZ. You can use commas `,` to separate multiple ports. + 1. Please authorize AWS Account of TiDB Cloud, make sure it can create endpoint for your endpoint service. You can find AWS Account of TiDB Cloud in the tip of the web page. + 2. Make sure you select the same **Kafka Type**, **Suggested Kafka Endpoint Service AZ** and fill the same unique ID in **Kafka Advertised Listener Pattern** when you [Setup Self Hosted Kafka Private Link Service in AWS](/tidb-cloud/setup-self-hosted-kafka-pls.md) in **Network** section. + 3. Double-check the **Kafka Advertised Listener Pattern** by clicking the button **Check usage and generate**, which will show message to help you validate the unique ID. + 4. Fill the **Endpoint Service Name** which is configured in [Setup Self Hosted Kafka Private Link Service in AWS](/tidb-cloud/setup-self-hosted-kafka-pls.md) + 5. Fill the **Boostrap Ports**, suggest at least one port for one AZ. You can use commas `,` to separate multiple ports. 3. If you select **Private Service Connect** - 1. Make sure you fill the same unique ID in **Kafka Advertised Listener Pattern** when you [Setup Self Hosted Kafka Private Service Connect in GCP](/tidb-cloud/setup-self-hosted-kafka-psc.md) in **Network** section. + 1. Make sure you fill the same unique ID in **Kafka Advertised Listener Pattern** when you [Setup Self Hosted Kafka Private Service Connect in Google Cloud](/tidb-cloud/setup-self-hosted-kafka-psc.md) in **Network** section. 2. Double-check the **Kafka Advertised Listener Pattern** by clicking the button **Check usage and generate**, which will show message to help you validate the unique ID. - 3. Fill the **Service Attachment** which is configured in [Setup Self Hosted Kafka Private Service Connect in GCP](/tidb-cloud/setup-self-hosted-kafka-psc.md) + 3. Fill the **Service Attachment** which is configured in [Setup Self Hosted Kafka Private Service Connect in Google Cloud](/tidb-cloud/setup-self-hosted-kafka-psc.md) 4. Fill the **Boostrap Ports**, suggest provide more than one ports. You can use commas `,` to separate multiple ports. 2. Select an **Authentication** option according to your Kafka authentication configuration. - If your Kafka does not require authentication, keep the default option **Disable**. @@ -106,8 +111,12 @@ TODO: 3. Select your **Kafka Version**. If you do not know that, use Kafka V2. 4. Select a desired **Compression** type for the data in this changefeed. 5. Enable the **TLS Encryption** option if your Kafka has enabled TLS encryption and you want to use TLS encryption for the Kafka connection. -6. Click **Next** to check the configurations you set and go to the next page. - +6. Click **Validate Connection and Next** to test the network connection, if all is well it will go to the next page. +> **Note:** +> If you select **Private Link** or **Private Service Connect** as network connectivity method. There will be extra steps compare to **Public IP** and **VPC Peering**. +> 1. After you click the button, we will try to create endpoint in TiDB Cloud side for **Private Link** or **Private Service Connect**. It may take several minutes. +> 2. After endpoint creation, you need to accept the connection request in cloud vendor console with you account login. +> 3. Then go back you TiDB Cloud console to confirm you have already accepted the connection request, then it will navigate to next page. ## Step 3. Set the changefeed 1. Customize **Table Filter** to filter the tables that you want to replicate. For the rule syntax, refer to [table filter rules](/table-filter.md). diff --git a/tidb-cloud/setup-self-hosted-kafka-pls.md b/tidb-cloud/setup-self-hosted-kafka-pls.md new file mode 100644 index 0000000000000..a38558aa6f41a --- /dev/null +++ b/tidb-cloud/setup-self-hosted-kafka-pls.md @@ -0,0 +1,528 @@ +--- +title: Setup Self-hosted Kafka Private Link Service in AWS +summary: This document explains how to set up private link service for self-hosted Kafka in AWS and how to make it work with TiDB Cloud. +--- + +# Setup Self-hosted Kafka Private Link Service in AWS + +This document explains how to set up private link service for self-hosted Kafka in AWS and how to make it work with TiDB Cloud. + +The main idea: +1. TiDB Cloud VPC connects to Kafka VPC through limit private endpoints. +2. Kafka clients need to talk directly to all Kafka brokers. +3. Therefore, we need to map every Kafka brokers to different ports to make every broker is unique in TiDB Cloud VPC. +4. We will leverage Kafka bootstrap mechanism and AWS cloud resources to achieve the mapping. + +Let's show how to connect to a three AZ Kafka private link service in AWS by example. It's not the only way to set up private link service for self-hosted Kafka. There may be other ways base on the similar port mapping mechanism. This document only used to show fundamental of Kafka private link service. If you want to set up Kafka private link service in production, you may need to build a more resilient Kafka private link service with better operational maintainability and observability. + +![main idea](/media/tidb-cloud/changefeed/connect-to-aws-self-hosted-kafka-privatelink-service.png) + +## Prerequisites + +- Make sure that you create a TiDB Cloud Dedicated cluster first. +- Make sure you have authorization to set up Kafka private link service in your own AWS account. + +## Steps + +### Align Deployment Information with the TiDB Cluster + +The steps: + +1. In the [TiDB Cloud console](https://tidbcloud.com), navigate to the cluster overview page of the TiDB cluster, and then click **Changefeed** in the left navigation pane. +2. In the overview page, you can find the region of TiDB Cluster, make sure your Kafka cluster will be deployed to the same region. +3. Click **Create Changefeed** + 1. Select **Kafka** as **Target Type**. + 2. Select **Self-hosted Kafka** as **Kafka Provider** + 3. Select **Private Link** as **Connectivity Method** +4. Take note the AWS account arn in **Reminders before proceeding** information, which your will use it to authorize TiDB Cloud to create endpoint for the Kafka private link service. +5. Select **Kafka Type**, confirm you will deploy Kafka cluster to **Single AZ** or **3 AZ**. Here we select **3 AZ**. Take note of the AZ IDs you want to deploy your Kafka cluster. If you don't know the relationship between your AZ names and AZ IDs, please refer to [AWS document](https://docs.aws.amazon.com/ram/latest/userguide/working-with-az-ids.html) to find it. +6. Pick a unique **Kafka Advertised Listener Pattern** for your Kafka Private Link service + 1. Input a unique random string can only include numbers or lowercase letters, which will be used to generate **Kafka Advertised Listener Pattern** later. + 2. Click **Check usage and generate** button to check whether if the random string is unique and generate **Kafka Advertised Listener Pattern** which will be used to assemble EXTERNAL advertised listener for kafka brokers. + +Please take note of all this deployment information, use them to configure your Kafka private link service. +Example of deployment information. + +| Information | Value | Reminder | +|-------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Region | Oregon (us-west-2) | | +| Principal of TiDB Cloud AWS Account | arn:aws:iam:::root | | +| AZ IDs | 1. usw2-az1
2. usw2-az2
3. usw2-az3 | Please align AZ IDs to AZ names in your AWS account.
Example:
1. usw2-az1 => us-west-2a
2. usw2-az2 => us-west-2c
3. usw2-az3 => us-west-2b | +| Kafka Advertised Listener Pattern | The unique random string: abc
Generated pattern for AZs
1. usw2-az1 => <broker_id>.usw2-az1.abc.us-west-2.aws.3199015.tidbcloud.com:<port>
2. usw2-az2 => <broker_id>.usw2-az2.abc.us-west-2.aws.3199015.tidbcloud.com:<port>
3. usw2-az3 => <broker_id>.usw2-az3.abc.us-west-2.aws.3199015.tidbcloud.com:<port> | Map AZ names to AZ-specified patterns, make sure you can configure the right pattern to the broker in specific AZ later
1. us-west-2a => <broker_id>.usw2-az1.abc.us-west-2.aws.3199015.tidbcloud.com:<port>
2. us-west-2c => <broker_id>.usw2-az2.abc.us-west-2.aws.3199015.tidbcloud.com:<port>
3. us-west-2b => <broker_id>.usw2-az3.abc.us-west-2.aws.3199015.tidbcloud.com:<port> | + +### 1. Setup Kafka VPC + +Kafka VPC requirements: +1. 3 private subnets for brokers, one per AZ. +2. 1 public subnet in any AZ with a bastion node that can connect to Internet and the 3 private subnets, which make kafka cluster setup easy. In the real production environment, your may have your own bastion node that can connect to the Kafka VPC. + +Before creating subnets, you should create subnets in AZs based on the AZ ID to AZ names mapping. Take following mapping as an example. +1. usw2-az1 => us-west-2a +2. usw2-az2 => us-west-2c +3. usw2-az3 => us-west-2b + +You should create private subnets in this three AZs: us-west-2a, us-west-2c, us-west-2b. + +Here is the detail steps to create Kafka VPC: +#### 1.1. Create Kafka VPC +1. Go to [AWS Console->VPC dashboard](https://console.aws.amazon.com/vpcconsole/home?#vpcs:), switch to the region you want to deploy Kafka. +2. Click "Create VPC" button, fill the form in "VPC settings" page. + 1. Select "VPC only". + 2. Fill the "Name tag", for example ```Kafka VPC```. + 3. Select "IPv4 CIDR manual input", fill "IPv4 CIDR", for example ```10.0.0.0/16```. + 4. Leave other options as default, click "Create VPC" button. + 5. If nothing wrong, page will navigate to the VPC detail page. Take note of the VPC ID,for example ```vpc-01f50b790fa01dffa``` +#### 1.2. Create private subnets in Kafka VPC +1. Go to [Subnets Listing Page](https://console.aws.amazon.com/vpcconsole/home?#subnets:) +2. Click "Create subnet" button, navigate "Create subnet" page. +3. Select "VPC ID" (```vpc-01f50b790fa01dffa```) we take note before. +4. Add 3 subnets with following inputs, suggest put AZ ID in subnet name to make it easy to configure broker later since TiDB Cloud require encoding AZ ID in broker's "advertised.listener" configuration + 1. Subnet1 in us-west-2a + - Subnet name: broker-usw2-az1 + - Availability Zone: us-west-2a + - IPv4 subnet CIDR block: 10.0.0.0/18 + 2. Subnet2 in us-west-2c + - Subnet name: broker-usw2-az2 + - Availability Zone: us-west-2c + - IPv4 subnet CIDR block: 10.0.64.0/18 + 3. Subnet3 in us-west-2b + - Subnet name: broker-usw2-az3 + - Availability Zone: us-west-2b + - IPv4 subnet CIDR block: 10.0.128.0/18 +5. Click "Create subnet" button, if nothing wrong, it will navigate to "Subnets Listing Page". +#### 1.3. Create the public subnet in Kafka VPC +1. Click "Create subnet" button, navigate to "Create subnet" page. +2. Select "VPC ID" (```vpc-01f50b790fa01dffa```) we take note before. +3. Add the public subnet in any AZ with following inputs + - Subnet name: bastion + - IPv4 subnet CIDR block: 10.0.192.0/18 +4. Click "Create subnet" button, if nothing wrong, it will navigate to "Subnets Listing Page". +5. Config the "bastion" subnet to Public subnet + 1. Go to [VPC dashboard -> Internet gateways](https://console.aws.amazon.com/vpcconsole/home#igws:), create an Internet Gateway with name as "kafka-vpc-igw". + 2. If nothing wrong, it will navigate to "Internet gateways Detail Page", click "Actions->Attach to VPC" attach the Internet Gateway to Kafka VPC. + 3. Go to [VPC dashboard -> Route tables](https://console.aws.amazon.com/vpcconsole/home#CreateRouteTable:), create a route table to the Internet Gateway in Kafka VPC and add a new route as following values + - Name: kafka-vpc-igw-route-table + - VPC: Kafka VPC + - Route: Destination - 0.0.0.0/0; Target - Internet Gateway, kafka-vpc-igw + 4. Attach the route table to bastion subnet. At the "Detail Page" of the route table, click "Subnet associations-> Edit subnet associations" to add bastion subnet and save changes. + +### 2. Setup Kafka Brokers + +#### 2.1. Create bastion node +Go to [EC2 Listing Page](https://console.aws.amazon.com/ec2/home#Instances:), create the bastion node in bastion subnet. + - Name: bastion-node + - Amazon Machine Image: Amazon linux + - Instance Type: t2.small + - Key pair: kafka-vpc-key-pair. PS: create a new key pair name "kafka-vpc-key-pair", download "kafka-vpc-key-pair.pem" to your local for following configuration. + - Network settings + - VPC: Kafka VPC + - Subnet: bastion + - Auto-assign public IP: Enable + - Security Group: create a new security group allow ssh from anywhere. PS: you may narrow the rule for safety in production environment. +#### 2.2. Create broker nodes +Go to [EC2 Listing Page](https://console.aws.amazon.com/ec2/home#Instances:), create 3 broker nodes in broker subnet, one per AZ. +1. Broker 1 in subnet broker-usw2-az1 + - Name: broker-node1 + - Amazon Machine Image: Amazon linux + - Instance Type: t2.large + - Key pair: reuse "kafka-vpc-key-pair" + - Network settings + - VPC: Kafka VPC + - Subnet: broker-usw2-az1 + - Auto-assign public IP: Disable + - Security Group: create a new security group allow all TCP from Kafka VPC. PS: you may narrow the rule for safety in production environment. + - Protocol: TCP + - Port range: 0 - 65535 + - Source: 10.0.0.0/16 +2. Broker 2 in subnet broker-usw2-az2 + - Name: broker-node2 + - Amazon Machine Image: Amazon linux + - Instance Type: t2.large + - Key pair: reuse "kafka-vpc-key-pair" + - Network settings + - VPC: Kafka VPC + - Subnet: broker-usw2-az2 + - Auto-assign public IP: Disable + - Security Group: create a new security group allow all TCP from Kafka VPC. PS: you may narrow the rule for safety in production environment. + - Protocol: TCP + - Port range: 0 - 65535 + - Source: 10.0.0.0/16 +3. Broker 3 in subnet broker-usw2-az3 + - Name: broker-node3 + - Amazon Machine Image: Amazon linux + - Instance Type: t2.large + - Key pair: reuse "kafka-vpc-key-pair" + - Network settings + - VPC: Kafka VPC + - Subnet: broker-usw2-az3 + - Auto-assign public IP: Disable + - Security Group: create a new security group allow all TCP from Kafka VPC. PS: you may narrow the rule for safety in production environment. + - Protocol: TCP + - Port range: 0 - 65535 + - Source: 10.0.0.0/163. +#### 2.3. Prepare kafka runtime binaries +1. Go to detail page of bastion node, get the "Public IPv4 address", ssh login to the node with previous download "kafka-vpc-key-pair.pem". +```shell +chmod 400 kafka-vpc-key-pair.pem +ssh -i "kafka-vpc-key-pair.pem" ec2-user@{bastion_public_ip} # replace {bastion_public_ip} to your bastion's ip, for example 54.186.149.187 +scp -i "kafka-vpc-key-pair.pem" kafka-vpc-key-pair.pem ec2-user@{bastion_public_ip}:~/ +``` +2. Download binaries +```shell +# Download kafka & openjdk, decompress. PS: your can choose the binary version as you like +wget https://downloads.apache.org/kafka/3.7.1/kafka_2.13-3.7.1.tgz +tar -zxf kafka_2.13-3.7.1.tgz +wget https://download.java.net/java/GA/jdk22.0.2/c9ecb94cd31b495da20a27d4581645e8/9/GPL/openjdk-22.0.2_linux-x64_bin.tar.gz +tar -zxf openjdk-22.0.2_linux-x64_bin.tar.gz +``` +3. Copy binaries to every broker nodes +```shell +# replace {broker-node1-ip} to your broker-node1 ip +scp -i "kafka-vpc-key-pair.pem" kafka_2.13-3.7.1.tgz ec2-user@{broker-node1-ip}:~/ +ssh -i "kafka-vpc-key-pair.pem" ec2-user@{broker-node1-ip} "tar -zxf kafka_2.13-3.7.1.tgz" +scp -i "kafka-vpc-key-pair.pem" openjdk-22.0.2_linux-x64_bin.tar.gz ec2-user@{broker-node1-ip}:~/ +ssh -i "kafka-vpc-key-pair.pem" ec2-user@{broker-node1-ip} "tar -zxf openjdk-22.0.2_linux-x64_bin.tar.gz" + +# replace {broker-node2-ip} to your broker-node2 ip +scp -i "kafka-vpc-key-pair.pem" kafka_2.13-3.7.1.tgz ec2-user@{broker-node2-ip}:~/ +ssh -i "kafka-vpc-key-pair.pem" ec2-user@{broker-node2-ip} "tar -zxf kafka_2.13-3.7.1.tgz" +scp -i "kafka-vpc-key-pair.pem" openjdk-22.0.2_linux-x64_bin.tar.gz ec2-user@{broker-node2-ip}:~/ +ssh -i "kafka-vpc-key-pair.pem" ec2-user@{broker-node2-ip} "tar -zxf openjdk-22.0.2_linux-x64_bin.tar.gz" + +# replace {broker-node3-ip} to your broker-node3 ip +scp -i "kafka-vpc-key-pair.pem" kafka_2.13-3.7.1.tgz ec2-user@{broker-node3-ip}:~/ +ssh -i "kafka-vpc-key-pair.pem" ec2-user@{broker-node3-ip} "tar -zxf openjdk-22.0.2_linux-x64_bin.tar.gz" +scp -i "kafka-vpc-key-pair.pem" openjdk-22.0.2_linux-x64_bin.tar.gz ec2-user@{broker-node3-ip}:~/ +ssh -i "kafka-vpc-key-pair.pem" ec2-user@{broker-node3-ip} "tar -zxf openjdk-22.0.2_linux-x64_bin.tar.gz" +``` +#### 2.4. Set up kafka nodes in every broker node. +1. We will set up a KRaft Kafka cluster with 3 nodes, each node will act as broker and controller roles. For every broker: + 1. For "listeners" item, all 3 brokers are the same and act as broker and controller roles: + 1. Configure the same CONTROLLER listener for all **controller** role node. if you want to add **broker** role only nodes, you don't need CONTROLLER listener in ```server.properties```. + 2. Configure two **broker** listeners. INTERNAL for internal access; EXTERNAL for external access from TiDB Cloud. + 2. For "advertised.listeners" item + 1. Configure a INTERNAL advertised listener for every broker with internal ip of broker node, advertise internal Kafka clients use this address to visit the broker. + 2. Configure a EXTERNAL advertised listener based on **Kafka Advertised Listener Pattern** we get from TiDB Cloud for every broker node to help TiDB Cloud differentiate between different brokers. Different EXTERNAL advertised listener helps Kafka client from TiDB Cloud side route request the right broker. + - `````` differentiate brokers from Kafka Private Link Service access point, so please plan a ports range for EXTERNAL advertised listener of all brokers. These ports don't have to be actual ports listened on brokers, they are ports listened on LB for Private Link Service which will forward request to different brokers. + - ```AZ ID``` in **Kafka Advertised Listener Pattern** indicate where the broker is deployed. TiDB Cloud will route request to different endpoint dns name based on the AZ ID. + - Better to configure different `````` for different broker, make it easy for troubleshooting. + 3. The planing values + - CONTROLLER port: 29092 + - INTERNAL port: 9092 + - EXTERNAL: 39092 + - EXTERNAL advertised listener ports range: 9093~9095 +2. ssh login to every broker node, create configuration file "~/config/server.properties" with content as following. +```properties +# brokers in usw2-az1 + +# broker-node1 ~/config/server.properties +# 1. replace {broker-node1-ip}, {broker-node2-ip}, {broker-node3-ip} to real ips +# 2. configure EXTERNAL in "advertised.listeners" based on the "Kafka Advertised Listener Pattern" in "Align Deployment Information with the TiDB Cluster" section +# 2.1 the pattern for AZ(ID: usw2-az1) is ".usw2-az1.abc.us-west-2.aws.3199015.tidbcloud.com:" +# 2.2 so the EXTERNAL can be "b1.usw2-az1.abc.us-west-2.aws.3199015.tidbcloud.com:9093", replace with "b" prefix plus "node.id" properties, replace with a unique port(9093) in EXTERNAL advertised listener ports range +# 2.3 if there are more broker role node in same AZ, you can configure them in same way +process.roles=broker,controller +node.id=1 +controller.quorum.voters=1@{broker-node1-ip}:29092,2@{broker-node2-ip}:29092,3@{broker-node3-ip}:29092 +listeners=INTERNAL://0.0.0.0:9092,CONTROLLER://0.0.0.0:29092,EXTERNAL://0.0.0.0:39092 +inter.broker.listener.name=INTERNAL +advertised.listeners=INTERNAL://{broker-node1-ip}:9092,EXTERNAL://b1.usw2-az1.abc.us-west-2.aws.3199015.tidbcloud.com:9093 +controller.listener.names=CONTROLLER +listener.security.protocol.map=INTERNAL:PLAINTEXT,CONTROLLER:PLAINTEXT,EXTERNAL:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL +log.dirs=./data +``` +```properties +# brokers in usw2-az2 + +# broker-node2 ~/config/server.properties +# 1. replace {broker-node1-ip}, {broker-node2-ip}, {broker-node3-ip} to real ips +# 2. configure EXTERNAL in "advertised.listeners" based on the "Kafka Advertised Listener Pattern" in "Align Deployment Information with the TiDB Cluster" section +# 2.1 the pattern for AZ(ID: usw2-az2) is ".usw2-az2.abc.us-west-2.aws.3199015.tidbcloud.com:" +# 2.2 so the EXTERNAL can be "b2.usw2-az2.abc.us-west-2.aws.3199015.tidbcloud.com:9094", replace with "b" prefix plus "node.id" properties, replace with a unique port(9094) in EXTERNAL advertised listener ports range +# 2.3 if there are more broker role node in same AZ, you can configure them in same way +process.roles=broker,controller +node.id=2 +controller.quorum.voters=1@{broker-node1-ip}:29092,2@{broker-node2-ip}:29092,3@{broker-node3-ip}:29092 +listeners=INTERNAL://0.0.0.0:9092,CONTROLLER://0.0.0.0:29092,EXTERNAL://0.0.0.0:39092 +inter.broker.listener.name=INTERNAL +advertised.listeners=INTERNAL://{broker-node2-ip}:9092,EXTERNAL://b2.usw2-az2.abc.us-west-2.aws.3199015.tidbcloud.com:9094 +controller.listener.names=CONTROLLER +listener.security.protocol.map=INTERNAL:PLAINTEXT,CONTROLLER:PLAINTEXT,EXTERNAL:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL +log.dirs=./data +``` +```properties +# brokers in usw2-az3 + +# broker-node3 ~/config/server.properties +# 1. replace {broker-node1-ip}, {broker-node2-ip}, {broker-node3-ip} to real ips +# 2. configure EXTERNAL in "advertised.listeners" based on the "Kafka Advertised Listener Pattern" in "Align Deployment Information with the TiDB Cluster" section +# 2.1 the pattern for AZ(ID: usw2-az3) is ".usw2-az3.abc.us-west-2.aws.3199015.tidbcloud.com:" +# 2.2 so the EXTERNAL can be "b3.usw2-az3.abc.us-west-2.aws.3199015.tidbcloud.com:9095", replace with "b" prefix plus "node.id" properties, replace with a unique port(9095) in EXTERNAL advertised listener ports range +# 2.3 if there are more broker role node in same AZ, you can configure them in same way +process.roles=broker,controller +node.id=3 +controller.quorum.voters=1@{broker-node1-ip}:29092,2@{broker-node2-ip}:29092,3@{broker-node3-ip}:29092 +listeners=INTERNAL://0.0.0.0:9092,CONTROLLER://0.0.0.0:29092,EXTERNAL://0.0.0.0:39092 +inter.broker.listener.name=INTERNAL +advertised.listeners=INTERNAL://{broker-node3-ip}:9092,EXTERNAL://b3.usw2-az3.abc.us-west-2.aws.3199015.tidbcloud.com:9095 +controller.listener.names=CONTROLLER +listener.security.protocol.map=INTERNAL:PLAINTEXT,CONTROLLER:PLAINTEXT,EXTERNAL:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL +log.dirs=./data +``` +2. Create script and execute it to start kafka broker in every broker node. +```shell +#!/bin/bash + +# Get the directory of the current script +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +# Set JAVA_HOME to the Java installation within the script directory +export JAVA_HOME="$SCRIPT_DIR/jdk-22.0.2" +# Define the vars +KAFKA_DIR="$SCRIPT_DIR/kafka_2.13-3.7.1/bin" +KAFKA_STORAGE_CMD=$KAFKA_DIR/kafka-storage.sh +KAFKA_START_CMD=$KAFKA_DIR/kafka-server-start.sh +KAFKA_DATA_DIR=$SCRIPT_DIR/data +KAFKA_LOG_DIR=$SCRIPT_DIR/log +KAFKA_CONFIG_DIR=$SCRIPT_DIR/config + +# Cleanup step make it easy to multiple experiments +# Find all Kafka process IDs +KAFKA_PIDS=$(ps aux | grep 'kafka.Kafka' | grep -v grep | awk '{print $2}') +if [ -z "$KAFKA_PIDS" ]; then + echo "No Kafka processes are running." +else + # Kill each Kafka process + echo "Killing Kafka processes with PIDs: $KAFKA_PIDS" + for PID in $KAFKA_PIDS; do + kill -9 $PID + echo "Killed Kafka process with PID: $PID" + done + echo "All Kafka processes have been killed." +fi + +rm -rf $KAFKA_DATA_DIR +mkdir -p $KAFKA_DATA_DIR +rm -rf $KAFKA_LOG_DIR +mkdir -p $KAFKA_LOG_DIR + +# magic id: BRl69zcmTFmiPaoaANybiw, you can use your own +$KAFKA_STORAGE_CMD format -t "BRl69zcmTFmiPaoaANybiw" -c "$KAFKA_CONFIG_DIR/server.properties" > $KAFKA_LOG_DIR/server_format.log +LOG_DIR=$KAFKA_LOG_DIR nohup $KAFKA_START_CMD "$KAFKA_CONFIG_DIR/server.properties" & +``` +#### 2.5. Test cluster setup in bastion node. +1. Test Kafka bootstrap +```shell +export JAVA_HOME=/home/ec2-user/jdk-22.0.2 + +# bootstrap from INTERNAL listener +./kafka_2.13-3.7.1/bin/kafka-broker-api-versions.sh --bootstrap-server {one_of_broker_ip}:9092 | grep 9092 +# expected output, order may be different. +{broker-node1-ip}:9092 (id: 1 rack: null) -> ( +{broker-node2-ip}:9092 (id: 2 rack: null) -> ( +{broker-node3-ip}:9092 (id: 3 rack: null) -> ( + +# bootstrap from EXTERNAL listener +./kafka_2.13-3.7.1/bin/kafka-broker-api-versions.sh --bootstrap-server {one_of_broker_ip}:39092 +# expected output(last 3 lines), order may be different. +# the differences of output from "bootstrap from INTERNAL listener" is that there are exceptions or errors since the listener can not be resolved in Kafka VPC. +# we will make it resolvable in TiDB Cloud side and make it route to the right broker. +b1.usw2-az1.abc.us-west-2.aws.3199015.tidbcloud.com:9093 (id: 1 rack: null) -> ERROR: org.apache.kafka.common.errors.DisconnectException +b2.usw2-az2.abc.us-west-2.aws.3199015.tidbcloud.com:9094 (id: 2 rack: null) -> ERROR: org.apache.kafka.common.errors.DisconnectException +b3.usw2-az3.abc.us-west-2.aws.3199015.tidbcloud.com:9095 (id: 3 rack: null) -> ERROR: org.apache.kafka.common.errors.DisconnectException +``` +2. Create producer script - "produce.sh" in bastion node +```shell +#!/bin/bash +BROKER_LIST=$1 # "{broker_address1},{broker_address2}..." + +# Get the directory of the current script +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +# Set JAVA_HOME to the Java installation within the script directory +export JAVA_HOME="$SCRIPT_DIR/jdk-22.0.2" +# Define the Kafka directory +KAFKA_DIR="$SCRIPT_DIR/kafka_2.13-3.7.1/bin" +TOPIC="test-topic" + +# Create a topic if it does not exist +create_topic() { + echo "Creating topic if it does not exist..." + $KAFKA_DIR/kafka-topics.sh --create --topic $TOPIC --bootstrap-server $BROKER_LIST --if-not-exists --partitions 3 --replication-factor 3 +} + +# Produce messages to the topic +produce_messages() { + echo "Producing messages to the topic..." + for ((chrono=1; chrono <= 10; chrono++)); do + message="Test message "$chrono + echo "Create "$message + echo $message | $KAFKA_DIR/kafka-console-producer.sh --broker-list $BROKER_LIST --topic $TOPIC + done +} +create_topic +produce_messages +``` +3. Create consumer script - "consume.sh" in bastion node +```shell +#!/bin/bash + +BROKER_LIST=$1 # "{broker_address1},{broker_address2}..." + +# Get the directory of the current script +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +# Set JAVA_HOME to the Java installation within the script directory +export JAVA_HOME="$SCRIPT_DIR/jdk-22.0.2" +# Define the Kafka directory +KAFKA_DIR="$SCRIPT_DIR/kafka_2.13-3.7.1/bin" +TOPIC="test-topic" +CONSUMER_GROUP="test-group" +# Consume messages from the topic +consume_messages() { + echo "Consuming messages from the topic..." + $KAFKA_DIR/kafka-console-consumer.sh --bootstrap-server $BROKER_LIST --topic $TOPIC --from-beginning --timeout-ms 5000 --consumer-property group.id=$CONSUMER_GROUP +} +consume_messages +``` +4. Execute "produce.sh" and "consume.sh" to verify kafka cluster is working. These scripts will also be reused for later network connection testing. The script will create a topic with ```--partitions 3 --replication-factor 3```, make sure all 3 brokers have data, make sure script will connect to all 3 brokers to guarantee network connection will be tested. +```shell +# test write message. +./produce.sh {one_of_broker_ip}:9092 +``` +```text +# expected output +Creating topic if it does not exist... + +Producing messages to the topic... +Create Test message 1 +>>Create Test message 2 +>>Create Test message 3 +>>Create Test message 4 +>>Create Test message 5 +>>Create Test message 6 +>>Create Test message 7 +>>Create Test message 8 +>>Create Test message 9 +>>Create Test message 10 +``` +```shell +# test read message +./consume.sh {one_of_broker_ip}:9092 +``` +```text +# expected example output (message order may be different) +Consuming messages from the topic... +Test message 3 +Test message 4 +Test message 5 +Test message 9 +Test message 10 +Test message 6 +Test message 8 +Test message 1 +Test message 2 +Test message 7 +[2024-11-01 08:54:27,547] ERROR Error processing message, terminating consumer process: (kafka.tools.ConsoleConsumer$) +org.apache.kafka.common.errors.TimeoutException +Processed a total of 10 messages +``` + +### 3. Setup Load Balancer + +We will need to create a NLB with 4 target groups with different ports, one for bootstrap, others will map to different brokers. +1. bootstrap target group => 9092 => broker-node1:39092,broker-node2:39092,broker-node3:39092 +2. broker target group 1 => 9093 => broker-node1:39092 +3. broker target group 2 => 9094 => broker-node2:39092 +4. broker target group 3 => 9095 => broker-node3:39092 +If you have more broker role nodes, you will need add more mapping here. There should be at least one node in bootstrap target group, it's recommend to add 3 nodes, one per AZ for resilience. + +Here are the operations steps +1. Go to [Target groups](https://console.aws.amazon.com/ec2/home#CreateTargetGroup:) to create 4 Target Groups + 1. Bootstrap target group + - Target type: Instances + - Target group name: bootstrap-target-group + - Protocol: TCP + - Port: 9092 + - IP address type: IPv4 + - VPC: Kafka VPC + - Health check protocol: TCP + - Register targets: broker-node1:39092,broker-node2:39092,broker-node3:39092 + 2. Broker target group 1 + - Target type: Instances + - Target group name: broker-target-group-1 + - Protocol: TCP + - Port: 9093 + - IP address type: IPv4 + - VPC: Kafka VPC + - Health check protocol: TCP + - Register targets: broker-node1:39092 + 3. Broker target group 2 + - Target type: Instances + - Target group name: broker-target-group-2 + - Protocol: TCP + - Port: 9094 + - IP address type: IPv4 + - VPC: Kafka VPC + - Health check protocol: TCP + - Register targets: broker-node2:39092 + 4. Broker target group 3 + - Target type: Instances + - Target group name: broker-target-group-3 + - Protocol: TCP + - Port: 9095 + - IP address type: IPv4 + - VPC: Kafka VPC + - Health check protocol: TCP + - Register targets: broker-node3:39092 +2. Go to [Load balancers](https://console.aws.amazon.com/ec2/home#LoadBalancers:) to create a Network Load Balancer + - Load balancer name: kafka-lb + - Schema: Internal + - Load balancer IP address type: IPv4 + - VPC: Kafka VPC + - Availability Zones: + - usw2-az1 with broker-usw2-az1 subnet + - usw2-az2 with broker-usw2-az2 subnet + - usw2-az3 with broker-usw2-az3 subnet + - Security groups: create a new security group with rules + - Inbound rule allows all TCP from Kafka VPC: Type - All TCP; Source - Anywhere-IPv4 + - Outbound rule allows all TCP to Kafka VPC: Type - All TCP; Destination - Anywhere-IPv4 + - Listeners and routing: + - Protocol: TCP; Port: 9092; Forward to: bootstrap-target-group + - Protocol: TCP; Port: 9093; Forward to: broker-target-group-1 + - Protocol: TCP; Port: 9094; Forward to: broker-target-group-2 + - Protocol: TCP; Port: 9095; Forward to: broker-target-group-3 +3. Test LB in bastion node. We can only test Kafka bootstrap since LB is listening on Kafka EXTERNAL listener, the addresses of EXTERNAL advertised listeners can not be resolvable in bastion node. Take note kafka-lb DNS name from LB Detail Page, for example ```kafka-lb-77405fa57191adcb.elb.us-west-2.amazonaws.com```. Execute script in bastion node +```shell +# please replace {lb_dns_name} to the actual +export JAVA_HOME=/home/ec2-user/jdk-22.0.2 +./kafka_2.13-3.7.1/bin/kafka-broker-api-versions.sh --bootstrap-server {lb_dns_name}:9092 + +# expected output(last 3 lines), order may be different. +b1.usw2-az1.abc.us-west-2.aws.3199015.tidbcloud.com:9093 (id: 1 rack: null) -> ERROR: org.apache.kafka.common.errors.DisconnectException +b2.usw2-az2.abc.us-west-2.aws.3199015.tidbcloud.com:9094 (id: 2 rack: null) -> ERROR: org.apache.kafka.common.errors.DisconnectException +b3.usw2-az3.abc.us-west-2.aws.3199015.tidbcloud.com:9095 (id: 3 rack: null) -> ERROR: org.apache.kafka.common.errors.DisconnectException + + +# you can also try bootstrap in others ports 9093/9094/9095, it will succeed probabilistically since NLB in AWS resolve LB DNS to IP address of Any Availability Zone and disable cross-zone load balancing by default. +# if you enable cross-zone load balancing in LB, it will be certainly success, but it's unnecessary, it may introduce potential cross AZ traffic. +``` +### 4. Setup Private Link Service + +1. Go to [Endpoint service](https://console.aws.amazon.com/vpcconsole/home#EndpointServices:), click button "Create endpoint service" to create a private link service for the Kafka LB. + - Name: kafka-pl-service + - Load balancer type: Network + - Load balancers: kafka-lb + - Included Availability Zones: usw2-az1, usw2-az2, usw2-az3 + - Require acceptance for endpoint: Acceptance required + - Enable private DNS name: No +2. After creation done, take note of the **Service name** which will be provided to TiDB Cloud, for example ```com.amazonaws.vpce.us-west-2.vpce-svc-0f49e37e1f022cd45``` +3. In detail page of the kafka-pl-service, click "Allow principals" tab, allow AWS account of TiDB Cloud to create endpoint. You can get the AWS account of TiDB Cloud in "Align Deployment Information with the TiDB Cluster" section, for example ```arn:aws:iam:::root``` + +### 5. Connect from TiDB Cloud + +1. Go back to TiDB Cloud console to create changefeed for the cluster to connect to Kafka cluster by **Private Link**, For the detail, you can refer to [To Kafka Sink](/tidb-cloud/changefeed-sink-to-apache-kafka.md) +2. After you proceed to the "Configure the changefeed target->Connectivity Method->Private Link", you just fill the following fields with corresponding values and others fields as needed + - Kafka Type: 3 AZ. PS: please + - Kafka Advertised Listener Pattern: abc. PS: same as the unique random string we used to generate "Kafka Advertised Listener Pattern" in "Setup Self-hosted Kafka Private Link Service in AWS" section + - Endpoint Service Name: + - Bootstrap Ports: 9092. PS: only one port is fine since we configure a special bootstrap target group behind this port. +3. Continue follow the guideline in [To Kafka Sink](/tidb-cloud/changefeed-sink-to-apache-kafka.md) +4. If everything go fine, you will successfully finish the job. diff --git a/tidb-cloud/setup-self-hosted-kafka-psc.md b/tidb-cloud/setup-self-hosted-kafka-psc.md new file mode 100644 index 0000000000000..f7d986dab7a8f --- /dev/null +++ b/tidb-cloud/setup-self-hosted-kafka-psc.md @@ -0,0 +1,5 @@ +architecture overview, what you should know before + +steps +1. go to tidb cloud to find out the deployment info of tidb cluster +2. get necessary info to setup kafka diff --git a/tidb-cloud/tidb-cloud-billing-ticdc-rcu.md b/tidb-cloud/tidb-cloud-billing-ticdc-rcu.md index aa0d42795f60f..0a230af14bc5f 100644 --- a/tidb-cloud/tidb-cloud-billing-ticdc-rcu.md +++ b/tidb-cloud/tidb-cloud-billing-ticdc-rcu.md @@ -34,7 +34,7 @@ To learn about the supported regions and the price of TiDB Cloud for each TiCDC ## Private Data Link Cost -If you choose "Private Link" or "Private Service Connect" network connectivity method, we will charge you extra "Private Data Link" cost which in [Data Transfer Cost](https://www.pingcap.com/tidb-dedicated-pricing-details/#data-transfer-cost) category. +If you choose **Private Link** or **Private Service Connect** network connectivity method, we will charge you extra **Private Data Link** cost which in [Data Transfer Cost](https://www.pingcap.com/tidb-dedicated-pricing-details/#data-transfer-cost) category. The price of "Private Data Link" will be **$ 0.01 per GiB**, same as **Data Processed** of [AWS Interface Endpoint pricing](https://aws.amazon.com/privatelink/pricing/#Interface_Endpoint_pricing) and **Consumer data processing** of [Google Cloud Private Service Connect pricing](https://cloud.google.com/vpc/pricing#psc-forwarding-rules)