Skip to content

Cobliteam/uReplicator

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

uReplicator

Build Status

Update

From 11/20/2018, master branch is moved to branch-0.1. New master is backward-compatible.

=========

uReplicator provides the ability to replicate across Kafka clusters in other data centers. Instead of publishing to a single Kafka cluster, you can publish data to multiple regional Kafka clusters and aggregate it all in one Kafka cluster.

=========

Kafka's current (part of 0.8.2) MirrorMaker design consumes data from a given regional Kafka cluster using a Kafka high-level consumer. With this design, rebalancing in the high level consumer (due to a addition/deletion of topics, source cluster problems, network issues and so on) affects all the topics being replicated via that Mirrormaker.

Goals of uReplicator

  1. Stability: Rebalance only occurs during startup (when a node is added/deleted)
  2. Simple operations: Easy to scale up cluster, no server restart for whitelisting topics
  3. High throughput: Max offset lag is consistently 0.
  4. Time SLA (~5min)

uReplicator Quick Start

Get the Code

Check out the uReplicator project:

git clone [email protected]:uber/uReplicator.git
cd uReplicator

This project contains everything (both mirrormaker-controller and mirrormaker-worker) you’ll need to run uReplicator.

Build uReplicator

Before you can run uReplicator, you need to build a package for it. This package is what your deployment tool uses to deploy uReplicator.

mvn clean package

Or command below (the previous one will take a long time to run):

mvn clean package -DskipTests

Set Up Local Test Environment

To test uReplicator locally, you need two systems: Kafka, and ZooKeeper. The script “grid” is to help you set up these systems.

  • Modify permission for the scripts generated by Maven:
chmod u+x bin/pkg/*.sh
  • The command below will download, install, and start ZooKeeper and Kafka (will start two Kafka systems: kafka1, which we use as source Kafka cluster, and kafka2, which we use as destination Kafka cluster):
bin/grid bootstrap
  • Create a dummyTopic in kafka1 and produce some dummy data:
./bin/produce-data-to-kafka-topic-dummyTopic.sh
  • Check if the data is successfully produced to kafka1 by opening another console tab and executing the command below:
./deploy/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181/cluster1 --topic dummyTopic
  • You should get this data:
Kafka topic dummy topic data 1
Kafka topic dummy topic data 2
Kafka topic dummy topic data 3
Kafka topic dummy topic data 4
…

Start uReplicator

Example 1: Copy data from source cluster to destination cluster

  • Start uReplicator Controller (you should keep it running):
./uReplicator-Distribution/target/uReplicator-Distribution-pkg/bin/start-controller-example1.sh
  • Start uReplicator Worker (you should keep it running, and it’s normal if you see kafka.consumer.ConsumerTimeoutException at this moment, since no topic has been added for copying):
./uReplicator-Distribution/target/uReplicator-Distribution-pkg/bin/start-worker-example1.sh
  • Add topic to uReplicator Controller to start copying from kafka1 to kafka2:
curl -X POST -d '{"topic":"dummyTopic", "numPartitions":"1"}' http://localhost:9000/topics
  • To check if the data is successfully copied to kafka2, you should open another console tab and execute the command below:
./deploy/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181/cluster2 --topic dummyTopic1
  • And you will see the same messages produced in kafka1:
Kafka topic dummy topic data 1
Kafka topic dummy topic data 2
Kafka topic dummy topic data 3
Kafka topic dummy topic data 4
…

Example 2: Copy data from source Kafka to destination Kafka cluster without explicitly whitelisting topics

  • Start uReplicator Controller (you should keep it running):
./uReplicator-Distribution/target/uReplicator-Distribution-pkg/bin/start-controller-example2.sh
  • Start uReplicator Worker (you should keep it running, and it’s normal if you see kafka.consumer.ConsumerTimeoutException at this moment since no topic has been added for copying):
./uReplicator-Distribution/target/uReplicator-Distribution-pkg/bin/start-worker-example2.sh
  • Create topic in kafka2. Example 2 enables topic auto-whitelisting, so you don't need to whitelist topics manually. If a topic is in both source and destination Kafka clusters, the controller auto-whitelists the topic and starts copying data.
./deploy/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181/cluster2 --topic dummyTopic --partition 1 --replication-factor 1
  • To check if the data is successfully copied to kafka2, open another console tab and execute this command (you might need to wait about 20 seconds for controller to refresh):
./deploy/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181/cluster2 --topic dummyTopic
  • And you should see the same messages produced in kafka1:
Kafka topic dummy topic data 1
Kafka topic dummy topic data 2
Kafka topic dummy topic data 3
Kafka topic dummy topic data 4
…

Shutdown

When you’re done, you can clean everything up using the same grid script:

./bin/pkg/stop-all.sh

Congratulations! You’ve now set up a local grid that includes Kafka and ZooKeeper, and you've run a uReplicator worker on it.

Start uReplicator on docker

Example 3: Run ureplicator on Docker

  • Build uReplicator devenv
docker build -t devenv devenv/.
  • Start uReplicator DockerImages
docker run -d -p 2181:2181 -p 9093:9093 -p 9094:9094  --add-host devenv:127.0.0.1 --name devenv devenv
  • Build uReplicator Images
docker build -t ureplicator .
  • Start uReplicator Controller
docker run -d --link devenv:devenv -p 9000:9000 --name controller --expose=9000 ureplicator "controller" -mode auto \
-enableAutoWhitelist true \
-port 9000 \
-refreshTimeInSeconds 10 \
-srcKafkaZkPath devenv:2181/cluster1 \
-zookeeper devenv:2181 \
-destKafkaZkPath devenv:2181/cluster1 \
-helixClusterName testMirrorMaker
  • Start uReplicator Worker
docker run -d --link devenv:devenv --name worker ureplicator "worker" \
--consumer.config example/example-consumer.properties \
--producer.config example/example-producer.properties \
--helix.config example/example-helix.properties
  • Create topic in kafka2. Example 3 enables topic auto-whitelisting, so you don't need to whitelist topics manually. If a topic is in both source and destination Kafka clusters, the controller auto-whitelists the topic and starts copying data.
docker exec -it devenv bash
$KAFKA_HOME/bin/kafka-topics.sh --create --zookeeper localhost:2181/cluster2 --topic dummyTopic --partition 4 --replication-factor 1
  • Produce test data
docker exec -it devenv bash
./usr/kafka/produce-data-to-kafka-topic-dummyTopic.sh 

  • To check if the data is successfully copied to kafka2, open another console tab and execute this command
docker exec -it devenv bash 
$KAFKA_HOME/bin/kafka-console-consumer.sh --zookeeper localhost:2181/cluster2 --topic dummyTopic

Example 4 Run urepliator with 2 workers

  • Build Docker Compose
docker-compose -f docker-compose-example4.yml build
  • Start Docker Compose
docker-compose -f docker-compose-example4.yml up
  • Produce test data
docker exec -it ureplicator_devenv_1 bash;
./usr/kafka/produce-data-to-kafka-topic-dummyTopic.sh 

  • Create topic in kafka2 with 4 partition
docker exec -it ureplicator_devenv_1 bash;
$KAFKA_HOME/bin/kafka-topics.sh --create --zookeeper localhost:2181/cluster2 --topic dummyTopic1 --partition 4 --replication-factor 1
  • Add topic to uReplicator Controller to start copying from kafka1 to kafka2:
curl -X POST -d '{"topic":"dummyTopic", "numPartitions":"4"}' http://localhost:9000/topics
  • To check if the data is successfully copied to kafka2, open another console tab and execute this command
docker exec -it ureplicator_devenv_1 bash;
$KAFKA_HOME/bin/kafka-console-consumer.sh --zookeeper localhost:2181/cluster2 --topic dummyTopic1
  • Check topic partition allocation
curl localhost:9000/topics/dummyTopic | jq .

  • Check online instances
curl localhost:9000/instances | jq .

About

Improvement of Apache Kafka Mirrormaker

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 86.1%
  • Scala 11.5%
  • Shell 1.9%
  • Dockerfile 0.5%