From 11/20/2018, master branch is moved to branch-0.1. New master is backward-compatible.
=========
uReplicator provides the ability to replicate across Kafka clusters in other data centers. Instead of publishing to a single Kafka cluster, you can publish data to multiple regional Kafka clusters and aggregate it all in one Kafka cluster.
=========
Kafka's current (part of 0.8.2) MirrorMaker design consumes data from a given regional Kafka cluster using a Kafka high-level consumer. With this design, rebalancing in the high level consumer (due to a addition/deletion of topics, source cluster problems, network issues and so on) affects all the topics being replicated via that Mirrormaker.
- Stability: Rebalance only occurs during startup (when a node is added/deleted)
- Simple operations: Easy to scale up cluster, no server restart for whitelisting topics
- High throughput: Max offset lag is consistently 0.
- Time SLA (~5min)
Check out the uReplicator project:
git clone [email protected]:uber/uReplicator.git
cd uReplicator
This project contains everything (both mirrormaker-controller and mirrormaker-worker) you’ll need to run uReplicator.
Before you can run uReplicator, you need to build a package for it. This package is what your deployment tool uses to deploy uReplicator.
mvn clean package
Or command below (the previous one will take a long time to run):
mvn clean package -DskipTests
To test uReplicator locally, you need two systems: Kafka, and ZooKeeper. The script “grid” is to help you set up these systems.
- Modify permission for the scripts generated by Maven:
chmod u+x bin/pkg/*.sh
- The command below will download, install, and start ZooKeeper and Kafka (will start two Kafka systems: kafka1, which we use as source Kafka cluster, and kafka2, which we use as destination Kafka cluster):
bin/grid bootstrap
- Create a dummyTopic in kafka1 and produce some dummy data:
./bin/produce-data-to-kafka-topic-dummyTopic.sh
- Check if the data is successfully produced to kafka1 by opening another console tab and executing the command below:
./deploy/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181/cluster1 --topic dummyTopic
- You should get this data:
Kafka topic dummy topic data 1
Kafka topic dummy topic data 2
Kafka topic dummy topic data 3
Kafka topic dummy topic data 4
…
Example 1: Copy data from source cluster to destination cluster
- Start uReplicator Controller (you should keep it running):
./uReplicator-Distribution/target/uReplicator-Distribution-pkg/bin/start-controller-example1.sh
- Start uReplicator Worker (you should keep it running, and it’s normal if you see kafka.consumer.ConsumerTimeoutException at this moment, since no topic has been added for copying):
./uReplicator-Distribution/target/uReplicator-Distribution-pkg/bin/start-worker-example1.sh
- Add topic to uReplicator Controller to start copying from kafka1 to kafka2:
curl -X POST -d '{"topic":"dummyTopic", "numPartitions":"1"}' http://localhost:9000/topics
- To check if the data is successfully copied to kafka2, you should open another console tab and execute the command below:
./deploy/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181/cluster2 --topic dummyTopic1
- And you will see the same messages produced in kafka1:
Kafka topic dummy topic data 1
Kafka topic dummy topic data 2
Kafka topic dummy topic data 3
Kafka topic dummy topic data 4
…
Example 2: Copy data from source Kafka to destination Kafka cluster without explicitly whitelisting topics
- Start uReplicator Controller (you should keep it running):
./uReplicator-Distribution/target/uReplicator-Distribution-pkg/bin/start-controller-example2.sh
- Start uReplicator Worker (you should keep it running, and it’s normal if you see kafka.consumer.ConsumerTimeoutException at this moment since no topic has been added for copying):
./uReplicator-Distribution/target/uReplicator-Distribution-pkg/bin/start-worker-example2.sh
- Create topic in kafka2. Example 2 enables topic auto-whitelisting, so you don't need to whitelist topics manually. If a topic is in both source and destination Kafka clusters, the controller auto-whitelists the topic and starts copying data.
./deploy/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181/cluster2 --topic dummyTopic --partition 1 --replication-factor 1
- To check if the data is successfully copied to kafka2, open another console tab and execute this command (you might need to wait about 20 seconds for controller to refresh):
./deploy/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181/cluster2 --topic dummyTopic
- And you should see the same messages produced in kafka1:
Kafka topic dummy topic data 1
Kafka topic dummy topic data 2
Kafka topic dummy topic data 3
Kafka topic dummy topic data 4
…
When you’re done, you can clean everything up using the same grid script:
./bin/pkg/stop-all.sh
Congratulations! You’ve now set up a local grid that includes Kafka and ZooKeeper, and you've run a uReplicator worker on it.
Example 3: Run ureplicator on Docker
- Build uReplicator devenv
docker build -t devenv devenv/.
- Start uReplicator DockerImages
docker run -d -p 2181:2181 -p 9093:9093 -p 9094:9094 --add-host devenv:127.0.0.1 --name devenv devenv
- Build uReplicator Images
docker build -t ureplicator .
- Start uReplicator Controller
docker run -d --link devenv:devenv -p 9000:9000 --name controller --expose=9000 ureplicator "controller" -mode auto \
-enableAutoWhitelist true \
-port 9000 \
-refreshTimeInSeconds 10 \
-srcKafkaZkPath devenv:2181/cluster1 \
-zookeeper devenv:2181 \
-destKafkaZkPath devenv:2181/cluster1 \
-helixClusterName testMirrorMaker
- Start uReplicator Worker
docker run -d --link devenv:devenv --name worker ureplicator "worker" \
--consumer.config example/example-consumer.properties \
--producer.config example/example-producer.properties \
--helix.config example/example-helix.properties
- Create topic in kafka2. Example 3 enables topic auto-whitelisting, so you don't need to whitelist topics manually. If a topic is in both source and destination Kafka clusters, the controller auto-whitelists the topic and starts copying data.
docker exec -it devenv bash
$KAFKA_HOME/bin/kafka-topics.sh --create --zookeeper localhost:2181/cluster2 --topic dummyTopic --partition 4 --replication-factor 1
- Produce test data
docker exec -it devenv bash
./usr/kafka/produce-data-to-kafka-topic-dummyTopic.sh
- To check if the data is successfully copied to kafka2, open another console tab and execute this command
docker exec -it devenv bash
$KAFKA_HOME/bin/kafka-console-consumer.sh --zookeeper localhost:2181/cluster2 --topic dummyTopic
Example 4 Run urepliator with 2 workers
- Build Docker Compose
docker-compose -f docker-compose-example4.yml build
- Start Docker Compose
docker-compose -f docker-compose-example4.yml up
- Produce test data
docker exec -it ureplicator_devenv_1 bash;
./usr/kafka/produce-data-to-kafka-topic-dummyTopic.sh
- Create topic in kafka2 with 4 partition
docker exec -it ureplicator_devenv_1 bash;
$KAFKA_HOME/bin/kafka-topics.sh --create --zookeeper localhost:2181/cluster2 --topic dummyTopic1 --partition 4 --replication-factor 1
- Add topic to uReplicator Controller to start copying from kafka1 to kafka2:
curl -X POST -d '{"topic":"dummyTopic", "numPartitions":"4"}' http://localhost:9000/topics
- To check if the data is successfully copied to kafka2, open another console tab and execute this command
docker exec -it ureplicator_devenv_1 bash;
$KAFKA_HOME/bin/kafka-console-consumer.sh --zookeeper localhost:2181/cluster2 --topic dummyTopic1
- Check topic partition allocation
curl localhost:9000/topics/dummyTopic | jq .
- Check online instances
curl localhost:9000/instances | jq .