This is the demo kit that accompanies the talk I gave at PyConZA 2015 on Apache Kafka. It contains scripts for demoing a producer and balanced consumers.
You can watch the talk here:
This demo requires Kafka 0.8.2. If you do not have access to a Kafka cluster, you can set it up in standalone mode, by downloading it here:
and installing it following the instructions here:
You will also need the pykafka client library. I used version 2.0.0, which was the latest version at the time of writing. Get it using
pip install pykafka
or from the source repo here:
Edit settings.conf
in this repo to connect to your Kafka and Zookeeper
hosts. By default, it is set up for a standalone Kafka installation on
localhost.
For the tests below to run properly, create a topic called "test" by executing
cd /opt/kafka # This is the default install directory
./bin/kafka-topics.sh --topic test --partitions 6 --create \
--zookeeper localhost:2181 --replication-factor 1
The test topic will have 6 partitions (more on this in Demo 4 below).
To check that your topic got created properly, you can execute
./bin/kafka-topics.sh --describe --zookeeper localhost:2181
This will display information about all Kafka topics.
python demo-producer.py
This will produce messages as fast as possible and display how many got produced, once every 5 seconds.
python demo-consumer.py
This will consume already produced messages as fast as possible and display how many got consumed, once every 5 seconds. It will also show the list of partitions from which messages got consumed. More on partitions in demo 4.
The consumer will block when it runs out of messages.
To delete all Kafka logs and Zookeeper data, you can do the following. This is for test purposes only and should obviously never be done in production.
- Stop your Kafka server
- Stop your Zookeeper server
rm -Rf /tmp/zookeeper/*
This is the default data location for a standalone Zookeeper install. See thedataDir
property inconfig/zookeeper.properties
for where your installation is storing things.rm -Rf /tmp/kafka-logs/*
This is the default data location for a standalone Kafka install. See thelog.dirs
property inconfig/server.properties
for where your installation is storing its Kafka logs.- Start your Zookeeper server
- Start your Kafka server
- Re-create your Kafka topics
Run demo-producer.py
and a few copies of demo-consumer.py
simultaneously,
in different terminals. You should see that the different consumers
automatically read from different partitions. If a consumer is killed or added,
the partitions being read will automatically be rebalanced between the
consumers.
Note that this will work only if your topic has more than 1 partition. Nothing useful happens if you have more consumers than you have partitions. See the Configuration section above on how to create a topic with multiple partitions.