Skip to content

Latest commit

 

History

History
281 lines (222 loc) · 14.3 KB

README_native.md

File metadata and controls

281 lines (222 loc) · 14.3 KB

Apache Kafka®

What is Apache Kafka?

Apache Kafka is an open-source event streaming platform used to collect, process, store, and integrate data at scale in real time. It powers numerous use cases including stream processing, data integration, and pub/sub messaging.

Kafka was originally developed at LinkedIn, was open sourced in 2011, and became an Apache Software Foundation project in 2012. It is used by thousands of organizations globally to power mission-critical real-time applications, from stock exchanges, to e-commerce applications, to IoT monitoring & analytics, to name a few.

When to use this image

This Docker image runs a GraalVM based native Kafka broker. GraalVM provides ahead-of-time Native Image compilation of the broker running in KRaft combined mode by default (i.e., it serves as both broker and KRaft controller) into a native binary executable that offers the following benefits compared to the JVM-based Apache Kafka image:

  1. Smaller image size (faster download time)
  2. Faster startup time
  3. Lower memory usage

Given these benefits, this image is well-suited for non-production development and testing scenarios. Testcontainers supports this image for automated unit or integration tests that require a Kafka cluster as opposed to a mock.

For more on the introduction of this image into Apache Kafka project, refer to KIP-974.

Quick start

Start a Kafka broker, mapping the port that Kafka listens on to the same port on your host machine:

docker run -d -p 9092:9092 --name broker apache/kafka-native:latest

Download Apache Kafka in order to get its command line tools. Note that the command line tools like kafka-topics.sh and kafka-console-producer.sh are not included in the kafka-native image. Once the latest Kafka release is extracted, cd into the bin directory.

cd <KAFKA HOME>/bin/

A topic is a logical grouping of events in Kafka. Create a topic called test-topic:

./kafka-topics.sh --bootstrap-server localhost:9092 --create --topic test-topic

Write two string events into the test-topic topic using the console producer that ships with Kafka:

./kafka-console-producer.sh --bootstrap-server localhost:9092 --topic test-topic

This command will wait for input at a > prompt. Enter hello, press Enter, then world, and press Enter again. Enter Ctrl+C to exit the console producer.

Now read the events in the test-topic topic from the beginning of the log:

./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test-topic --from-beginning

You will see the two strings that you previously produced:

hello
world

The consumer will continue to run until you exit out of it by entering Ctrl+C.

When you are finished, stop and remove the container by running the following command on your host machine:

docker rm -f broker

Overriding the default broker configuration

Apache Kafka supports a broad set of broker configurations that you may override via environment variables. The environment variables must begin with KAFKA_, and any dots in broker configurations should be specified as underscores in the corresponding environment variable. For example, to set the default number of partitions in topics, num.partitions, set the environment variable KAFKA_NUM_PARTITIONS. See the Kafka Docker Image Usage Guide for more information on overriding broker configuration in Docker.

It's important to note that if you are overriding any configuration, then none of the default configurations will be used. For example, to run Kafka in KRaft combined mode (meaning that the broker handling client requests and the controller handling cluster coordination both run in the same container) and set the default number of topic partitions to 3 instead of the default 1, we would specify KAFKA_NUM_PARTITIONS in addition to other required configurations:

docker run -d  \
  -p 9092:9092 \
  --name broker \
  -e KAFKA_NODE_ID=1 \
  -e KAFKA_PROCESS_ROLES=broker,controller \
  -e KAFKA_LISTENERS=PLAINTEXT://:9092,CONTROLLER://:9093 \
  -e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://localhost:9092 \
  -e KAFKA_CONTROLLER_LISTENER_NAMES=CONTROLLER \
  -e KAFKA_LISTENER_SECURITY_PROTOCOL_MAP=CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT \
  -e KAFKA_CONTROLLER_QUORUM_VOTERS=1@localhost:9093 \
  -e KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 \
  -e KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR=1 \
  -e KAFKA_TRANSACTION_STATE_LOG_MIN_ISR=1 \
  -e KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS=0 \
  -e KAFKA_NUM_PARTITIONS=3 \
  apache/kafka-native:latest

Specifying this many environment variables on the command line gets cumbersome. It's simpler to instead use Docker Compose to specify and manage Kafka in Docker. Depending on how you installed Docker, you may already have Docker Compose. You can verify that it's available by checking if this command succeeds, and refer to the Docker Compose installation documentation here if it doesn't:

docker compose version

To run Kafka with Docker Compose and override the default number of topic partitions to be 3, first copy the following into a file named docker-compose.yml:

services:
  broker:
    image: apache/kafka-native:latest
    container_name: broker
    ports:
      - 9092:9092
    environment:
      KAFKA_NODE_ID: 1
      KAFKA_PROCESS_ROLES: broker,controller
      KAFKA_LISTENERS: PLAINTEXT://localhost:9092,CONTROLLER://localhost:9093
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@localhost:9093
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
      KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
      KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
      KAFKA_NUM_PARTITIONS: 3

Now, from the directory containing this file, bring Kafka up in detached mode so that the containers run in the background:

docker compose up -d

The above quick start steps will work if you'd like to test topic creation and producing / consuming messages.

When you are finished, stop and remove the container by running the following command on your host machine from the directory containing the docker-compose.yml file:

docker compose down

Multiple nodes

In this section you will explore a more realistic Kafka deployment consisting of three brokers and three controllers running in their own containers (i.e., KRaft isolated mode). We'll also configure it such that we can connect to Kafka from within Docker or from the host machine. Bear in mind that doing this exercise in Docker is convenient to learn about multi-broker configurations and the Kafka protocol, but this Docker Compose example isn't appropriate for a production deployment.

Compared to a single-node Kafka deployment, there is a bit more to do on the configuration front:

  1. KAFKA_PROCESS_ROLES is either broker or controller depending on the container's role, not the KRaft combined mode value broker,controller
  2. KAFKA_CONTROLLER_QUORUM_VOTERS is a comma-separated list of the three controllers
  3. We accept the default values for KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR (3), KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR (3), and KAFKA_TRANSACTION_STATE_LOG_MIN_ISR (2) now that there are enough brokers to support the default settings, so we don't specify these configurations (a partition's replicas must reside on different brokers for fault tolerance)
  4. Brokers have two listeners: one for communicating within the Docker network, and one for connecting from the host machine. Because Kafka clients connect directly to brokers after initially connecting (bootstrapping), one listener uses the container name because it is a resolvable name for all containers on the Docker network. This listener is also used for inter-broker communication. The second listener uses localhost on a unique port that gets mapped on the host (29092 for broker-1, 39092 for broker-2, and 49092 for broker-3). With one node, a single listener on localhost works because the localhost name is conveniently correct from within the container and from the host machine, but this doesn't apply in a multi-node setup.

To deploy this six-node setup on your machine, copy the following into a file named docker-compose.yml:

services:
  controller-1:
    image: apache/kafka-native:latest
    container_name: controller-1
    environment:
      KAFKA_NODE_ID: 1
      KAFKA_PROCESS_ROLES: controller
      KAFKA_LISTENERS: CONTROLLER://:9093
      KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@controller-1:9093,2@controller-2:9093,3@controller-3:9093
      KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0

  controller-2:
    image: apache/kafka-native:latest
    container_name: controller-2
    environment:
      KAFKA_NODE_ID: 2
      KAFKA_PROCESS_ROLES: controller
      KAFKA_LISTENERS: CONTROLLER://:9093
      KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@controller-1:9093,2@controller-2:9093,3@controller-3:9093
      KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0

  controller-3:
    image: apache/kafka-native:latest
    container_name: controller-3
    environment:
      KAFKA_NODE_ID: 3
      KAFKA_PROCESS_ROLES: controller
      KAFKA_LISTENERS: CONTROLLER://:9093
      KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@controller-1:9093,2@controller-2:9093,3@controller-3:9093
      KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0

  broker-1:
    image: apache/kafka-native:latest
    container_name: broker-1
    ports:
      - 29092:9092
    environment:
      KAFKA_NODE_ID: 4
      KAFKA_PROCESS_ROLES: broker
      KAFKA_LISTENERS: 'PLAINTEXT://:19092,PLAINTEXT_HOST://:9092'
      KAFKA_ADVERTISED_LISTENERS: 'PLAINTEXT://broker-1:19092,PLAINTEXT_HOST://localhost:29092'
      KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@controller-1:9093,2@controller-2:9093,3@controller-3:9093
      KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
    depends_on:
      - controller-1
      - controller-2
      - controller-3

  broker-2:
    image: apache/kafka-native:latest
    container_name: broker-2
    ports:
      - 39092:9092
    environment:
      KAFKA_NODE_ID: 5
      KAFKA_PROCESS_ROLES: broker
      KAFKA_LISTENERS: 'PLAINTEXT://:19092,PLAINTEXT_HOST://:9092'
      KAFKA_ADVERTISED_LISTENERS: 'PLAINTEXT://broker-2:19092,PLAINTEXT_HOST://localhost:39092'
      KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@controller-1:9093,2@controller-2:9093,3@controller-3:9093
      KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
    depends_on:
      - controller-1
      - controller-2
      - controller-3

  broker-3:
    image: apache/kafka-native:latest
    container_name: broker-3
    ports:
      - 49092:9092
    environment:
      KAFKA_NODE_ID: 6
      KAFKA_PROCESS_ROLES: broker
      KAFKA_LISTENERS: 'PLAINTEXT://:19092,PLAINTEXT_HOST://:9092'
      KAFKA_ADVERTISED_LISTENERS: 'PLAINTEXT://broker-3:19092,PLAINTEXT_HOST://localhost:49092'
      KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@controller-1:9093,2@controller-2:9093,3@controller-3:9093
      KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
    depends_on:
      - controller-1
      - controller-2
      - controller-3

Start the containers from the directory containing the docker-compose.yml file:

docker compose up -d

The above quick start works the same way, only now we include all three broker listener endpoints in the --bootstrap-server argument.

./kafka-topics.sh --bootstrap-server localhost:29092,localhost:39092,localhost:49092 --create --topic test-topic
./kafka-console-producer.sh --bootstrap-server localhost:29092,localhost:39092,localhost:49092 --topic test-topic
./kafka-console-consumer.sh --bootstrap-server localhost:29092,localhost:39092,localhost:49092 --topic test-topic --from-beginning

When you are finished, stop and remove the Kafka deployment by running the following command on your host machine from the directory containing the docker-compose.yml file:

docker compose down

Additional resources