Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to persist data when used with Docker Swarm ? #263

Open
ikus060 opened this issue Feb 9, 2018 · 13 comments
Open

How to persist data when used with Docker Swarm ? #263

ikus060 opened this issue Feb 9, 2018 · 13 comments

Comments

@ikus060
Copy link

ikus060 commented Feb 9, 2018

Mostly a question how to get this setup working.
I want to start Kafka on docker swarm. When I reboot the service, I don't want to lost any data. Reading previous, I've already create a volume for /kafka. The issue I have right now is the broker id. I understand it need to be static for each instance.
How can I make it static when using deploy: mode: global ??
Here is my docker-compose:

version: '3.2'
services:
  zookeeper:
    image: wurstmeister/zookeeper
    ports:
      - "2181:2181"

  kafka:
    image: wurstmeister/kafka
    ports:
      - target: 9094
        published: 9094
        protocol: tcp
        mode: host
    environment:
      HOSTNAME_COMMAND: "docker info | grep ^Name: | cut -d' ' -f 2"
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INSIDE:PLAINTEXT,OUTSIDE:PLAINTEXT
      KAFKA_ADVERTISED_PROTOCOL_NAME: OUTSIDE
      KAFKA_ADVERTISED_PORT: 9094
      KAFKA_PROTOCOL_NAME: INSIDE
      KAFKA_PORT: 9092
      KAFKA_AUTO_CREATE_TOPICS_ENABLE: 'true'
      JMX_PORT: 9010
      KAFKA_JMX_OPTS: "-Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=kafka -Dcom.sun.management.jmxremote.rmi.port=9010"
      volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /srv/kafka:/kafka
    deploy:
      mode: global

  kafka-manager:
    image: sheepkiller/kafka-manager
    ports:
    - target: 9000
      published: 9090
      protocol: tcp
      mode: ingress
    environment:
      KM_CONFIG: /srv/kafka-manager/custom.conf
      ZK_HOSTS: "zookeeper:2181"
      APPLICATION_SECRET: letmein
      #KAFKA_MANAGER_AUTH_ENABLED: 'true'
      #KAFKA_MANAGER_USERNAME: admin
      #KAFKA_MANAGER_PASSWORD: password
    volumes:
      - /srv/kafka-manager:/srv/kafka-manager
    deploy:
      placement:
        constraints:
        - node.role==manager
@saule1508
Copy link

Same question here. I guess we have to use BROKER_ID_COMMAND, in the doc there is an example:

BROKER_ID_COMMAND: "hostname | awk -F'-' '{print $2}'"

However the broker_id must be an integer (starting with 0 and incrementing...) so we need a command that generates such an integer...

@jordijansen
Copy link

jordijansen commented Mar 26, 2018

Yeah I got the same issue. My current workaround is have 3 service definitions (kafka1, kafka2, kafka3) with static config for the KAFA_BROKER_ID and placement on the node using:

  kafka:
    deploy:
      placement:
        constraints:
        - node.labels.sector == messaging
        - node.hostname == EVT01

  kafka2:
    deploy:
      placement:
        constraints:
        - node.labels.sector == messaging
        - node.hostname == EVT02

  kafka3:
    deploy:
      placement:
        constraints:
        - node.labels.sector == messaging
        - node.hostname == EVT03

@saule1508
Copy link

It makes sense, I think it is the way to go and besides I did not like to use deploy: mode: global (it is overkill)

Something else that puzzle me. In the start-kafka.sh there is this piece of code that generate a new logs directory every time the container is started (because its host changed)

if [[ -z "$KAFKA_LOG_DIRS" ]]; then
    export KAFKA_LOG_DIRS="/kafka/kafka-logs-$HOSTNAME"
fi

so in the docker-compose we must add the env KAFKA_LOG_DIRS I believe

    environment:
      KAFKA_LOG_DIRS: "/kafka/kafka-logs"
      volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /srv/kafka:/kafka

@raarts
Copy link

raarts commented May 16, 2018

Here is my swarm stack setup:

version: "3.2"

services:
  zoo1:
    image: zookeeper
    volumes:
      - zookeeper_conf:/conf
      - zookeeper_data:/data
      - zookeeper_datalog:/datalog
    networks:
      - zookeeper
    environment:
      ZOO_MY_ID: 1
      ZOO_SERVERS: server.1=0.0.0.0:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
    deploy:
      placement:
        constraints:
          - node.labels.zookeeper == 1
      replicas: 1
      restart_policy:
        delay: 2s
        window: 20s

  zoo2:
    image: zookeeper
    volumes:
      - zookeeper_conf:/conf
      - zookeeper_data:/data
      - zookeeper_datalog:/datalog
    networks:
      - zookeeper
    environment:
      ZOO_MY_ID: 2
      ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=0.0.0.0:2888:3888 server.3=zoo3:2888:3888
    deploy:
      placement:
        constraints:
          - node.labels.zookeeper == 2
      replicas: 1
      restart_policy:
        delay: 2s
        window: 20s

  zoo3:
    image: zookeeper
    volumes:
      - zookeeper_conf:/conf
      - zookeeper_data:/data
      - zookeeper_datalog:/datalog
    networks:
      - zookeeper
    environment:
      ZOO_MY_ID: 3
      ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=0.0.0.0:2888:3888
    deploy:
      placement:
        constraints:
          - node.labels.zookeeper == 3
      replicas: 1
      restart_policy:
        delay: 2s
        window: 20s

  kafka:
    image:  wurstmeister/kafka:latest
    ports:
      - target: ${KAFKA_PORT}
        published: ${KAFKA_PORT}
        protocol: tcp
        mode: host
    networks:
      - zookeeper
      - kafka
    volumes:
      - kafka_data:/kafka
    environment:
      KAFKA_ZOOKEEPER_CONNECT: zoo1:2181
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INSIDE:PLAINTEXT,OUTSIDE:PLAINTEXT
      KAFKA_ADVERTISED_HOST_NAME: "{{.Node.Hostname}}"
      KAFKA_ADVERTISED_PROTOCOL_NAME: OUTSIDE
      KAFKA_ADVERTISED_PORT: ${KAFKA_PORT}
      KAFKA_PROTOCOL_NAME: INSIDE
      KAFKA_PORT: 9094
      KAFKA_LOG_DIRS: /kafka/logs
    deploy:
      mode: global
      placement:
        constraints:
          - node.labels.kafka == True
      restart_policy:
        delay: 2s
        window: 20s

# local-persist driver only works with swarm if it finds at least one volume at startup
volumes:
  zookeeper_conf:
    driver: local-persist
    driver_opts:
      mountpoint: /data/${ENV}/${CI_PROJECT_PATH_SLUG}/zookeeper/conf
  zookeeper_data:
    driver: local-persist
    driver_opts:
      mountpoint: /data/${ENV}/${CI_PROJECT_PATH_SLUG}/zookeeper/data
  zookeeper_datalog:
    driver: local-persist
    driver_opts:
      mountpoint: /data/${ENV}/${CI_PROJECT_PATH_SLUG}/zookeeper/datalog
  kafka_data:
    driver: local-persist
    driver_opts:
      mountpoint: /data/${ENV}/${CI_PROJECT_PATH_SLUG}/kafka/data

networks:
  zookeeper:
    driver: overlay
  kafka:
    driver: overlay

Basically I label a few nodes with kafka: True. I also use the local-persist volume driver on every node.

@saule1508
Copy link

You also need a KAFKA_BROKER_ID for each kafka instance

What I did at the end is to assign a docker label for each node, that contains the broker id for this node (a unique integer starting at 1)

Then I use the BROKER_ID_COMMAND to look up the node labels and derive the broker id (the command is fairly simple, I don't have it here). Since the command runs inside the container, one must give /var/run/docker.sock as a bind mount inside the container.

@raarts
Copy link

raarts commented May 19, 2018

Specifying a broker id is not required, kafka will automatically assign one. The id will be saved in the volume, so if the container is destroyed, and a new one mounts that volume, the same broker id will be used.

@frranck
Copy link

frranck commented Jul 23, 2018

"Reading previous"
@ikus060 Reading previous what ? I'm looking for instruction to persist log on a single instance, but can't find any.

@jordijansen
Copy link

@frranck you should set the KAFKA_LOG_DIRS environment variable. By default this kafka image creates a new folder based on the hostname of the container (which is dynamic in docker). So set the env variable, and mount a volume at that location.

@tunix
Copy link

tunix commented Nov 14, 2018

@raarts How do you connect to those Kafka instances from other services inside the swarm? As far as I get, the hostname is basically the container id and whenever the container starts, that hostname changes. What am I missing?

@raarts
Copy link

raarts commented Nov 14, 2018

@tunix if you look at the stack config file, you'll notice there's a 'kafka' network on which all kafka services are located. The kafka service has deploy mode: global, and only on nodes where node.labels.kafka == True. This is in my case on three machines, so there are three kafka nodes.

Every service that wants to talk to Kafka, needs to be put on the kafka network, and can then just use kafka as hostname. You can test this by entering a container, and ping kafka from there.

@tunix
Copy link

tunix commented Nov 14, 2018

@tunix if you look at the stack config file, you'll notice there's a 'kafka' network on which all kafka services are located. The kafka service has deploy mode: global, and only on nodes where node.labels.kafka == True. This is in my case on three machines, so there are three kafka nodes.

Every service that wants to talk to Kafka, needs to be put on the kafka network, and can then just use kafka as hostname. You can test this by entering a container, and ping kafka from there.

I have almost the same configuration but since I wasn't able to telnet to port 9094 of Kafka instances, I wasn't sure whether they're communicating properly. (I currently don't know why but they're not responding to telnet but I can see some logs -regarding partition assignments etc.- of Kafka service)

Also, writing {{.Node.Hostname}} essentially outputs something like ip-1.2.3.4 (on AWS) and I think Kafka binds to that address. Instead, I placed HOSTNAME_COMMAND which resolves the IP address of nodes and put the IP addresses of (kafka labelled hosts) into app configuration. (1.2.3.4:9094, 1.2.3.5:9094 ...) Although this does work, it's less fault tolerant than your configuration.

Could you please share your thoughts?

@raarts
Copy link

raarts commented Nov 14, 2018

I use netstat -plan | grep LISTEN from the container itself, to see where services bind to exactly, then try to telnet to that port from inside the container. I also use tcpdump to check if a container is sending data.

Also, from inside a container you cannot bind to the external address, unless you use network mode host for the container, but if you do that then that container cannot be on internal networks.

Hope this helps.

@hossein-kshvrz
Copy link

@jordijansen you're right. But when I set log directory I still have the problem that my topics are not persistent and the next time I deploy the stack, kafka doesn't recognize them. However, when I go to the log directory that I set, I see the logs and I also see the directory that has the name of my topics in the previous services. Do you have any idea what the problem is?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants