Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate support for distributed kafka connect #109

Open
yatharthranjan opened this issue Aug 3, 2021 · 3 comments
Open

Investigate support for distributed kafka connect #109

yatharthranjan opened this issue Aug 3, 2021 · 3 comments

Comments

@yatharthranjan
Copy link
Member

The Kafka connectors (eg Fitbit) are currently deployed in standalone mode, to take full advantage of the scalability of Kubernetes these can be deployed in distributed mode (in which the connectors themselves are stateless and store state in kafka). At KCL we have some use cases where running in distributed mode is necessary.

@blootsvoets
Copy link
Contributor

blootsvoets commented Aug 23, 2021

This needs to be mulled over a bit. Pro's and cons as I see them, using the current docker files:

- when in distributed mode, connectors must be provisioned twice: one time starting up the connector engine, one time by sending the JSON message to start up the connector.
- with the current docker images, we can't just use a generic "kafka connector" chart, because it won't include all the connector plugins we use (s3, upload, jdbc, fitbit). So the JSON message can only be sent to the appropriate Kafka connector pod.
- liveness of the connector becomes harder to compute, since the probe does not know if the connector plugin has crashed or if the connector plugin has not yet been started or has been actively stopped.
+ use multiple nodes for processing.

I can see this as an alternate run mode that could be added to the respective charts. The operator would then be in charge of sending the appropriate JSON file every time a pod restarts. This looks very error-prone to me.

Alternatively, each relevant Dockerfile is adapted to use distributed mode. The entry point script then polls whether the distributed connector engine has started and send a JSON file from a predefined path. If you could make the relevant change to the connector you want to run in distributed mode, we could adapt the helm chart to cater for this change.

@keyvaann
Copy link
Collaborator

If we alter the Dockerfile to use distributed mode can we still have them running with only a single instance?

@blootsvoets
Copy link
Contributor

If it automatically starts up the actual component during startup, from K8S of view, it would be the same. The only difference is that it will then store offsets in Kafka instead of in a persistent volume.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants