Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka integration #1

Open
wants to merge 61 commits into
base: main
Choose a base branch
from
Open

Kafka integration #1

wants to merge 61 commits into from

Conversation

hippalus
Copy link
Owner

@hippalus hippalus commented Dec 27, 2024

Fixes parseablehq#936

Description

This pull request implements the Kafka connector for Parseable by introducing better stream management, modularizing the code, and integrating Prometheus metrics. It also adds groundwork for dynamic configurations.

How It Works

Partition Management

  • The connector dynamically manages streams for each partition using StreamState.
  • Each Kafka message is routed to the corresponding PartitionStreamReceiver for processing.

Rebalance Handling

  • On partition assignment:
    • New streams are created for untracked partitions.
  • On partition revocation:
    • Streams for revoked partitions are terminated and cleaned up.

Stream Processing

  • The StreamWorker processes messages in batches, ensuring backpressure handling with configurable buffer_size and buffer_timeout.

Metrics

  • Exposed metrics via Prometheus include: Partition lag, consumer throughput, etc provided by rd-kafka statistics.

TODO

  1. Integration Tests:
    • Add tests using test containers to simulate Kafka.
  2. Docker Improvements:
    • Add build dependencies to the Dockerfile for rd-kafka
    • Add docker-compose to test distributed env.
  3. Metrics Collector:
    • Complete KafkaConsumerMetricsCollector

This PR has:

  • been tested to ensure log ingestion and log query works.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added documentation for new or modified features or behaviors.

…le for rdkafka dependencies. Implement retrying for consumer.rcv() fn to handle temporary Kafka unavailability.
Implement KafkaMetricsCollector to collect and expose Kafka client and broker metrics. Refactor ParseableServer.init(..) and connectors::init(..).
Repository owner deleted a comment from github-actions bot Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feat: Add native Kafka integration for Parseable server
1 participant