Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka connector integration #3398

Open
killme2008 opened this issue Feb 27, 2024 · 2 comments
Open

Kafka connector integration #3398

killme2008 opened this issue Feb 27, 2024 · 2 comments
Labels
C-feature Category Features help wanted Extra attention is needed

Comments

@killme2008
Copy link
Contributor

What type of enhancement is this?

API improvement, User experience, Other

What does the enhancement do?

Kafka is a popular open-source messaging queue used by many applications for message and data transmission in big-data environments. It is supported as a data source ingestion by numerous databases and data warehouses.

For example, clickhouse supports Kafka engine https://clickhouse.com/docs/en/engines/table-engines/integrations/kafka

 CREATE TABLE queue (
    timestamp UInt64,
    level String,
    message String
  ) ENGINE = Kafka('localhost:9092', 'topic', 'group1', 'JSONEachRow');

Databend has an ingestor project https://github.com/databendcloud/bend-ingest-kafka

bend-ingest-kafka
  --kafka-bootstrap-servers="127.0.0.1:9092,127.0.0.2:9092"\
  --kafka-topic="Your Topic"\
  --kafka-consumer-group= "Consumer Group"\
  --databend-dsn="http://root:[email protected]:8000"\
  --databend-table="db1.tbl" \
  --data-format=”json“ \
  --batch-size=100000 \
  --batch-max-interval=300s

Apache Doris supports creating a routine load to import data from kafka https://doris.apache.org/docs/dev/data-operate/import/import-scenes/kafka-load/

CREATE ROUTINE LOAD demo.my_first_routine_load_job ON test_1
COLUMNS TERMINATED BY ",",
PROPERTIES
(
    "max_batch_interval" = "20",
    "max_batch_rows" = "300000",
    "max_batch_size" = "209715200",
)
FROM KAFKA
(
   "kafka_broker_list"= "broker1:9091,broker2:9091",
   "kafka_topic" = "my_topic",
   "property.security.protocol" = "ssl",
   "property.ssl.ca.location" = "FILE:ca.pem",
   "property.ssl.certificate.location" = "FILE:client.pem",
   "property.ssl.key.location" = "FILE:client.key",
   "property.ssl.key.password" = "abcdefg"
);

As a time series database (TSDB), I believe that greptimedb should also have support for a Kafka connector. This support can be implemented as an independent project or as a table extension similar to ClickHouse.

I am uncertain which approach is preferable, but personally, I lean towards the latter. I have opened this issue to initiate a discussion on the matter.

Implementation challenges

No response

@killme2008 killme2008 added help wanted Extra attention is needed C-feature Category Features labels Feb 27, 2024
@killme2008 killme2008 changed the title Kafka ingestor integration Kafka connector integration Feb 27, 2024
@tisonkun
Copy link
Collaborator

as a table extension similar to ClickHouse

Currently, the CREATE TABLE clause in GreptimeDB doesn't support switching Engine? From the docs we can only define the columns with a "default engine"/

@sunng87
Copy link
Member

sunng87 commented Feb 28, 2024

I'm +1 for a standalone ingester for better scalability, especially in share environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature Category Features help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants