Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework the Kafka Guide #124

Merged
merged 2 commits into from
Sep 17, 2024
Merged

Rework the Kafka Guide #124

merged 2 commits into from
Sep 17, 2024

Conversation

matkuliak
Copy link
Contributor

@matkuliak matkuliak commented Sep 16, 2024

Summary of the changes / Why this is an improvement

Original guide wasn't working anymore. It's similar, just dockerized and updated.

Preview

https://cratedb-guide--124.org.readthedocs.build/integrate/etl/kafka-connect.html

Checklist

Copy link
Member

@amotl amotl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi. Thanks a stack for bringing this up to speed. I did not verify the procedure, but the document looks good. On two spots, I added suggestions about converting to active voice, but in general it is good to go.

Comment on lines -149 to -222
For this example, this Python script will simulate the creation
of random sensor data and push it into the ``metrics`` topic:

.. code-block:: python

import time
import random

from confluent_kafka import avro
from confluent_kafka.avro import AvroProducer

# Define the Avro schema we want our produced records to conform to.
VALUE_SCHEMA_STR = """
{
"namespace": "cratedb.metrics",
"name": "value",
"type": "record",
"fields": [
{"name": "id", "type": "string"},
{"name": "timestamp", "type": "float"},
{"name": "payload", "type": {
"type": "record",
"name": "payload",
"fields": [
{"name": "temperature", "type": "float"},
{"name": "humidity", "type": "float"},
{"name": "pressure", "type": "float"},
{"name": "luminosity", "type": "float"}
]
}
}
]
}
"""

# Load the Avro schema.
VALUE_SCHEMA = avro.loads(VALUE_SCHEMA_STR)

# Create an Avro producer using the defined schema, assuming that our
# Kafka servers are running at localhost:9092 and the Schema Registry
# server is running at localhost:8081.
AVRO_PRODUCER = AvroProducer(
{
"bootstrap.servers": "localhost:9092",
"schema.registry.url": "http://localhost:8081",
},
default_value_schema=VALUE_SCHEMA,
)

# Create a metric payload from a simulated sensor device.
def create_metric():
return {
"id": "sensor-" + str(random.choice(list(range(1, 21)))),
"timestamp": int(time.time()),
"payload": {
"temperature": random.uniform(-5, 35),
"humidity": random.uniform(0, 100),
"pressure": random.uniform(1000, 1030),
"luminosity": random.uniform(0, 65000),
},
}

# Create a new metric every 0.25 seconds and push it to the metrics topic.
while True:
AVRO_PRODUCER.produce(topic="metrics", value=create_metric())
AVRO_PRODUCER.flush()
time.sleep(0.25)

This script can be run by installing the following dependencies and running it:

.. code-block:: console

$ pip install "confluent-kafka[avro]" "avro-python3"
$ python simulator.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sad to see this Python snippet removed. Maybe we can bring it back in one way or another, if not within this tutorial, maybe at another spot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely. The current setup is pretty crude, I'm sure we can improve it with a similar script

Comment on lines +4 to +6
This guide describes a dockerized procedure for integrating CrateDB with Kafka
Connect. By following these steps, you will set up a pipeline to ingest data
from Kafka topics into CrateDB seamlessly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Active voice, very good.

docs/integrate/etl/kafka-connect.md Outdated Show resolved Hide resolved
docs/integrate/etl/kafka-connect.md Outdated Show resolved Hide resolved
@matkuliak matkuliak merged commit eb62278 into main Sep 17, 2024
4 checks passed
@matkuliak matkuliak deleted the mm/kafka-connect-update branch September 17, 2024 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants