-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework the Kafka Guide #124
Conversation
66b387f
to
09c21ee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi. Thanks a stack for bringing this up to speed. I did not verify the procedure, but the document looks good. On two spots, I added suggestions about converting to active voice, but in general it is good to go.
For this example, this Python script will simulate the creation | ||
of random sensor data and push it into the ``metrics`` topic: | ||
|
||
.. code-block:: python | ||
|
||
import time | ||
import random | ||
|
||
from confluent_kafka import avro | ||
from confluent_kafka.avro import AvroProducer | ||
|
||
# Define the Avro schema we want our produced records to conform to. | ||
VALUE_SCHEMA_STR = """ | ||
{ | ||
"namespace": "cratedb.metrics", | ||
"name": "value", | ||
"type": "record", | ||
"fields": [ | ||
{"name": "id", "type": "string"}, | ||
{"name": "timestamp", "type": "float"}, | ||
{"name": "payload", "type": { | ||
"type": "record", | ||
"name": "payload", | ||
"fields": [ | ||
{"name": "temperature", "type": "float"}, | ||
{"name": "humidity", "type": "float"}, | ||
{"name": "pressure", "type": "float"}, | ||
{"name": "luminosity", "type": "float"} | ||
] | ||
} | ||
} | ||
] | ||
} | ||
""" | ||
|
||
# Load the Avro schema. | ||
VALUE_SCHEMA = avro.loads(VALUE_SCHEMA_STR) | ||
|
||
# Create an Avro producer using the defined schema, assuming that our | ||
# Kafka servers are running at localhost:9092 and the Schema Registry | ||
# server is running at localhost:8081. | ||
AVRO_PRODUCER = AvroProducer( | ||
{ | ||
"bootstrap.servers": "localhost:9092", | ||
"schema.registry.url": "http://localhost:8081", | ||
}, | ||
default_value_schema=VALUE_SCHEMA, | ||
) | ||
|
||
# Create a metric payload from a simulated sensor device. | ||
def create_metric(): | ||
return { | ||
"id": "sensor-" + str(random.choice(list(range(1, 21)))), | ||
"timestamp": int(time.time()), | ||
"payload": { | ||
"temperature": random.uniform(-5, 35), | ||
"humidity": random.uniform(0, 100), | ||
"pressure": random.uniform(1000, 1030), | ||
"luminosity": random.uniform(0, 65000), | ||
}, | ||
} | ||
|
||
# Create a new metric every 0.25 seconds and push it to the metrics topic. | ||
while True: | ||
AVRO_PRODUCER.produce(topic="metrics", value=create_metric()) | ||
AVRO_PRODUCER.flush() | ||
time.sleep(0.25) | ||
|
||
This script can be run by installing the following dependencies and running it: | ||
|
||
.. code-block:: console | ||
|
||
$ pip install "confluent-kafka[avro]" "avro-python3" | ||
$ python simulator.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sad to see this Python snippet removed. Maybe we can bring it back in one way or another, if not within this tutorial, maybe at another spot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely. The current setup is pretty crude, I'm sure we can improve it with a similar script
This guide describes a dockerized procedure for integrating CrateDB with Kafka | ||
Connect. By following these steps, you will set up a pipeline to ingest data | ||
from Kafka topics into CrateDB seamlessly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Active voice, very good.
Co-authored-by: Andreas Motl <[email protected]>
Summary of the changes / Why this is an improvement
Original guide wasn't working anymore. It's similar, just dockerized and updated.
Preview
https://cratedb-guide--124.org.readthedocs.build/integrate/etl/kafka-connect.html
Checklist