Skip to content

crate/realtime-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Real-Time Demo

A demo of CrateDB's real-time analytics capabilities.

Architecture Overview

Architecture Diagram

The solution consists of three major layers running in the AWS Cloud:

Data Producer (Temperature Data)

Purpose: Simulate and send temperature data events into the system.

  • AWS EC2 Hosts a lightweight data generator script that loads temperature readings from a source dataset. These readings are serialized into JSON events.

  • Event Stream (Amazon MSK / Kafka) The producer publishes these temperature events to a Kafka topic. Kafka provides scalable, fault-tolerant buffering between producers and consumers.

Flow: EC2 → Kafka (MSK)

Data Consumer

Purpose: Listen for new temperature events, process them, and write batches into CrateDB.

  • AWS Lambda Subscribes to the Kafka topic and triggers upon new events. Processes messages and groups them into batches.

  • CrateDB Cloud The Lambda function writes the processed temperature data into CrateDB

Flow: Kafka → Lambda → CrateDB

User Interface

Purpose: Visualize the temperature data in real time.

  • AWS EC2 (Frontend Host) Runs a Grafana instance, accessible to users via a browser.

  • Grafana Dashboard Connects directly to CrateDB using its PostgreSQL endpoint. Executes queries to fetch and visualize temperature metrics such as:

    • Current and historical readings
    • Geo/time-based charts
  • Users Access the Grafana dashboard through a secure web interface.

Flow: CrateDB → Grafana → Users

Data Producer

We use publicly available data from the Climate Data Store. The ERA5-Land data set includes atmospheric variables, such as air temperature and air humidity from around the globe.

data/parser.py generates a report for a given date range, parses the retrieved NetCDF file, and converts it to JSON documents. Below is an example of the final JSON document.

{
    "timestamp": 1756684800000000000,
    "temperature": 295.6215515136719,
    "latitude": 40.90000000000115,
    "longitude": 31.100000000000172
}

Prerequisites

Climate Data Store API

We retrieve data from the Climate Data Store API and you will need access to their API. Please retrieve your personal CDSAPI key and store it in ~/.cdsapirc:

echo "url: https://cds.climate.copernicus.eu/api
key: INSERT-YOUR-KEY" > ~/.cdsapirc

AWS

We use Amazon Managed Streaming for Apache Kafka (MSK). Please set up credentials using aws configure.

Producer

To run the producer, we need to set up a virtual Python environment and install dependencies:

cd data
python3 -m venv .venv
source .venv/bin/activate
pip3 install -U -r requirements.txt

Next, copy the example .env file and adjust values as needed:

cp .env.example .env

Execution

To run the producer, simply execute it. If you want to adjust the date range that will be downloaded, please edit the main method accordingly.

python3 producer.py

Select alternate geo data

The producer has embedded data, by default it will retrieve data from the Climate Data Store for Germany, to change this, update the following line in producer.py:

parser = Parser("DEU")

All ISO_A3 country codes should work here. Some countries are more complex and have many country bounds, some a distance away.

Trigger

The trigger is configured to fire on a single new Kafka (MSK) record (batch size 1). To restart the ingest from the start, delete the existing trigger and create a new one (it's very straightforward). There is only one option for MSK cluster, make sure authentication is set and to ignore bad test data, set the starting point to be the timestamp 2025-10-14T00:00:00.000Z. Set the topic to dev-1 (or a different topic as needed).

Development

This repository uses black for consistent formatting of Python code. Please check your code before comitting. To set up black, run these steps within your virtual Python environment:

cd data
pip3 install -U -r requirements-dev.txt
black .

Create Lambda code

From the msk-to-crate-python directory, run:

sam build && sam deploy

This will first build, then deploy the lambda code (including any pip dependencies) to AWS Lambda. It will be viewable at:

https://us-east-1.console.aws.amazon.com/lambda/home?region=us-east-1#/functions/real-time-demo-app-msktocratepython-zHHmuN8vwScx?subtab=envVars&tab=configure

The sam cli command can be installed with homebrew from aws-sam-cli on macOS.

It creates a hidden directory .aws-sam, or should!

Also, there are several environment variables that need defining against the Lambda, these are:

  • CRATEDB_DB: CrateDB database name (typically doc)
  • CRATEDB_HOST: CrateDB host name
  • CRATEDB_USER: CrateDB username
  • CRATEDB_PASS: CrateDB password
  • CRATEDB_PORT: CrateDB port for HTTP, typically 4200
  • SOURCE_TOPIC: Topic name, excluding the records first part of the path

Creating ZIP file for Lambda function

To generate a deployable ZIP archive containing the Lambda function's code and dependencies, run the corresponding bash script:

cd msk-to-cratedb-python
./create_zip_package.sh

The resulting ZIP file can then be used to deploy code to a newly created Lambda function. Make sure it's not extracted and kept as a .ZIP file.

About

A demo for CrateDB's realtime analytics capabilities

Topics

Resources

Code of conduct

Stars

Watchers

Forks

Contributors 4

  •  
  •  
  •  
  •