Design and implementation of a textual domain language to produce machine learning applications on data streams

Description

One of the technological areas that has experienced an explosion in recent years is that of machine learning. Now, given the information age we live in, it is unthinkable to produce/record data without being accompanied by a machine learning algorithm and/or data processing for their denoising or automatic inference. Due to the huge market need for machine learning algorithms, there is a large number of low/no-code platforms, free to use or under commercial license, which offer graphical design capabilities of an ML solution, the use of which is quite simple even for citizen developers. However, the vast majority of these platforms offer their services for static datasets, which must be imported into the corresponding package in order to perform the analyses. In this thesis, DSLs oriented towards machine learning and data processing algorithms will be implemented, which will be applied (mainly) to live data, offering their results also as streams. The goal is to create tools that can process data generated by cyber-physical systems, and visualize them in Dashboards, through transformations in other DSLs.

Tools that are being used

Apache Kafka
Quix streams
River
Docker
Cassandra
Text-X

A visual representation of the process that will be built is displayed below

graph LR
    IOT[IOT Devices] -->|Data| Kafka
    Kafka -->|Stream Data| Quix[Quix Streams]
    Quix -->|Filtered Data| River
	River -->|ML Results| Cassandra
    Kafka --> |Aggregated Data| Cassandra

Loading

Below i will explain each tool and its usage for the project

Kafka

Purpose

The kafka folder contains scripts and configurations for setting up and managing a Kafka environment. This includes producing and consuming messages, as well as administrative tasks such as creating and deleting topics. The folder also includes a Docker Compose file for setting up a Kafka cluster using Docker.

Files

admin_kafka.py

Contains functions for Kafka administrative tasks using the confluent_kafka library. Functions include creating and deleting topics. Example usage of AdminClient to manage Kafka topics. consumer2.py

Script for consuming messages from a Kafka topic. Uses the confluent_kafka library to create a Kafka consumer. Includes logic for handling messages and closing the consumer gracefully.

docker-compose.yml

Docker Compose configuration file for setting up a Kafka cluster. Defines services for Kafka controllers and brokers. Includes environment variables and dependencies for each service. Also includes a Kafka UI service for managing the Kafka cluster.

kafka_server_funcs.py

Contains utility functions for parsing command-line arguments. Used by other scripts to standardize argument parsing.

producer_v2.py

Script for producing messages to a Kafka topic. Uses the confluent_kafka library to create a Kafka producer. Includes functions for constructing events and IDs, initializing namespaces, and handling delivery callbacks. Parses command-line arguments to configure the producer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Design and implementation of a textual domain language to produce machine learning applications on data streams

Description

Tools that are being used

Kafka

Purpose

Files

admin_kafka.py

docker-compose.yml

kafka_server_funcs.py

producer_v2.py

Files

README.md

Latest commit

History

README.md

File metadata and controls

Design and implementation of a textual domain language to produce machine learning applications on data streams

Description

Tools that are being used

Kafka

Purpose

Files

admin_kafka.py

docker-compose.yml

kafka_server_funcs.py

producer_v2.py