Design and implementation of a textual domain language to produce machine learning applications on data streams
One of the technological areas that has experienced an explosion in recent years is that of machine learning. Now, given the information age we live in, it is unthinkable to produce/record data without being accompanied by a machine learning algorithm and/or data processing for their denoising or automatic inference. Due to the huge market need for machine learning algorithms, there is a large number of low/no-code platforms, free to use or under commercial license, which offer graphical design capabilities of an ML solution, the use of which is quite simple even for citizen developers. However, the vast majority of these platforms offer their services for static datasets, which must be imported into the corresponding package in order to perform the analyses. In this thesis, DSLs oriented towards machine learning and data processing algorithms will be implemented, which will be applied (mainly) to live data, offering their results also as streams. The goal is to create tools that can process data generated by cyber-physical systems, and visualize them in Dashboards, through transformations in other DSLs.
- Apache Kafka
- Quix streams
- River
- Docker
- Cassandra
- Text-X
A visual representation of the process that will be built is displayed below
graph LR
IOT[IOT Devices] -->|Data| Kafka
Kafka -->|Stream Data| Quix[Quix Streams]
Quix -->|Filtered Data| River
River -->|ML Results| Cassandra
Kafka --> |Aggregated Data| Cassandra
Below i will explain each tool and its usage for the project
The kafka folder contains scripts and configurations for setting up and managing a Kafka environment. This includes producing and consuming messages, as well as administrative tasks such as creating and deleting topics. The folder also includes a Docker Compose file for setting up a Kafka cluster using Docker.
Contains functions for Kafka administrative tasks using the confluent_kafka library. Functions include creating and deleting topics. Example usage of AdminClient to manage Kafka topics. consumer2.py
Script for consuming messages from a Kafka topic. Uses the confluent_kafka library to create a Kafka consumer. Includes logic for handling messages and closing the consumer gracefully.
Docker Compose configuration file for setting up a Kafka cluster. Defines services for Kafka controllers and brokers. Includes environment variables and dependencies for each service. Also includes a Kafka UI service for managing the Kafka cluster.
Contains utility functions for parsing command-line arguments. Used by other scripts to standardize argument parsing.
Script for producing messages to a Kafka topic. Uses the confluent_kafka library to create a Kafka producer. Includes functions for constructing events and IDs, initializing namespaces, and handling delivery callbacks. Parses command-line arguments to configure the producer.