Skip to content

Latest commit

 

History

History
65 lines (45 loc) · 3.11 KB

README.md

File metadata and controls

65 lines (45 loc) · 3.11 KB

Design and implementation of a textual domain language to produce machine learning applications on data streams

Description

One of the technological areas that has experienced an explosion in recent years is that of machine learning. Now, given the information age we live in, it is unthinkable to produce/record data without being accompanied by a machine learning algorithm and/or data processing for their denoising or automatic inference. Due to the huge market need for machine learning algorithms, there is a large number of low/no-code platforms, free to use or under commercial license, which offer graphical design capabilities of an ML solution, the use of which is quite simple even for citizen developers. However, the vast majority of these platforms offer their services for static datasets, which must be imported into the corresponding package in order to perform the analyses. In this thesis, DSLs oriented towards machine learning and data processing algorithms will be implemented, which will be applied (mainly) to live data, offering their results also as streams. The goal is to create tools that can process data generated by cyber-physical systems, and visualize them in Dashboards, through transformations in other DSLs.

Tools that are being used

  • Apache Kafka
  • Quix streams
  • River
  • Docker
  • Cassandra
  • Text-X

A visual representation of the process that will be built is displayed below

graph LR
    IOT[IOT Devices] -->|Data| Kafka
    Kafka -->|Stream Data| Quix[Quix Streams]
    Quix -->|Filtered Data| River
	River -->|ML Results| Cassandra
    Kafka --> |Aggregated Data| Cassandra
Loading

Below i will explain each tool and its usage for the project

Kafka

Purpose

The kafka folder contains scripts and configurations for setting up and managing a Kafka environment. This includes producing and consuming messages, as well as administrative tasks such as creating and deleting topics. The folder also includes a Docker Compose file for setting up a Kafka cluster using Docker.

Files

admin_kafka.py

Contains functions for Kafka administrative tasks using the confluent_kafka library. Functions include creating and deleting topics. Example usage of AdminClient to manage Kafka topics. consumer2.py

Script for consuming messages from a Kafka topic. Uses the confluent_kafka library to create a Kafka consumer. Includes logic for handling messages and closing the consumer gracefully.

docker-compose.yml

Docker Compose configuration file for setting up a Kafka cluster. Defines services for Kafka controllers and brokers. Includes environment variables and dependencies for each service. Also includes a Kafka UI service for managing the Kafka cluster.

kafka_server_funcs.py

Contains utility functions for parsing command-line arguments. Used by other scripts to standardize argument parsing.

producer_v2.py

Script for producing messages to a Kafka topic. Uses the confluent_kafka library to create a Kafka producer. Includes functions for constructing events and IDs, initializing namespaces, and handling delivery callbacks. Parses command-line arguments to configure the producer.