Real-time distributed platform for analyzing maritime traffic data from multiple sources, developed by Emiliano Sescu and Giovanni Bellini.
Project for the Scalable and Distributed Computing Course - Final Grade: x/30
Maritime traffic monitoring is crucial for navigation safety, port management, and operational efficiency. The Automatic Identification System (AIS), placed in every vessel, provides vital data on vessel position, speed, and route. However, AIS data is often scattered across:
- AIS antennas and VTS stations for coverage in coastal areas.
- Satellite-based systems (e.g., INMARSAT) for remote or open ocean regions or for ensuring redundancy.
Click here to have a look at a real map containing those elements.
This project aims to build a scalable platform to ingest, process, and analyze these data streams, enhancing anomaly detection and decision-making capabilities for maritime operators.
The platform ensures scalability, fault tolerance, and low latency using modern distributed technologies.
-
Ingest data: Data streams from AIS antennas, VTS stations, and satellite providers are fetched via provider APIs (for testing purposes logs are generated) ingested into Kafka clusters. Kafka acts as a durable buffer, ensuring reliable data delivery to downstream systems.
-
Process data streams: Flink Structured Streaming processes data from Kafka in real-time. This includes:
- Identify and track unregistered vessels in Cassandra.
- Generate alarms when a vessel stops providing logs before reaching its destination, or when it deviates from its expected route beyond a defined threshold. All alarms are stored in Cassandra.
-
Stores data streams: Logs are stored in Apache Druid in daily segments for fast querying.
-
Visualize results: The Operator UI provides a dynamic map that updates in real time to track vessel movements, along with dedicated pages for alarms and vessel details. Grafana enables monitoring of Kubernetes cluster performance.
Component | Technology |
---|---|
Ingestion | Apache Kafka |
Stream Processing | Apache Flink |
Real-time Storage | Apache Druid (logs), Cassandra (alarms & vessel info) |
Block Storage | Longhorn (Kafka persistence, Druid metadata in PostgreSQL) |
Object Storage | MinIO (Druid segments, Flink checkpoints) |
Orchestration | Kubernetes (K8s) |
Ingress Controller | Traefik |
Continuous Deployment | ArgoCD |
Infrastructure as Code | Ansible (with Kubespray) |
Monitoring | Grafana (Cluster state and performance) |
The platform was deployed and tested on a UniPi cluster consisting of four machines:
- 1 control plane node running only Kubernetes.
- 3 worker nodes hosting the distributed application components.
Services were distributed across multiple nodes to ensure fault tolerance. If a node failed, workloads were automatically rescheduled onto healthy nodes. For better understanding of how fault tolerance was achieved through distribution, please refer to the documentation.
- Kubernetes managed the orchestration of Flink, Kafka, and Druid, ensuring scalability and high availability.
- ArgoCD handled continuous deployment, automating application updates.
- Traefik was used as the ingress controller for managing external traffic.
- Ansible with Kubespray automated Kubernetes cluster provisioning and infrastructure setup.
-
Advanced Anomaly Detection:
- Explore frameworks for applying machine learning to detect anomalies in vessel trajectories.
- Integrate methodologies for detecting deviations from expected routes, clustering unusual patterns, etc.
-
Multi-Region Support:
- Optimize for global deployments with data replication and geo-distributed processing.
This map example is taken from a National Systems of Safety and Protection of Navigation paper.
Distributed under the MIT License. See LICENSE.txt
for more information.