Skip to content

Real-time distributed platform for analyzing maritime traffic data using Kafka, Flink, Druid, and Kubernetes.

License

Notifications You must be signed in to change notification settings

Faxatos/AquaScope

Repository files navigation

AcquaScope: Maritime Traffic Analysis

Real-time distributed platform for analyzing maritime traffic data from multiple sources, developed by Emiliano Sescu and Giovanni Bellini.

Project for the Scalable and Distributed Computing Course - Final Grade: x/30

Problem Context

Maritime traffic monitoring is crucial for navigation safety, port management, and operational efficiency. The Automatic Identification System (AIS), placed in every vessel, provides vital data on vessel position, speed, and route. However, AIS data is often scattered across:

  • AIS antennas and VTS stations for coverage in coastal areas.
  • Satellite-based systems (e.g., INMARSAT) for remote or open ocean regions or for ensuring redundancy.

Click here to have a look at a real map containing those elements.

This project aims to build a scalable platform to ingest, process, and analyze these data streams, enhancing anomaly detection and decision-making capabilities for maritime operators.

Platform Architecture

The platform ensures scalability, fault tolerance, and low latency using modern distributed technologies.

drawing

Objectives

  1. Ingest data: Data streams from AIS antennas, VTS stations, and satellite providers are fetched via provider APIs (for testing purposes logs are generated) ingested into Kafka clusters. Kafka acts as a durable buffer, ensuring reliable data delivery to downstream systems.

  2. Process data streams: Flink Structured Streaming processes data from Kafka in real-time. This includes:

    • Identify and track unregistered vessels in Cassandra.
    • Generate alarms when a vessel stops providing logs before reaching its destination, or when it deviates from its expected route beyond a defined threshold. All alarms are stored in Cassandra.
  3. Stores data streams: Logs are stored in Apache Druid in daily segments for fast querying.

  4. Visualize results: The Operator UI provides a dynamic map that updates in real time to track vessel movements, along with dedicated pages for alarms and vessel details. Grafana enables monitoring of Kubernetes cluster performance.

Technology Stack

Component Technology
Ingestion Apache Kafka
Stream Processing Apache Flink
Real-time Storage Apache Druid (logs), Cassandra (alarms & vessel info)
Block Storage Longhorn (Kafka persistence, Druid metadata in PostgreSQL)
Object Storage MinIO (Druid segments, Flink checkpoints)
Orchestration Kubernetes (K8s)
Ingress Controller Traefik
Continuous Deployment ArgoCD
Infrastructure as Code Ansible (with Kubespray)
Monitoring Grafana (Cluster state and performance)

Deployment & Testing in UniPi Cluster

The platform was deployed and tested on a UniPi cluster consisting of four machines:

  • 1 control plane node running only Kubernetes.
  • 3 worker nodes hosting the distributed application components.

Services were distributed across multiple nodes to ensure fault tolerance. If a node failed, workloads were automatically rescheduled onto healthy nodes. For better understanding of how fault tolerance was achieved through distribution, please refer to the documentation.

Orchestration & Deployment

  • Kubernetes managed the orchestration of Flink, Kafka, and Druid, ensuring scalability and high availability.
  • ArgoCD handled continuous deployment, automating application updates.
  • Traefik was used as the ingress controller for managing external traffic.
  • Ansible with Kubespray automated Kubernetes cluster provisioning and infrastructure setup.

Future Work

  1. Advanced Anomaly Detection:

    • Explore frameworks for applying machine learning to detect anomalies in vessel trajectories.
    • Integrate methodologies for detecting deviations from expected routes, clustering unusual patterns, etc.
  2. Multi-Region Support:

    • Optimize for global deployments with data replication and geo-distributed processing.

Map Example

drawing

Acknowledgments

This map example is taken from a National Systems of Safety and Protection of Navigation paper.

License

Distributed under the MIT License. See LICENSE.txt for more information.

About

Real-time distributed platform for analyzing maritime traffic data using Kafka, Flink, Druid, and Kubernetes.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages