-
Notifications
You must be signed in to change notification settings - Fork 0
System Architecture
Our Building Information Enhancer
tool is designed with a micro-service architecture in mind, being both easily expandable and scalable. Each of the services is deployed as separate Docker
containers. The overall architecture of our system can be seen in the diagram below, with a detailed description of individual components below the diagram.
Our React-based
frontend is designed to be minimalistic, functional, and easily exchangeable. With one of our key requirements focusing on the backend data lake, while still needing to present our work in a simple, readable fashion, we followed the principle of "less is more". The frontend does not execute any data logic, it always fetches the data from the backend using HTTP Rest-API requests, thus can be easily exchanged for other designs, making both it and the backend fully independent of each other and dataset agnostic.
The communication between the frontend and the API Gateway
happens through HTTP Rest-API requests with GeoJSON provided as a payload. The JSON structure for each of the endpoints can be found in the frontend/src/types
folder or below.
Following the principles of the API Gateway, it is the first and only backend component to which the Frontend container connects to. Our gateway provides all necessary API endpoints for accessing our backend micro-services, with its main function being HTTP Request routing and propagation of requests to appropriate backend services. We provide the following endpoints:
-
\getDatasetLists
- for querying the metadata database and providing a list of all available datasets, including their name, ID, a short description, and the main menu icon.
/**
* Type for the basicData from the metadata database.
*/
export interface DatasetBasicData {
datasetId: string;
name: string;
shortDescription: string;
icon: string;
}
-
\getDatasetMetadata
- for querying the metadata database with the dataset ID as a required parameter. Returns the metadata object of a specific dataset, including its map marker type, map icon, and minimum zoom level, and possibly other dataset-based values.
/**
* Type of the additionalData from the metadata database.
*/
export interface DatasetMetaData {
icon: string;
type: MarkersTypes;
longDescription: string;
minZoomLevel: number;
markersThreshold: number;
displayProperty: DisplayProperty[];
tables: Table[];
polygonColoring: PolygonColoring | null;
}
/**
* Display property type used for the marker popups.
*/
export interface DisplayProperty {
displayName: string;
value: string;
}
/**
* Table type for storing the number of ingested lines for each dataset.
*/
export interface Table {
name: string;
numberOfLines: number;
}
/**
* The map of types of colors for the polygons.
*/
export interface PolygonColoring {
attributeName: string;
colors: PolygonColor[];
}
/**
* Individual entry in the color map.
*/
export interface PolygonColor {
color: string;
values: string[];
}
-
\getDatasetViewportData
- returns the data for a specified viewport window and the requested dataset. This endpoint corresponds to the viewport of the map window on the frontend, including the top-left and bottom-right coordinates of the map and the zoom level. The request is propagated to the API Composer, which returns a list of data points in the geojson'sFeatureCollection<Geometry>
format. -
\loadLocationData
- returns data corresponding to a specific location (coordinate or area) selected on the frontend map. When called, it propagates the request to the API Composer, which queries all available datasets in the spatial database and aggregates their results.
/**
* A response object from the location endpoint.
*/
export interface LocationDataResponse {
individualData: DatasetItem[];
selectionData: DatasetItem[];
}
/**
* A single dataset row visible in the data view.
*/
export interface DatasetItem {
displayName: string;
value: string | null;
datasetID: string | null;
coordinate: number[] | null;
subdata: SubdataItem[] | null;
}
/**
* Sub rows for the data view and the dataset row.
*/
export interface SubdataItem {
key: string;
value: string;
}
The metadata database is responsible for storing all the metadata regarding the available datasets, including their names, descriptions, minimum zoom level, icons, and possibly more. For our project, we are using the MongoDB docker image and deploying it as a separate container, with other services connecting to it through MongoDB Drivers (such as MongoDB C# Driver).
The main purpose of the metadata database is to be able to provide all necessary dataset data for all micro-services, behaving similarly to a dataset registry. By separating this metadata into a database, we are able to support scaling the system, both in the number of micro-services and the supported geospatial databases. The structure of the metadata document can be seen below.
{
basicData: {
DatasetId: string,
Name: string,
ShortDescription: string,
Icon: SVG
},
additionalData: {
Icon: SVG
Type: "areas" | "markers" | "none",
DataType: "SHAPE" | "CITYGML" | "CSV" | "none",
LongDescription: string,
MinZoomLevel: int,
MarkersThreshold: int,
DisplayProperty: [ {displayName: string, value: string} ],
PolygonColoring: {attributeName: string, colors: [ {color: string, values: string[] } ]},
Tables: {name: string, numberOfLines: number}, // Filled in automatically by the data pipeline.
},
}
Where the basicData
is fetched by the \getDatasetLists
endpoint and contains dataset's ID, name, short description and an icon. Furthermore, the additionalData
section is fetched by the \getDatasetMetadata
endpoint and contains dataset's longer description, type of map markers, marker icon, two map thresholds (MinZoomLevel
for zoom warning and the MarkersThreshold
for a zoom level for which polygons are switched into markers), DisplayProperty
used for on click popups for markers, PolygonColoring
being a map of colors for polygons with specific parameter values and the Tables
filled automatically by the data pipeline.
For each of the supported datasets, such as metadata JSON document is created. For more information about how the metadata database is used, visit the corresponding metadata page.
The API Composer is the primary service responsible for querying geospatial databases, acting as a technical interface for accessing the data lake. Written in C#, it connects to these databases to retrieve:
- Specific datasets within a given viewport, corresponding to the
\getDatasetViewportData
endpoint. - Or, query and aggregate results about a specific location from multiple databases and tables, corresponding to the
\loadLocationData
endpoint.
The API Composer is designed to encapsulate the business layer logic of the product, with the main functions of the API Composer including querying the data lake based on dataset metadata and aggregating data suitably for analysis. Finally, the results are returned to the API Gateway
service. An example for on-demand aggregation generated summaries for arbitrary polygons can be seen below.
The data lake in our project serves the primary function of storing diverse datasets for user access. It supports multiple geospatial and non-geospatial databases through a microservice-oriented approach, with each database deployed as a separate service and accessible via the API Composer service. Currently, a single geospatial SQL database is in use, namely the mssql server
with geo-query support.
This database employs several performance enhancement strategies, including specific data types, index creation, and suitable levels of accuracy. More details on these optimizations can be found here. Finally, specific datasets are injected into the corresponding database using the Data Pipeline tool described below.
The C#
-based Data Pipeline populates the database with the specified datasets, where for each dataset one or more data sources can be defined. Each data source is represented by a .yaml
definition file containing information about the data's type, structure, and location, with one example of such a .yaml
file visible below. The definition files are separated into multiple folders common
, development
, and production
, used appropriately in specific deployments.
During this process, for each of the .yaml
files a new instance of the data pipeline is setup, and the data starts being ingested. The source of the data can be either specified as a local file path or a URL
link used for download. Furthermore, appropriate conversions on the data are performed, some metrics are pre-calculated for further increase in performance, database indexes are created, and bounding boxes are calculated. For more information about how to use and set up the Data Pipeline, visit a dedicated Data Pipeline page.
Supported data source formats:
- CSV
- SHAPE
- CITYGML
# describe the source
source:
# Type of the source URL or filepath
type: URL
# File path or URL pointing to the data
location: https://data.bundesnetzagentur.de/Bundesnetzagentur/SharedDocs/Downloads/DE/Sachgebiete/Energie/Unternehmen_Institutionen/E_Mobilitaet/Ladesaeulenregister.csv
# The format of the data. Options: CSV, SHAPE, CITYGML
# SHAPE expects .zip file.
data_format: CSV
options:
# Skip lines at the beginning
skip_lines: 0
# Discard any rows that have null values
discard_null_rows: false
# How to deal with an existing table. Options: ignore, replace, skip (default).
if_table_exists: replace
# the name of the table in the database
table_name: EV_charging_stations
# the delimiter used for CSV datatype.
delimiter: ";"
# Description of the table columns
table_cols:
- name: Betreiber
name_in_table: operator
# if this is true, will discard any row that has this column as null. defaults to false.
is_not_nullable: true
- name: Bundesland
name_in_table: state
# the sql type of this column. defaults to VARCHAR(500)
type: INT