-
Notifications
You must be signed in to change notification settings - Fork 0
Data Pipeline
The data pipeline is a CLI application, used to insert a dataset into the database. Currently, it supports the following file types: CSV
, SHAPE
, and CITYGML
. Each data source is described using a separate .yaml
file but can point to the same dataset ID.
- CLI execution
DataPipeline datasource.yaml
- Options:
-b | --behaviour
Behavior when inserting into a database:
-
replace
: drop the existing table before inserting. -
skip
: do not insert when the table already exists. -
ignore
: always try to insert regardless if the table already exists or not.
# describe the source
source:
# link | filepath
type: URL
# filepath or URL pointing to the file.
location: https://data.bundesnetzagentur.de/Bundesnetzagentur/SharedDocs/Downloads/DE/Sachgebiete/Energie/Unternehmen_Institutionen/E_Mobilitaet/Ladesaeulenregister.csv
# the format of the data. Options: CSV, SHAPE
# SHAPE expects .zip file.
data_format: CSV
options:
# skip lines at the beginning
skip_lines: 0
# discard any rows that have null values
discard_null_rows: false
# how to deal with existing table. Options: ignore, replace, skip (default).
if_table_exists: replace
# the name of the table in the database
table_name: EV_charging_stations
# the delimiter used for CSV datatype.
delimiter: ";"
table_cols:
- name: Betreiber
name_in_table: operator
# if this is true, will discard any row that has this column as null. defaults to false.
is_not_nullable: true
- name: Bundesland
name_in_table: state
# the sql type of this column. defaults to VARCHAR(500)
type: INT
To make the import of a .yaml
file work when starting the application from inside Visual Studio you need to set it up in the properties.
- Right-click on the solution -> properties.
- go to Debug
- click "open debug launch profiles UI"
- set the arguments as
test.yaml
(or another yaml file) and the working directory as the/yaml
directory in the projects directory.
- Download SQL server developer version from here
- Then install SQL server
- After that download and install SQL Server Management Studio (SSMS) from here
- Now you will be able to use SQL server from SSMS.
- In SSMS left click on db server then go to properties -> Security. Then in server authentication, select sql server and Windows authentication mode.
- Then restart the SQL server.
- First, create DB using the below command:
IF EXISTS(select 1 from sys.databases where [name]='BIEDB') DROP DATABASE [BIEDB] GO USE [master] CREATE DATABASE [BIEDB] GO USE [BIEDB] GO
-
Now create a user by following the steps below:
- In SSMS navigate to Security
- Then enter Login name
- Then select SQL server authentication and enter the password
- Then in User Mapping select the db you created and also select db_owner
-
After the user is created, necessary permissions need to be gave to it.
- Go into Database properties
- Then into permissions
- Then provide relevant permissions like create table and insert.
To establish a connection with SQL server you need to pass follow environment variables:
- DB_NAME=BIEDB // Specifies the name of the database. - DB_PASSWORD=MyPass@1234 // Specifies the password for the SQL Server user profile. - DB_SERVER=db // Specifies the hostname or IP address of the SQL Server instance. - DB_USERNAME=sa // Specifies the SQL Server user profile name. - DB_TYPE=SQL // Specifies the type of database server being used. - TRUSTED=False // By setting false indicates that Windows Authentication is not used, and instead, SQL Server login credentials are provided.
-
For local use, you can pass environment variables by following these steps:
- Right-click on the solution -> properties.
- go to Debug
- click "open debug launch profiles UI"
- then set the environment variables
-
On the server you can pass these parameters to the DataPipeline service as environment variables in docker-compose file.
The csv-importer is used to read the data from a given csv-file and prepare it for storage in the database. The path to the csv-file and other arguments are given via the .yaml
file.
The csv-Importer can work with local csv-files (given as a file path) or remote csv-files (using URL links).
The filtering process inside of the CSV-importer includes:
- Removing empty rows
- Removing columns not mentioned in the yaml file
- Removing rows where fields, that are market not empty in the yaml-file, are empty.