Skip to content

danielwiegand/bahn_delays

Repository files navigation

Monitor train delays

A website to visualize train delays (under development)

under-development made-with-python Website MIT license

Goal

Making use of the Deutsche Bahn's API for timetable and timetable changes (https://developer.deutschebahn.com/store/apis/info?name=Timetables&version=v1&provider=DBOpenData&#/), this app collects and displays delay data of trains departing from a specific train station.

Tools used

kafka docker spark airflow streamlit

structure

  • Docker-compose to orchestrate the Docker containers needed
  • Airflow to manage DAGs (tasks) which are executed in fixed intervals
  • Apache Kafka to handle streaming messages from the Deutsche Bahn API
  • Apache Spark to work with streaming data
  • Streamlit to display train delay data

How to use

  1. Clone this repository and install docker-compose
  2. Get a token from Deutsche Bahn to use its API (see https://developer.deutschebahn.com/store/apis/info?name=Timetables&version=v1&provider=DBOpenData&#/)
  3. Create a file .envin the main project folder with the following content:
AIRFLOW_CONN_SPARK_DEFAULT=spark://airflow:airflow@spark%3A%2F%2Fspark:8080
BEARER=<your_deutsche_bahn_token>
  1. Use docker-compose up -d to start the pipeline
  2. Go to localhost:8501 to see collected data over time

Currently, data are collected for the München-Pasing train station. To change this, change the eva variable in functions.py. The eva (ID) of every other station from Deutsche Bahn can be fetched by means of the get /station/{pattern} API (see https://developer.deutschebahn.com/store/apis/info?name=Timetables&version=v1&provider=DBOpenData&#!/default/get_station_pattern).

License

MIT license

About

A website to visualize train delays

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published