From ff2a0adfdda8c853d837942b9c214819ecc6e7e0 Mon Sep 17 00:00:00 2001 From: Carsten Schmotz Date: Wed, 28 Jun 2023 12:25:22 +0200 Subject: [PATCH] updated readme --- README.md | 20 +++-- report copy.ipynb | 221 ---------------------------------------------- 2 files changed, 14 insertions(+), 227 deletions(-) delete mode 100644 report copy.ipynb diff --git a/README.md b/README.md index f4d1760ff..97f9394bb 100644 --- a/README.md +++ b/README.md @@ -1,13 +1,12 @@ # AMSE/SAKI 2023 Template Project -This your open data project in the AMSE/SAKI module for FAU in data engineering. +This a open data project in the AMSE/SAKI module for FAU in data engineering. This repository contains a data science project that is developed by the student over the course of the semester, and (b) the exercises that are submitted over the course of the semester. -## Project Setup +# Project Setup The following files are part of this project: -- `data.sqlite`: -The final, cleaned dataset. +- `data.sqlite`: The final, cleaned dataset. - `exploration.ipynb`: A Jupyter notebook that you can use to explore your data and show in detail what it looks like. You can refer to this file in your report for users that want more information about your data. - `report.ipynb`: Your final report as a Jupyter notebook. This is the result of your project work and should lead with a question that you want to answer using open data. The content of the report should answer the question, ideally using fitting visualizations, based on the data in `data.sqlite`. @@ -21,20 +20,29 @@ The final, cleaned dataset. - `project-plan.md`: The organistion file for the project. - `gitignore`: Prevents that `.sql` files get summited online to github in order to prevent storage shortage. -## Manual +# Manual -First of all am automated data pipeline `AutomatedDataPipeline.py` downloads the relevant data from the internet. + -The second part is to filter the Datatables with `tablefilter.py` which deleted redundant data. The tables are reduced to the summary of the year and the rows are inverse so that the data sets fits each other. -Lastly the data is stored in `data.sqlite` for the exploration and the report. +# Notes + +Github Actions are active to test for pipeline on every push. This ensures that the data is correctly downloaded. +Folder`github/workflows`: +`continuous_integration.yml`: Starts the Github action test for the pipeline. +`exercise-feedback.yml`: Activates the grading for the exercises. + + ## Exercises -The exercises folder in the repository contains the results of the exercises that had to be completed over the semester. Exercises one, three and five are completed in Jayvee while exercises two and four are completed using Python. +The exercises folder in the repository contains the results of the exercises that had to be completed over the semester. Exercises one, three and five are completed in Jayvee while exercises two and four are completed using Python. Github actions are used to test and grade the exercises. diff --git a/report copy.ipynb b/report copy.ipynb deleted file mode 100644 index b4f9d3b43..000000000 --- a/report copy.ipynb +++ /dev/null @@ -1,221 +0,0 @@ -{ - "cells": [ - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Report: Correlation between increasing energy prize because more selled electric cars\n", - "\n", - "This projekt uses open data from Mobilithek (https://download-data.deutschebahn.com/static/datasets/haltestellen/D_Bahnhof_2020_alle.CSV) to render a map of germany with all train stops and operators marked.\n", - "\n", - "The question that interests us is: Who runs trainstops in germany and where?" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Install dependencies\n", - "Initially, install all required dependencies. The specific version of SQLAlchemy is needed because SQLAlchemy 2.0 does not work with pandas yet. nbformat allows the use of the \"notebook\" formatter for the plot, others can not be rendered to HTML." - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Requirement already satisfied: pandas in c:\\users\\besitzer\\appdata\\local\\programs\\python\\python311\\lib\\site-packages (2.0.1)\n", - "Requirement already satisfied: python-dateutil>=2.8.2 in c:\\users\\besitzer\\appdata\\roaming\\python\\python311\\site-packages (from pandas) (2.8.2)\n", - "Requirement already satisfied: pytz>=2020.1 in c:\\users\\besitzer\\appdata\\local\\programs\\python\\python311\\lib\\site-packages (from pandas) (2023.3)\n", - "Requirement already satisfied: tzdata>=2022.1 in c:\\users\\besitzer\\appdata\\local\\programs\\python\\python311\\lib\\site-packages (from pandas) (2023.3)\n", - "Requirement already satisfied: numpy>=1.21.0 in c:\\users\\besitzer\\appdata\\local\\programs\\python\\python311\\lib\\site-packages (from pandas) (1.24.3)\n", - "Requirement already satisfied: six>=1.5 in c:\\users\\besitzer\\appdata\\roaming\\python\\python311\\site-packages (from python-dateutil>=2.8.2->pandas) (1.16.0)\n", - "Note: you may need to restart the kernel to use updated packages.\n", - "Requirement already satisfied: plotly in c:\\users\\besitzer\\appdata\\local\\programs\\python\\python311\\lib\\site-packages (5.14.1)\n", - "Requirement already satisfied: tenacity>=6.2.0 in c:\\users\\besitzer\\appdata\\local\\programs\\python\\python311\\lib\\site-packages (from plotly) (8.2.2)\n", - "Requirement already satisfied: packaging in c:\\users\\besitzer\\appdata\\roaming\\python\\python311\\site-packages (from plotly) (23.1)\n", - "Note: you may need to restart the kernel to use updated packages.\n", - "Note: you may need to restart the kernel to use updated packages.\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "ERROR: Invalid requirement: \"'SQLAlchemy==1.4.46'\"\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Requirement already satisfied: nbformat in c:\\users\\besitzer\\appdata\\local\\programs\\python\\python311\\lib\\site-packages (5.8.0)\n", - "Requirement already satisfied: fastjsonschema in c:\\users\\besitzer\\appdata\\local\\programs\\python\\python311\\lib\\site-packages (from nbformat) (2.16.3)\n", - "Requirement already satisfied: jsonschema>=2.6 in c:\\users\\besitzer\\appdata\\local\\programs\\python\\python311\\lib\\site-packages (from nbformat) (4.17.3)\n", - "Requirement already satisfied: jupyter-core in c:\\users\\besitzer\\appdata\\roaming\\python\\python311\\site-packages (from nbformat) (5.3.0)\n", - "Requirement already satisfied: traitlets>=5.1 in c:\\users\\besitzer\\appdata\\roaming\\python\\python311\\site-packages (from nbformat) (5.9.0)\n", - "Requirement already satisfied: attrs>=17.4.0 in c:\\users\\besitzer\\appdata\\local\\programs\\python\\python311\\lib\\site-packages (from jsonschema>=2.6->nbformat) (23.1.0)\n", - "Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in c:\\users\\besitzer\\appdata\\local\\programs\\python\\python311\\lib\\site-packages (from jsonschema>=2.6->nbformat) (0.19.3)\n", - "Requirement already satisfied: platformdirs>=2.5 in c:\\users\\besitzer\\appdata\\roaming\\python\\python311\\site-packages (from jupyter-core->nbformat) (3.5.0)\n", - "Requirement already satisfied: pywin32>=300 in c:\\users\\besitzer\\appdata\\roaming\\python\\python311\\site-packages (from jupyter-core->nbformat) (306)\n", - "Note: you may need to restart the kernel to use updated packages.\n" - ] - } - ], - "source": [ - "%pip install pandas\n", - "%pip install plotly\n", - "%pip install 'SQLAlchemy==1.4.46'\n", - "%pip install nbformat" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Load data\n", - "Create a pandas dataframe using the local sqlite file." - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [], - "source": [ - "import pandas as pd\n", - "\n", - "df = pd.read_sql_table('carregistration', 'sqlite:///data/CarRegistration.sqlite')" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Who runs trainstops in germany and where?\n", - "To answer our initial question, we use plotly to draw a scatterplot of all train stops in the dataset, overlaying it on a map from OpenStreetMap.\n", - "\n", - "The train stops will be colored based on the `Betreiber_Name`, allowing us to see what area an operator services." - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - " \n", - " " - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "import plotly.io as pio\n", - "import plotly.express as px\n", - "\n", - "pio.renderers.default = \"notebook\"\n", - "\n", - "fig = px.scatter_mapbox(df, \n", - " lat=\"Breite\", \n", - " lon=\"Laenge\", \n", - " hover_name=\"NAME\", \n", - " hover_data=[\"EVA_NR\", \"DS100\", \"Betreiber_Name\"],\n", - " color=\"Betreiber_Name\",\n", - " zoom=5, \n", - " height=800,\n", - " width=1200)\n", - "\n", - "fig.update_layout(mapbox_style=\"open-street-map\")\n", - "fig.update_layout(margin={\"r\":0,\"t\":0,\"l\":0,\"b\":0})\n", - "fig.show()" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "env", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.11.3" - }, - "orig_nbformat": 4 - }, - "nbformat": 4, - "nbformat_minor": 2 -}