diff --git a/README copy.md b/README copy.md deleted file mode 100644 index c3f139485c..0000000000 --- a/README copy.md +++ /dev/null @@ -1,39 +0,0 @@ -# AMSE/SAKI 2023 Template Project -This template project provides some structure for your open data project in the AMSE/SAKI module. -This repository contains (a) a data science project that is developed by the student over the course of the semester, and (b) the exercises that are submitted over the course of the semester. -Before you begin, make sure you have [Python](https://www.python.org/) and [Jayvee](https://github.com/jvalue/jayvee) installed. We will work with [Jupyter notebooks](https://jupyter.org/). The easiest way to do so is to set up [VSCode](https://code.visualstudio.com/) with the [Jupyter extension](https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter). - -## Project Setup -The following files are part of this template repository as examples and should be **replaced by you** over the semester: - -- `data.sqlite`: Your final, cleaned dataset. You will create an automated data pipeline that creates this SQLite database from multiple open data sources. The template repository includes data about train stations in germany, you need to replace this with your data! -- `exploration.ipynb`: A Jupyter notebook that you can use to explore your data and show in detail what it looks like. You can refer to this file in your report for users that want more information about your data. -- `report.ipynb`: Your final report as a Jupyter notebook. This is the result of your project work and should lead with a question that you want to answer using open data. The content of the report should answer the question, ideally using fitting visualizations, based on the data in `data.sqlite`. - - -## Exercises -During the semester you will need to complete exercises, sometimes using [Python](https://www.python.org/), sometimes using [Jayvee](https://github.com/jvalue/jayvee). You **must** place your submission in the `exercises` folder in your repository and name them according to their number from one to five: `exercise.`. - -In regular intervalls, exercises will be given as homework to complete during the semester. We will divide you into two groups, one completing an exercise in Jayvee, the other in Python, switching each exercise. Details and deadlines will be discussed in the lecture, also see the [course schedule](https://amse.uni1.de/). At the end of the semester, you will therefore have the following files in your repository: - -1. `./exercises/exercise1.jv` or `./exercises/exercise1.py` -2. `./exercises/exercise2.jv` or `./exercises/exercise2.py` -3. `./exercises/exercise3.jv` or `./exercises/exercise3.py` -4. `./exercises/exercise4.jv` or `./exercises/exercise4.py` -5. `./exercises/exercise5.jv` or `./exercises/exercise5.py` - -### Exercise Feedback -We provide automated exercise feedback using a GitHub action (that is defined in `.github/workflows/exercise-feedback.yml`). To view your exercise feedback, navigate to Actions -> Exercise Feedback in your repository (or use the direct link [/actions/workflows/exercise-feedback.yml](/actions/workflows/exercise-feedback.yml)). - -The exercise feedback is executed whenever you make a change in files in the `exercise` folder and push your local changes to the repository on GitHub. To see the feedback, open the latest GitHub Action run, open the `exercise-feedback` job and `Exercise Feedback` step. You should see command line output that contains output like this: - -```sh -Found exercises/exercise1.jv, executing model... -Found output file airports.sqlite, grading... -Grading Exercise 1 - Overall points 17 of 17 - --- - By category: - Shape: 4 of 4 - Types: 13 of 13 -``` diff --git a/README.md b/README.md index 97f9394bb0..d8992662d5 100644 --- a/README.md +++ b/README.md @@ -27,22 +27,15 @@ The following files are part of this project: -The second part is to filter the Datatables with `tablefilter.py` which deleted redundant data. The tables are reduced to the summary of the year and the rows are inverse so that the data sets fits each other. --Lastly the data is stored in `data.sqlite` for the exploration and the report. +-Lastly the data is stored in `data.sqlite` for the exploration and the report.# Notes -# Notes +# Adtional +Github actions are enabled to test the pipeline on every push. This ensures that the data is downloaded correctly. +github/workflows' folder: +`continuous_integration.yml`: Starts the Github pipeline action test. +`exercise-feedback.yml`: Enables grading for the exercises. -Github Actions are active to test for pipeline on every push. This ensures that the data is correctly downloaded. -Folder`github/workflows`: -`continuous_integration.yml`: Starts the Github action test for the pipeline. -`exercise-feedback.yml`: Activates the grading for the exercises. - - - - - - - -## Exercises -The exercises folder in the repository contains the results of the exercises that had to be completed over the semester. Exercises one, three and five are completed in Jayvee while exercises two and four are completed using Python. Github actions are used to test and grade the exercises. +# Exercises +The exercises folder in the repository contains the results of the exercises that were completed throughout the semester. Exercises one, three, and five are completed in Jayvee, while exercises two and four are completed in Python. Github actions are used to test and grade the exercises. diff --git a/report.ipynb b/report.ipynb index 9b24d3947c..21864d5a60 100644 --- a/report.ipynb +++ b/report.ipynb @@ -5,27 +5,27 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Report: This projects analyzes the number of car registraitons with electric drive compare to the with the prize for electric energy.\n", + "## Report: This project analyzes the number of car registrations with electric drive in comparison with the price for electric energy.\n", "\n", - "First of all the question to ask is a higher number of electric cars a factor that raises the energy prize in germany?\n", "\n", - "## The following two data source are used:\n", - "#### Source 1: Car registration per year in germany with alternative drives (including electric drives)\n", + "## The following two data sources are used:\n", + "#### Source 1: New car registrations per year in Germany with alternative drives (including electric drives)\n", "\n", "- Metadata URL: https://mobilithek.info/offers/573358160767496192\n", "- URL: (https://www.kba.de/SharedDocs/Downloads/DE/Statistik/Fahrzeuge/FZ28/fz28_2022_09.xlsx?__blob=publicationFile&v=4) \n", "- Type: xlsx\n", - " This project uses open data from Mobilithek to get the number on cars with electric drive which are new on the streets each year.\n", "\n", + "This project uses open data from the Mobilithek to calculate the number of new electric cars on the road each year.\n", "\n", - "#### Source 2: The energy prizes per year in germany\n", + "\n", + "#### Source 2: Energy prices per year in Germany\n", "\n", "- Metadata URL: https://www.govdata.de/web/guest/suchen/-/details/strompreise-fur-haushalte-deutschland-jahrejahresverbrauchsklassen-preisbestandteile\n", "- URL: (https://www-genesis.destatis.de/genesis/downloads/00/tables/61243-0002_00.csv)\n", "- Type: CSV\n", - " The second data is from govdata.com to get required energy bills.\n", + " The second data is from govdata.com to get the required energy bills.\n", "\n", - "### The question that interests me is: Does more electric cars cause increasing energy prizes?" + "### The question that interests me is: Will more electric cars cause energy prices to rise?" ] }, { @@ -33,8 +33,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Install dependencies\n", - "Initially, install all required dependencies. The specific version of SQLAlchemy is needed because SQLAlchemy 2.0 does not work with pandas yet. nbformat allows the use of the \"notebook\" formatter for the plot, others can not be rendered to HTML. Matplotlib is to visualise the data in a coordinate system. " + "## Installing dependencies\n", + "First install all required dependencies. The specific version of SQLAlchemy is needed because SQLAlchemy 2.0 does not yet work with pandas. nbformat allows the use of the \"notebook\" formatter for plotting, others cannot be rendered to HTML. Matplotlib is used to visualize the data in a coordinate system." ] }, { @@ -112,13 +112,15 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Load data\n", - "Create a pandas dataframe using the local sqlite file. The module operating system is loaded to run the pipeline und the filter in order to get the Result.sqlite File. Result is used to get a good representation of the data." + "## Loading data\n", + "Create a Pandas dataframe using the local sqlite file. The module operating system is loaded to run the pipeline and filter to get the Result.sqlite file. \n", + "\n", + "The file 'data.sqlite' is used to get a good representation of the data." ] }, { "cell_type": "code", - "execution_count": 39, + "execution_count": 52, "metadata": {}, "outputs": [ { @@ -163,8 +165,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Does electric cars on the streets of germany have an impact on the electric bill?\n", - "To answer our initial question, we use matplotlib.pyplot to draw two graphs. One for the number of new electric cars in germany per year. The othter to visualise the energy cost for household in germany per year." + "## Do electric cars on the streets of Germany have an impact on the electricity bill?\n", + "To answer our first question, we use matplotlib.pyplot to plot two graphs. One for the number of new electric cars in Germany per year. The other to visualize the energy costs for a household in Germany per year." ] }, { @@ -221,12 +223,12 @@ }, { "cell_type": "code", - "execution_count": 49, + "execution_count": 51, "metadata": {}, "outputs": [ { "data": { - "image/png": "", + "image/png": "", "text/plain": [ "
" ] @@ -251,14 +253,14 @@ "df2 = pd.read_sql_query(prize_axes, conn)\n", "conn.close()\n", "\n", - "x1= ['2019','2020','2021','2022']\n", - "\n", + "#x1= ['2019','2020','2021','2022']\n", + "x2 = df2['Jahr']\n", "y1 = df1['Insgesamt']\n", "y2 = df2['Insgesamt']\n", "\n", "fig, ax2= plt.subplots()\n", "\n", - "ax2.plot(x1,y2,'r-',label='Energy bill')\n", + "ax2.plot(x2,y2,'r-',label='Energy bill')\n", "ax2.set_xlabel('Jahre')\n", "ax2.set_ylabel('bill [Euro/Kwh]')\n", "ax2.legend()\n", @@ -274,9 +276,28 @@ "cell_type": "markdown", "metadata": {}, "source": [ + "## Interpretation\n", + "\n", "The graph for the car registration form 2016 to 2022 in the unit 1 million.\n", - "It shows a decrease per year till 2019 and after that number drop from 3.6 million to 2.6 million.\n", - "On the other side the energy bill " + "It shows a decrease per year until 2019 and after that the number drops from 3.6 million to 2.6 million.\n", + "\n", + "On the other hand, the energy bill has a range from 2022 to 2019. The data set does not get the order of the data correctly. The reverse process of the data set creates some error. \n", + "The graph shows a constant increase in energy prices in Germany.\n", + "\n", + "## Result.\n", + "\n", + "The two graphs together give the impression that the decrease in car registrations has increased energy prices. \n", + "The first question cannot be answered clearly. It can be interpreted as no, there is no effect on the prices.\n", + "\n", + "To get a clearer answer, more data needs to be examined and other variables/factors need to be taken into account.\n", + "The Ukraine crisis has increased prices everywhere. Some power plants are being shut down because of environmental laws in Germany.\n", + "\n", + "These things and more have to be considered.\n", + "\n", + "## All in all \n", + "The question: Will more electric cars cause energy prices to rise?\n", + "\n", + "#### The answer is no." ] } ],