-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: Jonita Ruiter <[email protected]>
- Loading branch information
1 parent
9389e66
commit 3841e81
Showing
11 changed files
with
336 additions
and
620,342 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
[submodule "openstef"] | ||
path = openstef | ||
url = https://github.com/OpenSTEF/openstef.git | ||
branch = Update-pandas |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -101,7 +101,7 @@ | |
}, | ||
"outputs": [], | ||
"source": [ | ||
"train_data.head()" | ||
"display(train_data.head())" | ||
] | ||
}, | ||
{ | ||
|
299 changes: 299 additions & 0 deletions
299
examples/04. Split net load into components using DAZLS.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,299 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "64765210", | ||
"metadata": {}, | ||
"source": [ | ||
"# Dazls model\n", | ||
"\n", | ||
"Energy splitting method in openSTEF. The DAZLS model is able to split a forecast into wind (on shore), solar and other.\n", | ||
"\n", | ||
"It trains one splitting model which can be used for every prediction job. As input it uses data from multiple substations with known components and it outputs the prediction of solar and wind power for unkown target substations.\n", | ||
"\n", | ||
"The model contains 2-steps which are deployed in sequence:\n", | ||
"1. Domain model (any data-driven model can be used);\n", | ||
"2. Adaptation model (any data-driven model can be used).\n", | ||
"\n", | ||
"#### For reference, see: [dazls.rst](https://github.com/OpenSTEF/openstef/tree/main/docs/dazls.rst)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "69f0a54a", | ||
"metadata": {}, | ||
"source": [ | ||
"## This notebook contains:\n", | ||
"\n", | ||
"1. Preprocessing of the data which will be used for the model;\n", | ||
"\n", | ||
"We use the \"combined_data\" folder, which contains the raw data, to preprocess the data and save it in the folder \"prep_data\". After this, the preprocessed data with metadata can be found in the \"prep_data\" file in the path we have set. Then we use the prep_data to run the dazls model.\n", | ||
"\n", | ||
"2. Train dazls and generate a prediction for 1 out-of-sample csv file; \n", | ||
"\n", | ||
"\n", | ||
"3. Load and store the model.\n", | ||
"\n", | ||
"The dazls_stored.sav file is being produced and is being used in openstef for the components forecast ([create_component_forecast](https://github.com/OpenSTEF/openstef/blob/main/openstef/pipeline/create_component_forecast.py))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "056a49f8", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import numpy as np\n", | ||
"from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error\n", | ||
"import glob\n", | ||
"import pandas as pd\n", | ||
"import os\n", | ||
"from sklearn.model_selection import train_test_split\n", | ||
"from sklearn.preprocessing import MinMaxScaler\n", | ||
"from sklearn.neighbors import KNeighborsRegressor\n", | ||
"import random \n", | ||
"from sklearn.utils import shuffle\n", | ||
"import joblib\n", | ||
"\n", | ||
"from openstef.model.regressors.dazls import Dazls\n", | ||
"\n", | ||
"pd.options.plotting.backend = 'plotly'\n", | ||
"\n", | ||
"#Seed and preparation\n", | ||
"random.seed(999)\n", | ||
"np.random.seed(999)\n", | ||
"\n", | ||
"#Path\n", | ||
"combined_data=[]\n", | ||
"station_name=[]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "b7b4284a", | ||
"metadata": {}, | ||
"source": [ | ||
"## Preprocess the data\n", | ||
"Create the \"prep_data\" folder" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "cb4637ed", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"#Read, create metadata and save in prep_data folder\n", | ||
"for file_name in glob.glob('combined_data/*.csv'):\n", | ||
" \n", | ||
" #Read and fill missing values\n", | ||
" x = pd.read_csv(file_name, low_memory=False,parse_dates=[\"datetime\"],index_col=0)\n", | ||
" x[\"datetime\"]=pd.to_datetime(x[\"datetime\"])\n", | ||
" x=x.set_index('datetime')\n", | ||
" x.replace([np.inf, -np.inf], np.nan, inplace=True)\n", | ||
" x=x.ffill()\n", | ||
" # x=x.interpolate(method='ffill')\n", | ||
"\n", | ||
" ## Get variance metadata ####\n", | ||
" var=x.iloc[:,:3].var()\n", | ||
" for i in range(3):\n", | ||
" x.loc[:, 'var'+str(i)] = var.iloc[i]\n", | ||
" ### end get variance ####\n", | ||
"\n", | ||
" ## Get sem metadata ####\n", | ||
" sem=x.iloc[:,:3].sem()\n", | ||
" for i in range(3):\n", | ||
" x.loc[:, 'sem'+str(i)] = sem.iloc[i]\n", | ||
" ### end get sem ####\n", | ||
"\n", | ||
"\n", | ||
" ## Get min-max capacity physical metadata ####\n", | ||
" mini=x.iloc[:,3:5].min()\n", | ||
" maxi=x.iloc[:,3:5].max()\n", | ||
" for i in range(2):\n", | ||
" x.loc[:, 'min'+str(i)] = mini.iloc[i]\n", | ||
" for i in range(2):\n", | ||
" x.loc[:, 'max'+str(i)] = maxi.iloc[i] \n", | ||
" ### end get sem #### \n", | ||
" \n", | ||
" combined_data.append(x)\n", | ||
" sn=os.path.basename(file_name)\n", | ||
" station_name.append(sn[:len(sn)-4])\n", | ||
" x.to_csv(\"prep_data/\"+sn, index=True)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "2d045c67", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"combined_data = []\n", | ||
"station_name = []\n", | ||
"\n", | ||
"# Read prepared data\n", | ||
"for file_name in glob.glob('prep_data/*.csv'):\n", | ||
" x = pd.read_csv(file_name, low_memory=False, parse_dates=[\"datetime\"])\n", | ||
" x[\"datetime\"] = pd.to_datetime(x[\"datetime\"])\n", | ||
" x = x.set_index('datetime')\n", | ||
" x.columns=[x.lower() for x in x.columns]\n", | ||
" combined_data.append(x)\n", | ||
" sn = os.path.basename(file_name)\n", | ||
" station_name.append(sn[:len(sn) - 4])\n", | ||
"\n", | ||
"\n", | ||
"# Split data in train and test (the first substation is being used for the testing)\n", | ||
"training_data = pd.concat(combined_data[1:])\n", | ||
"test_data = combined_data[0]\n", | ||
"target_columns =['total_solar_part', 'total_wind_part']\n", | ||
"feature_columns = [x for x in test_data.columns if x not in target_columns]\n", | ||
"print('Testing station:',station_name[0])" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "8fca6b90", | ||
"metadata": {}, | ||
"source": [ | ||
"## Initialize DAZLS model" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "375a8b47", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"model = Dazls()\n", | ||
"# Fit model\n", | ||
"model.fit(training_data.loc[:,feature_columns], training_data.loc[:,target_columns])" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "75e033fc", | ||
"metadata": {}, | ||
"source": [ | ||
"## Generate a prediction\n", | ||
"In this section, a prediction is generated using the DAZLS model." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "66e7dcdc", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# get predicted y\n", | ||
"y = model.predict(test_data.loc[:,feature_columns])" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "d53da56c", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Examine the results, which are stored in the variable y. The other variables are added to these results by copying the test_data.\n", | ||
"result = test_data.loc[:,target_columns].copy()\n", | ||
"result['DAZLS_wind_split'] = y[:,0]\n", | ||
"result['DAZLS_solar_split'] = y[:,1]\n", | ||
"result.iloc[60:]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "53c59b68", | ||
"metadata": {}, | ||
"source": [ | ||
"## Evaluate the results\n", | ||
"Examine the results by looking at the build-in metrics and visualisation." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "6bb8ce3a", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# print prediction performance\n", | ||
"RMSE, R2=model.score(test_data.loc[:,target_columns], y)\n", | ||
"\n", | ||
"print(\"The root-mean-square error (RMSE) is {} \".format(RMSE) + \"and the R2 score is {}\".format(R2))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "0cc8e7e0", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Visualize the results by plotting the results of the DAZLS model.\n", | ||
"\n", | ||
"fig_result=result.plot()\n", | ||
"fig_result.show()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "eacf2558", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Compare the wind split results to the actual value\n", | ||
"fig_compare_result_wind=pd.concat([result['DAZLS_wind_split'], test_data['total_wind_part']], axis=1).plot()\n", | ||
"fig_compare_result_wind.show()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "ba34fcc9", | ||
"metadata": {}, | ||
"source": [ | ||
"## Store the model" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "4ac275ad", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Store the dazls file\n", | ||
"filename = 'dazls_stored.sav'\n", | ||
"joblib.dump(model, filename)" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.13" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
File renamed without changes.
Oops, something went wrong.