This repository contains the codes and visualization tools for Western Power Distribution Data Challenge (Part 2). The aim of the challenge is to predict the peak EV usages across eight weeks in three substations in UK by using the demand and weather data only. You can find more details on the task can be found here. The data for this task can be found here. YouTube kick-off can be found here. Our team WOJJ members include Wangkun Xu, Olayinka Ayo, Jemima Graham, and Jiaruijue Wang. The work is supervised by Dr. Fei Teng. All the team members are from Control & Power group, Dept. EEE, Imperial College London, UK.
- Step 1: Load original data
- Step 2: Pre-process data
- Weather: exponential smoothing with/without cubic transformation(smoothed by default)
- National demand: weighted smoothing (smoothed by default)
- Load: averaged smoothing and outlier removal (controlled by
SMOOTH_INPUT
)
- Step 3: Prepare dataset
- Load data of 56 days ago (smoothed)
- Smoothed national data
- Calendar features: month, hour, day of year and temporal-encoded day of week
- Weather features:
temperature,solar_irradiance, windspeed_north and windspeed_east - Nearby-station load (smoothed)
- Step 4: Train GAM
- Combination of TensorTerms of different features
- Step 5: Post-process
- Smoothed combined load (controlled by
COMB_SMOOTH_METHOD
and window sizeWS
)- daily max
- hourly mean
- hourly max
- averaged smoothing + daily max
- weighted smoothing + daily max
- Negative value removal (controlled by
APPLY_ABS
)
- Smoothed combined load (controlled by
Generalized Additive Model (GAM) generalizes the linear models to capture different nonlinearities on each feature.
GAMs relax the restriction that the relationship must be a simple weighted sum, and instead assume that the outcome can be modeled by a sum of arbitrary functions of each feature (source).
GAM can be modelled as
$$
g(E_Y(y|x))=\beta_0+f_1(x_1)+f_2(x_2)+...+f_p(x_p)
$$
where
To run the model on windows OS, we suggest to use Anaconda.
- Install Anaconda;
- Clone everything in this folder to a local folder, e.g. named as WPD2;
- In the terminal run
conda create --name WPD2
to build the new environment; - Add the conda forge channel by
conda config --append channels conda-forge
; - Install the extra requirements
conda install --file requirements.txt
.
To execute the codes, simply run the entry script run.py
.
python ./run.py
Before executing the entry script, you may need to modify the DATA_FOLDER
and the SUBMISSION_PATH
configurations in run.py
and ensure the structure inside your data folder matches the description in the data_loader.py
.
The following file structure is expected in the data_folder:
- data_folder:
- phase-1
- {STATION} Combined Load xxxxx.csv
- {STATION} Training Data.csv
- template_1.csv
- solution_phase1.csv
- phase-2
- {STATION} Combined Load xxxxx.csv
- {STATION} Training Data.csv
- template_2.csv
- weather_data
- df_weather_{id}_hourly.csv
- national_demand
- demanddata_{year}.csv
- phase-1
We incorporate the following parameters for GAM:
-
N_SPLINES
: Number of splines to use for each marginal term. Must be of same length as feature. -
LAMBDA
: Strength of smoothing penalty. Must be a positive float. Larger values enforce stronger smoothing. |
With SHOW_ERROR
set to True, various smoothing methods with different window size will be evaluated, which can guide the parameter selection in phase 2. Our best results (error matrix) over stations of phase 1 are summarized as followed, which is a pandas.DataFrame
object.
Criterion: Mean Absolute Percentage Error (MAPE)
daily_max | hourly_mean | hourly_max | avg-9 | avg-13 | avg-17 | wgt-9 | wgt-13 | wgt-17 | |
---|---|---|---|---|---|---|---|---|---|
BOURNVILLE CB 7 | 0.04363 | 0.04433 | 0.06339 | 0.04053 | 0.03930 | 0.04052 | 0.04081 | 0.03996 | 0.03945 |
BRADLEY STOKE CB 8 | 0.01887 | 0.01939 | 0.02598 | 0.01884 | 0.01808 | 0.01570 | 0.01853 | 0.01878 | 0.01793 |
STRATTON CB 4041 | 0.02663 | 0.03460 | 0.07314 | 0.02566 | 0.02220 | 0.02744 | 0.02465 | 0.02517 | 0.02341 |
Overall (mean) | 0.02971 | 0.03277 | 0.05417 | 0.02834 | 0.02653 | 0.02788 | 0.02800 | 0.02797 | 0.02693 |
*The avg and wgt stand for averaged and weighted smoothing respectively, while the number followed stands for the window size.
BOURNVILLE CB 7 | BRADLEY STOKE CB 8 | STRATTON CB 4041 | |
---|---|---|---|
Training MAPE | 0.05173 | 0.02452 | 0.07264 |
*Using the averaged smoothing method with window size equal to 13 for post-processing.
BRIDPORT CB 306 | HEMYOCK CB 56_24 | PORTISHEAD ASHLANDS CB 4 | |
---|---|---|---|
Training MAPE | 0.04185 | 0.06348 | 0.07580 |
*Using the averaged smoothing method with window size equal to 13 for post-processing.