Explore Feature Engineering to Improve Models Performance #11

SverreNystad · 2023-09-24T18:37:45Z

To enhance the predictive performance of our models (Linear Regression, Random Forest, Gradient Boosting, LSTM, ARIMA, SARIMA), we need to explore and implement various feature engineering strategies. Feature engineering can help in uncovering hidden patterns in the data, dealing with missing or noisy data, and improving model generalization.

1. Feature Transformation:

Log Transformation: Apply logarithmic transformation to skewed features to approximate Gaussian distribution.
Polynomial Features: Create polynomial features to capture interaction between different features.
Scaling: Standardize or normalize the features, especially for models sensitive to the scale of the input variables like Linear Regression and Gradient Boosting.
Trigonometric Transformation: Many cyclic features like time could gain from having their value as a trigonometric value. Then values like the year can easier be learn from. LSTM could gain from this.

2. Feature Creation:

Time-based Features: Extract hour, day, month, and other time-related features from the timestamp.
Lagged Features: Create lagged features to incorporate information from previous time steps.
Rolling Statistics: Calculate rolling mean, median, and standard deviation as new features to capture trends and seasonality.
Domain-specific Features: Explore creating features based on domain knowledge in solar energy production, such as sunlight intensity and optimal sunlight hours.

3. Handling Missing Data:

Investigate and implement advanced imputation methods like k-NN imputation or model-based imputation for dealing with missing data.

4. Feature Selection:

Correlation Analysis: Analyze correlation between features and target variable to select relevant features.
Recursive Feature Elimination (RFE): Use RFE to select the most important features.
Feature Importance from Tree-based Models: Extract feature importance from models like Random Forest and Gradient Boosting to select relevant features.

5. Temporal Features:

Since our data is time-series, experiment with different ways to incorporate temporal information, such as:
- Seasonal decomposition to separate trend, seasonality, and residuals.
- Fourier Transforms to capture cyclical patterns.

6. Data Augmentation:

Explore data augmentation techniques suitable for time-series data, such as time warping, to increase the diversity and amount of training data.

7. Encoding Categorical Features:

If applicable, explore different encoding techniques for categorical features, such as One-Hot Encoding or Target Encoding.

Tasks:

Implement Feature Transformation methods. #24
Create new features based on domain knowledge and time information.
Handle missing data. #18
Perform feature selection to retain the most informative features.
Experiment with temporal feature engineering methods.
Apply suitable data augmentation techniques for time-series data.
Encode categorical features effectively. #22
Evaluate the impact of implemented feature engineering strategies on model performance.

Acceptance Criteria:

Improved model performance in terms of evaluation metric (MAE).
Clear documentation of the feature engineering methods implemented and their impact on the model.
Successful integration of the engineered features in the modeling pipeline.

Additional Context:

Ensure that the engineered features make sense in the domain context and do not lead to data leakage.

SverreNystad · 2023-09-24T18:41:33Z

It is also very important to document every experiment so that we can learn from them. I propose that we create a database with three attributes: Model, MAE Score, Description, git-SHA.
If it is hard to log the model the three other attributes will be enough to retrieve it.

This will make it easy to get back to earlier models and know what works and much more.

Create Database to for models

SverreNystad · 2023-09-27T10:51:28Z

It could be of value to find the momentum of features and add that as vell. Like is it increasing or decreasing?

SverreNystad assigned SverreNystad, pskoland and Gunnar2908 Sep 24, 2023

Gunnar2908 added a commit that referenced this issue Sep 25, 2023

#11 Remove snow_density as its a useless feature

56f23ad

Gunnar2908 added a commit that referenced this issue Sep 25, 2023

#11 Fix: using variable before assignment

9195897

Gunnar2908 added a commit that referenced this issue Sep 25, 2023

#11 Fix: Correct variable use

e62b8e0

Gunnar2908 added a commit that referenced this issue Sep 29, 2023

#11 Fix dropping all data rows

1934532

Gunnar2908 added a commit that referenced this issue Oct 2, 2023

#11 Remove snow features - model improved

2558d9b

Gunnar2908 added a commit that referenced this issue Oct 2, 2023

Removed more features #11

4e0cf26

Gunnar2908 added a commit that referenced this issue Oct 5, 2023

#11 Add multiple features - best model

6043cb4

Gunnar2908 added a commit that referenced this issue Oct 5, 2023

#11 2d and 3d correlation plots

d9e59f2

Gunnar2908 added a commit that referenced this issue Oct 7, 2023

#11 Add features

afa18a2

Gunnar2908 added a commit that referenced this issue Oct 7, 2023

#11 feat

a740d72

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore Feature Engineering to Improve Models Performance #11

Explore Feature Engineering to Improve Models Performance #11

SverreNystad commented Sep 24, 2023 •

edited

Loading

SverreNystad commented Sep 24, 2023 •

edited

Loading

SverreNystad commented Sep 27, 2023

Explore Feature Engineering to Improve Models Performance #11

Explore Feature Engineering to Improve Models Performance #11

Comments

SverreNystad commented Sep 24, 2023 • edited Loading

1. Feature Transformation:

2. Feature Creation:

3. Handling Missing Data:

4. Feature Selection:

5. Temporal Features:

6. Data Augmentation:

7. Encoding Categorical Features:

Tasks:

Acceptance Criteria:

Additional Context:

SverreNystad commented Sep 24, 2023 • edited Loading

SverreNystad commented Sep 27, 2023

SverreNystad commented Sep 24, 2023 •

edited

Loading

SverreNystad commented Sep 24, 2023 •

edited

Loading