Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore Feature Engineering to Improve Models Performance #11

Open
8 tasks done
SverreNystad opened this issue Sep 24, 2023 · 2 comments
Open
8 tasks done

Explore Feature Engineering to Improve Models Performance #11

SverreNystad opened this issue Sep 24, 2023 · 2 comments
Assignees

Comments

@SverreNystad
Copy link
Owner

SverreNystad commented Sep 24, 2023

To enhance the predictive performance of our models (Linear Regression, Random Forest, Gradient Boosting, LSTM, ARIMA, SARIMA), we need to explore and implement various feature engineering strategies. Feature engineering can help in uncovering hidden patterns in the data, dealing with missing or noisy data, and improving model generalization.

1. Feature Transformation:

  • Log Transformation: Apply logarithmic transformation to skewed features to approximate Gaussian distribution.
  • Polynomial Features: Create polynomial features to capture interaction between different features.
  • Scaling: Standardize or normalize the features, especially for models sensitive to the scale of the input variables like Linear Regression and Gradient Boosting.
  • Trigonometric Transformation: Many cyclic features like time could gain from having their value as a trigonometric value. Then values like the year can easier be learn from. LSTM could gain from this.

2. Feature Creation:

  • Time-based Features: Extract hour, day, month, and other time-related features from the timestamp.
  • Lagged Features: Create lagged features to incorporate information from previous time steps.
  • Rolling Statistics: Calculate rolling mean, median, and standard deviation as new features to capture trends and seasonality.
  • Domain-specific Features: Explore creating features based on domain knowledge in solar energy production, such as sunlight intensity and optimal sunlight hours.

3. Handling Missing Data:

  • Investigate and implement advanced imputation methods like k-NN imputation or model-based imputation for dealing with missing data.

4. Feature Selection:

  • Correlation Analysis: Analyze correlation between features and target variable to select relevant features.
  • Recursive Feature Elimination (RFE): Use RFE to select the most important features.
  • Feature Importance from Tree-based Models: Extract feature importance from models like Random Forest and Gradient Boosting to select relevant features.

5. Temporal Features:

  • Since our data is time-series, experiment with different ways to incorporate temporal information, such as:
    • Seasonal decomposition to separate trend, seasonality, and residuals.
    • Fourier Transforms to capture cyclical patterns.

6. Data Augmentation:

  • Explore data augmentation techniques suitable for time-series data, such as time warping, to increase the diversity and amount of training data.

7. Encoding Categorical Features:

  • If applicable, explore different encoding techniques for categorical features, such as One-Hot Encoding or Target Encoding.

Tasks:

Acceptance Criteria:

  • Improved model performance in terms of evaluation metric (MAE).
  • Clear documentation of the feature engineering methods implemented and their impact on the model.
  • Successful integration of the engineered features in the modeling pipeline.

Additional Context:

  • Ensure that the engineered features make sense in the domain context and do not lead to data leakage.
@SverreNystad
Copy link
Owner Author

SverreNystad commented Sep 24, 2023

It is also very important to document every experiment so that we can learn from them. I propose that we create a database with three attributes: Model, MAE Score, Description, git-SHA.
If it is hard to log the model the three other attributes will be enough to retrieve it.

This will make it easy to get back to earlier models and know what works and much more.

  • Create Database to for models

@SverreNystad
Copy link
Owner Author

It could be of value to find the momentum of features and add that as vell. Like is it increasing or decreasing?

Gunnar2908 added a commit that referenced this issue Sep 29, 2023
Gunnar2908 added a commit that referenced this issue Oct 2, 2023
Gunnar2908 added a commit that referenced this issue Oct 5, 2023
Gunnar2908 added a commit that referenced this issue Oct 7, 2023
Gunnar2908 added a commit that referenced this issue Oct 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants