Updated Assignment 3 with my Answers #71
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes are you trying to make? (e.g. Adding or removing code, refactoring existing code, adding reports)
I am adding code to build and evaluate model pipelines for predicting the area affected by forest fires using the Forest Fire dataset from the UCI Machine Learning Repository. This includes preprocessing steps, creating and tuning multiple regression models, and evaluating their performance.
What did you learn from the changes you have made?
I learned how to effectively preprocess data using column transformers, including scaling and one-hot encoding. I also gained experience in building and tuning both linear and non-linear regression models using Ridge regression and RandomForestRegressor. Furthermore, I learned how to use SHAP values for feature importance analysis and how to systematically remove less important features to enhance model performance.
Was there another approach you were thinking about making? If so, what approach(es) were you thinking of?
Were there any challenges? If so, what issue(s) did you face? How did you overcome it?
How were these changes tested?
The changes were tested using cross-validation to ensure consistent and reliable performance metrics across different model configurations. I used metrics such as negative mean squared error (neg_MSE) to compare model performance before and after feature removal. SHAP values were also computed and visualized to validate the impact of each feature on the model’s predictions, ensuring the changes led to meaningful improvements.
A reference to a related issue in your repository (if applicable)
Checklist