UofT-DSI | production - Assignment 3 #72

sijiao-liu · 2024-11-05T09:41:16Z

What changes are you trying to make? (e.g. Adding or removing code, refactoring existing code, adding reports)

I am attempting to use SHAP values to explain the predictions of the best-performing model. This involves:

Implementing code to explain the impact of features on a specific observation from the test set.
Identifying which features are most and least important across the entire training set.
Providing guidance on feature removal and performance testing.

What did you learn from the changes you have made?

I learned that the SHAP library can be quite resource-intensive, and using it in restricted environments may cause issues related to library compatibility (e.g., CUDA or GPU dependencies).
I also realized the importance of ensuring that the data passed to SHAP matches the transformed format that the model uses, especially when categorical features have been one-hot encoded.

Was there another approach you were thinking about making? If so, what approach(es) were you thinking of?

Using KernelExplainer from SHAP, which is model-agnostic and does not require specialized GPU support, as a fallback.
Providing manual feature importance analysis using model coefficients (for linear models) or feature importances from tree-based models if SHAP visualizations continued to fail.

Were there any challenges? If so, what issue(s) did you face? How did you overcome it?

Challenges Faced:

Encountering compatibility issues with libraries that require GPU resources, leading to errors when importing SHAP or running specific visualizations.
Dealing with a dimension mismatch when trying to match SHAP values with the transformed features.

Overcoming the Challenges:

I fixed the dimension mismatch by ensuring that the features passed to SHAP were in the same transformed format used by the model.
For the CUDA errors, I planned to switch to a simpler SHAP setup that does not require GPU dependencies.

How were these changes tested?

The changes were tested by running:

Train-test split evaluations to ensure the model pipelines worked correctly.
Using SHAP to generate explanations, although visual outputs were difficult to render in the current environment.
Testing also involved debugging errors and validating data transformations to ensure correctness.

A reference to a related issue in your repository (if applicable)

Checklist

I can confirm that my changes are working as intended

github-actions · 2024-11-05T09:41:29Z

Hello, thank you for your contribution. If you are a participant, please close this pull request and open it in your own forked repository instead of here. Please read the instructions on your onboarding Assignment Submission Guide more carefully. If you are not a participant, please give us up to 72 hours to review your PR. Alternatively, you can reach out to us directly to expedite the review process.

sijiao-liu added 2 commits November 5, 2024 03:30

completed production assignment 2

58b9e3c

completed production assignment 3

bfafa9c

sijiao-liu closed this Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UofT-DSI | production - Assignment 3 #72

UofT-DSI | production - Assignment 3 #72

sijiao-liu commented Nov 5, 2024

github-actions bot commented Nov 5, 2024

UofT-DSI | production - Assignment 3 #72

UofT-DSI | production - Assignment 3 #72

Conversation

sijiao-liu commented Nov 5, 2024

What changes are you trying to make? (e.g. Adding or removing code, refactoring existing code, adding reports)

What did you learn from the changes you have made?

Was there another approach you were thinking about making? If so, what approach(es) were you thinking of?

Were there any challenges? If so, what issue(s) did you face? How did you overcome it?

How were these changes tested?

A reference to a related issue in your repository (if applicable)

Checklist

github-actions bot commented Nov 5, 2024