Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete implementation of assignment_2 model pipeline and metrics #79

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

marjanrajabi437
Copy link

What changes are you trying to make? (e.g. Adding or removing code, refactoring existing code, adding reports)

I am adding performance metric calculations, including negative log loss, ROC AUC, accuracy, and balanced accuracy for test data. Additionally, I’m restructuring the code to display fold-level results in a more readable, sorted format.

What did you learn from the changes you have made?

From these changes, I learned how to use cross_validation to extract and evaluate fold-level metrics across different folds, and I gained a better understanding of calculating metrics for model performance on test data.

Was there another approach you were thinking about making? If so, what approach(es) were you thinking of?

Yes, I considered using a separate function to handle the metric calculations and display results for both training and testing, to make the code more modular and reusable. This approach could improve readability, especially as the complexity of metrics increases.

Were there any challenges? If so, what issue(s) did you face? How did you overcome it?

Yes, I faced some challenges with the cross_validate function, especially in sorting results and calculating fold-level metrics directly. There were also issues with handling prediction probability arrays in metrics like roc_auc and log_loss. I addressed these by adjusting the data passed to predict_proba and breaking down the metric calculations step-by-step.

How were these changes tested?

These changes were tested by running the pipeline on a sample of test data and checking that each metric calculation matched the expected output. I also verified each step's output to ensure that the metrics align with known values for model performance.

A reference to a related issue in your repository (if applicable)

Refer to sections 3a, 3b, and 4 in the lab materials within the repository.

Checklist

  • I can confirm that my changes are working as intended

  • Code is free of syntax errors and runs without issues

  • Metrics display correctly for each fold and are sorted by neg_log_loss for clarity

  • Code is modular and well-documented for easy future adjustments

  • Output is formatted to improve readability and analysis

Copy link

Hello, thank you for your contribution. If you are a participant, please close this pull request and open it in your own forked repository instead of here. Please read the instructions on your onboarding Assignment Submission Guide more carefully. If you are not a participant, please give us up to 72 hours to review your PR. Alternatively, you can reach out to us directly to expedite the review process.

@marjanrajabi437
Copy link
Author

marjanrajabi437 commented Nov 11, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant