Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UofT-DSI | Scaling to production - Assignment 2 #53

Closed
wants to merge 7 commits into from

Conversation

movcha
Copy link

@movcha movcha commented Jul 13, 2024

What changes are you trying to make? (e.g. Adding or removing code, refactoring existing code, adding reports)

I focused on implementing a complete machine learning pipeline using the Adult dataset. I aimed to construct a robust preprocessing workflow, build a model pipeline with parameter tuning, evaluate its performance using cross-validation metrics, and assess its predictive power on the test dataset.

What did you learn from the changes you have made?

  • Preprocessing techniques for numerical and categorical data, including imputation and scaling.
  • Constructing pipelines in scikit-learn to streamline model training and evaluation.
  • The significance of cross-validation in assessing model performance and tuning hyperparameters effectively.
  • Importance of setting a random state for reproducibility in machine learning experiments.

Was there another approach you were thinking about making? If so, what approach(es) were you thinking of?

I thought about incorporating feature selection techniques to improve model efficiency and interpretability.

Were there any challenges? If so, what issue(s) did you face? How did you overcome it?

One challenge was ensuring the pipeline's robustness across different datasets or scenarios, particularly in handling missing data and categorical variables with varying cardinality. I addressed this by experimenting with different preprocessing strategies and validating the pipeline's performance across multiple folds in cross-validation.

How were these changes tested?

The changes were tested by running the notebook and verifying the output at each step.

A reference to a related issue in your repository (if applicable)

N/A

Checklist

  • I can confirm that my changes are working as intended

Copy link

Hello, thank you for your contribution. If you are a participant, please close this pull request and open it in your own forked repository instead of here. Please read the instructions on your onboarding Assignment Submission Guide more carefully. If you are not a participant, please give us up to 72 hours to review your PR. Alternatively, you can reach out to us directly to expedite the review process.

@movcha movcha closed this Jul 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant