-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Submission: 10: Investment Outcome Predictor #10
Comments
Data analysis review checklistReviewer:Conflict of interest
Code of Conduct
General checks
Documentation
Code quality
Reproducibility
Analysis report
Estimated hours spent reviewing:1 Hour Review Comments:
Overall, this project is very well written and covers all essential bases. As per the comments in the above, most of the issues that I have spotted in your project are very minor and can be fixed relatively quickly. It is interesting that you have decided to use R markdown as a way of rendering your report, maybe it would've been better to do it in Jupyter book? maybe it wouldn't. Well done! AttributionThis was derived from the JOSE review checklist and the ROpenSci review checklist. |
Reviewer: Isabela Lucas BruxellasConflict of interest
Code of Conduct
General checks
Documentation
Code quality
Reproducibility
Analysis report
Estimated hours spent reviewing: 3 HoursReview Comments:Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above. Things that were done particularly well:
Thing to improve
AttributionThis was derived from the JOSE review checklist and the ROpenSci review checklist. |
Data analysis review checklistReviewer: Jaskaran1116Conflict of interest
Code of Conduct
General checks
Documentation
Code quality
Reproducibility
Analysis report
Estimated hours spent reviewing: 1 hour 15 minutesReview Comments:Components that are constructed well
Components to improve on
Overall, great job! You guys have adhered to the guidelines and have created a very well structured project. I feel the suggestions are just some minor changes to the repository and can be fixed quickly. I, also, liked that you guys have used R makrdown to render your report since it allows the results of R code to be directly inserted into formatted documents. AttributionThis was derived from the JOSE review checklist and the ROpenSci review checklist. |
Data analysis review checklistReviewer: <GITHUB_USERNAME>Conflict of interest
Code of Conduct
General checks
Documentation
Code quality
Reproducibility
Analysis report
Estimated hours spent reviewing: 2 hrs.Review Comments:Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above. Well made pointsI think it is a good project which lands well on the computational side. I found the functions and testing to be well developed and carefully thought. I think the project is well structured and the idea of the project was solid from the beginning. It seems like a project where everyone worked in a fluent way which lead to a project which does not seem as a "glue" of parts. The project is easy to deploy thanks to the well done README file, while the makefile was well developed with no errors in the process. The conclusions are solid. Points to improve
AttributionThis was derived from the JOSE review checklist and the ROpenSci review checklist. |
Submitting authors: @nkoda @mahdiheydar @izk20 @harrysyz99
Repository: https://github.com/DSCI-310/DSCI-310-Group-10
Abstract/executive summary:
The KNN-Classification model was applied to 2017 Canadian census data to predict whether an individual made money on their investments (true class) or broke even or lost money (false class) using their family size, and whether they are the major income earner in their family as features.
All investments contain a risk, so the rationale for this analysis was to gain insight into whether the pressures of being the main income earner in a family and having a larger family size have influence on predicting someones investment outcomes. This could be used to further analyze the risks associated within the specific investments for further analysis.
The KNN-model was tuned for the nearest neighbors hyperparameter. A value of 26 was used yielding approximately 57% accuracy. Therefore, the model did not perform much better compared to a dummy classifier. The KNN-classification model was not able to distinguish between individuals in the same family size group unlike the pattern found in the actual data.
It is important to build other models such as a support vector machine model (SVM), or carry out feature engineering or add other features that may serve as better predictors to gain more solid results. This will enhance the investigation of the original research question of how family size, and whether an individual is the major income earner in their family, can be used to predict investment outcomes.
Editor: @ttimbers
Reviewer: @YellowPrawn @ClaudioETC @isabelalucas @Jaskaran1116
The text was updated successfully, but these errors were encountered: