-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Submission: 1: Predicting students’ grades using multi-variable regression #1
Comments
Data analysis review checklistReviewer: ayashaaConflict of interest
Code of Conduct
General checks
Documentation
Code quality
Reproducibility
Analysis report
Estimated hours spent reviewing: 3Review Comments:Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above. This was a very well written report with an interesting topic! I enjoyed reading and learning about the factors that may predict grades, and they analysis ran smoothly on my end. I am not entirely sure if this was a bug on my end from running it locally, but after running make all, there was no report output in the _build folder. Instead, there were several html files for each section (intro, methods, etc), which made it a bit more difficult to read through the whole report. I did not check off style guidelines because while many of them were followed, there was some style inconsistencies throughout the code and functions. For example, some function name files do not match the function definition; ex. plotSquareData.py and plot_square_data. Furthermore, some variable names follow camelCase while others follow snake_case. Finally, I think it would be interesting to expand your results/discussion section. Specifically, the paragraphs where you are discussing the features impacting true high grades. I think this is a really interesting part of your analysis, and would love to see more in depth analysis of these, such as a discussion about which maternal jobs have a positive vs negative effect. Or even some research/speculations as to why the mothers job has a higher impact than the fathers, why being in a relationship has a negative impact, etc. Overall, great work!! 👍 AttributionThis was derived from the JOSE review checklist and the ROpenSci review checklist. |
Data analysis review checklistReviewer: <GITHUB_USERNAME>Conflict of interest
Code of Conduct
General checks
Documentation
Code quality
Reproducibility
Analysis report
Estimated hours spent reviewing:1.5 Review Comments:The analysis was well done and presents several interesting questions. It would be very interesting to see how combinations of features can affect the final grade prediction, as that could give more interpretable results than comparing them all individually. The code itself is readable and well commented, but doesn't exactly fit this class' guidelines. Some functions use different formats for documentation, and not all of them include examples of how to run them. These issues are to be expected when multiple people work on a project, and a standard style guide could fix them. I checked the box for documentation as it provides enough information to understand the code, but this could be an issue for final grading. Tests are well done and cover edge cases, the only feedback I would give is to print error messages from the assert statements when they fail instead of print statements to simplify the output. AttributionThis was derived from the JOSE review checklist and the ROpenSci review checklist. |
Data analysis review checklistReviewer: asmdrkConflict of interest
Code of Conduct
General checks
Documentation
Code quality
Reproducibility
Analysis report
Estimated hours spent reviewing:1-2 Review Comments:This was a great analysis, the introduction in particular set up the analysis really well, clearly outlining the question being explored and the importance and potential uses that it could have. In fact, the quality of writing throughout the report was great, clearly explaining the choices and decisions made for the analysis and why, and also explaining their significance in an easy to understand manner. AttributionThis was derived from the JOSE review checklist and the ROpenSci review checklist. |
Submitting authors: @TheAMIZZguy @danielhou13 @TimothyZG @gzzen
Repository: https://github.com/DSCI-310/DSCI-310-Group-1
Abstract/executive summary:
For this project, we put our focus on predicting students’ final grades. Being able to efficiently predict the final grade allows a student to track their current progress and plan in advanced. The dataset being used is recorded at UCI ML Repo. We are particularly interested in how would the following features provided in the data could contribute to the prediction of students’ final grade
G3
:study time
: The number of hours spend studying per week.Pstatus
: Whether the parents are living together or seperated.Medu
: Mother's education level.Fedu
: Father's education level.Mjob
: The mother's job.Fjob
: The father’s job.goout
: Frequency of that student hanging out with friends.romantic
: Whether the student is in a romantic relationship.travel time
: How long it takes for the student to get to school.Since we are using a mixture of categorical variables and numeric variables to predict a quantitative result, the concept of least square regression analysis from DSCI 100 could be implemented and extended to fit our context.
We performed a 80-20 split on the dataset and trained a multi-variable least-square regression model on the training data with the 9 features we selected for the model. The simplest method of doing least squares regression is Ridge Regression, which is functionally similar to Linear Regression, but better at avoiding unexpected coefficients.
We test the model with cross validation and get an average cv-score of -4.61, which means an error of 4.61 and a final RMSE error of 3.83.
Editor: @ttimbers
Reviewer: @rlaze @asmdrk @ayashaa
The text was updated successfully, but these errors were encountered: