Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submission: 4: Default payments in Taiwan and comparison of the predictive accuracy of probability of default #4

Open
1 task
ttimbers opened this issue Apr 1, 2022 · 4 comments

Comments

@ttimbers
Copy link

ttimbers commented Apr 1, 2022

Submitting authors: @Shravan37 @overcast-day @hmartin11 @YuYT98

Repository: https://github.com/DSCI-310/DSCI-310-Group-4

Abstract/executive summary:
Financial institutions incur monetary loss when a client or borrower is unable to pay their interest or their initial principal on time. Thus, it is necessary for such institutions to assess the risk that potential borrowers cannot repay their loan in determining their eligibility for the loan in the first place. The present study endeavors to answer the question "Is there a way to effectively predict whether or not a client will default on their credit card payment?" and uncover the most significant features that contribute to the higher likelihood of defaulting. The result of predictive accuracy of the projected likelihood of default will be more beneficial than the binary result of categorization - credible or not credible customers - from the standpoint of risk management.

Editor: @ttimbers

Reviewer: @TheAMIZZguy @jossiej00 @sasiburi @zhangfred8

@sasiburi
Copy link

sasiburi commented Apr 7, 2022

Data analysis review checklist

Reviewer: sasiburi

Conflict of interest

  • As the reviewer I confirm that I have no conflicts of interest for me to review this work.

Code of Conduct

General checks

  • Repository: Is the source code for this data analysis available? Is the repository well organized and easy to navigate?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?

Documentation

  • Installation instructions: Is there a clearly stated list of dependencies?
  • Example usage: Do the authors include examples of how to use the software to reproduce the data analysis?
  • Functionality documentation: Is the core functionality of the data analysis software documented to a satisfactory level?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Code quality

  • Readability: Are scripts, functions, objects, etc., well named? Is it relatively easy to understand the code?
  • Style guidelines: Does the code adhere to well known language style guides?
  • Modularity: Is the code suitably abstracted into scripts and functions?
  • Tests: Are there automated tests or manual steps described so that the function of the software can be verified? Are they of sufficient quality to ensure software robsutness?

Reproducibility

  • Data: Is the raw data archived somewhere? Is it accessible?
  • Computational methods: Is all the source code required for the data analysis available?
  • Conditions: Is there a record of the necessary conditions (software dependencies) needed to reproduce the analysis? Does there exist an easy way to obtain the computational environment needed to reproduce the analysis?
  • Automation: Can someone other than the authors easily reproduce the entire data analysis?

Analysis report

  • Authors: Does the report include a list of authors with their affiliations?
  • What is the question: Do the authors clearly state the research question being asked?
  • Importance: Do the authors clearly state the importance for this research question?
  • Background: Do the authors provide sufficient background information so that readers can understand the report?
  • Methods: Do the authors clearly describe and justify the methodology used in the data analysis? Do the authors communicate any assumptions or limitations of their methodologies?
  • Results: Do the authors clearly communicate their findings through writing, tables and figures?
  • Conclusions: Are the conclusions presented by the authors correct?
  • References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
  • Writing quality: Is the writing of good quality, concise, engaging?

Estimated hours spent reviewing: 75min

Review Comments:

  1. I am able to setup the environment used to clone this repository. Although the instructions are well explained, neither the pdf version nor the html version of this report can be reproduced due to missing R and R packages in docker image. I would recommend adding R as well as the R packages you used in Rmarkdown (tidyverse and tidymodels) to Dockerfile so that others can successfully reproduce pdf and html forms of your report.
  2. Overall, the report is well-structured and clearly demonstrate an essential societal question from a comprehensive and effective data science aspect. However the research question needs to be stated at the beginning of the introduction part, by doing that it is easier for viewers to follow the analysis. I really like the background introduction part but personally as an audience, the importance of this topic can be further emphasized.
  3. For the documentation part, I would suggest you to add more contents in CONTRIBUTING.md including detailed steps for directly contribution, reporting bugs and other forms of support. Further, there is no link to your data set in this repository for now, I would recommend including that in README.md.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

@jossiej00
Copy link

jossiej00 commented Apr 7, 2022

Data analysis review checklist

Reviewer:

Conflict of interest

  • As the reviewer I confirm that I have no conflicts of interest for me to review this work.

Code of Conduct

General checks

  • Repository: Is the source code for this data analysis available? Is the repository well organized and easy to navigate?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?

Documentation

  • Installation instructions: Is there a clearly stated list of dependencies?
  • Example usage: Do the authors include examples of how to use the software to reproduce the data analysis?
  • Functionality documentation: Is the core functionality of the data analysis software documented to a satisfactory level?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Code quality

  • Readability: Are scripts, functions, objects, etc., well named? Is it relatively easy to understand the code?
  • Style guidelides: Does the code adhere to well known language style guides?
  • Modularity: Is the code suitably abstracted into scripts and functions?
  • Tests: Are there automated tests or manual steps described so that the function of the software can be verified? Are they of sufficient quality to ensure software robsutness?

Reproducibility

  • Data: Is the raw data archived somewhere? Is it accessible?
  • Computational methods: Is all the source code required for the data analysis available?
  • Conditions: Is there a record of the necessary conditions (software dependencies) needed to reproduce the analysis? Does there exist an easy way to obtain the computational environment needed to reproduce the analysis?
  • Automation: Can someone other than the authors easily reproduce the entire data analysis?

Analysis report

  • Authors: Does the report include a list of authors with their affiliations?
  • What is the question: Do the authors clearly state the research question being asked?
  • Importance: Do the authors clearly state the importance for this research question?
  • Background: Do the authors provide sufficient background information so that readers can understand the report?
  • Methods: Do the authors clearly describe and justify the methodology used in the data analysis? Do the authors communicate any assumptions or limitations of their methodologies?
  • Results: Do the authors clearly communicate their findings through writing, tables and figures?
  • Conclusions: Are the conclusions presented by the authors correct?
  • References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
  • Writing quality: Is the writing of good quality, concise, engaging?

Estimated hours spent reviewing: 1.5 hr

Review Comments:

  1. Though the data scource is accessable, I think the structure of data/ could be more organized. For example, you can put the raw data in a sub-folder and other processed in the other sub-folder under data/. Also, those summary plots generated in the process of the analysis can be stored under /results.
  2. The list of Python dependencies and the documentation of dockerfile are very clear and easy to read! However, there are some incompatible between them, like tidyverse is listed in README, but not included in your dockerfile. I am not a specialist of Python, I may be wrong.
  3. I think you can add more details to your CONTRIBUTING.md. For example, you can standardize the contents of an issue, like what details should be included in an issue, and clarify the scenario of processing an issue.
  4. A list of authors can be added to your report. Group 4 is not that specific.
  5. Some filenames of function files are not that clear, like the file functions.py. It's hard for readers to know what kind of functions are in that file. A suggestion: I think you can unify a fashion of naming functions and related function files.
  6. I was stuck in the step 2 of Running the project via Docker. I could lauch the jupyter lab, but nothing in my work folder. I tried to run to the second command in step 2 after moved to the root of the project, but it still didn't work. I don't know if it was just an incident. You can double check the commands.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

@zhangfred8
Copy link

Data analysis review checklist

Reviewer: <GITHUB_USERNAME>

Conflict of interest

  • As the reviewer I confirm that I have no conflicts of interest for me to review this work.

Code of Conduct

General checks

  • Repository: Is the source code for this data analysis available? Is the repository well organized and easy to navigate?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?

Documentation

  • Installation instructions: Is there a clearly stated list of dependencies?
  • Example usage: Do the authors include examples of how to use the software to reproduce the data analysis?
  • Functionality documentation: Is the core functionality of the data analysis software documented to a satisfactory level?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Code quality

  • Readability: Are scripts, functions, objects, etc., well named? Is it relatively easy to understand the code?
  • Style guidelides: Does the code adhere to well known language style guides?
  • Modularity: Is the code suitably abstracted into scripts and functions?
  • Tests: Are there automated tests or manual steps described so that the function of the software can be verified? Are they of sufficient quality to ensure software robsutness?

Reproducibility

  • Data: Is the raw data archived somewhere? Is it accessible?
  • Computational methods: Is all the source code required for the data analysis available?
  • Conditions: Is there a record of the necessary conditions (software dependencies) needed to reproduce the analysis? Does there exist an easy way to obtain the computational environment needed to reproduce the analysis?
  • Automation: Can someone other than the authors easily reproduce the entire data analysis?

Analysis report

  • Authors: Does the report include a list of authors with their affiliations?
  • What is the question: Do the authors clearly state the research question being asked?
  • Importance: Do the authors clearly state the importance for this research question?
  • Background: Do the authors provide sufficient background information so that readers can understand the report?
  • Methods: Do the authors clearly describe and justify the methodology used in the data analysis? Do the authors communicate any assumptions or limitations of their methodologies?
  • Results: Do the authors clearly communicate their findings through writing, tables and figures?
  • Conclusions: Are the conclusions presented by the authors correct?
  • References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
  • Writing quality: Is the writing of good quality, concise, engaging?

Estimated hours spent reviewing: 1.2 hours

Review Comments:

  1. Hi, very well done report! Everything is very well structured so it is very easy to get setup and easy to jump into code and begin reading the analysis. The only hiccup I had through this whole process was trying to find the pdf or html version of the literary report. I would like to see the correct modules appended to the Dockerfile so that these reports can be correctly generated.

  2. As mentioned in another peer review, the beginning introduction gives the reader a good idea of the history of the situation in Taiwan as well as the kinds of people the problem with default payments affect. In the conclusion, I think it might be a good idea to write a small section just to quickly summarize everything this analysis and done as well as the results so that all your findings are in one place.

  3. For the test suite, some tests could be documented slightly better. Since the function names are written really well, its very easy to tell what the test is testing, but it might be nice to have some other information as well, just a quick summary of what the test does and what parameters its passing to the function.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

@TheAMIZZguy
Copy link

Data analysis review checklist

Reviewer: TheAMIZZguy

Conflict of interest

  • As the reviewer I confirm that I have no conflicts of interest for me to review this work.

Code of Conduct

General checks

  • Repository: Is the source code for this data analysis available? Is the repository well organized and easy to navigate?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?

Documentation

  • Installation instructions: Is there a clearly stated list of dependencies?
  • Example usage: Do the authors include examples of how to use the software to reproduce the data analysis?
  • Functionality documentation: Is the core functionality of the data analysis software documented to a satisfactory level?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Code quality

  • Readability: Are scripts, functions, objects, etc., well named? Is it relatively easy to understand the code?
  • Style guidelides: Does the code adhere to well known language style guides?
  • Modularity: Is the code suitably abstracted into scripts and functions?
  • Tests: Are there automated tests or manual steps described so that the function of the software can be verified? Are they of sufficient quality to ensure software robsutness?

Reproducibility

  • Data: Is the raw data archived somewhere? Is it accessible?
  • Computational methods: Is all the source code required for the data analysis available?
  • Conditions: Is there a record of the necessary conditions (software dependencies) needed to reproduce the analysis? Does there exist an easy way to obtain the computational environment needed to reproduce the analysis?
  • Automation: Can someone other than the authors easily reproduce the entire data analysis?

Analysis report

  • Authors: Does the report include a list of authors with their affiliations?
  • What is the question: Do the authors clearly state the research question being asked?
  • Importance: Do the authors clearly state the importance for this research question?
  • Background: Do the authors provide sufficient background information so that readers can understand the report?
  • Methods: Do the authors clearly describe and justify the methodology used in the data analysis? Do the authors communicate any assumptions or limitations of their methodologies?
  • Results: Do the authors clearly communicate their findings through writing, tables and figures?
  • Conclusions: Are the conclusions presented by the authors correct?
  • References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
  • [] Writing quality: Is the writing of good quality, concise, engaging?

Estimated hours spent reviewing: 1h50m

(includes 20 mins of dealing with some dockerfile and running problems)

Review Comments:

Great work overall, I know it looks like a lot of problems, but they all seemed to be very minor problems, so I felt I had to find a lot to justify it

  1. A little nitpicky but there are some slight inconsistencies within the same files (things are collaborative so some personal consistency differences are fine, but within the same block should be better) code descriptions could use some minor tweaking, such as when "count_plot" references 'x' and 'y' values but 'x' and 'names' are used. In this case everything is still very readable, but can get confusing as functions get larger. And there is inconsistent tabbing styles for multi-line blocks, which can lead to problems in python (though it works and is mostly a consistency comment)
  2. Related, leaving chunks of commented code in the analysis like in the test_count_plot, which since it is part of the test raises some questions on edge case tests and whether it truly is trustworthy and works as intended always. Likewise many tests didn't describe what situation they were testing (names like 'test_metrics_round' only take you so far). Best way to improve this would be to just make more comments and more descriptive ones (the only comments being things like Submission: GROUP 18: Credit Card Default Prediction UBC-MDS/data-analysis-review-2021#31, #32 don't help too much to a reviewer. You also left a TODO in train_test_models.py which is also worrisome towards trustworthiness and reproducibility)
  3. Unsure if having things like png's (heatmap.png) is appropriate to store in the data folder as opposed to something closer to results. Likewise for more readability I would suggest splitting the data folder into raw and processed (maybe even a 'final') for improved readability, rather than having it all in the same place.
  4. Code files are overall split very appropriately, each file has a specific use and isn't so large as to overwhelm the reviewer in following the code. Nice!
  5. Arguably the part people would want to see is the report itself as a .html or .pdf and it was not obvious where to find it (other than a comment saying to find the .ipynb)
  6. In CONTRIBUTIONS, saying things like reviews within 7 days is great, but you should be wary of how long you will maintain that promise
  7. Some problems with following the instructions in the README.md but I'll attribute it to my own computer being weird and old, including this since it seems another peer reviewer had the same comment
  8. The Report title is a bit weird, I could not understand what the analysis would be about until I read the report itself (good introduction!). This kept occuring with some strange ways to phrase things that if you are familiar with the project are a clear read, but not if you are looking at it for the first time (the question also isn't clearly stated until you talk about the importance, wherein it can be inferred).
  9. The writing described every step correctly and why the code and graphs where there, though maybe a bit too technical (saying "payment = 0" instead of just using a more english "when they paid vs when they didn't")
  10. Use of random.seed() is very good towards reproducibility
  11. Correctly applied the purpose of a predictive question and even went into extrapolation to an extent.
  12. Interesting and important question where I learned a lot and I can even think about it for my own life and analysis's if I have to deal with a bank.
  13. Everything else not included in this is very good, I did not want to make a review that was much longer than how large it already is.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants