Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code review checklists #46

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 13 additions & 3 deletions dime-coding-standards/checklists/analysis-code-review.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,18 @@ _Add names or links_:
## Analysis scripts
- [ ] No variables are created in analysis scripts unless very specific to a particular output
- [ ] Code is modular, such that individual outputs can be run in any order
- [ ] Research decisions are well documented in the code (e.g. sample used, treatment of missing variables)
- [ ] The code implements the models described in the documentation
- [ ] Research decisions (e.g. sample used, treatment of missing variables) are well documented and the code implements them as described
- [ ] The code implements the models as described in the documentation
- [ ] The functions used are apropriate for the analysis being performed
- [ ] Categorical variables are used correctly (i.e. labeled integers are not used as continuous variables)
- [ ] Categorical variables are used correctly (i.e. not used as continuous variables)
- [ ] Outputs are exported in a reproducible manner, and manual formatting is very limited
- [ ] Variable definitions are consistent across analyses
- [ ] Samples are consistent across analyses
- [ ] Models are consistent across analyses (e.g., outcomes, controls, standard error treatment)

## Analysis outputs
- [ ] Tables and graphs contain detailed notes explaining the methods, such that exhibits can be read and understood on their own
- [ ] The methods described in notes correspond to those implemented in the code
- [ ] Output tables are [easy to interpret](https://dimewiki.worldbank.org/Checklist:_Submit_Table)
- [ ] Output graphs are [easy to interpret](https://dimewiki.worldbank.org/Checklist:_Reviewing_Graphs)

28 changes: 19 additions & 9 deletions dime-coding-standards/checklists/construction-code-review.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,19 +29,29 @@ _Add names or links_:
- [ ] Common tasks are abstracted and automated (e.g. using functions or macros)

## Construction scripts
- [ ] Do you understand how and why each variable is constructed? If not, indicate to the author where more comments are needed.
- [ ] Check merges. Are any observations dropped or does the number of observations increase? If so, is there a clear justification for that? If any observations didn't match, is that explained in the comments?
- [ ] Check collapses, reshapes, and groupwise calculations (`bys` and `egen` for example). How are missing values treated? Does the sort order of the data matter or is the result uniquely determined? Is the number of observations correct and are the mathematics correct?
- [ ] Check winsorization or other techniques for handling outliers. Is the reason for the chosen transformation explained? Is there documentation explaining how parameters such as cutoff percentiles were chosen and why one or both tails were altered?
- [ ] Check treatment of missing values. Are research decisions well documented? Are there cases where you think missing values are being created or replaced unintentionally?
- [ ] Check creation of new variables. Does the code match the variable definition in the documentation? Is the correct function being used?
- [ ] Check creation of new variables.
- [ ] Do you understand how and why each variable is constructed? If not, indicate where more comments or documentation are needed.
- [ ] Does the code match the variable definition in the documentation?
- [ ] Is the correct function being used?
- [ ] Check merges (joins). Does the number of observations in the resulting data set change? If so, is there a clear justification for that? If any observations didn't match, the reason for this explained in the comments?
- [ ] Check collapses (summarises), reshapes (pivots), and groupwise calculations (`bys` and `egen` for example).
- [ ] How are missing values treated by these commands? If missings are treated as zero, is there an explanation for why that is?
- [ ] How does the number of non-missing values in the resulting data compare to the original?
- [ ] How does the number of observations in the resulting data compare to the original?
- [ ] Does the sort order of the data affect the result?
- [ ] Is the number of observations correct and are the mathematics correct?
- [ ] Check winsorization or other techniques for handling outliers.
- [ ] Is the reason for the chosen transformation explained?
- [ ] Is there documentation explaining how parameters such as cutoff percentiles were chosen and why one or both tails were altered?
- [ ] Check treatment of missing values such as imputation of removal of observations.
- [ ] Is the reason for the chosen treatment explained?
- [ ] Are there cases where you think missing values are being created or replaced unintentionally?
- [ ] Is there always clear documentation of why observations are dropped, if any?

## Constructed dataset(s)
- [ ] Is the resulting dataset tidy (each row is an observation, each column is a variable)?
- [ ] Is the resulting dataset tidy (each row is an observation, each column is a variable, only one unit of observation is represented in the data)?
- [ ] Are variable names informative?
- [ ] Are variable labels informative?
- [ ] Are value labels informative?
- [ ] Are variable and value labels informative?
- [ ] Are all labels grammatically correct and free of special characters?
- [ ] Is there clear documentation (variable dictionary, variable labels, value labels, notes, comments) about variable definition?
- [ ] Are dataset file names informative?