-
-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproducible code #61
Comments
|
How can the scripts be tested if they don't accept data as arguments? I think we need to add unit tests instead. They will test our code and provide users with examples at the same time. |
I think this can be part of the documentation solution we talked about in #59. Using |
So you suggest having algorithms separated from data and unit tests that will show usage of the algorithms? And the tests can be transformed into HTML reports for convenience? Sounds good to me |
Hmm not exactly. What I mean is that scripts specially formatted can be turned into HTML reports (https://rdrr.io/cran/knitr/man/spin.html). Data would still need to be part of the algorithms. Because this function, while trying to compile a report, runs the actual script - errors would be thrown if there's any problem with the script. That error can be part of a test. At the same time good scripts would compile to nice HTML reports. It would make more sense once we have a prototype running in https://github.com/Panquesito7/R/tree/documentation_stuff |
I agree with you on this fundamental issue; for linearRegressionRaw.R, I replaced a reference to the diamonds dataset with a specifically simulated and reproducible (via a set seed) synthetic dataset. Half of the challenge here is going to be eliminating extraneous library calls, such as with the tidyverse functions and datasets. |
I personally don't mind if third party packages are used, but either the In either case, some check should be done if packages are installed. Something like: if (!require(ggplot2))
install.packages("ggplot2")
# The rest of the code
# ... |
I think as a standard all scripts should be completely independent and reproducible. I.e. people should be able to copy and paste code in their R REPL session without errors. This is currently not the case with many scripts in this repo. Instead of supplying example data, many algorithms are written as "templates" where one has to input their own data. However, there's no information what the data structure should even be.
R has many built in datasets, so these can be used to run algorithms with. If the script is just a function definition, then there should be an example usage of the function.
I could list here all scripts that need to be written this way.
What do you think?
The text was updated successfully, but these errors were encountered: