Skip to content

karl-cottenie/bestpracticesR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 

Repository files navigation

bestpracticesR

Best practices for reproducible coding in R

humor

Setting up your folder structure

  • one folder per project = one contained unit
    • definition of “contained unit” depends on your project, style, requirements
  • make sure that this is your working directory in R/R Studio
  • examples of folder structures to use [http://nicercode.github.io/blog/2013-04-05-projects]

R/

  • contains analysis script
  • often used functions (potentially in different function script file)
  • if you open R/RStudio by double-clicking on the script file in your file explorer, this will also automatically set the working directory to this location, and all your file names will be relative to this location.

data/

  • data in .csv file
  • sometimes also excel file
  • data organization [http://kbroman.org/dataorg/]
  • read_csv(“../data/ “ ) with tab completion to import data file (if the working directory is set to this folder)
    • if YYYY-MM-DD format, import will convert it into a DateTime object automatically

doc/

figs/

  • important figures
  • sharing with collaborators/advisors
  • publication figures

output/

  • copy-paste from console
  • model output
  • p-values

R style guide

  • script file = where everything happens
    • plain text => readable by anybody
    • easy to save
    • easy to repeat
    • easy to document
    • try to avoid going back to a spreadsheet to process the data
    • one exception: if errors are detected during data processing
  • e.g. from Google [https://google.github.io/styleguide/Rguide.xml], some points from that guide
    • applying a style guide automatically [https://www.tidyverse.org/blog/2017/12/styler-1.0.0/]
    • spacing
    • commenting [https://stackoverflow.blog/2021/12/23/best-practices-for-writing-code-comments/]
      • Rule 1: Comments should not duplicate code, focus on the why, not the what
      • Rule 2: Good comments do not excuse unclear code
      • Coding explanations (#, often after the code, but not exclusively)
        • Code organization, see examples below (Rule 4) (## XXXXX -----)
        • Justification for a section of code, including links (Rules 6 and 7) ## XXX
        • Dead end analyses because it did not work, or not pursuing this line of inquiry (but leave it in as a trace of it, to potentially solve this issue, or avoid making the same mistake in the future (Rule 8) # (>_<)
        • Solutions/results/interpretations (Rule 7) (#==> XXX)
        • Reference to manuscript pieces, figures, results, tables, ... # (*_*)
        • TODO items (Rule 9) #TODO
  • names for data frames (df_name), for lists (ls_name), for vectors (vc_name) (Thanks Jacqueline May)
  • attach: avoid using it
  • use common sense and BE CONSISTENT

R - studio

Set up

  • 4 standard windows
    • script file = all the necessary code
      • should be able to run line-by-line to repeat whole analysis
    • console = output
      • try code, but if final solution
      • copy-paste to script file
      • or use history window
  • tab completion => you can use descriptive (longer) file names
  • soft-wrap R source files (preferences > Code editing) => no need to scroll left-right

Code hierarchy and sections

  • _ Heading 1 --------
  • __ Subheading 1.1 ----------
  • ___ subsub heading 1.1.1 -------------
  • any comment line with 4 trailing dashes (-) , equal signs (=), or pound signs (#)
  • you can add more trailing dashes to help subdivide your code visually
  • I start the headings with different _ because this will indent the subheadings, and if you convert it to some Markdown version that will make the transition easier (Thank you Brent for that tip).
  • you can fold the code to hide lines that you are not working on
  • to navigate between code sections, use “Jump To” menu available at bottom of the editor

Ultimate goal

  • create zip file of your project folder
  • send it to somebody else
  • naive intelligent observer should be able to repeat the whole analysis
  • understand every step of the way

humor

What's next?

The best introduction to data analysis with R. This online, free, continuously updated, book makes all our jobs of guiding people through their first (or 6th) steps using R. Very highly recommended!

R for Data Science

http://r4ds.hadleyn.z/

Usefull cheat sheets

https://www.rstudio.com/resources/cheatsheets/

And you can sign up to receive updates to these sheets!

British Ecological Society's Guide to Reproducible Code in Ecology and Evolution

http://www.britishecologicalsociety.org/wp-content/uploads/2017/12/guide-to-reproducible-code.pdf

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published