diff --git a/.nojekyll b/.nojekyll index dd8586b..48fc10a 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -10d02429 \ No newline at end of file +b8961dc3 \ No newline at end of file diff --git a/01-handson_github_website.html b/01-handson_github_website.html index 510a39d..8a8941e 100644 --- a/01-handson_github_website.html +++ b/01-handson_github_website.html @@ -297,15 +297,39 @@ @@ -418,7 +442,7 @@

Person 1 (owner):

-

+

@@ -438,7 +462,7 @@

Person 2 (collaborat
-

+

@@ -452,7 +476,7 @@

Person 2 (collaborat
-

+

diff --git a/02-handson_github_rstudio.html b/02-handson_github_rstudio.html index 7c42294..38c3734 100644 --- a/02-handson_github_rstudio.html +++ b/02-handson_github_rstudio.html @@ -331,15 +331,39 @@ diff --git a/03-handson_github_workflows.html b/03-handson_github_workflows.html index 7fb1a8c..733ef02 100644 --- a/03-handson_github_workflows.html +++ b/03-handson_github_workflows.html @@ -331,15 +331,39 @@ @@ -419,7 +443,7 @@

Person 1 (owner):

-

+

@@ -453,7 +477,7 @@

Person 2: Create a
-

+

@@ -510,7 +534,7 @@

Person 2: Create
-

+

diff --git a/about.html b/about.html index f4e990a..4e16378 100644 --- a/about.html +++ b/about.html @@ -30,7 +30,7 @@ - + @@ -296,15 +296,39 @@ @@ -784,8 +808,8 @@

About RDS

diff --git a/datamgmt_plan.html b/datamgmt_plan.html index e4a4048..52613f6 100644 --- a/datamgmt_plan.html +++ b/datamgmt_plan.html @@ -297,15 +297,39 @@ diff --git a/datamgmt_prompts.html b/datamgmt_prompts.html index 24c08b3..21a91eb 100644 --- a/datamgmt_prompts.html +++ b/datamgmt_prompts.html @@ -297,15 +297,39 @@ diff --git a/git_cli.html b/git_cli.html index 780895e..97fad6f 100644 --- a/git_cli.html +++ b/git_cli.html @@ -331,15 +331,39 @@ diff --git a/git_conflicts.html b/git_conflicts.html index 0b5e95b..e558417 100644 --- a/git_conflicts.html +++ b/git_conflicts.html @@ -331,15 +331,39 @@ diff --git a/git_further_readings.html b/git_further_readings.html index e77ad1b..724b053 100644 --- a/git_further_readings.html +++ b/git_further_readings.html @@ -297,15 +297,39 @@ diff --git a/git_rstudio.html b/git_rstudio.html index 9af11b6..eedc5f7 100644 --- a/git_rstudio.html +++ b/git_rstudio.html @@ -331,15 +331,39 @@ @@ -692,7 +716,7 @@

Sending cha
-

+

@@ -713,7 +737,7 @@

Sending cha
-

+

diff --git a/github_intro.html b/github_intro.html index 5d42d3f..9d45c05 100644 --- a/github_intro.html +++ b/github_intro.html @@ -297,15 +297,39 @@ diff --git a/github_org.html b/github_org.html index da97fe9..75c514d 100644 --- a/github_org.html +++ b/github_org.html @@ -297,15 +297,39 @@ diff --git a/github_teams.html b/github_teams.html index 31490ca..ecc4dac 100644 --- a/github_teams.html +++ b/github_teams.html @@ -297,15 +297,39 @@ diff --git a/github_template.html b/github_template.html index 5f76963..e180c54 100644 --- a/github_template.html +++ b/github_template.html @@ -297,15 +297,39 @@ diff --git a/github_workflows.html b/github_workflows.html index 55636f7..2c8b95e 100644 --- a/github_workflows.html +++ b/github_workflows.html @@ -297,15 +297,39 @@ diff --git a/img/code_handover.jpeg b/img/code_handover.jpeg new file mode 100644 index 0000000..41b1624 Binary files /dev/null and b/img/code_handover.jpeg differ diff --git a/img/document_diataxis.png b/img/document_diataxis.png new file mode 100644 index 0000000..5e5a1fa Binary files /dev/null and b/img/document_diataxis.png differ diff --git a/img/fish_length.png b/img/fish_length.png new file mode 100644 index 0000000..d36bb84 Binary files /dev/null and b/img/fish_length.png differ diff --git a/img/found_documentationjpg.jpeg b/img/found_documentationjpg.jpeg new file mode 100644 index 0000000..697b514 Binary files /dev/null and b/img/found_documentationjpg.jpeg differ diff --git a/img/metadata_cooking.png b/img/metadata_cooking.png new file mode 100644 index 0000000..a892e8a Binary files /dev/null and b/img/metadata_cooking.png differ diff --git a/index.html b/index.html index dabe317..44fb4c2 100644 --- a/index.html +++ b/index.html @@ -9,7 +9,7 @@ - + Designing a Reproducible and Collaborative Lab (RCL) + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ + +
+ +
+ + +
+ + + +
+ +
+
+

Preserve your code

+
+ + + +
+ + + + +
+ + + +
+ + +
+

Making your code readable

+
+
+
+
+

+
https://twitter.com/cjm4189/status/1557346489613094914
+
+
+
+
+

It is important to make your code easy to read if you hope that others will reuse it. It starts with using a consistent style withing your scripts (at least within a project).

+ +
+
import this
+
+
The Zen of Python, by Tim Peters
+
+Beautiful is better than ugly.
+Explicit is better than implicit.
+Simple is better than complex.
+Complex is better than complicated.
+Flat is better than nested.
+Sparse is better than dense.
+Readability counts.
+Special cases aren't special enough to break the rules.
+Although practicality beats purity.
+Errors should never pass silently.
+Unless explicitly silenced.
+In the face of ambiguity, refuse the temptation to guess.
+There should be one-- and preferably only one --obvious way to do it.
+Although that way may not be obvious at first unless you're Dutch.
+Now is better than never.
+Although never is often better than *right* now.
+If the implementation is hard to explain, it's a bad idea.
+If the implementation is easy to explain, it may be a good idea.
+Namespaces are one honking great idea -- let's do more of those!
+
+
+

There is also the visual aspect of the code that should not be neglected. Like a prose, if you receive a long text without any paragraphs, you might be not very excited about reading it. Indentation, spaces, and empty lines should be leveraged to make a script visually inviting and easy to read. The good news is that most of the Integrated Development Environment (IDE) will help you to do so by auto formatting your scripts according to conventions. Note that also a lot of IDEs, such as RStudio, rely on some conventions to ease the navigation of scripts and notebooks. For example, try to add four - or # after a line starting with one or several # in an R Script!

+
+
+

Comments

+

Real Programmers don’t comment their code. If it was hard to write, it should be hard to understand.
+Tom Van Vleck, based on people he knew…_ (https://multicians.org/thvv/realprogs.html)

+

Joke aside, it is really hard to comment too much your code, because even steps that might seem trivial today might not be so anymore in a few weeks or months for now. In addition, a well commented code is more likely to be read by others. Note also that comments should work in complement of the code and should not being seen as work around vague naming conventions of variables or functions.

+
+
x <- 9.81  #  gravitational acceleration
+
+gravity_acc <- 9.81  #  gravitational acceleration
+
+ +
+

Inline

+

It does not matter if you are using a script or notebook. It is important to provide comments along your code to complement it by:

+
    +
  • explaining what the code does
  • +
  • capturing decisions that were made on the analytical side. For example, why a specific value was used for a threshold.
  • +
  • specifying when some code was added to handle an edge case such as an unexpected value in the data (so a new user doesn’t have to guess what does lines of code and might want to delete them assuming it is not necessary)
  • +
+

Other thoughts:

+
    +
  • It is OK to state (what seems) the obvious (some might disagree with this statement)
  • +
  • Try to keep comments to the point and short
  • +
+
+
+

Functions

+

Both Python and R have conventions on how to document functions. Adopting those conventions will help you to make your code readable but also to automate part of the documentation development.

+
+

Roxygen2

+

The goal of roxygen2 is to make documenting your code as easy as possible. It can dynamically inspect the objects that it’s documenting, so it can automatically add data that you’d otherwise have to write by hand.

+

How do we insert it? Make sure you cursor is inside the function you want to document and from RStudio Menu Code -> Insert Roxygen Skeleton

+

Example:

+
+
#' Add together two numbers
+#'
+#' @param x A number
+#' @param y A number
+#' @return The sum of \code{x} and \code{y}
+#' @examples
+#' add(1, 1)
+#' add(10, 1)
+add2 <- function(x, y) {
+  x + y
+}
+
+

Try it! - Copy the function (without the documentation) in a new script - Add a third parameter to the function such as it sums 3 numbers - Add the Roxygen skeleton - Fill it to best describe your function

+

Note that when you are developing an R package, the Roxygen skeleton can be leveraged to develop the help pages of your package so you only have one place to update and the help will synchronize automatically.

+
+
+

Python Docstring

+

A docstring is a string literal that occurs as the first statement in a module, function, class, or method definition. Such a docstring becomes the __doc__ special attribute of that object.

+
+
def complex(real=0.0, imag=0.0):
+    """Form a complex number.
+
+    Keyword arguments:
+    real -- the real part (default 0.0)
+    imag -- the imaginary part (default 0.0)
+    """
+    if imag == 0.0 and real == 0.0:
+        return complex_zero
+
+

Here for more: https://www.python.org/dev/peps/pep-0257/

+
+
+
+
+

Leveraging Notebooks

+

As we have discussed and experimented with Notebooks during the week. It is because Notebooks provide space to further develop content, such as methodology, around the code you are developing in your analysis. Notebooks also enable you to integrate the outputs of your scientific research with the code that was used to produce it. Finally, notebooks can be rendered into various format that let them share with a broad audience.

+

Notebooks are not only used within the scientific community, see here for some thoughts from Airbnb data science team.

+
+
+
+

Hands-on

+
+

Documenting

+
+
getPercent <- function( value, pct ) {
+    result <- value * ( pct / 100 )
+    return( result )
+}
+
+

Try adding the Roxygen Skeleton to this function and fill all the information you think is necessary to document the function

+
+
+

Commenting

+

Let’s try to improve the readability and documentation of this repository: https://github.com/brunj7/better-comments. Follow the instructions on the README

+

For inspiration, you can check out the NASA code for APOLLO 11 dating from 1969: https://github.com/chrislgarry/Apollo-11!!

+
+
+
+
+

Code repositories

+

On-line code repositories are a great way to version and share your code. Here are a few examples of git-based code repositories:

+
    +
  • GitHub
  • +
  • GitLab
  • +
  • Bitbucket
  • +
  • SourceForge
  • +
+

Note however that there is no long-term commitment of any of those main code repositories and that archiving the specific snapshot of your code that was used for a specific analysis along your data is a great idea. Several data repositories offer an integration that lets you do that with data repositories. For example, Zenodo has a great integration with GitHub that lets you issue a DOI for a specific release (read snapshot) of your repository and preserve it independently from the code repository. See here for more details.

+

Note that one version is not against the other, in contrary you can see your code repository as the live version of your work and the code snapshot archive as the historical trace that was produced for a specific analysis. In other words, we recommend to link both the code repository and its snapshot to the data archive.

+ + +
+ +
+
+


+

UCSB library logo

+

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License

+
+ + + +
+ + + + + \ No newline at end of file diff --git a/datamgmt_preserve.html b/preserve_data.html similarity index 81% rename from datamgmt_preserve.html rename to preserve_data.html index d952587..fea29c4 100644 --- a/datamgmt_preserve.html +++ b/preserve_data.html @@ -7,7 +7,7 @@ -Designing a Reproducible and Collaborative Lab (RCL) - Preserve your data for reuse +Designing a Reproducible and Collaborative Lab (RCL) - Preserve your data + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ + +
+ +
+ + +
+ + + +
+ +
+
+

Choosing a License

+
+ + + +
+ + + + +
+ + + +
+ + +

It is a good practice to add a license to a repository/project. It will help to clarify what are the expectations regarding using and potentially contributing to this work.

+
+

Code

+

Here is a good website to choose a license: https://choosealicense.com/

+

Here is also a good set of instructions on how to make this happen on a GitHub repository: https://docs.github.com/en/github/creating-cloning-and-archiving-repositories/creating-a-repository-on-github/licensing-a-repository

+

Note those type of licensing is more meant for software (such as R packages) rather than analytical scripts. For those scripts, keeping the license minimal such as MIT of 3-Clause BSD is our recommendation.

+
+
+

Data

+

Technically, facts are not copyrightable. Only interpretations of facts. However, data licensing falls under the content licensing framework that is more related to copyright. Here is the type of licensing that can be used: https://creativecommons.org/licenses/

+

CC0 is recommended for data

+

Note that the license type might be dictated by the data repository you choose as most of the data repositories will have one or two license to choose from.

+ + +
+ +
+
+


+

UCSB library logo

+

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License

+
+ + + +
+ + + + + \ No newline at end of file diff --git a/preserve_prompts.html b/preserve_prompts.html new file mode 100644 index 0000000..057f6fe --- /dev/null +++ b/preserve_prompts.html @@ -0,0 +1,866 @@ + + + + + + + + + +Designing a Reproducible and Collaborative Lab (RCL) - Preserving your work + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ + +
+ +
+ + +
+ + + +
+ +
+
+

Preserving your work

+
+ + + +
+ + + + +
+ + + +
+ + +

Preserving your work includes developing enough documentation so your future self or another researcher can make sense of your work, enough so that they can at least reproduce your work and reuse it to answer their own research questions. It also includes where to store and share your work in data / code repositories.

+
+

Documentation

+

Only a few people with free time ahead of them will sit wondering about what to do next and think “what if I were to write some documentation!?”. Make it part of your workflow! and do not let it get out of sync too much as you iterate on your analysis.

+
+
+
+
+

+
https://twitter.com/JenMsft/status/1557218211971489792
+
+
+
+
+
+

Know your audience

+

There are actually various ways to document your work!! Here is a potential framework to help to think about those different types of documentation and their related audiences:

+
+
+
+
+

+
source: https://diataxis.fr/
+
+
+
+
+

Your potential audience(s) for your documentation can be a future collaborator, an external researcher with no direct insight on your work, a potential user of a tool you developed?

+
+
+
+

Prompts

+
    +
  • What are the specific expectations from your discipline in terms of documentation and sharing data? For example, should you also share the raw data along with the analysis results?

  • +
  • Considering funding and publisher mandates, as well as disciplinary norms, do you anticipate / have had any challenges regarding data sharing and preservation?

  • +
+ + +
+ +
+
+


+

UCSB library logo

+

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License

+
+ + + +
+ + + + + \ No newline at end of file diff --git a/preserve_readme.html b/preserve_readme.html new file mode 100644 index 0000000..9d68231 --- /dev/null +++ b/preserve_readme.html @@ -0,0 +1,853 @@ + + + + + + + + + +Designing a Reproducible and Collaborative Lab (RCL) - The power of READMEs + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ + +
+ +
+ + +
+ + + +
+ +
+
+

The power of READMEs

+
+ + + +
+ + + + +
+ + + +
+ + +

README files are not a new thing. They have been around computer projects since the early days. One great thing about the popularization of supporting the markdown syntax (and its web rendering in most code repositories) is that you can move beyond a simple text file and start to present a compelling entry point to your project that can link to various parts and external resources.

+

Good types of information to have on a README:

+
    +
  • Title capturing the essence of the project
  • +
  • List of current contributors
  • +
  • A short explanation of the goal / purpose
  • +
  • How to install / where to start
  • +
  • A quick demo on how to use the content (can be a link to another document as well)
  • +
  • What to do if a bug is spotted
  • +
  • How to contribute
  • +
  • Licensing
  • +
  • Acknowledgements of authors, contributors, sponsors or other related work
  • +
+

Adding images, short videos / animations can make a README more engaging.

+
+

Data README

+

Most data repositories will ask you to provide some kind of README file to help describe the content you are archiving. Here is a template you may customize for your project needs: https://doi.org/10.5281/zenodo.10828379

+
+
+

Code README

+

Need some inspiration ?

+ + + +
+ +
+
+


+

UCSB library logo

+

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License

+
+ + + +
+ + + + + \ No newline at end of file diff --git a/datamgmt_prompts-preserve.html b/preserve_self.html similarity index 92% rename from datamgmt_prompts-preserve.html rename to preserve_self.html index 3e0c71e..06ada22 100644 --- a/datamgmt_prompts-preserve.html +++ b/preserve_self.html @@ -7,7 +7,7 @@ -Designing a Reproducible and Collaborative Lab (RCL) - Preservation Prompts +Designing a Reproducible and Collaborative Lab (RCL) - Document yourself