Merge pull request #1058 from ethanwhite/r-intro-updates

R intro updates
datacarpentry · Aug 30, 2024 · cb98fc2 · cb98fc2
2 parents 2f8dfa2 + e885de2
commit cb98fc2
Show file tree

Hide file tree

Showing 8 changed files with 92 additions and 57 deletions.
diff --git a/exercises/Check-that-your-code-runs-R.md b/exercises/Check-that-your-code-runs-R.md
@@ -13,10 +13,14 @@ Follow these steps in RStudio to make sure your code really runs:
 
 ![Screenshot showing clicking session from the menu bar and selecting Restart R]({{ site.baseurl}}/exercises/restart-r-screenshot.png)
 
-2\. Check to make sure the `Environment` tab is empty:
+2\. If the `Environment` tab isn't empty click on the broom icon to clear it:
+
+![Screenshot showing the Environment tab with the cursor hovering over the broom icon]({{ site.baseurl}}/exercises/clear-rstudio-environment-screenshot.png)
+
+The `Environment` tab should now say "Environment Is Empty":
 
 ![Screenshot showing the Environment tab with only the words Environment Is Empty]({{ site.baseurl}}/exercises/empty-rstudio-environment-screenshot.png)
 
-3\. Rerun your entire homework assignment to make sure it runs from start to finish and produces the expected results.
+3\. Rerun your entire homework assignment using "Source with Echo" to make sure it runs from start to finish and produces the expected results.
 
-![Screenshot showing the RStudio Run button]({{ site.baseurl}}/exercises/rstudio-run-button-screenshot.png)
+![Screenshot showing the RStudio Source with Echo item hovered in the Source dropdown]({{ site.baseurl}}/exercises/rstudio-source-with-echo-screenshot.png)
diff --git a/exercises/clear-rstudio-environment-screenshot.png b/exercises/clear-rstudio-environment-screenshot.png
diff --git a/exercises/rstudio-source-with-echo-screenshot.png b/exercises/rstudio-source-with-echo-screenshot.png
diff --git a/lectures/R-intro.md b/lectures/R-intro.md
@@ -7,5 +7,5 @@ language: R
 
 1. [Introduction]({{ site.baseurl }}/materials/r-intro)
 2. [Vectors]({{ site.baseurl }}/materials/vectors-R)
-3. [Checking that your code runs]({{ site.baseurl }}/materials/basic-reproducibility-R)
+3. [Checking that your code runs]({{ site.baseurl }}/materials/check-that-your-code-runs-R)
 4. [Using Large Language Models to Learn]({{ site.baseurl }}/materials/large-language-models)
diff --git a/materials/basic-reproducibility-R.md b/materials/basic-reproducibility-R.md
@@ -32,14 +32,15 @@ language: R
   * Doesn't unload packages
   * Useful when developing code
 * Restart R to get a clean environment
-  * Works locally (not in Posit Cloud; `Session` -> `Restart R` always reloads environment)
-  * Does unload packages
-  * Useful for making sure everything works
-  * As long is it doesn't secretly reload things
-* Run entire file using `Source` button or `Ctrl-Shift-S`
-* Makes sure that the code runs fully and produces desired result
-
-* Stop R from storing the state of the environment
+  * Unloads packages
+  * But won't clear environment by default (at least not on Posit Cloud)
+* Safest thing is to both clear the environment and restart R
+* Then run the entire file using `Source with Echo` button or `Ctrl-Shift-S`
+* Ensures that the code runs fully and produces desired result
+* Last required exercise of every assignment will walk you through this process
+
+### Stop R from storing the state of the environment
+
 * When you close RStudio it will often ask if you want to save your workspace
 * *Start to close RStudio*
 * *Show Save dialog*

diff --git a/materials/check-that-your-code-runs-R.md b/materials/check-that-your-code-runs-R.md
@@ -0,0 +1,46 @@
+---
+layout: page
+element: notes
+title: Check That Your Code Runs
+language: R
+---
+
+### Setup
+
+> Make sure that `Tools` -> `Global Options` -> `General` ->
+> `Save workspace to ~/.RData on exit` is set to the default `Ask`
+
+### Introduction to Reproducibility
+
+* Goal - rerun full analysis with a single click (or command)
+* First step - Make sure your code runs anytime and anywhere
+  * next day (who has gotten code working & had it not work the next day?)
+	* desktop vs. laptop
+	* collaborators
+	* advisor
+
+### Make sure things you did before don't matter
+
+* Computers store the results of each command run in sequence
+* Change something
+* Looks like it still works
+* Only works because of something you did earlier in the same session
+
+### Clearing environments and restarting R
+
+* Clear R environment using the broom icon on the `Environment` tab.
+  * Doesn't unload packages
+  * Useful when developing code
+* Restart R to get a clean environment
+  * Unloads packages
+  * But won't clear environment by default (at least on Posit Cloud)
+* Safest thing is to both clear the environment and restart R
+* Then run the entire file using `Source with Echo` button or `Ctrl-Shift-S`
+* Ensures that the code runs fully and produces desired result
+* Last required exercise of every assignment will walk you through this process
+
+### Force R to clear environment when restarting
+
+* `Tools` -> `Global Options` -> `General` ->
+  `Save workspace to ~/.RData on exit` -> `Never`
+* Unclick `Restore .RData into workspace at startup` 
diff --git a/materials/large-language-models.md b/materials/large-language-models.md
@@ -11,56 +11,37 @@ language: R
 * Who's heard about ChatGPT and other similar models?
 * One of the kinds of text these models can generate is code
 
-* The very simplistic version of how these models work is that they look at a string of words and figure out what word is most likely to come next
-* They learn the most likely next word by looking at millions of examples from the internet
-* In other words they are an advanced form of autocomplete or a parrot
+* The very simplistic version of how these models work is that they look at a string of words and figure out what words are most likely to come next
+* They learn this by looking at millions of examples from the internet
 
 * Since there is lots of code written by software developers on the internet they are pretty good at generating code
 * And since there are lots of lessons on how to learn to code on the internet they can also be good generating text that explains code
 
 ### Examples
 
 * *Open ChatGPT*
-* Let's prompt ChatGPT to solve something like we've been working on
-* *Enter the following prompt*
-
-> How do you calculate the sum of the vector numbers <- c(2.1, 2.7, 2.7, 3.2, 2.9, NA, 3.9, 2.1, 4.5, 2.6) in R?
-
-* *ChatGPT will produce a code answer with a variable holding the sum*
-* This looks like the right answer and it even saw the NA, handled it appropriately, and explained that
-* Let's prompt ChatGPT for the result of this code
-
-> What is the value of <variable_name>?
-
-* *Copy the code into R and run it*
-* *The result returned is currently wrong, but that could change*
-* In this case the LLM is wrong
-* It can't run code and it doesn't know how to do math, it just knows that when the word "sum" is used for a string of numbers that looks like roughly like this one that there tends to be a number that looks roughly like 24.7 that follows it
-* So, LLMs can be powerful, but also wrong 
-
-### Right answer wrong approach
-
-* Because LLM aren't specifically designed for this course they may show you ways to do things that we aren't learning
-
-* *Start a new Chat*
-* *Type the following prompt*
-
-> In the R programming language use code to print the sum of the following vector.
->
-> numbers <- c(2.1, 2.7, 2.7, 3.2, 2.9, NA, 3.9, 2.1, 4.5, 2.6, 2.9, 3.1)
-
-* Because of the small differences in the phrasing of the question (and the stochasticity of LLMs) we get a different answer
-* It still works, but it's more complicated, and it's not the approach that we're learning
+* *Copy and paste and exercise from the assignment*
+* *In most cases the result will be reasonable*
 
 ### Using LLMs for learning
 
 * There are a variety of meaningful ethical concerns about using LLMs
     * The use a lot of energy to train and run and therefore put a lot of CO2 in the atmosphere
     * They use millions of peoples work without credit or payment, arguably in violation of copyright and licenses
-    * And since they parrot what's on the internet they often lot of bias and bigotry
+    * And since they parrot what's on the internet they often include a lot of bias and bigotry
 * That said, LLMs can be useful for learning and you are welcome to use them for this in this course
 * Using them to directly answer the exercises won't help you learn, because humans need practice to learn
-* That's the only reason we have exercises
+* That's why we have exercises
+* But that's easy for me to say, so let's hear from someone actively learning to code - Hero from Coding with Strangers
+* (there is some swearing in this video, so if you're not comfortable with that you're welcome to step out for ~60 seconds)
+
+<iframe width="674" height="1198" src="https://www.youtube.com/embed/OhaGNTiMXmU" title="Sora AI is Like Batman&#39;s Utility Belt" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
+
+* I'm assuming that you're all here to learn, so I recommend listening to Hero and at least not using LLMs to directly answer the exercises
+* We won't be policing LLM content in submissions
+* But the exercises do need to be answered using the approaches we learned in class and often LLMs will use a different approach 
+
+
 * So what are useful ways to use them?
 * You can prompt them to explain things you don't understand and they will parrot relevant advice from material on the web
 * This can be easier, especially for folks learning to code, than trying to search for a specific site that has the answer
@@ -69,4 +50,8 @@ language: R
 ### Using LLMs for work
 
 * Once you've finished the course you can use them to automate things you already know how to do
-* But LLMs are parrots so you know enough so that you can check and make sure that the model produced a valid result
+* In our experience in my lab these models typically do about 90% of each simple task correctly
+* That means that the end result rarely works, but if you know how to fix what is wrong it can still be a time saver
+* But you have to know enough to fix the things that don't work and to check and make sure that the code is actually doing what you want
+* And sometimes they're really wrong
+* The other day I asked GitHub copilot for help with a complex version control command and it gave me a command that would have unrecoverably deleted all of the work I'd done in the last hour
diff --git a/materials/vectors-R.md b/materials/vectors-R.md
@@ -26,8 +26,8 @@ str(states)
 * In general `[]` in R means, "give me a piece of something"
 * `states[1]` gives us the first value in the vector
 * `states[1:3]` gives us the first through the third values
-* `1:3` works by makeing a vector of the whole numbers 1 through 3.
-* So, this is the same as `states[1:3]` is the same as `states[c(1, 2, 3)]` 
+* `1:3` works by making a vector of the whole numbers 1 through 3.
+* So, this is the same as `states[c(1, 2, 3)]` 
 * You can use a vector to get any subset or order you want `states[c(4, 1, 3)]`
 
 * Many functions in R take a vector as input and return a value
@@ -48,7 +48,7 @@ min(count)
 sum(count)
 ```
 
-> Do [Basic Vectors]({{ site.baseurl }}/exercises/Vectors-basic-vectors-R/).
+> Do Exercise 6 - [Basic Vectors]({{ site.baseurl }}/exercises/Vectors-basic-vectors-R/).
 
 ### Null values
 
@@ -82,7 +82,7 @@ mean(count_na, na.rm = TRUE)
 
 ### Working with multiple vectors
 
-* Build on example where we have information on states and population counts by adding areas
+* Add information on area to our information on states and population counts
 
 ```r
 states <- c("FL", "FL", "GA", "SC")
@@ -99,7 +99,7 @@ area <- c(3, 5, 1.9, 2.7)
 area * 2
 ```
 
-* This works because when do this multiplication, R multiplies the first value in the vector by 2, then multiplies the second values in the vector by 2, and so on
+* When we run this, R multiplies the first value in the vector by 2, then multiplies the second value in the vector by 2, and so on
 * Element-wise: operating on one element at a time
 
 * Remember - this isn't saved unless we store it
@@ -109,7 +109,7 @@ area * 2
 area
 ```
 
-* If we want to keep the results of the calculation them in a new variable
+* To keep the results of the calculation store them in a new variable
 
 ```r
 doubled_area <- area * 2
@@ -147,7 +147,7 @@ density[states != 'FL']
 ```
 
 * Numerical comparisons like greater or less than
-* Select states that meet with some restrictions on density
+* Select states that meet conditions related to density
 
 ```r
 states[density > 3]
@@ -168,8 +168,7 @@ density[density > 3]
 * What's actually happening when we subset vectors this way?
 * Let's look at the piece of the code inside the `[]`
 
-```r
-density > 3
+```r`states[1:3]` 
 ```
 
 * This does an element-wise check to see if each value is > 3