Skip to content

Commit

Permalink
Merge pull request #1058 from ethanwhite/r-intro-updates
Browse files Browse the repository at this point in the history
R intro updates
  • Loading branch information
ethanwhite authored Aug 30, 2024
2 parents 2f8dfa2 + e885de2 commit cb98fc2
Show file tree
Hide file tree
Showing 8 changed files with 92 additions and 57 deletions.
10 changes: 7 additions & 3 deletions exercises/Check-that-your-code-runs-R.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,14 @@ Follow these steps in RStudio to make sure your code really runs:

![Screenshot showing clicking session from the menu bar and selecting Restart R]({{ site.baseurl}}/exercises/restart-r-screenshot.png)

2\. Check to make sure the `Environment` tab is empty:
2\. If the `Environment` tab isn't empty click on the broom icon to clear it:

![Screenshot showing the Environment tab with the cursor hovering over the broom icon]({{ site.baseurl}}/exercises/clear-rstudio-environment-screenshot.png)

The `Environment` tab should now say "Environment Is Empty":

![Screenshot showing the Environment tab with only the words Environment Is Empty]({{ site.baseurl}}/exercises/empty-rstudio-environment-screenshot.png)

3\. Rerun your entire homework assignment to make sure it runs from start to finish and produces the expected results.
3\. Rerun your entire homework assignment using "Source with Echo" to make sure it runs from start to finish and produces the expected results.

![Screenshot showing the RStudio Run button]({{ site.baseurl}}/exercises/rstudio-run-button-screenshot.png)
![Screenshot showing the RStudio Source with Echo item hovered in the Source dropdown]({{ site.baseurl}}/exercises/rstudio-source-with-echo-screenshot.png)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added exercises/rstudio-source-with-echo-screenshot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion lectures/R-intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,5 @@ language: R

1. [Introduction]({{ site.baseurl }}/materials/r-intro)
2. [Vectors]({{ site.baseurl }}/materials/vectors-R)
3. [Checking that your code runs]({{ site.baseurl }}/materials/basic-reproducibility-R)
3. [Checking that your code runs]({{ site.baseurl }}/materials/check-that-your-code-runs-R)
4. [Using Large Language Models to Learn]({{ site.baseurl }}/materials/large-language-models)
17 changes: 9 additions & 8 deletions materials/basic-reproducibility-R.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,14 +32,15 @@ language: R
* Doesn't unload packages
* Useful when developing code
* Restart R to get a clean environment
* Works locally (not in Posit Cloud; `Session` -> `Restart R` always reloads environment)
* Does unload packages
* Useful for making sure everything works
* As long is it doesn't secretly reload things
* Run entire file using `Source` button or `Ctrl-Shift-S`
* Makes sure that the code runs fully and produces desired result

* Stop R from storing the state of the environment
* Unloads packages
* But won't clear environment by default (at least not on Posit Cloud)
* Safest thing is to both clear the environment and restart R
* Then run the entire file using `Source with Echo` button or `Ctrl-Shift-S`
* Ensures that the code runs fully and produces desired result
* Last required exercise of every assignment will walk you through this process

### Stop R from storing the state of the environment

* When you close RStudio it will often ask if you want to save your workspace
* *Start to close RStudio*
* *Show Save dialog*
Expand Down
46 changes: 46 additions & 0 deletions materials/check-that-your-code-runs-R.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
---
layout: page
element: notes
title: Check That Your Code Runs
language: R
---

### Setup

> Make sure that `Tools` -> `Global Options` -> `General` ->
> `Save workspace to ~/.RData on exit` is set to the default `Ask`
### Introduction to Reproducibility

* Goal - rerun full analysis with a single click (or command)
* First step - Make sure your code runs anytime and anywhere
* next day (who has gotten code working & had it not work the next day?)
* desktop vs. laptop
* collaborators
* advisor

### Make sure things you did before don't matter

* Computers store the results of each command run in sequence
* Change something
* Looks like it still works
* Only works because of something you did earlier in the same session

### Clearing environments and restarting R

* Clear R environment using the broom icon on the `Environment` tab.
* Doesn't unload packages
* Useful when developing code
* Restart R to get a clean environment
* Unloads packages
* But won't clear environment by default (at least on Posit Cloud)
* Safest thing is to both clear the environment and restart R
* Then run the entire file using `Source with Echo` button or `Ctrl-Shift-S`
* Ensures that the code runs fully and produces desired result
* Last required exercise of every assignment will walk you through this process

### Force R to clear environment when restarting

* `Tools` -> `Global Options` -> `General` ->
`Save workspace to ~/.RData on exit` -> `Never`
* Unclick `Restore .RData into workspace at startup`
57 changes: 21 additions & 36 deletions materials/large-language-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,56 +11,37 @@ language: R
* Who's heard about ChatGPT and other similar models?
* One of the kinds of text these models can generate is code

* The very simplistic version of how these models work is that they look at a string of words and figure out what word is most likely to come next
* They learn the most likely next word by looking at millions of examples from the internet
* In other words they are an advanced form of autocomplete or a parrot
* The very simplistic version of how these models work is that they look at a string of words and figure out what words are most likely to come next
* They learn this by looking at millions of examples from the internet

* Since there is lots of code written by software developers on the internet they are pretty good at generating code
* And since there are lots of lessons on how to learn to code on the internet they can also be good generating text that explains code

### Examples

* *Open ChatGPT*
* Let's prompt ChatGPT to solve something like we've been working on
* *Enter the following prompt*

> How do you calculate the sum of the vector numbers <- c(2.1, 2.7, 2.7, 3.2, 2.9, NA, 3.9, 2.1, 4.5, 2.6) in R?
* *ChatGPT will produce a code answer with a variable holding the sum*
* This looks like the right answer and it even saw the NA, handled it appropriately, and explained that
* Let's prompt ChatGPT for the result of this code

> What is the value of <variable_name>?
* *Copy the code into R and run it*
* *The result returned is currently wrong, but that could change*
* In this case the LLM is wrong
* It can't run code and it doesn't know how to do math, it just knows that when the word "sum" is used for a string of numbers that looks like roughly like this one that there tends to be a number that looks roughly like 24.7 that follows it
* So, LLMs can be powerful, but also wrong

### Right answer wrong approach

* Because LLM aren't specifically designed for this course they may show you ways to do things that we aren't learning

* *Start a new Chat*
* *Type the following prompt*

> In the R programming language use code to print the sum of the following vector.
>
> numbers <- c(2.1, 2.7, 2.7, 3.2, 2.9, NA, 3.9, 2.1, 4.5, 2.6, 2.9, 3.1)
* Because of the small differences in the phrasing of the question (and the stochasticity of LLMs) we get a different answer
* It still works, but it's more complicated, and it's not the approach that we're learning
* *Copy and paste and exercise from the assignment*
* *In most cases the result will be reasonable*

### Using LLMs for learning

* There are a variety of meaningful ethical concerns about using LLMs
* The use a lot of energy to train and run and therefore put a lot of CO2 in the atmosphere
* They use millions of peoples work without credit or payment, arguably in violation of copyright and licenses
* And since they parrot what's on the internet they often lot of bias and bigotry
* And since they parrot what's on the internet they often include a lot of bias and bigotry
* That said, LLMs can be useful for learning and you are welcome to use them for this in this course
* Using them to directly answer the exercises won't help you learn, because humans need practice to learn
* That's the only reason we have exercises
* That's why we have exercises
* But that's easy for me to say, so let's hear from someone actively learning to code - Hero from Coding with Strangers
* (there is some swearing in this video, so if you're not comfortable with that you're welcome to step out for ~60 seconds)

<iframe width="674" height="1198" src="https://www.youtube.com/embed/OhaGNTiMXmU" title="Sora AI is Like Batman&#39;s Utility Belt" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>

* I'm assuming that you're all here to learn, so I recommend listening to Hero and at least not using LLMs to directly answer the exercises
* We won't be policing LLM content in submissions
* But the exercises do need to be answered using the approaches we learned in class and often LLMs will use a different approach


* So what are useful ways to use them?
* You can prompt them to explain things you don't understand and they will parrot relevant advice from material on the web
* This can be easier, especially for folks learning to code, than trying to search for a specific site that has the answer
Expand All @@ -69,4 +50,8 @@ language: R
### Using LLMs for work

* Once you've finished the course you can use them to automate things you already know how to do
* But LLMs are parrots so you know enough so that you can check and make sure that the model produced a valid result
* In our experience in my lab these models typically do about 90% of each simple task correctly
* That means that the end result rarely works, but if you know how to fix what is wrong it can still be a time saver
* But you have to know enough to fix the things that don't work and to check and make sure that the code is actually doing what you want
* And sometimes they're really wrong
* The other day I asked GitHub copilot for help with a complex version control command and it gave me a command that would have unrecoverably deleted all of the work I'd done in the last hour
17 changes: 8 additions & 9 deletions materials/vectors-R.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@ str(states)
* In general `[]` in R means, "give me a piece of something"
* `states[1]` gives us the first value in the vector
* `states[1:3]` gives us the first through the third values
* `1:3` works by makeing a vector of the whole numbers 1 through 3.
* So, this is the same as `states[1:3]` is the same as `states[c(1, 2, 3)]`
* `1:3` works by making a vector of the whole numbers 1 through 3.
* So, this is the same as `states[c(1, 2, 3)]`
* You can use a vector to get any subset or order you want `states[c(4, 1, 3)]`

* Many functions in R take a vector as input and return a value
Expand All @@ -48,7 +48,7 @@ min(count)
sum(count)
```

> Do [Basic Vectors]({{ site.baseurl }}/exercises/Vectors-basic-vectors-R/).
> Do Exercise 6 - [Basic Vectors]({{ site.baseurl }}/exercises/Vectors-basic-vectors-R/).

### Null values

Expand Down Expand Up @@ -82,7 +82,7 @@ mean(count_na, na.rm = TRUE)

### Working with multiple vectors

* Build on example where we have information on states and population counts by adding areas
* Add information on area to our information on states and population counts

```r
states <- c("FL", "FL", "GA", "SC")
Expand All @@ -99,7 +99,7 @@ area <- c(3, 5, 1.9, 2.7)
area * 2
```

* This works because when do this multiplication, R multiplies the first value in the vector by 2, then multiplies the second values in the vector by 2, and so on
* When we run this, R multiplies the first value in the vector by 2, then multiplies the second value in the vector by 2, and so on
* Element-wise: operating on one element at a time

* Remember - this isn't saved unless we store it
Expand All @@ -109,7 +109,7 @@ area * 2
area
```

* If we want to keep the results of the calculation them in a new variable
* To keep the results of the calculation store them in a new variable

```r
doubled_area <- area * 2
Expand Down Expand Up @@ -147,7 +147,7 @@ density[states != 'FL']
```

* Numerical comparisons like greater or less than
* Select states that meet with some restrictions on density
* Select states that meet conditions related to density

```r
states[density > 3]
Expand All @@ -168,8 +168,7 @@ density[density > 3]
* What's actually happening when we subset vectors this way?
* Let's look at the piece of the code inside the `[]`

```r
density > 3
```r`states[1:3]`
```
* This does an element-wise check to see if each value is > 3
Expand Down

0 comments on commit cb98fc2

Please sign in to comment.