edit dplyr/SQL lectures

GotelliLab · Mar 19, 2024 · 46bcb7d · 46bcb7d
1 parent 7e2aa1a
commit 46bcb7d
Show file tree

Hide file tree

Showing 2 changed files with 28 additions and 8 deletions.
diff --git a/Lectures/dplyr_Lecture1.Rmd b/Lectures/dplyr_Lecture1.Rmd
@@ -37,7 +37,10 @@ class(starwars)
 ```
 ### What is a tibble?    
 + "a modern take on data frames"   
-+ said to keep the great aspects of data frames and drops the frustrating ones (i.e. changing variable names, changing an input type)   
++ said to keep the great aspects of data frames and drops the frustrating ones (i.e. changing variable names, changing an input type) In general, tibbles do "less" as a tradeoff to make code simpler and less prone to crashing. A few examples of what they change from data frames: 
+
+Tibbles will never change the names of variables, and will never create row names. They also make printing easier, and so are often a good choice when working with large datasets and don't want to overload your console. 
+
 
 ```{r}
 glimpse(starwars) #more effective than str() in this case   
@@ -50,12 +53,12 @@ head(starwars) #looks a little different since it's a tibble (mentions dimension
 #dataClean <- starwars[complete.cases(starwars),] # removes all rows with NA values but doesn't work with lists
 
 # to get complete cases for just certain columns 
-starwarsClean <-starwars[complete.cases(starwars[,1:3]),]
+starwarsClean <-starwars[complete.cases(starwars[,1:10]),]
 
 
 # Check for NAs
-is.na(starwarsClean[1,1]) #useful for only a few observations since it returns a list of True/False
-anyNA(starwarsClean)
+is.na(starwarsClean[,1]) #useful for only a few observations since it returns a list of True/False
+anyNA(starwarsClean[,1:10])
 anyNA(starwars[,1:10]) # compared to our original dataset
 
 # What does our data look like now?
@@ -103,7 +106,7 @@ select(starwarsClean, name:species) # you can use variables names too
 select(starwarsClean, -(films:starships)) # you can subset everything except particular variables
 
 ## Rearrange columns
-select(starwarsClean, name, gender, species, everything()) # using the everything() helper function is useful if you have a few variables to move to the beginning
+select(starwarsClean, homeworld, name, gender, species, everything()) # using the everything() helper function is useful if you have a few variables to move to the beginning
 select(starwarsClean, contains("color")) ## other helpers include: ends_with, starts_with, matches (reg ex), num_range
 
 ## Renaming columns
@@ -189,14 +192,16 @@ wideSW <- starwarsClean %>%
   pivot_wider(names_from = sex, values_from = height, values_fill = NA)
 wideSW
 
-starwars %>%
+pivotSW<-starwars %>%
   select(name, homeworld) %>%
   group_by(homeworld) %>%
   mutate(rn = row_number()) %>%
   ungroup() %>%
   pivot_wider(names_from = homeworld, values_from = name) %>%
   select(-rn)
 
+pivotSW
+
 ## make data set longer
 glimpse(wideSW)
 wideSW %>%

diff --git a/Lectures/dplyr_SQL_Lecture.html b/Lectures/dplyr_SQL_Lecture.html
@@ -357,11 +357,26 @@ <h4 class="date">2023-02-27</h4>
 </div>
 
 
+<p>#SQL stands for Structured Query Language, and it’s useful for
+storing and processing datasets. This is often used in a lot of other
+applications, for example I use it a lot when running GIS applications
+like ArcGIS Pro or ArcMaps, which usually doesn’t have a dedicated way
+to interact with datasets except with SQL.</p>
+<p>#So, the first thing we’re going to do is download the data</p>
 <p>Here’s a link to the sample datasets that we’ll be using- <a href="https://datadryad.org/stash/dataset/doi:10.5061%2Fdryad.4mw6m90b9">Download</a></p>
 <pre class="r"><code>#Installing Packages
-library(sqldf)
-library(dplyr)
+library(sqldf)</code></pre>
+<pre><code>## Warning: package &#39;RSQLite&#39; was built under R version 4.2.3</code></pre>
+<pre class="r"><code>library(dplyr)
 library(tidyverse)</code></pre>
+<pre><code>## Warning: package &#39;tidyverse&#39; was built under R version 4.2.3</code></pre>
+<pre><code>## Warning: package &#39;ggplot2&#39; was built under R version 4.2.3</code></pre>
+<pre><code>## Warning: package &#39;tibble&#39; was built under R version 4.2.3</code></pre>
+<pre><code>## Warning: package &#39;tidyr&#39; was built under R version 4.2.3</code></pre>
+<pre><code>## Warning: package &#39;readr&#39; was built under R version 4.2.3</code></pre>
+<pre><code>## Warning: package &#39;purrr&#39; was built under R version 4.2.3</code></pre>
+<pre><code>## Warning: package &#39;forcats&#39; was built under R version 4.2.3</code></pre>
+<pre><code>## Warning: package &#39;lubridate&#39; was built under R version 4.2.3</code></pre>
 <div id="take-a-look-at-the-datasets-first" class="section level1">
 <h1>Take a look at the datasets first</h1>
 <pre class="r"><code>head(species_clean)