Skip to content

Commit

Permalink
edit dplyr/SQL lectures
Browse files Browse the repository at this point in the history
  • Loading branch information
georgeni2442 committed Mar 19, 2024
1 parent 7e2aa1a commit 46bcb7d
Show file tree
Hide file tree
Showing 2 changed files with 28 additions and 8 deletions.
17 changes: 11 additions & 6 deletions Lectures/dplyr_Lecture1.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,10 @@ class(starwars)
```
### What is a tibble?
+ "a modern take on data frames"
+ said to keep the great aspects of data frames and drops the frustrating ones (i.e. changing variable names, changing an input type)
+ said to keep the great aspects of data frames and drops the frustrating ones (i.e. changing variable names, changing an input type) In general, tibbles do "less" as a tradeoff to make code simpler and less prone to crashing. A few examples of what they change from data frames:

Tibbles will never change the names of variables, and will never create row names. They also make printing easier, and so are often a good choice when working with large datasets and don't want to overload your console.


```{r}
glimpse(starwars) #more effective than str() in this case
Expand All @@ -50,12 +53,12 @@ head(starwars) #looks a little different since it's a tibble (mentions dimension
#dataClean <- starwars[complete.cases(starwars),] # removes all rows with NA values but doesn't work with lists
# to get complete cases for just certain columns
starwarsClean <-starwars[complete.cases(starwars[,1:3]),]
starwarsClean <-starwars[complete.cases(starwars[,1:10]),]
# Check for NAs
is.na(starwarsClean[1,1]) #useful for only a few observations since it returns a list of True/False
anyNA(starwarsClean)
is.na(starwarsClean[,1]) #useful for only a few observations since it returns a list of True/False
anyNA(starwarsClean[,1:10])
anyNA(starwars[,1:10]) # compared to our original dataset
# What does our data look like now?
Expand Down Expand Up @@ -103,7 +106,7 @@ select(starwarsClean, name:species) # you can use variables names too
select(starwarsClean, -(films:starships)) # you can subset everything except particular variables
## Rearrange columns
select(starwarsClean, name, gender, species, everything()) # using the everything() helper function is useful if you have a few variables to move to the beginning
select(starwarsClean, homeworld, name, gender, species, everything()) # using the everything() helper function is useful if you have a few variables to move to the beginning
select(starwarsClean, contains("color")) ## other helpers include: ends_with, starts_with, matches (reg ex), num_range
## Renaming columns
Expand Down Expand Up @@ -189,14 +192,16 @@ wideSW <- starwarsClean %>%
pivot_wider(names_from = sex, values_from = height, values_fill = NA)
wideSW
starwars %>%
pivotSW<-starwars %>%
select(name, homeworld) %>%
group_by(homeworld) %>%
mutate(rn = row_number()) %>%
ungroup() %>%
pivot_wider(names_from = homeworld, values_from = name) %>%
select(-rn)
pivotSW
## make data set longer
glimpse(wideSW)
wideSW %>%
Expand Down
19 changes: 17 additions & 2 deletions Lectures/dplyr_SQL_Lecture.html
Original file line number Diff line number Diff line change
Expand Up @@ -357,11 +357,26 @@ <h4 class="date">2023-02-27</h4>
</div>


<p>#SQL stands for Structured Query Language, and it’s useful for
storing and processing datasets. This is often used in a lot of other
applications, for example I use it a lot when running GIS applications
like ArcGIS Pro or ArcMaps, which usually doesn’t have a dedicated way
to interact with datasets except with SQL.</p>
<p>#So, the first thing we’re going to do is download the data</p>
<p>Here’s a link to the sample datasets that we’ll be using- <a href="https://datadryad.org/stash/dataset/doi:10.5061%2Fdryad.4mw6m90b9">Download</a></p>
<pre class="r"><code>#Installing Packages
library(sqldf)
library(dplyr)
library(sqldf)</code></pre>
<pre><code>## Warning: package &#39;RSQLite&#39; was built under R version 4.2.3</code></pre>
<pre class="r"><code>library(dplyr)
library(tidyverse)</code></pre>
<pre><code>## Warning: package &#39;tidyverse&#39; was built under R version 4.2.3</code></pre>
<pre><code>## Warning: package &#39;ggplot2&#39; was built under R version 4.2.3</code></pre>
<pre><code>## Warning: package &#39;tibble&#39; was built under R version 4.2.3</code></pre>
<pre><code>## Warning: package &#39;tidyr&#39; was built under R version 4.2.3</code></pre>
<pre><code>## Warning: package &#39;readr&#39; was built under R version 4.2.3</code></pre>
<pre><code>## Warning: package &#39;purrr&#39; was built under R version 4.2.3</code></pre>
<pre><code>## Warning: package &#39;forcats&#39; was built under R version 4.2.3</code></pre>
<pre><code>## Warning: package &#39;lubridate&#39; was built under R version 4.2.3</code></pre>
<div id="take-a-look-at-the-datasets-first" class="section level1">
<h1>Take a look at the datasets first</h1>
<pre class="r"><code>head(species_clean)
Expand Down

0 comments on commit 46bcb7d

Please sign in to comment.