Skip to content

Commit

Permalink
create Data Curation and Experimental Design lectures
Browse files Browse the repository at this point in the history
  • Loading branch information
ngotelli committed Feb 22, 2024
1 parent e0da42c commit a80d345
Show file tree
Hide file tree
Showing 6 changed files with 1,507 additions and 2 deletions.
74 changes: 74 additions & 0 deletions Lectures/DataCuration.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
---
title: "Data Curation"
author: "Nicholas J. Gotelli"
date: "22 February 2024"
output:
html_document:
highlight: tango
theme: united
keep_md: yes
---
## Data curation
* GitHub repository
* Metadata
* [Metadata template](https://gotellilab.github.io/Bio381/Scripts/Feb02/ExcelDataTemplate.xlsx)
* Example of Lauren's data

## Exporting and importing data

First create a tiny data set in Excel:
```
# comments at the top
# beaucoup metadata
ID, Treatment, Biomass, Notes
1, Control, 30.3,
2, HighN, 13.0,
3, HighN, NA, broken scale
4, Control, 35.3,
```
- Save as .csv
- Inspect in RStudio Editor

### Use `read.table` to bring in data
```{r, eval=FALSE}
my_data <- read.table(file="path/to/data.csv",
header=TRUE,
sep=",",
comment.char="#")
# inspect object
str(my_data)
# now add a column
my_data$newVar <- runif(4)
head(my_data)
```

### Use `write.table` to export to a data file

```{r, eval=FALSE}
write.table(x=my_data,
file="Path/To/OutputFileName.csv",
HEADER=TRUE,
sep=",")
```



But this is not a good way to save or share data objects if we are working in R. Some researchers use the `save()` function, which preserves the whole environment, but once it is restored with `load()`, the variable names cannot be changed. It is better to use `saveRDS().

### `saveRDS()`: useful when you are working only in R
```{r, eval=FALSE}
saveRDS(my_data, file="Path/To/FileName.RDS") # .RDS suffix is not required, but good for clarity
```

This only saves a single R object as a binary, but remember, you can bundle up many things into a single list!

Use `readRDS()` to restore it.

### `readRDS()`
```{r, eval=FALSE}
data_in <-readRDS("FileName.RDS")
```

549 changes: 549 additions & 0 deletions Lectures/DataCuration.html

Large diffs are not rendered by default.

218 changes: 218 additions & 0 deletions Lectures/ExperimentalDesigns.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,218 @@
---
title: 'Experimental Designs'
author: "Nicholas J. Gotelli"
date: "February 22, 2024"
output:
html_document:
highlight: tango
theme: united
pdf_document: default
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE,eval=FALSE)
```


### Archetype Experimental Designs
- independent versus dependent variables
- discrete versus continuous variables
- continuous variables (integer and real)
- direction of cause and effect, x axis is independent
- continuous versus discrete (natural or arbitrary or statistical bins)

### Regression (dependent: continuous, independent: continuous)
- linear model of $y = a + bx$
- statistical tests for null of hypothesis of slope and/or intercept = 0
- confidence and prediction intervals of uncertainty
- goodness of fit tests for linearity

### Set-up
```{r}
library(tidyverse)
```


### Data Frame construction for Regression Data

```{r}
n = 50 # number of observations (rows)
varA <- runif(n) # random uniform values (independent)
varB <- runif(n) # a second random column (dependent)
varC <- 5.5 + varA*10 # a noisy linear relationship with varA
ID <- seq_len(n) # creates a sequence from 1:n (if n > 0!)
regData <- data.frame(ID,varA,varB,varC)
head(regData)
str(regData)
```

# Basic regression analysis in R
```{r}
# model
regModel <- lm(varB~varA,data=regData)
# model output
regModel # printed output is sparse
str(regModel) # complicated, but has "coefficients"
head(regModel$residuals) # contains residuals
# 'summary' of model has elements
summary(regModel) #
summary(regModel)$coefficients
str(summary(regModel))
# best to examine entire matrix of coefficients:
summary(regModel)$coefficients[] #shows all
# can pull results from this, but a little wordy
summary(regModel)$coefficients[1,4] #p value for intercept
summary(regModel)$coefficients["(Intercept)","Pr(>|t|)"] # uggh
# alternatively unfurl this into a 1D atomic vector with names
z <- unlist(summary(regModel))
str(z)
z
z$coefficients7
# grab what we need and put into a tidy list
regSum <- list(intercept=z$coefficients1,
slope=z$coefficients2,
interceptP=z$coefficients7,
slopeP=z$coefficients8,
r2=z$r.squared)
# much easier to query and use
print(regSum)
regSum$r2
regSum[[5]]
```

### Basic ggplot of regression model

```{r}
regPlot <- ggplot(data=regData,aes(x=varA,y=varB)) +
geom_point() +
stat_smooth(method=lm,se=0.99) # default se=0.95
print(regPlot)
# ggsave(filename="Plot1.pdf",plot=regPlot,device="pdf")
```

### Data frame construction for one-way ANOVA

```{r}
nGroup <- 3 # number of treatment groups
nName <- c("Control","Treat1", "Treat2") # names of groups
nSize <- c(12,17,9) # number of observations in each group
nMean <- c(40,41,60) # mean of each group
nSD <- c(5,5,5) # standardd deviation of each group
ID <- 1:(sum(nSize)) # id vector for each row
resVar <- c(rnorm(n=nSize[1],mean=nMean[1],sd=nSD[1]),
rnorm(n=nSize[2],mean=nMean[2],sd=nSD[2]),
rnorm(n=nSize[3],mean=nMean[3],sd=nSD[3]))
TGroup <- rep(nName,nSize)
ANOdata <- data.frame(ID,TGroup,resVar)
str(ANOdata)
```

### Basic ANOVA in R

```{r}
ANOmodel <- aov(resVar~TGroup,data=ANOdata)
print(ANOmodel)
print(summary(ANOmodel))
z <- summary(ANOmodel)
str(z)
aggregate(resVar~TGroup,data=ANOdata,FUN=mean)
unlist(z)
unlist(z)[7]
ANOsum <- list(Fval=unlist(z)[7],probF=unlist(z)[9])
ANOsum
```


### Basic ggplot of ANOVA data

```{r}
ANOPlot <- ggplot(data=ANOdata,aes(x=TGroup,y=resVar,fill=TGroup)) +
geom_boxplot()
print(ANOPlot)
# ggsave(filename="Plot2.pdf",plot=ANOPlot,device="pdf")
```


### Data frame construction for logistic regression
```{r}
xVar <- sort(rgamma(n=200,shape=5,scale=5))
yVar <- sample(rep(c(1,0),each=100),prob=seq_len(200))
lRegData <- data.frame(xVar,yVar)
```
### Logistic regression analysis in R

```{r}
lRegModel <- glm(yVar ~ xVar,
data=lRegData,
family=binomial(link=logit))
summary(lRegModel)
summary(lRegModel)$coefficients
```

### Basic ggplot of logistic regression
```{r}
lRegPlot <- ggplot(data=lRegData, aes(x=xVar,y=yVar)) +
geom_point() +
stat_smooth(method=glm, method.args=list(family=binomial))
print(lRegPlot)
```

### Data for contingency table analysis
```{r}
# integer counts of different data groups
vec1 <- c(50,66,22)
vec2 <- c(120,22,30)
dataMatrix <- rbind(vec1,vec2)
rownames(dataMatrix) <- c("Cold","Warm")
colnames(dataMatrix) <-c("Aphaenogaster",
"Camponotus",
"Crematogaster")
str(dataMatrix)
```

### Basic contingency table analysis in R

```{r}
print(chisq.test(dataMatrix))
```

### Plotting contingency table analyses

```{r}
# some simple plots using baseR
mosaicplot(x=dataMatrix,
col=c("goldenrod","grey","black"),
shade=FALSE)
barplot(height=dataMatrix,
beside=TRUE,
col=c("cornflowerblue","tomato"))
dFrame <- as.data.frame(dataMatrix)
dFrame <- cbind(dFrame,list(Treatment=c("Cold","Warm")))
dFrame <- gather(dFrame,key=Species,Aphaenogaster:Crematogaster,value=Counts)
p <- ggplot(data=dFrame,aes(x=Species,y=Counts,fill=Treatment)) + geom_bar(stat="identity",position="dodge",color=I("black")) +
scale_fill_manual(values=c("cornflowerblue","coral"))
print(p)
```

664 changes: 664 additions & 0 deletions Lectures/ExperimentalDesigns.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion index.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ February 14 | - | [Homework #5](Homeworks/Homework_05.html)
February 15 | [Matrices, Lists, Data Frames II](Lectures/MatricesListsDataFrames.html) | -
February 20 | [Probability Distributions](Lectures/ProbabilityDistributions.html)| -
February 21 | - | [Homework # 6](Homeworks/Homework_06.html)
February 22 | [Data Curation](Lectures/Lecture_10.xhtml)| -
February 22 | [Data Curation](Lectures/DataCuration.xhtml)| -
February 27 |[Functions](Lectures/Lecture_12.xhtml) | [(Experimental Designs)](Lectures/Lecture_12a.xhtml)
February 28 | | [Homework # 7](Homeworks/Homework_08.html)
February 29 | [Functions/Structured Programming](Lectures/) |
Expand Down
2 changes: 1 addition & 1 deletion index.html
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,7 @@ <h2>Lecture Outlines &amp; Homework Assignments</h2>
</tr>
<tr class="even">
<td>February 22</td>
<td><a href="Lectures/Lecture_10.xhtml">Data Curation</a></td>
<td><a href="Lectures/DataCuration.xhtml">Data Curation</a></td>
<td>-</td>
</tr>
<tr class="odd">
Expand Down

0 comments on commit a80d345

Please sign in to comment.