-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME.Rmd
147 lines (95 loc) · 5.49 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
---
title: "Basic Usage"
author: "Fernando Campos"
date: "`r Sys.Date()`"
output:
md_document:
variant: markdown_github
vignette: >
%\VignetteIndexEntry{Vignette Title}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
# paceR
![ACG Costa Rica](ACG.jpg)
`paceR` is a collection of functions that make it easy to get data from the University of Calgary's PACE Database into R for further analysis.
To use the tools, you must have access to the PACE Database. If you don't know how to do this, you can ask [Fernando](mailto:[email protected]), [Urs](mailto:[email protected]), or [John](mailto:[email protected]).
## Preparation
To use this package, you first need to install `devtools` with:
```{r, eval = FALSE}
install.packages("devtools")
```
Then, you can install the latest development version of `paceR` from github:
```{r, eval = FALSE}
library(devtools)
devtools::install_github("camposfa/paceR")
```
After you have installed the package once, you can simply load it in the future using:
```{r, message = FALSE, warning = FALSE}
library(paceR)
```
This package makes heavy use of the data manipulation packages [stringr](http://cran.r-project.org/package=stringr), [lubridate](http://cran.r-project.org/package=lubridate), [tidyr](http://cran.r-project.org/package=tidyr), and [dplyr](http://cran.r-project.org/package=dplyr). If not already installed, `paceR` will install these packages automatically. It also provides a convenient function to load them all into your R session, if you want to use them for other tasks:
```{r, message = FALSE}
load_pace_packages()
```
## Connecting to the PACE Database
To get data from PACE, you must have a username and password for the PACE database, and you must create an SSH tunnel to the database server.
### Connecting on a Mac
If you have set up an SSH key (Fernando can help you do this), you can create the SSH tunnel by running this command in R:
```{r, eval = FALSE}
system('ssh -f [email protected] -L 3307:localhost:3306 -N')
```
If you have not set up an SSH key but you do have a username and password for the PACE database, then you can still create the tunnel by opening Terminal and typing the following __in Terminal, not in R!__:
```{r, eval = FALSE}
ssh -f [email protected] -L 3307:localhost:3306 -N
```
Replace the "camposf" part with your username. You will then be prompted to enter your password. Once this is done, the tunnel has been created and you can return to R.
### Connecting on a Windows machine
When you made an account, you should have received instructions from John about how to create the SSH tunnel using Putty/Plink. Just run this and have it open in the background.
### Connecting to a database
With the tunnel created, we can now connect to the database(s) in PACE. The primary database is called `monkey`. There is also a secondary database called `paceR` that has a variety of convenient views that are designed to be used with this package. We'll create connections to both.
```{r}
# Connect to monkey database
pace_db <- src_mysql(group = "PACE", user = "camposf", dbname = "monkey", password = NULL)
# Connect to paceR database
paceR_db <- src_mysql(group = "PACE", user = "camposf", dbname = "paceR", password = NULL)
```
It should go without saying, but the above lines are just an example--if copied verbatim, they will **not** work for anyone but me (Fernando). You must first set up the SSH key and then modify these lines for your particular account.
## Getting data from the database
Once you get the connection worked out, you now can pull data from the database. Whenever you download data, you must pass the name of the database connection that you're using. To do this correctly, it is crucial that you understand a major design decision that affects how the functions can be used.
### Downloading data using saved views (RECOMMENDED METHOD!)
The best way to get data from the database is to use the convenient saved "views". These are stored in the `paceR` database, and they should be called using the functions that begin with `getv_`
```{r}
# Get a Individuals data
(i2 <- getv_Individual(paceR_db))
# Get a condensed version
(i2 <- getv_Individual(paceR_db, full = FALSE))
```
Currently, the following functions are available:
* `getv_CensusMonthly()`
* `getv_Individual()`
* `getv_Phenology()`
### Downloading raw database tables (NOT RECOMMENDED!)
If you want to download __raw database tables__, then you should use function `get_pace_tbl()`. All tables are stored in the `monkey` database, and so when you use this function, you must pass the connection to the `monkey` database in addition to the name of the table that you want to download. For example:
```{r}
# Get the raw individuals table
(i <- get_pace_tbl(pace_db, "tblIndividual"))
# Get the raw deaths table
(d <- get_pace_tbl(pace_db, "tblIndividualDeath"))
```
Note that the "foreign keys" (i.e., the columns that end with "ID") are just uninformative numbers! To make use of the data, you might need to join the tables by their ID relevant fields.
```{r}
# Join the individuals and deaths tables
id <- left_join(i, d, by = c("ID" = "IndividualID"))
```
You can see that there are many more ID fields that would need to be joined. It can very inconvenient to work with the data this way!
## Tutorials for specific kinds of data
* [Phenology](Phenology.md)
* [Fruit Biomass for Santa Rosa](BiomassSR.md)