forked from dzchilds/eda-for-bio
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path1_02_R_calculator.Rmd
209 lines (165 loc) · 16.4 KB
/
1_02_R_calculator.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
# A quick introduction to R
## Using R as a big calculator {#r-calculator}
### Basic arithmetic
The end of the [Get up and running with R and RStudio] chapter demonstrated that R can handle familiar arithmetic operations: addition, subtraction, multiplication, division. If we want to add or subtract a pair of numbers just place the `+` or `-` symbol in between two numbers, hit Enter, and R will read the expression, evaluate it, and print the result to the Console. This works exactly as we expect it to:
```{r}
3 + 2
5 - 1
```
Multiplication and division are no different, though we don't use `x` or `÷` for these operations. Instead, we use `*` and `/` to multiply and divide:
```{r}
7 * 2
3 / 2
```
We can also exponentiate a numbers: raise one number to the power of another. We use the `^` operator to do this:
```{r}
4^2
```
This raises 4 to the power of 2 (i.e. we squared it). In general, we can raise a number `x` to the power of `y` using `x^y`. Neither `x` or `y` need to be a whole numbers either.
Arithmetic operations can also be combined into one expression. Assume we want to subtract 6 from 2^3^. The expression to perform this calculation is:
```{r}
2^3 - 6
```
$2^3=8$ and $8-6=2$. Simple enough, but what if we had wanted to carry out a slightly longer calculation that required the last answer to then be divided by 2? This is the **wrong** the way to do it:
```{r}
2^3 - 6 / 2
```
The answer we were looking for is $1$. So what happened? R evaluated $6/2$ first and then subtracted this answer from $2^3$.
If that's obvious, great. If not, it's time to learn a bit about the __order of precendence__ used by R. R uses a standard set of rules to decide the order in which arithmetic calculations feed into one another so that it can unambiguously evaluate any expression. It uses the same order as every other computer language, which thankfully is the same one we all learned in mathematics class at school. The order of precedence used is:
1. exponents and roots ("taking powers")
2. multiplication and division
3. additional and subtraction
```{block, type="info"}
#### BODMAS and friends
If you find it difficult to remember order of precedence used by R, there are a load of [mnemonics](http://en.wikipedia.org/wiki/Order_of_operations#Mnemonics) that can to help. Pick one you like and remember that instead.
```
In order to get the answer we were looking for we need to take control of the order of evaluation. We do this by enclosing grouping the necessary bits of the calculation inside parentheses ("round brackets"). That is, we place `(` and `)` either side of them. The order in which expressions inside different pairs of parentheses are evaluated follows the rules we all had to learn at school. The R expression we should have used is therefore:
```{r}
(2^3 - 6) / 2
```
We can use more than one pair of parentheses to control the order of evaluation in more complex calculations. For example, if we want to find the cube root of 2 (i.e. 2^1/3^) rather than 2^3^ in that last calculation we would instead write:
```{r}
(2^(1/3) - 6) / 2
```
The parentheses around the `1/3` in the exponent are needed to ensure this is evaluated prior to being used as the exponent.
### Problematic calculations
Now is a good time to highlight how R handles certain kinds of awkward numerical calculations. One of these involves division of a number by 0. Some programming languages will respond to an attempt to do this with an error. R is a bit more forgiving:
```{r}
1/0
```
Mathematically, division of a finite number by `0` equals A Very Large Number: infinity. R has a special built in data value that allows it to handle this kind of thing. This is `Inf`, which of course stands for "infinity". The other special kind of value we sometimes run into can be generated by numerical calculations that don't have a well-defined result. For example, it arises when we try to divide 0 or infinity by themselves:
```{r}
0/0
```
The `NaN` in this result stands for Not a Number. R produces `NaN` because $0/0$ is not defined mathematically: it produces something that is Not a Number. The reason we are pointing out `Inf` and `NaN` is not because we expect to use them. It's important to know what they represent because they often arise as a result of a mistake somewhere in a program. It's hard to track down such mistakes if we don't know how `Inf` and `NaN` arise.
That is enough about using R as a calculator for now. What we've seen---even though we haven't said it yet---is that R functions as a REPL: a read-eval-print loop (there's no need to remember this term). R takes user input, evaluates it, prints the results, and then waits for the next input. This is handy, because it means we can use it interactively, working through an analysis line-by-line. However, to use R to solve for complex problems we need to learn how to store and reuse results. We'll look at this in the next section.
```{block, type="speed"}
#### Working efficiently at the Console
Working at the Console soon gets tedious if we have to retype similar things over and over again. There is no need to do this though. Place the cursor at the prompt and hit the up arrow. What happens? This brings back the last expression sent to R's interpreter. Hit the up arrow again to see the last-but-one expression, and so on. We go back down the list using the down arrow. Once we're at the line we need, we use the left and right arrows to move around the expression and the delete key to remove the parts we want to change. Once an expression has been edited like this we hit Enter to send it to R again. Try it!
```
## Storing and reusing results {#assignment}
So far we've not tried to do anything remotely complicated or interesting, though we now know how to construct longer calculations using parentheses to control the order of evaluation. This approach is fine if the calculation is very simple. It quickly becomes unwieldy for dealing with anything more. The best way to see what we mean is by working through a simple example---solving a quadratic equation. Quadratic equations looks like this: $a + bx + cx^2 = 0$. If we know the values of $a$, $b$ and $c$ then we can solve this equation to find the values of $x$ that ensure the left hand side equals the right hand side. Here's the well-known formula for these solutions:
$$
x = \frac{-b\pm\sqrt{b^2-4ac}}{2a}
$$
Let's use R to calculate these solutions for us. Say that we want to find the solutions to the quadratic equation when $a=1$, $b=6$ and $c=5$. We just have to turn the above equation into a pair of R expressions:
```{r}
(-6 + (6^2 -4 * 1 * 5)^(1/2)) / (2 * 1)
```
```{r}
(-6 - (6^2 -4 * 1 * 5)^(1/2)) / (2 * 1)
```
The output tells us that the two values of $x$ that satisfy this particular quadratic equation are -1 and -5. What should we do if we now need to solve a different quadratic equation? Working at the Console, we could bring up the expressions we typed (using the up arrow) and then go through each of these, changing the numbers to match the new values of $a$, $b$ and $c$. Editing individual expressions like this is fairly tedious, and more importantly, it's fairly error prone because we have to make sure we substitute the new numbers at exactly the right positions.
A partial solution to this problem is to store the values of $a$, $b$ and $c$. We'll see precisely why this is useful in a moment. First, we need to learn how to store results in R. The key to this is to use the __assigment operator__, written as a left arrow ` <- `. Sticking with our original example, we need to store the numbers 1, 6 and 5. We do this using three expressions, one after the another:
```{r}
a <- 1
```
```{r}
b <- 6
```
```{r}
c <- 5
```
Notice that we don't put a space between `<` and `-`---R won't like it if we try to add one. R didn't print anything to screen, so what actually happened? We asked R to first evaluate the expression on the right hand side of each ` <-` (just a number in this case) and then __assign the result__ of that evaluation instead of printing it. Each result has a name associated with it, which appears on the left hand side of the ` <- `.
```{block, type="speed"}
#### RStudio shortcut
We use the assignment operator ` <- ` all the time when working with R, and because it's inefficient to have to type the `<` and `-` characters over and over again, RStudio has a built in shortcut for typing the assignment operator: Alt + `-` . Try it. Move the curser to the Console, hold down the Alt key ('Option' on a Mac), and press the `-` sign key. RStudio will auto-magically add insert ` <- `.
```
The net result of all this is that we have stored the numbers 1, 6 and 5 somewhere in R, associating them with the letters `a`, `b` and `c`, respectively. What does this mean? Here's what happens if we type the letter `a` into the Console and hit Enter:
```{r}
a
```
It looks the same as if we had typed the number `1` directly into the Console. The result of typing `b` or `c` is hopefully obvious. What we just did was to store the output that results from evaluating three separate R expressions, associating each a name so that we can access them again^[Technically, this is called __binding__ the name to a value. You really don't need to remember this though.].
Whenever we use the assignment operator ` <- ` we are telling R to keep whatever kind of value results from the calculation on the right hand side of ` <- `, giving it the name on the left hand side so that we can access it later. Why is this useful? Let's imagine we want to do more than one thing with our three numbers. If we want to know their sum or their product we can now use:
```{r}
a + b + c
```
```{r}
a * b * c
```
So once we've stored a result and associated it with a name we can reuse it wherever it's needed. Returning to our motivating example, we can now calculate the solutions to the quadratic equation by typing these two expressions into the Console:
```{r}
(-b + (b^2 -4 * a * c)^(1/2)) / (2 * a)
```
```{r}
(-b - (b^2 -4 * a * c)^(1/2)) / (2 * a)
```
Imagine we'd like to find the solutions to a different quadratic equation where $a=1$, $b=5$ and $c=5$. We just changed the value of $b$ here to keep things simple. To find our new solutions we have to do two things. First we change the value of the number associated with `b`...
```{r}
b <- 5
```
...then we bring up those lines that calculate the solutions to the quadratic equation and run them, one after the other:
```{r}
(-b + (b^2 -4 * a * c)^(1/2)) / (2 * a)
```
```{r}
(-b - (b^2 -4 * a * c)^(1/2)) / (2 * a)
```
We didn't have to retype those two expressions. We could just use the up arrow to bring each one back to the prompt and hit Enter. This is much simpler than editing the expressions. More importantly, we are beginning to see the benefits of using something like R: we can break down complex calculations into a series of steps, storing and reusing intermediate results as required.
## How does assignment work?
It's important to understand, at least roughly, how assignment works. The first thing to note is that when we use the assignment operator ` <- ` to associate names and values, we informally refer to this as creating (or modifying) __a variable__. This is much less tedious than using words like "bind", "associate", value", and "name" all the time. Why is it called a variable? What happens when we run these lines:
```{r}
myvar <- 1
myvar <- 7
```
The first time we used ` <- ` with `myvar` on the left hand side we __created__ a variable `myvar` associated with the value 1. The second line `myvar <- 7` __modified__ the value of `myvar` to be 7. This is why we refer to `myvar` as a variable: we can change the its value as we please. What happened to the old value associated with `myvar`? In short, it is gone, kaput, lost... forever. The moment we assign a new value to `myvar` the old one is destroyed and can no longer be accessed. Remember this.
Keep in mind that the expression on the right hand side of ` <- ` can be any kind of calculation, not just just a number. For example, if I want to store the number 1, associating it with `answer`, I could do this:
```{r}
answer <- (1 + 2^3) / (2 + 7)
```
That is a strange way to assign the number 1, but it illustrates the point. More generally, as along as the expression on the right hand side generates an output it can be used with the assignment operator. For example, we can create new variables from old variables:
```{r}
newvar <- 2 * answer
```
What happened here? Start at the right hand side of ` <- `. The expression on this side contained the variable `answer` so R went to see if `answer` actually exists in the global environment. It does, so it then substituted the value associated with `answer` into the requested calculation, and then assigned the resulting value of 2 to `newvar`. We created a new variable `newvar` using information associated with `answer`.
Now look at what happens if we just copy a variable using the assignment operator:
```{r}
myvar <- 7
mycopy <- myvar
```
At this point we have two variables, `myvar` and `mycopy`, each associated with the number 7. There is something very important going on here: each of these is associated with a __different copy__ of this number. If we change the value associated with one of these variables it does not change the value of the other, as this shows:
```{r}
myvar <- 10
```
```{r}
myvar
```
```{r}
mycopy
```
R always behaves like this unless we work hard to alter this behaviour (we never do this in this book). So remember, every time we assign one variable to another, we actually make a completely new, independent copy of its associated value. For our purposes this is a good thing because it makes it much easier to understand what a long sequence of R expressions will do. That probably doesn't seem like an obvious or important point, but trust us, it is.
## Global environment
Whenever we associate a name with a value we create a copy of both these things somewhere in the computer's memory. In R the "somewhere" is called an environment. We aren't going to get into a discussion of R's many different kinds of environments---that's an advanced topic well beyond the scope of this book. The one environment we do need to be aware of though is the __Global Environment__.
Whenever we perform an assignment in the Console the name-value pair we create (i.e. the variable) is placed into the Global Environment. The current set of variables are all listed in the __Environment__ tab in RStudio. Take a look. Assuming that at least one variable has been made, there will be two columns in the __Environment__ tab. The first shows us the names of all the variables, while the second summarises their values.
```{block, type="warning"}
#### The Global Environment is temporary
By default, R will save the Global Environment whenever we close it down and then restore it in the next R session. It does this by writing a copy of the Global Environment to disk. In theory this means we can close down R, reopen it, and pick things up from where we left off. Don't do this---it only increases the risk of making a serious mistake. Assume that when R and RStudio are shut down, everything in Global Environment will be lost.
```
## Naming rules and conventions
We don't have to use a single letter to name things in R. The words `tom`, `dick` and `harry` could be used in place of `a`, `b` and `c`. It might be confusing to use them, but `tom`, `dick` and `harry` are all legal names as far as to R is concerned:
* A legal name in R is any sequence of letters, numbers, `.`, or `_`, but the sequence of characters we use must begin with a letter. Both upper and lower case letters are allowed. For example, `num_1`, `num.1`, `num1`, `NUM1`, `myNum1` are all legal names, but `1num` and `_num1` are not because they begin with `1` and `_`.
* R is case sensitive---it treats upper and lower case letters as different characters. This means that `num` and `Num` are treated as distinct names. Forgetting about case sensitivity is a good way to create errors when using R. Try to remember that.
```{block, type="warning"}
#### Don't begin a name with `.`
We are allowed to begin a name with a `.`, but this usually is A Bad Idea. Why? Because variable names that begin with `.` are hidden from view in the Global Environment---the value it refers to exists but it's invisible. This behaviour exists to allow R to create invisible variables that control how it behaves. This is useful, but it isn't really meant to be used by the average user.
```