-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathBasics.qmd
340 lines (235 loc) · 10.2 KB
/
Basics.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
# Basics
Set directory of project with `setwd`
```{r}
setwd("C:/Users/vicer06/OneDrive - Linköpings universitet/Documents/GitHub/R4EnvChemAnalysis")
```
Rstudio consists of four windows, top left is where the Rscript files get opened and where the code can be written and edited, bottom left is where the console can be found for writing commands. In top right can for instance the workspace be seen which shows all created objects and in the bottom right can for instance the the files in the working directory be seen, created plots, loaded packages, etc.
![Figure. Overview of RStudio](images/rstudio-01.png){width="574"}
For some functions in R, different packages can be used. For example, to create more complex plots `ggplot()` function can be used. To be able to use it, the `ggplot2` packages needs to be first installed by `install.packages("ggplot2")` in the console, or by goint to *Tools\>Install Packages...\>search for `ggplot2`*.
Installation of packages only needs to be done ones. To be able to use the functions includes in a package, it needs to be loaded for each new R session by:
```{r}
#| output: true
library("ggplot2")
```
`ggplot2` can also be installed by installing the packages `tidyverse`, which contains multiple packages and load it with `library():`
```{r}
library("tidyverse")
```
If one want to use a function and are not sure which arguments the function requires, one can use `args("function name")` . Further by using `?"name of function/package/.."` , one can read about that certain function or package in the help-tab in the bottom right corner
```{r}
args(plot) #for the function plot, need to have the arguments x and y
```
```{r}
args(round) #function to round numeric vector, the function needs the numeric vector, which can consist of only one number
x <- 6.02214072
round(x) #by default will round the number to zero digits
round(x, digits = 3) #can change to how many digits one wants
```
------------------------------------------------------------------------
## Simple commands
Objects is a way to store data in R. Objects can easily be created by using the symbol `<-`, and that points towards the name of the object. Further, to write comments that can make the written code easier to interpretate `#` can be used:
```{r}
#| output: true
#This is a code that creates objects a which contains the sum of 2 + 5
a <- 2 + 5
a
#It is also possible to assign a vector of numbers to objects as done here for b which contains the numbers 2,3,4,5,6
b <- c(2:6)
b
#Further, it is possible to use addition, subtraction, etc, on objects.
c <- b+a
c
d <- b*a
d
d <- d - 2
d
#Also, it is possible to use matrix operator on vectors
e1 <- b %*% b #inner multiplication
e1
e2 <- b %o% b #outer multiplication
e2
```
Important: object names cannot start with a number or contain symbols such as \^, !, \$, \@, +, -, /, \*
```{r}
#2a <- c(2,4)
#Error: unexpected symbol in "2a"
```
```{r}
#object! <- c(1:4)
#Error: unexpected '!' in "object!"
```
```{r}
#However, it is possible to use "_" in an objects name
object_name <- c(5:8)
```
If an new objects is given the same name as an existing objects, the content in the objects will be written over by the new content:
```{r}
f <- 2+6 #objects f is created which contains the sum of 2+6
f
f <- c(2,4,6) #object f now has the values 2, 4, and 6
f
```
To avoid writing over existing objects, one can either look in the top-right part of RStudio in the "Environment"-tab to see which names are already in use, or can use the command `ls()`
```{r}
ls()
```
------------------------------------------------------------------------
[When a script is running and you would like to cancel it, you can press `clrl+c`]{.underline}
------------------------------------------------------------------------
**More useful functions:**
```{r}
sum(b) #sums a set of values
mean(b) #returns the mean values of a set of values
replicate(n = 3, a+2) #repeats the command (here a+2) n number of times
trunc(3.14) #returns only integers of a number
sample(x = object_name, size = 2, replace = TRUE) #returns random value from x (in this case object_name). size determines number of values to return and replace = TRUE mean the same value can be returned multiple times (if FALSE, will not return same value multiple times)
a1 <- data.frame(a11=1:10,
a12=2:11)
apply(a1[,1:2], 2, mean) #performs specified function (in this case mean) of column 1 and 2 in a1
apply(a1[,1:2], 1, mean) #performed specified function on each row for the columns 1 and 2 in a1
sd(b, na.rm=TRUE) #calculate the standard deviation of vector b and ignoers NA values
```
------------------------------------------------------------------------
## Data types
The main data types in R includes
- Numeric ⟹ `3.14, 2, 6.022`
- Integer ⟹ `2, 3, 4`
- Complex ⟹ `4i+5`
- Logical ⟹ `TRUE/FALSE`
- Character ⟹ "x", "hello", "RT"
To check if a vector is numeric: `is.numeric("data")` , returns `TRUE` if data is numeric (incldues both integers or decimals)
`class("data")` returns the type of data e.g. `numeric`
`typeof("data")` are more specific, e.g. returns `double` if data contains decimal numbers
If want to change vector to integers can sue `as.integer("data")`
- Factor ⟹ categorical variable, can be used to divide data according to different levels e.g. data has the categories `influent` and `effluent`
```{r}
data <- data.frame(sample_type = as.factor(c("influent", "effluent")), #as.factor can be used to convert the type of sample into factor
RT_min = c(1.1,1.1,2.3,2.3,3.4,3.4),
m_z = c(180, 190, 200, 180, 230, 200))
str(data) #can be used to see structure of data frame
```
Data frame can be used to store data in a table, similar to Excel. Can have different columns with data (in example above, have columns `sample_type` , `RT_min` , and `m_z` ) and the columns contains data in six rows. The columns can contain different types of data, e.g. `character` , `numeric` , `logical`, but they must have the same length.
Can extract a certain part of a data frame by using `[]`
```{r}
#data[row, column]
data[2] #second columns (RT_min) with all row data
data[2,] #second row, with all column data
data[,2:3] #column 2 and 3 (RT_min, m_z), all rows
data[4:6, 2:3] #row 4 to 6, for column 2 and 3
data[1, c("RT_min")] #returns first value for column named "RT_min"
data$RT_min #returns all values from column named "RT_min"
data$RT_min[1] <- 1.2 #changes first value of column "RT_min" to 1.2
data$new <- c(1:6) #adds new column "new" to the data with the values 1 to 6
```
[To store data with different lengths, can use `lists` instead.]{.underline}
```{r}
data_list <- list(sample_type = c("influent", "effluent"),
retention_time = c(1.1, 2.1, 2.4, 3.0, 4.6, 4.8, 5.3),
m_z = c(180, 190, 200, 210))
str(data_list)
```
```{r}
data_list[[1]] #will return first list
data_list[[2]][1:4] #returns first 4 values of list 2
data_list[[3]][3] #returns third value from list 3
```
If have two objects can compare them with logical tests:
```{r}
h <- c(1,2,3)
i <- c(2,2,4)
#elementwise comparison
h != i #is h not equal to i, so givs TRUE, FALSE, TRUE
h == i #is h equal to i, gives FALSE, TRUE, FALSE
j <- 4
j %in% h #is j in h, 4 is not in h
j %in% i #is j in i, 4 is in i
```
```{r}
data$new[data$sample_type=="influent"] <- "is influent" #will write "is influent" on the rows in column "new" is sample_types equals "influent
```
To check if data has `NA`
```{r}
data$new[data$new=="is influent"] <- NA #put "is influent" to NA
is.na(data) #returns TRUE if is has NA
```
------------------------------------------------------------------------
## Writing Functions
Instead of using the included functions in R (e.g. `plot()` ) or in a package (e.g. `ggplot()` ), it is possible to write your own functions.
To make a function, three parts is needed: *name, set of arguments, and body of code* and has en general layout as: `function_name <- function("set of arguments"){"functions code}`
```{r}
#Function that sums the values in objects "vector" without any set of arguments used
vector <- 1:10
first_function <- function(){
output <- sum(vector)
return(output)
}
first_function()
#This function does the same, but here an argument "samp" is needed for it to work
second_function <- function(samp){
sum_sample <- sum(samp)
return(sum_sample)
}
second_function(samp=vector)
#Same type of function again, but here we have an default value for argument "samp", so if the argument is not specifically used, the default it used.
third_function <- function(samp=1:4){
sum_sample <- sum(samp)
return(sum_sample)
}
third_function()
third_function(samp=vector)
```
If one has a written some command and want to make it into a function, R can do that for you if you highlight the commands and press `Alt + cltr + x` (or *code \> extract function*).
------------------------------------------------------------------------
## If Statements/ For Loops
An useful way to write code is to use **If statements** and **For loops**.
- In If statements, only if a certain criteria is true, a certain command will happen. In R they have the general form of: `if("criteria"){"command to happen if true"}.`
```{r}
#Only if v1 is smaller than v2 will v3 be created as the sum of v1 and v2
v1 <- 5
v2 <- 10
if (v1<v2) {
v3 <- v1+v2
print(v3)
}
if (v1>v2) {
v4 <- v1+v2
print(v4)
}
#Returns nothing since statement is not true
```
It is also possible to add a "plan B" with if statements by using `else{}` .
```{r}
#will return the sum w1 and w2 if the if statment is true, otherwise will take w1-w2
w1 <- 5
w2 <- 10
if (w1>w2) {
w3 <- w1+w2
print(w3)
} else {
w3 <- w1-w2
print(w3)
}
```
```{r}
#Can combine if statments with function. In this case, if input value of x is larger than 0, will return y as y+x*-1
x1 <- 2
y1 <- 3
a_function <- function(x,y){
if (x > 0){
y <- y+x*-1
}
return(y)
}
a_function(x=x1, y=y1)
```
- In For loops, certain commands will be repeated for a certain number of times. In R they have the general form of: `for("item in object"){"command as long as item is in object"}.`
```{r}
#The For loop will add the numbers in o1 +1 to the empty vector o2
o1 <- 1:4
o2 <- c()
for (i in o1) {
o2[i] <- i+1
}
o2
```
------------------------------------------------------------------------