-
Notifications
You must be signed in to change notification settings - Fork 109
/
basic_r_tutorial.Rmd
839 lines (695 loc) · 28.2 KB
/
basic_r_tutorial.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
# (PART) Working with data {-}
Most of the chpaters in this section are based, but there are a few Python ones as well!
# Basic R
Xin Guo, Aiden Zhang
```{r, include=FALSE}
knitr::opts_chunk$set(echo = TRUE,warning = FALSE, message = FALSE)
```
R is a tool for data processing, it is important for us to understand the basic operations. In this cheatsheat/tutorial we will cover some basic operations of data. We will start with the building blocks of data types then go to the data structures. We try to cover the create, update(delete included), select operations for each data object following the logic of basic database operations.
## Data types
For different data types, the attributes of an object is of most interest, we would use the following functions to explore objects.
We will classify them into two types, one is the read-only functions, and the others are writable functions. If we want to change the attributes of an object, we might use some functions to implement, and we will also cover these operations.
### (1) character
```{r}
temp_char = "hellow world"
```
Let us explore the attributes of the character. The following are some of the general "read-only" functions.
```{r}
summary(temp_char)
```
str function returns some information, but it is less detailed than the summary.
```{r}
str(temp_char)
```
**summary** and **str** are efficient functions to know about an object, but it is worth noticed that function **attributes** does not give us information about the variable.
```{r}
attributes(temp_char)
```
Then let us explore more about the "read-only" functions and get to know the functions that can change their return values.
**length** function will return the length of the object.
```{r}
length(temp_char)
```
We can change the length of the object by appending another character to it. If we want append a number to it, the number will be converted into a character to be consistent.
```{r}
long_temp = append(temp_char,"hahaha")
longer_test = append(temp_char,1)
length(longer_test)
```
**unique** function returns the unique element in the object.
```{r}
unique(temp_char)
```
For each unique element in the object, **table** function returns their frequencies. **range** function returns the minimum and maximum value.
```{r}
table(temp_char)
range(temp_char)
```
We can change the value of the character by using the **substr** function.
If we want to add more value to the end, we can use the **paste** function.It will return a longer size character object.
If we add a number using **paste**, like the **append** function, it will convert the number into character. This is useful when we have a different data to print out and the **print** function cannot support multiple items.
```{r}
substr(temp_char,6,7) = "? "
a = 2019
paste(temp_char,"!",a,sep ="")
```
We can use the class function to find out the data type of this object.
```{r}
class(temp_char)
```
If we want to change the class of this object, we would normally use the **as** function. We might try to turn a number into character sometimes.
```{r}
# To turn a number into character
temp_num = 1
class(temp_num)
temp_num_new = as.character(temp_num)
class(temp_num_new)
```
We should pay attention to that the **as** function does not modify the parameter passed into it. So we will have a new object.
Now, let us look at some writable attributes of the object.
Levels is the order for the items in the object, that is why a character object will not have order. We can set the levels though it does not have any meaning for this certain character. It will not change the data type of the object.
```{r}
levels(temp_char)
levels(temp_char) = 1
levels(temp_char)
class(temp_char)
```
**names** function will return the name of an object if it has, and we can assign value to it to name the object.
The dimension and the length of the names we want to assign must match the dimension and length of the object, otherwise it will return error.
```{r}
names(temp_char)
names(temp_char) = c('1')
```
Other attributes related to names are **rownames**, **colnames** and **dimnames**, since this object does not have dimension, so it will return values NULL and it will not allow us to make any changes to them.
```{r}
rownames(temp_char)
colnames(temp_char)
dimnames(temp_char)
```
### (2)numeric
There are two types of numerical data in R, numeric and integer.
We will use the numeric class as interpretation since integer can be regarded as a specific kind of numeric.
```{r}
temp_numer = 3.14
```
Let us explore the attributes of the numeric object. The following are some of the general "read-only" functions.
Based on different object passed into the function, the summary will give us different details about the object.
```{r}
summary(temp_numer)
```
**str** function returns the class and value information.
```{r}
str(temp_numer)
```
**attributes** function still does not give us information about the object.
```{r}
attributes(temp_numer)
```
Then let us explore more about the "read-only" functions and get to know the functions that will change their return values.
**length** function will return the length of the object.
```{r}
length(temp_numer)
```
We can change the length of the object by appending another numeric number to it.
```{r}
long_temp = append(temp_numer,2.72)
class(long_temp)
length(long_temp)
```
**unique** function returns the unique element in the object.
```{r}
unique(temp_numer)
```
For each unique element in the object, **table** function returns their frequencies. **range** function returns the lowest and largest values in the input vector
```{r}
table(temp_numer)
range(temp_numer)
```
To change the value of a numeric object, we usually reassign the value of it.
```{r}
temp_numer = 2.72
```
**class** function can find out the data type of this object.
```{r}
class(temp_numer)
```
If we want to change the class of this object, we would normally use the **as** function. We might try to turn character into a numeric data.
The function only works if it makes sense. We cannot turn a character of "hello world" into a number, the function will return error.
```{r}
# To turn a number into character
temp_ch = "3.14"
temp_num_new = as.numeric(temp_ch)
temp_num_new
class(temp_num_new)
```
We should pay attention to that the **as** function does not modify the parameter passed into it. So we will have a new object.
And if we try to turn a character into a integer, we can see that if it is a real number in meaning, it will be converted into a integer at the output.
```{r}
as.integer("2.7")
```
It will have problem when the character is not in a good format, for example with a comma.
We need to fix the comma problem before we transform the class type. Here, **gsub** and **sub** does not have a functional difference, because the matching pattern is not a regular expression. **gsub** is more powerful and strict in matching pattern when regular expression is used.
```{r}
as.integer("1,234")
as.numeric("1,234")
as.numeric(gsub(",","","1,234"))
as.numeric(sub(",","","1,234"))
```
Now, let us look at some writable attributes of the object.
Levels can be modified but still does not effect the class type of the numeric object.
```{r}
levels(temp_numer)
levels(temp_numer) = 1
levels(temp_numer)
class(temp_numer)
```
**names** function will return the name of an object if it has, and we can assign value to it to name the object.
```{r}
names(temp_numer)
names(temp_numer) = c('1')
```
Other attributes related to names are **rownames**, **colnames** and **dimnames**, since this object does not have dimension, so it will return values NULL and it will not allow us to make any changes to them.
```{r}
rownames(temp_numer)
colnames(temp_numer)
dimnames(temp_numer)
```
### (3)Logical
logical data is the data that contains `TRUE` and `FALSE`, or you can abbreviate them as `T` and `F` in R. We can easily convert between logicals and numericals with 1 and 0.
```{r}
temp_logical = c(TRUE, FALSE, T, T, F)
as.numeric(temp_logical)
as.logical(temp_logical)
```
Let us explore the attributes of the logicals. The following are some of the general “read-only” functions. Based on different object passed into the function, the summary will give us different details about the object.
```{r}
summary(temp_logical)
```
**str** function returns the some information, it is less detailed than the summary.
```{r}
str(temp_logical)
```
For each unique element in the object, table function returns their frequencies. range function returns the lowest and largest values in the input vector.
```{r}
range(temp_logical)
```
```{r}
table(temp_logical)
```
There are many useful functions related to logicals. We will illustrate a few here:
**ifelse**(\<logical\>, value1, value2) returns value1 if logical is true, return value2 if logical is false. It is especially useful when we need to inspect or modify data based on its value. Note that this function takes a vector.
```{r}
temp <- c(1,NA,1,NA,2,2,3,4)
is.na(temp)
temp <- ifelse(is.na(temp), 0, temp)
temp
```
Now we have all NA values changed to 0.
**all**(\<logical vector\>) returns if all the logical values are true.
```{r}
temp <- c(1,NA,1,NA,2,2,3,4)
all(is.na(temp))
```
Not all values of `temp` are NAs.
**any**(\<logical vector\>) returns if any of the logical values are true
```{r}
temp <- c(1,NA,1,NA,2,2,3,4)
any(is.na(temp))
```
There exists an NA value in `temp`.
**which**(\<logical vector\>) return the index which the logical value is true
```{r}
temp <- c(1,NA,1,NA,2,2,3,4)
which(is.na(temp))
```
The element at indexes 2 and 4 of `temp` are NAs.
The basic classes of R includes numeric, logical, character, integer, complex, since numeric, character, logical are the most common ones, so we illustrated these 3 classes. Complex is a broader range than the real number, in reality, the data we collect will rarely relate to it.
Now we will head to different data structures.
## data structure
### (1) vector:
A vector can only contain objects from the same class.
We can use these two ways to create a vector. It will lead different operations when we want to update this vector object. There is no better between these two methods, they should be used according to our coding needs.
```{r}
temp_vector = c(1,2,3)
temp_vector2 = c("a","b","c")
num_vec = vector("numeric", length = 10)
char_vec = vector("character",length = 10)
```
Let us explore the attributes of the vector. The following are some of the general "read-only" functions.
```{r}
summary(temp_vector)
```
```{r}
str(temp_vector)
```
```{r}
attributes(temp_vector)
```
Then let us explore more about the "read-only" functions and get to know the functions that will change their return values.
**length** function will return the length of the object.
```{r}
length(temp_vector)
```
We can change the length of the object by appending another character to it. If we want append a number to a character vector, the number will be converted into a character to be consistent.
```{r}
long_temp = append(temp_vector,2.72)
class(long_temp)
length(long_temp)
```
**unique** function returns the unique element in the object.
```{r}
unique(temp_vector)
```
For each unique element in the object, **table** function returns their frequencies. **range** function returns the maximum and minimum values.
```{r}
table(temp_vector)
range(temp_vector)
range(temp_vector2)
```
To update the value of a vector object, we use the index to access and change that value.
```{r}
temp_vector[1] = 1.1
temp_vector
```
**match** function can be used to find the corresponding item based on value and update its values.
**which** function can be used to find all the matches.
This is like the relationship between **regexpr** and **gregexpr**, where
**regexpr** only gives you the first match of the string (reading left to right) and **gregexpr** will return all of the matches in a given string if there are is more than one match.
For character vector, we might also use gregexpr if we look for certain items with some patterns.
```{r}
temp_vector[match(c(1.1,2,3),temp_vector)]
temp_vector[which(temp_vector %in% c(1.1,2,3))]
temp_vector2[match(c("a","b"),temp_vector2)]
temp_vector2[which(temp_vector2 %in% c("a","b"))]
```
If we want to change the values of some items within specific range of values, we can use the logical expression to subset them.
```{r}
temp_vector[temp_vector <= 2]
sapply(temp_vector[temp_vector <= 2],FUN = function(x){x=1})
sapply(temp_vector[temp_vector <= 2],FUN = function(x){x+1})
```
```{r}
```
**class** function can find out the data type of this object.
We can see that the class type for vector is not vector but the data type of the item inside the vector.
```{r}
class(temp_vector)
class(num_vec)
class(char_vec)
```
If we want to change the class of this object, we would normally use the **as** function. We might try to use the **sapply** functions. We might try to turn character into a numeric data.
The function only works if it makes sense.
The **sapply** function behaves similarly to **lapply**; the only real difference is in the return value. **sapply** will try to simplify the result of **lapply**.
```{r}
vec_char = c("3.14","2.72")
sapply(vec_char, as.numeric)
lapply(vec_char,as.numeric)
```
We should pay attention to that the **as** function does not modify the parameter passed into it. So we will have a new object.
Now, let us look at some writable attributes of the object.
Levels is NULL and can be modified but still does not effect the class type of the numeric object.
```{r}
levels(temp_vector)
levels(temp_vector) = 1
class(temp_vector)
```
**names** function will return the name of an object if it has, and we can assign value to it to name the object. Then we will be able to access the items with their names.
```{r}
names(temp_vector)
names(temp_vector) = c('alpha','beta','ceta')
temp_vector['alpha']
```
If we want to find some items with certain names, we can find them in the following ways.
```{r}
temp_vector[names(temp_vector) %in% c("hello","world","alpha") ]
```
Other attributes related to names are **rownames**, **colnames** and **dimnames**, since this object does not have dimension, so it will return values NULL and it will not allow us to make any changes to them.
```{r}
rownames(temp_numer)
colnames(temp_numer)
dimnames(temp_numer)
```
### (2)list:
A list can contain objects from the different classes.
We can use two ways to create a list.
```{r}
temp_list = list("a" = c(1,2), 2.72,"pi" = 3.14, "c" = "hello world")
list_vec = vector("list", length = 5)
```
Let us explore the attributes of the list. The following are some of the general "read-only" functions.
```{r}
summary(temp_list)
```
```{r}
str(temp_list)
str(list_vec)
```
```{r}
attributes(temp_list)
```
Then let us explore more about the "read-only" functions and get to know the functions that will change their return values.
**length** function will return the length of the object.
```{r}
length(temp_list)
```
We can change the length of the object by **append** function.
```{r}
long_temp = append(temp_list,0)
class(long_temp)
length(long_temp)
```
**unique** function returns the unique element in the object.
```{r}
unique(temp_list)
```
For each unique element in the object,we cannot use the **table** function directly but to apply it to each item of the the object. **range** function returns the maximum and minimum values.
```{r}
sapply(temp_list,table)
range(temp_list)
```
To update the value of a vector object, we use the index/name to access and change that value.
The following three ways can be used to acces and modify an object in the list.
```{r}
temp_list[["a"]]
temp_list$a = c(1,3)
temp_list[[1]]
```
The following indexings are used to access the first item, it is worth noticed that it returns the whole item including its value.
```{r}
temp_list["a"]
temp_list[1]
temp_list[1][1]
```
If we want to find the nested element in an item of a list, we will use the following ways to select.
```{r}
temp_list[[c(1,1)]]
temp_list[[1]][[2]] = 2
temp_list[[1]][2]
```
**class** function can find out the data type of this object.
```{r}
class(temp_list)
```
Now, let us look at some writable attributes of the object.
Levels is NULL and can be modified but still does not effect the class type of the numeric object.
```{r}
levels(temp_list)
levels(temp_list) = 1
class(temp_list)
```
**names** function will return the name of an object if it has, and we can assign value to it to name the object. Recall previously that we cannot name the item when appending it, but we can name it later.
```{r}
names(temp_list)
names(temp_list)[2] = "e"
```
If we want to find some items with certain names, we can find them in the following ways.
```{r}
temp_list[names(temp_list) %in% c("hello","world","e","c") ]
```
Other attributes related to names are **rownames**, **colnames** and **dimnames**, since this object does not have dimension, so it will return values NULL and it will not allow us to make any changes to them.
```{r}
rownames(temp_list)
colnames(temp_list)
dimnames(temp_list)
```
### (3)factor
Often factors will be automatically created when we read a dataset in using a function like read.table().
**factor** function will create a factor array.
```{r}
temp_factor = factor(c("yes", "yes", "no", "yes", "no"))
```
Let us explore the attributes of the factor. The following are some of the general "read-only" functions.
```{r}
summary(temp_factor)
```
```{r}
str(temp_factor)
```
**attributes** function tells us the levels of the factor, it is set by default. And we can know from the **str** function that "no" corresponds to the level 1 and "yes" to 2.
```{r}
attributes(temp_factor)
```
Then let us explore more about the "read-only" functions and get to know the functions that will change their return values.
**length** function will return the length of the object.
```{r}
length(temp_factor)
```
It is not common to add new item into a factor, we can try using the append function to do it. And we have found that during the append operation, the function takes the levels attribute to do the work. So if we append a character to it, the resulting object is a character vector and so does the numeric object. So this is not a proper operation and it would be better if we create a vector than turn it into a factor with **as.factor** or **factor** function.
```{r}
temp_factor
class(temp_factor)
long_temp_char = append(temp_factor,"no")
long_temp_num = append(temp_factor,2019)
long_temp_char
long_temp
```
**unique** function returns the unique element in the object.
```{r}
unique(temp_factor)
```
For each unique element in the object, **table** function returns their frequencies. **range** function does not have meanings under the level order, so it will return error.
```{r}
table(temp_factor)
#range(temp_factor)
```
To update the value of a vector object, we use the index to access and change that value. We can only modify the value within the current levels, out of scope modification will return error, which means we cannot assign value like "neutral" to the item inside the factor.
```{r}
temp_factor[1] = "no"
temp_factor
```
**match** function can be used to find the corresponding item based on value and update its values.
**which** function can be used to find all the matches.
Then we can use the **sapply** function to modify the value of the elements.
```{r}
temp_factor
temp_factor[match(c("yes","no"),temp_factor)]
temp_factor[which(temp_factor %in% c("yes","no"))]
```
**class** function can find out the data type of this object.
```{r}
class(temp_factor)
```
Now, let us look at some writable attributes of the object.
levels function returns the default level, we would try to reset the order with the **factor** function. We should notice that though we can modify the levels directly, but this will not return the desired result, so we should not use this way to modify it.
```{r}
temp_factor = factor(c("yes", "yes", "no", "yes", "no"))
levels(temp_factor)
sort(temp_factor)
# build a new factor
sort(factor(temp_factor,levels = c("yes","no")))
# directly change the level
levels(temp_factor) = c("yes","no")
sort(temp_factor)
# build a new factor
sort(factor(c("yes", "yes", "no", "yes", "no"), levels = c("yes", "no")) )
```
**names** function will return the name of an object if it has, and we can assign value to it to name the object. Then we will be able to access the items with their names.
```{r}
temp_factor = factor(c("yes", "yes", "no", "yes", "no"))
names(temp_factor)
names(temp_factor) = c('alpha','beta','ceta','gamma','omega')
temp_factor['alpha']
```
If we want to find some items with certain names, we can find them in the following ways.
```{r}
temp_factor[names(temp_factor) %in% c("hello","world","alpha") ]
```
Other attributes related to names are **rownames**, **colnames** and **dimnames**, since this object does not have dimension, so it will return values NULL and it will not allow us to make any changes to them.
```{r}
rownames(temp_factor)
colnames(temp_factor)
dimnames(temp_factor)
```
### (4)matrix
matrix is a data structure with dimensions.
**matrix** function will create a matrix.
```{r}
temp_matrix <- matrix(0, nrow = 2, ncol = 5)
```
Let us explore the attributes of the factor. The following are some of the general "read-only" functions.
```{r}
summary(temp_matrix)
```
```{r}
str(temp_matrix)
```
```{r}
attributes(temp_matrix)
```
Then let us explore more about the "read-only" functions and get to know the functions that will change their return values.
**length** function will return the length of the object, the length is calculated by multiplying dimensions.
```{r}
length(temp_matrix)
nrow(temp_matrix)
ncol(temp_matrix)
length(temp_matrix)
```
If we want to add another matrix into the current matrix, we will use the **rbind** and **cbind** functions.
**rbind** means binding along the row, so it will require the column sizes of two matrices are the same.
**cbind** means binding along the column, so it will require the row sizes of two matrices are the same.
**append** function does not work, because it would turn the matrix into a 1 dimensions data structure which is not we want.
```{r}
long_temp = append(temp_matrix,cbind(1,2,3,4,5))
attributes(long_temp)
temp_matrix
long_temp_row = rbind(temp_matrix,cbind(1,2,3,4,5))
long_temp_col = cbind(temp_matrix,rbind(1,2))
```
**unique** function returns the unique element in the object.
```{r}
temp_matrix[2,2] = 3
unique(temp_matrix)
```
For each unique element in the object, **table** function returns their frequencies. **range** will return the minimum and maximum object. The operations are done across the matrix, not confined to certain row or column.
```{r}
table(temp_matrix)
range(temp_matrix)
```
To update the value of a matrix object, we use the index to access and change that value. We cannot update with a different data type.
```{r}
# this is not valid:
# temp_matrix[1][2]
temp_matrix
temp_matrix[1,2] = 1
#for a row
temp_matrix[2,]
#for a column
temp_matrix[,2]
```
Random access item like this will return value by spreading the matrix.
```{r}
temp_matrix[4]
```
The **in** function can help us to determine if an value falls into our criteria, we directly use the TRUE and False result to subset the matrix we want. When using the **which** function, we need to be careful that we are selecting rows so that the subset format should be correct.
Then we can use the **sapply** function to modify the value of the elements.
```{r}
temp_matrix
temp_matrix[,2] %in% c(2,3,4)
temp_matrix[temp_matrix[,2] %in% c(2,3,4)]
which(temp_matrix[,2] %in% c(2,3,4) )
temp_matrix[which(temp_matrix[,2] %in% c(2,3,4)),]
```
**class** function can find out the data type of this object.
```{r}
class(temp_matrix)
```
Now, let us look at some writable attributes of the object.
levels function returns NULL because this is not a factor matrix. We create a factor matrix but it will lose the attributes as a factor when we apply matrix operations to it. So levels does not have significant meaning in matrix.
```{r}
levels(temp_matrix)
```
**names** function will return the name of an object if it has, and we can assign value to it to name the object. Then we will be able to access the items with their names.
```{r}
names(temp_matrix) = c('1','2','3','4','5','6','7','8')
temp_matrix
temp_matrix['3']
```
If we want to find some items with certain names, we can find them in the following ways.
```{r}
temp_matrix[names(temp_matrix) %in% c('4','5','6') ]
```
But selecting a single item is not usually what we want, we might want to select a whole column/row based on its name.
Other attributes related to names are **rownames**, **colnames** and **dimnames**, the dimensions of the assignment value must match the matrix size.
```{r}
rownames(temp_matrix) = c("1",'2')
colnames(temp_matrix) = c('a','b','c','d','e')
temp_matrix
dimnames(temp_matrix) = list(c("1",'2'),c('a','b','c','d','e'))
```
To access or update value with their names we use the following format.
```{r}
temp_matrix['1',]
temp_matrix[,'a']
```
### (5) dataframe
Data frames are usually created by reading in a dataset using the read.table() or read.csv(). Each column can be viewed as a vector. And can be viewed as list with vectors of same length.
We load a dataframe from the basic R **data** function
```{r}
data("women")
```
Let us explore the attributes of the factor. The following are some of the general "read-only" functions.
```{r}
summary(women)
```
```{r}
str(women)
```
```{r}
attributes(women)
```
Then let us explore more about the "read-only" functions and get to know the functions that will change their return values.
**length** function will return the length of the object, the length is calculated by multiplying dimensions.
```{r}
length(women)
nrow(women)
ncol(women)
dim(women)
```
If we want to add another dataframe into the current matrix, we will use the **rbind** method because this is a new observation. If we want to add a new variable then we will use **cbind**.
```{r}
long_temp_row = rbind(women,data.frame(height=c(110), weight=c(77)))
dim(long_temp_row)
```
**unique** function returns the unique element in the object.
```{r}
unique(women)
```
For each unique element in the object, **table** function returns their frequencies. **range** will return the minimum and maximum object. The operations are done across the matrix, not confined to certain row or column.
```{r}
table(women)
range(women)
```
To update the value of a dataframe item, we use the index to access and change that value. We cannot update with a different data type.
```{r}
women[3,2]
#for a row
women[2,]
#for a column
women[,2]
```
Random access item like this will return the 1st column value of the dataframe which is different from the matrix operation.
```{r}
women[1]
```
To change the values in a row or in a column, we can use **sapply** function or vectorized function to modify the data.
And if we want to apply changes to all the rows/columns, we will use the **apply** function. To create new variable based on the other variables, we will use the **mapply** function.
```{r}
women[,2] = as.factor(women[,2])
women
data.frame(apply(women,2,as.numeric) ) # 1 for row statistics, 2 for column statistics
data("women")
mapply(function(h,w){w/(h*h)},women[,1],women[,2])
```
To subset some dataframe, we use the **in** function and the **which** function. We need to be careful that we are selecting rows so that the subset format should be correct, unlike matrix, when **in** function can be easily converted to the result we want.
And we can also use the split function and then access the part we want with corresponding index.
```{r}
#split(women,women[,2])
women[women[,2] %in% c(117,120),]
women[which(women [,2] %in% c(117,120)),]
```
**class** function can find out the data type of this object.
```{r}
class(women)
```
Now, let us look at some writable attributes of the object.
levels function returns NULL even if the dataframe contains factor variables.
```{r}
levels(women)
```
**names** function will return the name of an object if it has, and we can assign value to it to name the object. Then we will be able to access the items with their names. Unlike matrix, in dataframe the names is for the column names, so by names, we will access a column.
```{r}
names(women)
names(women) = c('1')
women['1']
```
Other attributes related to names are **rownames**, **colnames** and **dimnames**, the dimensions of the assignment value must match the matrix size.
```{r}
rownames(women)
colnames(women)
dimnames(women)
```
To access or update value with their names we use the following format.
```{r}
women['1',]
women[,'1']
```