-
Notifications
You must be signed in to change notification settings - Fork 7
/
chapter1.Rmd
546 lines (376 loc) · 14.4 KB
/
chapter1.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
---
title: 'R and statistics'
description: 'Basics of R, the amazing statistical programming language. Do not be afraid of the art of programming!'
---
## What is R?
```yaml
type: NormalExercise
key: 168121d0f0
lang: r
xp: 100
skills: 1
```
R is a programming language designed for statistical analysis. R is completely open source and all analysis done with R are reproduciple and easy to share. R is flexible, powerful and includes implementations of the latest research.
Because of these reasons, R is a very popular language among the scientific community. R is however also widely [used](http://blog.revolutionanalytics.com/2014/05/companies-using-r-in-2014.html) for data analysis and visualisations by companies such as Google, Facebook, Twitter and Reaktor. R is perfect for data science.
Researchers and experts often use programming to visualize and explore data and implement new statistical methods. The purpose of this course is to expose you to the basics of programming and data analysis with R.
Follow the instructions below to complete the exercise.
`@instructions`
- Take a peek at the code written on the right in script.R. When executed, the code gives your computer the instructions to (1) load a dataset and (2) produce a graphic.
- Could you tell which lines of code relates to (1) or (2)? A hashtag marks a line as a comment line.
- When you're done pondering, click 'Submit Answer' to execute the code and move on to the next exercise! (You do not need to make any changes to the code in this exercise)
`@hint`
- Simply press 'Submit Answer' when you're ready to move forward.
`@pre_exercise_code`
```{r}
data_url <- "http://www.helsinki.fi/~kvehkala/JYTmooc/JYTOPKYS-data.txt"
students2014 <- read.table(data_url, sep="\t", header=TRUE)
```
`@sample_code`
```{r}
# This is an example of the power of R
# Create a character object holding data location
data_url <- "http://www.helsinki.fi/~kvehkala/JYTmooc/JYTOPKYS-data.txt"
# Load and save data with the read.table() function
students2014 <- read.table(data_url, sep = "\t", header = TRUE)
# Draw a graph of student shoesize and hight using plot()
with(students2014, plot(x = kenka, y = pituus, col = sukup, pch = 20, main = "Scatter plot of student shoesize and height \n(color indicates sex)"))
```
`@solution`
```{r}
# This is an example of the power of R
# Create a character object holding data location
data_url <- "http://www.helsinki.fi/~kvehkala/JYTmooc/JYTOPKYS-data.txt"
# Load and save data with the read.table() function
students2014 <- read.table(data_url, sep = "\t", header = TRUE)
# Draw a graph of student shoesize and hight using plot()
with(students2014, plot(x = kenka, y = pituus, col = sukup, pch = 20, main = "Scatter plot of student shoesize and height \n(color indicates sex)"))
```
`@sct`
```{r}
success_msg("Great! Move on to the first exercise.")
```
---
## Basic tools
```yaml
type: NormalExercise
key: 35ad6c43ec
lang: r
xp: 100
skills: 1
```
On your right you see the R editor area - the script - and below that the R console. The editor area is just a simple text editor where you write code - just like text.
You can first write code to the editor area and then tell R that you want to execute a line of code where your cursor currently is by pressing `Ctrl + Enter` (`Cmd + Enter` on a mac). Input and output will then appear in the console.
It is also possible to write code directly to the console and use `Enter` to execute, but working with the script area is preferred.
Try it!
`@instructions`
- Type "Hello world!" in the editor area. Use quotation marks.
- Press `Ctrl + Enter` to execute your "Hello world!" - code
- Make sure to use a capital "H" and an exclamation mark.
- Click 'Submit Answer' when done.
`@hint`
- Remember to use quotation marks and write the sentence exactly as instructed.
`@pre_exercise_code`
```{r}
# no pec
```
`@sample_code`
```{r}
# This is the R editor!
# A hashtag at the beginning of the line defines the line as a comment
# Write your code below
# Below is the R console, where you will see output
```
`@solution`
```{r}
# This is the R editor!
# A hashtag at the beginning of the line defines the line as a comment
# write your code below
"Hello world!"
# Below is the R console, where you will see output
```
`@sct`
```{r}
test_student_typed("Hello world!", not_typed_msg="Please type 'Hello world!' in the editor.")
test_error()
success_msg("Great work! You have now executed your first line of code :)")
```
---
## Arithmetics
```yaml
type: NormalExercise
key: b0a54af3ff
lang: r
xp: 100
skills: 1
```
Let's continue with something simple. R can do amazing things like scrape websites and draw beautiful graphics but it can also do basic calculations. Consider the following arithmetic operators:
- Addition: `+`
- Subtraction: `-`
- Multiplication: `*`
- Division: `/`
- Exponentiation: `^` (or `**`)
The `^` operator raises the number to its left to the power of the number to its right: for example `3^2` is 9. With this knowledge, follow the instructions below to complete the exercise.
`@instructions`
- Study and execute the examples in the R script. (Use `Ctrl + Enter` or `Cmd + Enter`)
- Type `2^5` in the editor and calculate 2 to the power 5.
- When done, click 'Submit Answer'.
`@hint`
- If you have trouble accessing the `^` -sign, you can also use double star `**`.
`@pre_exercise_code`
```{r}
# no pec
```
`@sample_code`
```{r}
# An addition
5 + 5
# A subtraction
5 - 5
# A multiplication
3 * 5
# A division
(5 + 5) / 2
# Exponentiation
```
`@solution`
```{r}
# An addition
5 + 5
# A subtraction
5 - 5
# A multiplication
3 * 5
# A division
(5 + 5) / 2
# Exponentiation
2 ^ 5
```
`@sct`
```{r}
test_output_contains("2^5", incorrect_msg = "The exponentiation example is not correct. Please write `2 ^ 5` on a new line.")
success_msg("Great work! Head over to the next exercise.")
```
---
## Objects
```yaml
type: NormalExercise
key: 856fab934f
lang: r
xp: 100
skills: 1
```
Here's where things start to get interesting. In R you can create and operate on things called *objects*.
An object is something that can store information such as numerical values and names. Once an object is created, it will be stored in memory and the information it contains will be available to you later.
Objects are created using the assign operator: `<-` (< and -). The value of an object can be printed by typing it's name.
`@instructions`
- Execute the lines that create and operate on `my_character_object` and `my_numeric_object`.
- Assign your name or nickname to `my_name`. Use quotation marks.
- Assign a positive message to yourself to `my_message`. Use quotation marks.
- Click 'Submit Answer'.
`@hint`
- Remember that `Ctrl + Enter` executes a single row.
`@pre_exercise_code`
```{r}
# no pec
```
`@sample_code`
```{r}
# Create an object
my_character_object <- "Hi there!"
# Print the contents of the object
my_character_object
# Create another object
my_numeric_object <- 5 + 5
# Do further calculations with the object
my_numeric_object / 5
# Override the value of an object by assigning a new value to it
my_character_object <- my_numeric_object
my_character_object
# Create character objects my_name and my_message here. Use quotation marks.
my_name <-
my_message <-
```
`@solution`
```{r}
# Create an object
my_character_object <- "Hi there!"
# Print the contents of the object
my_character_object
# Create another object
my_numeric_object <- 5 + 5
# Do further calculations with the object
my_numeric_object / 5
# Override the value of an object by assigning a new value to it
my_character_object <- my_numeric_object
my_character_object
# Create character objects my_name and my_message here. Use quotation marks.
my_name <- "Hey You"
my_message <- "You are awsome!"
```
`@sct`
```{r}
test_object("my_name", eval = F, incorrect_msg = "Did you assign your name to `my_name`?")
test_object("my_message", eval = F, incorrect_msg = "Did you assign a message to `my_message`?")
test_error(incorrect_msg = "Did you use quotation marks to assign your name and a message to the objects `my_name` and `my_message`")
success_msg(paste0("Great job ", my_name, ". ", my_message, "!"))
```
---
## Functions
```yaml
type: NormalExercise
key: b81cdce659
lang: r
xp: 100
skills: 1
```
In R you operate on objects using *functions*. A function is a special kind of object that uses other objects as inputs, performs operations, and usually outputs a result. The inputs of a function are called *arguments*.
A Function is *called* by typing it's name and giving it the necessary arguments inside parenthesis. When a function is called, it will perform an action.
R has many functions ready for you to use. Functions are how you're actually going to do all the magic. Here we will do a little bit of magic using data that will become more familiar to you later on.
`@instructions`
- Study and execute the examples in the R script. Don't worry about the `$` -sign for now, we will get to that later.
- Follow the examples and compute the mean age of students.
- When done, click 'Submit Answer'.
`@hint`
- Use the function `mean()` and the object `student_age` like in the example on line 7 of script.R.
`@pre_exercise_code`
```{r}
students2014 <- read.table("http://www.helsinki.fi/~kvehkala/JYTmooc/JYTOPKYS-data.txt", sep="\t", header=TRUE)
# keep a couple background variables
students2014 <- students2014[,c("sukup","toita","ika","pituus","kenka","kone")]
# recode kone -variable NA values as factor levels
students2014$kone <- addNA(students2014$kone)
# choose rows without missing values
students2014 <- students2014[complete.cases(students2014),]
# integers to numeric
students2014$ika <- as.numeric(students2014$ika)
students2014$pituus <- as.numeric(students2014$pituus)
students2014$kenka <- as.numeric(students2014$kenka)
```
`@sample_code`
```{r}
# students2014 data is available
# Create object student_height
student_height <- students2014$pituus
# Compute the average height of the students
mean(student_height)
# Create object student_age
student_age <- students2014$ika
# Compute the average age of the students
```
`@solution`
```{r}
# students2014 data is available
# Create object student_height
student_height <- students2014$pituus
# Compute the average height of the students
mean(student_height)
# Create object student_age
student_age <- students2014$ika
# Compute the average age of the students
mean(student_age)
```
`@sct`
```{r}
test_output_contains("mean(student_age)", incorrect_msg = "Please use the `mean()` function to compute the mean of student_age.")
test_error()
success_msg("Excellent! You are making great progress.")
```
---
## Good arguments
```yaml
type: NormalExercise
key: 88d1d679eb
lang: r
xp: 100
skills: 1
```
During this course you will operate on many of R's ready-made functions. As we already saw, functions take objects as their arguments and then perform actions using the objects. Using a function is called *calling* it.
Functions usually have more than one possible argument. Some of them can have default values and others need to be specified. The arguments have names, which can (and often should) be used when specifying their values. The recommended style is to specify all but the first argument by name, when calling a function.
`@instructions`
- Execute the example codes where the function `head()` is used to explore the first couple observations in the `students2014` dataset.
- Using the assign operator `<-`, create object `first_ten_students` by calling `head()` with `n = 10`.
`@hint`
- Directly assign the output of `head()` to `first_ten_students` to create the object
- Remember to separate the arguments with a comma
- The answer is of the form: `first_ten_students <- head(arguments_here)`
`@pre_exercise_code`
```{r}
# load data from web
students2014 <- read.table("http://www.helsinki.fi/~kvehkala/JYTmooc/JYTOPKYS-data.txt", sep="\t", header=TRUE)
# keep a couple background variables
students2014 <- students2014[,c("sukup","toita","ika","pituus","kenka","kone")]
# recode kone variables missing values as factor levels
students2014$kone <- addNA(students2014$kone)
# keep only rows without missing values
students2014 <- students2014[complete.cases(students2014),]
# integers to numeric
students2014$ika <- as.numeric(students2014$ika)
students2014$pituus <- as.numeric(students2014$pituus)
students2014$kenka <- as.numeric(students2014$kenka)
```
`@sample_code`
```{r}
# students2014 is available
# Use the function head() on the object students2014
head(students2014, n = 4)
# Argument 'n' of head() has a default value 6
head(students2014)
# These calls are identical. Try them.
head(students2014, n = 3) # recommended style
head(students2014, 3) # style usually not recommended
head(n = 3, students2014) # style not recommended
# Save the first ten observations of the data
first_ten_students <-
```
`@solution`
```{r}
# students2014 is available
# Use the function head() on the object students2014
head(students2014, n = 4)
# Argument 'n' of head() has a default value 6
head(students2014)
# These calls are identical. Try them.
head(students2014, n = 3) # recommended style
head(students2014, 3) # style usually not recommended
head(n = 3, students2014) # style not recommended
first_ten_students <- head(students2014, n = 10)
```
`@sct`
```{r}
test_error(incorrect_msg = "Your code seems to produce an error. Did you create the object `first_ten_students` as instructed?")
test_object("first_ten_students")
success_msg("Great job!")
```
---
## students2014
```yaml
type: MultipleChoiceExercise
key: 8d2100f40a
lang: r
xp: 50
skills: 1
```
We will be doing data science using a dataset that was collected in 2014 from the students of the Faculty of Social Sciences in the University of Helsinki.
The students filled out [this short questionnaire](http://www.helsinki.fi/~kvehkala/JYTmooc/ASSIST-2014-FI-0-20140903.pdf). Please go ahead and read both pages of the pdf in the above link and then come back and answer the question below.
How many numbered sections does the questionnaire have?
`@possible_answers`
- two
- four
- seven
- ten
- fourteen
`@hint`
- if the link does not work, the url is: http://www.helsinki.fi/~kvehkala/JYTmooc/ASSIST-2014-FI-0-20140903.pdf
`@pre_exercise_code`
```{r}
# pre exercise code here
```
`@sct`
```{r}
# submission correctness tests
msg1 <- msg3 <- msg4 <- msg5 <- "Sorry, not quite."
msg2 <- "Yes, correct!"
test_mc(correct = 2, feedback_msgs = c(msg1, msg2, msg3, msg4, msg5))
# Final message the student will see upon completing the exercise
success_msg("Great work! Next time we'll explore the data :)")
```