-
Notifications
You must be signed in to change notification settings - Fork 10
/
ch3.Rmd
336 lines (246 loc) · 7.26 KB
/
ch3.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
---
title: "簡單的繪圖"
author: "郭耀仁"
date: "`r Sys.Date()`"
output: slidy_presentation
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, results = 'hide', warning = FALSE)
```
## 視覺化的力量
- [Hans Rosling](https://www.youtube.com/watch?v=jbkSRLYSojo)
> He rose to international celebrity status after producing a [Ted Talk](https://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen?language=zh-tw) in which he promoted the use of data to explore development issues.
> [Hans Rosling - Wikipedia](https://en.wikipedia.org/wiki/Hans_Rosling)
## 視覺化的力量(2)
- Less than 5 minutes
- 200 years
- 200+ countries
- 120,000+ observations
## 玩具資料(Toy datasets)
- 在 RStudio 命令列輸入 `data()` 可以看有哪些玩具資料可以直接使用
```{r}
cars
iris
mtcars
?cars #help(cars)
?iris #help(iris)
?mtcars #help(mtcars)
```
## 觀察玩具資料
- 使用函數觀察玩具資料
```{r}
head(cars)
tail(cars)
dim(cars)
nrow(cars)
ncol(cars)
summary(cars)
str(cars)
```
## Getting Started
- Generic X-Y plotting
- $y = x^2$
- `xlim` 調整 X 軸上下界
- `ylim` 調整 Y 軸上下界
```{r}
x <- seq(from = -5, to = 5, by = 0.1)
y <- x^2
plot(x, y)
```
## Getting Started(2)
- `xlab` X 軸標籤
- `ylab` Y 軸標籤
- `main` 標題
- `sub` 副標題
```{r}
plot(x, y, xlab = "x label", ylab = "y label",
main = expression(y == x^2), sub = "Getting started")
```
## Getting Started(3)
- `pch` 標記的樣式
![](images/ch301.png)
```{r}
plot(x, y, pch = 4)
```
## Getting Started(4)
- `type = "l"` 描繪線條
- `lty` 線條的樣式
- `lwd` 線條的寬度
![](images/ch302.png)
```{r}
plot(x, y, type = "l", lty = 2, lwd = 3)
```
## Getting Started(5)
- `col` 顏色
- `col.axis`
- `col.lab`
- `col.main`
- `col.sub`
- `bg`
```{r}
plot(x, y, xlab = "x label", ylab = "y label",
main = expression(y == x^2), sub = "Getting started",
col.axis = "red", col.lab = "green", col.main = "blue",
col.sub = "orange", col = "red", bg = "green", pch = 21)
# bg 僅對 pch 21~25 有效
```
## Getting Started(6)
- 重要的 `par()` 參數
- `las` 軸標籤的方向
- `mar` 圖的上下左右邊界寬
- `mfrow` 子圖形的 row 數與 col 數
- 利用 `dev.off()` 恢復預設值
```{r}
par("las")
par("mar")
par("mfrow")
```
## Getting Started(7)
- `text()` 加入文字
```{r}
plot(mtcars$hp, mtcars$mpg, xlim = c(min(mtcars$hp), max(mtcars$hp)*1.2), pch = 21, col = "blue", bg = "blue", xlab = "Horse Power", ylab = "Miles per Gallon")
text(mtcars$hp, mtcars$mpg, labels = row.names(mtcars), cex = 0.5, pos = 4)
```
## Getting Started(8)
- `legend()` 加入圖例
```{r}
plot(iris$Sepal.Length, iris$Petal.Length, col = iris$Species,
xlab = "Sepal Length", ylab = "Petal Length", pch = 16, cex = 1.5)
legend("bottomright", col = 1:3, legend = levels(iris$Species), pch = 16, bty = "n")
```
## Getting Started(9)
- 練習
```{r}
x <- seq(-5, 5, by = 0.1)
y <- x**3
plot(x, y, type = "n", xaxt = "n", yaxt = "n", ylab = "", xlab = "")
title(expression(y == x^3))
mtext("y", side = 2, line = 1, las = 1)
points(x[x < 0], y[y < 0], col = "blue")
lines(x[x >= 0], y[y >= 0], col = "red", lwd = 2)
```
## 散佈圖
- `plot()` 繪製散佈圖
```{r}
plot(cars$speed, cars$dist)
```
## 散佈圖(2)
- 練習加入標題(`main`)、X 軸標籤(`xlab`)與 Y 軸標籤(`ylab`)
## 散佈圖(3)
- 加入 **fit line**
```{r}
plot(cars$speed, cars$dist)
fitline <- lm(dist ~ speed, data = cars)
abline(fitline, lwd = 2, col = "blue")
```
## 線圖
- `plot(..., type = "l")` 繪製線圖
```{r}
temperature <- round(runif(30) * 10 + 25)
dates <- as.Date("2017-06-01"):as.Date("2017-06-30")
dates <- as.Date(dates, origin = "1970-01-01")
plot(x = dates, y = temperature, type = "l")
```
## 線圖(2)
- 練習調整 `ylim` 參數
## 直方圖
- 繪製標準常態分佈與均勻分布
- `runif()` 函數可以幫你產生 0 到 1之間**均勻分配**的隨機數
- `rnorm()` 函數可以幫你產生**標準常態分配**的隨機數
- 調整 `breaks` 參數
```{r}
n <- 100
par(mfrow = c(2, 1)) # 建立一個 2x1 的網格畫布
hist(runif(n), main = paste("Distribution of", n, "uniformly distributed variables")) # 試著增加隨機數的個數 n
hist(rnorm(n), main = paste("Distribution of", n, "normally distributed variables")) # 試著增加隨機數的個數 n
```
## 盒鬚圖
- `boxplot()` 繪製盒鬚圖
- 使用 `~` 運算子將類別變數納入
```{r}
str(iris)
par(mfrow = c(2, 2))
boxplot(iris$Sepal.Length ~ iris$Species, main = "Sepal length by species")
boxplot(iris$Sepal.Width ~ iris$Species, main = "Sepal width by species")
boxplot(iris$Petal.Length ~ iris$Species, main = "Petal length by species")
boxplot(iris$Petal.Width ~ iris$Species, main = "Petal width by species")
```
## 函數圖
- `curve()` 繪製函數圖
```{r}
curve(sin, from = -pi, to = pi)
grid()
```
## 函數圖(2)
- 繪製自訂函數也不是問題
- 我們會教同學如何自訂函數
```{r}
sigmoid_func <- function(x){
return(1 / (1 + exp(-x)))
}
curve(sigmoid_func, from = -10, to = 10)
abline(h = 0.5, lty = 2)
abline(v = 0, lty = 2)
grid()
```
## 長條圖
- `barplot()` 繪製長條圖
```{r}
tbl_gear <- table(mtcars$gear)
barplot(tbl_gear, main = "Vehicle counts by gear types",
xlab = "Gear", ylab = "Vehicle counts")
```
## 長條圖(2)
- 不只能呈現 count
- 練習畫出下圖
```{r echo=FALSE}
vehicle_names <- row.names(mtcars)
barplot(mtcars$hp, names = vehicle_names,
main = "Horse power of each vehicle", xlab = "Horse power",
horiz = TRUE, las = 1, cex.names = 0.5)
```
## 長條圖(3)
- 在 bar 上方顯示數值
```{r}
y_lim <- c(0, 1.2 * max(table(mtcars$gear)))
bar_plt <- barplot(table(mtcars$gear), main = "Vehicle counts by gear types", xlab = "Gear", ylab = "Vehicle counts", ylim = y_lim)
text(x = bar_plt, y = table(mtcars$gear), label = table(mtcars$gear), pos = 3)
```
```{r}
vehicle_names <- row.names(mtcars)
x_lim <- c(0, 1.1 * max(mtcars$hp))
bar_plt <- barplot(mtcars$hp, names = vehicle_names, main = "Horse power of each vehicle", xlab = "Horse power", horiz = TRUE, las = 1, cex.names = 0.5, xlim = x_lim)
text(x = mtcars$hp, y = bar_plt, label = mtcars$hp, pos = 4, cex = 0.5)
```
## (Optional)氣泡圖
- 首先準備資料 `country`
```{r}
library(DBI)
con <- dbConnect(RMySQL::MySQL(),
dbname = "world",
host = "rsqltrain.ced04jhfjfgi.ap-northeast-1.rds.amazonaws.com",
port = 3306,
user = "trainstudent",
password = "csietrain")
country <- dbReadTable(con, "country")
dbDisconnect(con)
```
## (Optional)氣泡圖(2)
- 利用 `symbols()` 函數繪畫
```{r}
symbols(country$GNP, country$LifeExpectancy, circles = sqrt(country$Population / pi),
bg = factor(country$Continent), fg = "white",
xlab = "Income", ylab = "Life Expectancy")
```
## (Optional)將 `ylab` 轉換成水平
```{r}
?mtext
plot(cars, ylab = "")
mtext("Dist", side = 2, las = 1, line = 2)
#mtext("Dist", side = 2, las = 1, at = max(cars$dist) * 1.2)
```
## 如何學習作圖
- 查字典 vs. 讀字典
- 學習如何搜尋關鍵字
## 期中作業
- 用一個 2x2 的畫布練習使用 Base plotting system 繪製任意四個圖形