-
Notifications
You must be signed in to change notification settings - Fork 39
/
ggTimeSeries.Rmd
229 lines (173 loc) · 6.86 KB
/
ggTimeSeries.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
---
title: "ggplot2 extensions: ggTimeSeries"
---
### ggTimeSeries
<https://github.com/Ather-Energy/ggTimeSeries>
This R package offers novel time series visualisations. It is based on `ggplot2` and offers `geom`s and pre-packaged functions for easily creating any of the offered charts. Some examples are listed below.
```{r, message=FALSE,warning=FALSE}
# Example from https://github.com/Ather-Energy/ggTimeSeries
library(ggplot2)
library(ggthemes)
library(data.table)
library(ggTimeSeries)
```
## Line Charts Legacy
IoT devices generate a lot of sequential data over time, also called time series data. Legacy portrayals of such data would centre around line charts. Line charts have reportedly been around since the early 1700s (source: Wikipedia) and we have nothing against them. They facilitate trend detection and comparison, are simple to draw, and easy to understand; all in all a very well behaved visualisation. In modern times, their use is widespread from the heartbeat monitor at a hospital to the multiple-monitor display at a trader's desk.
```{r excel97_line, ext = 'png', fig.align = 'center', echo = FALSE, message = F, warning = F}
set.seed(10)
dfData = data.frame(
Time = 1:100,
Signal = abs(
c(
cumsum(rnorm(100, 0, 3)),
cumsum(rnorm(100, 0, 4)),
cumsum(rnorm(100, 0, 1)),
cumsum(rnorm(100, 0, 2))
)
),
Variable = c(rep('a', 100), rep('b', 100), rep('c', 100), rep('d', 100)),
VariableLabel = c(rep('Class A', 100), rep('Class B', 100), rep('Class C', 100), rep('Class D', 100))
)
Excel97Plot = ggplot(dfData, aes(x = Time, y = Signal, color = VariableLabel)) +
geom_line() +
geom_point() +
theme_excel() +
scale_colour_excel()
print("Excel 97 look recreated in R with the ggthemes package")
plot(Excel97Plot)
```
## Alternatives
However there are cases when the data scientist becomes more demanding and specific. Five alternatives available to such a data scientist are listed below. All of these options are available as `geom`s or packaged functions in the `ggplot2` based `ggTimeSeries` package.
Before that, setting a minimal theme -
```{r minimalTheme}
minimalTheme = theme_set(theme_bw(12))
minimalTheme = theme_update(
axis.ticks = element_blank(),
legend.position = 'none',
strip.background = element_blank(),
panel.border = element_blank(),
panel.background = element_blank(),
panel.grid = element_blank(),
panel.border = element_blank()
)
```
### Calendar Heatmap
Available as `stat_calendar_heatmap` and `ggplot_calendar_heatmap`.
A calendar heatmap is a great way to visualise daily data. Its structure makes it easy to detect weekly, monthly, or seasonal patterns.
```{r calendar_heatmap, fig.align = 'center', echo = TRUE, message = F, warning = F}
# creating some data
set.seed(1)
dtData = data.table(
DateCol = seq(
as.Date("1/01/2014", "%d/%m/%Y"),
as.Date("31/12/2015", "%d/%m/%Y"),
"days"
),
ValueCol = runif(730)
)
dtData[, ValueCol := ValueCol + (strftime(DateCol,"%u") %in% c(6,7) * runif(1) * 0.75), .I]
dtData[, ValueCol := ValueCol + (abs(as.numeric(strftime(DateCol,"%m")) - 6.5)) * runif(1) * 0.75, .I]
# base plot
p1 = ggplot_calendar_heatmap(
dtData,
'DateCol',
'ValueCol'
)
# adding some formatting
p1 +
xlab('') +
ylab('') +
scale_fill_continuous(low = 'green', high = 'red') +
facet_wrap(~Year, ncol = 1)
# creating some categorical data
dtData[, CategCol := letters[1 + round(ValueCol * 7)]]
# base plot
p2 = ggplot_calendar_heatmap(
dtData,
'DateCol',
'CategCol'
)
# adding some formatting
p2 +
xlab('') +
ylab('') +
facet_wrap(~Year, ncol = 1)
```
### Horizon Plots
Available as `stat_horizon` and `ggplot_horizon`.
Imagine an area chart which has been chopped into multiple chunks of equal height. If you overlay these chunks one on top of the the other, and colour them to indicate which chunk it is, you get a horizon plot. Horizon plots are useful when visualising y values spanning a vast range but with a skewed distribution, and / or trying to highlight outliers without losing context of variation in the rest of the data.
```{r horizon, fig.align = 'center', echo = TRUE, message = F, warning = F}
# creating some data
set.seed(1)
dfData = data.frame(x = 1:1000, y = cumsum(rnorm(1000)))
# base plot
p1 = ggplot_horizon(dfData, 'x', 'y')
print("If you're seeing any vertical white stripes, it's a display thing.")
# adding some formatting
p1 +
xlab('') +
ylab('') +
scale_fill_continuous(low = 'green', high = 'red') +
coord_fixed( 0.5 * diff(range(dfData$x)) / diff(range(dfData$y)))
```
### Steamgraphs
Available as `stat_steamgraph`.
A steamgraph is a more aesthetically appealing version of a stacked area chart. It tries to highlight the changes in the data by placing the groups with the most variance on the edges, and the groups with the least variance towards the centre. This feature in conjunction with the centred alignment of each of the contributing areas makes it easier for the viewer to compare the contribution of any of the components across time.
```{r steamgraph, fig.align = 'center', echo = TRUE, message = F, warning = F}
# creating some data
set.seed(10)
dfData = data.frame(
Time = 1:1000,
Signal = abs(
c(
cumsum(rnorm(1000, 0, 3)),
cumsum(rnorm(1000, 0, 4)),
cumsum(rnorm(1000, 0, 1)),
cumsum(rnorm(1000, 0, 2))
)
),
VariableLabel = c(rep('Class A', 1000), rep('Class B', 1000), rep('Class C', 1000), rep('Class D', 1000))
)
# base plot
p1 = ggplot(dfData, aes(x = Time, y = Signal, group = VariableLabel, fill = VariableLabel)) +
stat_steamgraph()
# adding some formatting
p1 +
xlab('') +
ylab('') +
coord_fixed( 0.2 * diff(range(dfData$Time)) / diff(range(dfData$Signal)))
```
### Waterfall
Available as `stat_waterfall` and `ggplot_waterfall`.
Rather than the values itself, a waterfall plot tries to bring out the changes in the values.
```{r waterfall, fig.align = 'center', echo = TRUE, message = F, warning = F}
# creating some data
set.seed(1)
dfData = data.frame(x = 1:100, y = cumsum(rnorm(100)))
# base plot
p1 = ggplot_waterfall(
dtData = dfData,
'x',
'y'
)
# adding some formatting
p1 +
xlab('') +
ylab('')
```
### Occurrence Dot Plot
Available as `stat_occurrence`.
This one is a favourite in infographics. For rare events, the reader would find it convenient to have the count of events encoded in the chart itself instead of having to map the value back to the Y axis.
```{r occurrence_dotplot, fig.align = 'center', echo = TRUE, message = F, warning = F}
# creating some data
set.seed(1)
dfData = data.table(x = 1:100, y = floor(4 * abs(rnorm(100, 0 , 0.4))))
# base plot
p1 = ggplot(dfData, aes(x =x, y = y) )+
stat_occurrence()
# adding some formatting
p1 +
xlab('') +
ylab('') +
coord_fixed(ylim = c(0,1 + max(dfData$y)))
```