-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy path07-NumericData.Rmd
executable file
·274 lines (198 loc) · 10.5 KB
/
07-NumericData.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
# Control charts for numeric, normally-distributed data {#numeric}
| If your data involve... | use a/an ... | based on the ... distribution. |
| --------------------------------- | --------- | ----------------------------- |
| Individual points | *I* chart | normal |
| Subgroup average | $\bar{x}$ and *s* chart | normal |
| Exponentially weighted moving average | EWMA chart | normal |
| Cumulative sum | CUSUM chart | normal |
| Time between (rare) events | *t* chart | Weibull |
- For continuous data, the definition of the control limits will depend on your question and the data at hand. To detect small shifts in the mean quickly, an EWMA is probably best, while to understand natural variation and try to detect special cause variation, an $\bar{x}$ and *s* chart will be more useful.
- In the rare cases you may need an individual chart, do *not* use 3$\sigma$ for the control limits; you must use 2.66$MR_{bar}$ instead to ensure the limits are presented correctly.
- Note: EWMA and CUSUM charts aren't "standard" control charts in that the only guideline for detecting special cause variation is a point outside the limits. So while they can't detect special cause variation like control charts, they *can* detect shifts in mean with fewer points than a standard control chart. "MAY" DETECT SHIFTS? LOOK AT THE X-BAR S CHARTS FOR WAIT TIMES VERSUS THE EWMA CHART FOR WAIT TIMES.
<br/>
## *I-MR* chart
(Think we should move the IMR chart section just above the t-chart section -BB)
When you have a single measurement per subgroup, the *I-MR* combination chart is appropriate. They should always be used together.
**Mean($\bar{x}$):** $\bar{x} = \frac{\sum_{x_{i}}}{n}$
**Control limits for normal data (*I*):** 2.66$MR_{bar}$
*where*
$MR_{bar}$ = average moving range of *x*s, excluding those > 3.27$MR_{bar}$
<br/>
I DON'T SEE WHERE YOU'RE EXCLUDING VALUES OVER 3.27*MR OR IS THAT WHAT THE K DOES IN THE FIRST PLOT?
(Does excluding >3.27MR come from the t chart source (Provost and Murray)? I don't see it excluded in most IMR formulations, but added the exclusion the code below -BB)
*Lab results turnaround time*
```{r IMR_data}
# Generate fake data
arrival = cumsum(rexp(24, 1/10))
process = rnorm(24, 5)
exit = matrix( , length(arrival))
exit[1] = arrival[1] + process[1]
for (i in 1:length(arrival)) {
exit[i] = max(arrival[i], exit[i - 1]) + process[i]
}
# Calculate control chart inputs
subgroup.i = seq(1, length(exit))
subgroup.mr = seq(1, length(exit) - 1)
point.i = exit - arrival
point.mr = matrix(, length(point.i) - 1)
for (i in 1:length(point.i) - 1) {
point.mr[i] = abs(point.i[i + 1] - point.i[i])
}
mean.i = mean(point.i)
mean.mr0 = mean(point.mr)
mean.mr = mean(point.mr[point.mr<=3.27*mean.mr0])
sigma.i = rep(mean.mr, length(subgroup.i))
sigma.mr = rep(mean.mr, length(subgroup.mr))
```
Unlike the attribute control charts, the *I-MR* chart requires a little intepretation. The *I* portion is the data itself, but the *MR* part shows the variation over time, specifically, the range between successive data points.
Look at the *MR* part first; if it's in control, then any special cause variation in the *I* portion can be attributed to a change in process. If the *MR* chart out of control, the control limits for the *I* portion will be wrong, and should not be interpreted. ISN'T THIS THE CASE BELOW?
```{r MR_chart, fig.height = 3.5}
# Plot MR chart
spc.plot(subgroup.mr, point.mr, mean.mr, sigma.mr, k = 3.27,
lcl.show = FALSE, band.show = FALSE,
label.x = "Test number", label.y = "Turnaround time (moving range)")
```
```{r I_chart, fig.height = 3.5}
# Plot I chart
spc.plot(subgroup.i, point.i, mean.i, sigma.i, k = 2.66,
lcl.min = 0, band.show = FALSE,
label.x = "Test number", label.y = "Turnaround time")
```
<br/>
## $\bar{x}$ and *s* chart
When you have a sample or multiple measurements per subgroup, the $\bar{x}$ and *s* chart combination is the appropriate choice. As with the *I-MR* chart, they should always be used together.
Control limits (3σ) are calculated as follows:
**Variable averages ($\bar{x}$):** $3\frac{\bar{s}}{\sqrt{n_i}}$
**Variable standard deviation (*s*):** $3\bar{s}\sqrt{1-c_4^2}$
*where* $c_4 = \sqrt{\frac{2}{n-1}}\frac{\Gamma(\frac{n}{2})}{\Gamma(\frac{n-1}{2})}$
<br/>
*Patient wait times*
``` {r xex}
# Generate fake patient wait times data
set.seed(777)
waits = c(rnorm(1700, 30, 5), rnorm(650, 29.5, 5))
months = strftime(sort(as.Date('2013-10-01') + sample(0:729,
length(waits), TRUE)), "%Y-%m-01")
sample.n = as.numeric(table(months))
dfw = data.frame(months, waits)
# Calculate control chart inputs
subgroup.x = as.Date(unique(months))
subgroup.s = subgroup.x
point.x = aggregate(dfw$waits, by = list(months), FUN = mean, na.rm = TRUE)$x
point.s = aggregate(dfw$waits, by = list(months), FUN = sd, na.rm = TRUE)$x
mean.x = mean(waits)
mean.s = sqrt(sum((sample.n - 1) * point.s ^ 2) /
(sum(sample.n) - length(sample.n)))
sigma.x = mean.s / sqrt(sample.n)
c4 = sqrt(2 / (sample.n - 1)) * gamma(sample.n / 2) /
gamma((sample.n - 1) / 2)
sigma.s = mean.s * sqrt(1 - c4 ^ 2)
```
As with the *I-MR* chart, you need to look at the *s* chart first---if it shows special-cause variation, the control limits for the $\bar{x}$ chart will be wrong. If it doesn't, you can go on to interpret the $\bar{x}$ chart.
```{r s_chart, fig.height = 3.5}
# Plot s chart
spc.plot(subgroup.s, point.s, mean.s, sigma.s, k = 3,
label.x = "Month", label.y = "Wait times standard deviation (s)")
```
```{r x_chart, fig.height = 3.5}
# Plot xbar chart
spc.plot(subgroup.x, point.x, mean.x, sigma.x, k = 3,
label.x = "Month", label.y = "Wait times average (x)")
```
<br/>
## EWMA chart
**Control limits for exponentially weighted moving average (EWMA):** $3\frac{\bar{s}}{\sqrt{n_i}}\sqrt{\frac{\lambda}{2-\lambda}[1 - (1 - \lambda)^{2i}]}$
*where* $\lambda$ is a weight that determines the influence of past observations. If unsure choose $\lambda = 0.2$, but $0.05 \leq \lambda \leq 0.3$ is acceptable (where larger values give stronger weights to past observations).
<br/>
*Patient wait times (continued)*
``` {r ewmaex}
# Calculate control chart inputs
subgroup.z = subgroup.x
lambda = 0.2
point.z = matrix( , length(point.x))
point.z[1] = mean.x
for (i in 2:length(point.z)) {
point.z[i] = lambda * point.x[i] + (1 - lambda) * point.z[i-1]
}
mean.z = mean.x
sigma.z = (mean.s / sqrt(sample.n)) *
sqrt(lambda/(2-lambda) * (1 - (1-lambda)^(seq(1:length(point.z)))))
```
```{r EMWA_chart, fig.height = 3.5}
# Plot EWMA chart
spc.plot(subgroup.z, point.z, mean.z, sigma.z, k = 3, band.show = FALSE,
rule.show = FALSE, label.x = "Month",
label.y = "Wait times moving average")
```
<br/>
## CUSUM chart
Lower and upper cumulative sums are calculated as follows:
$S_{l,i} = -\max{[0, -z_i -k + S_{l,i-1}]},$
$S_{h,i} = \max{[0, z_i -k + S_{h,i-1}]}$
*where* $z_i$ is the standardized normal score for subgroup $i$ and $0.5 \leq k \leq 1$ is a slack value.
It is common to choose "decision limits" of $\pm 4$ or $\pm 5$.
<br/>
``` {r cusumex, fig.height = 3.5}
# Calculate control chart inputs
subgroup.cusum = subgroup.x
slack = 0.5
zscore = (point.x - mean.x)/sigma.x
point.cusuml = matrix(, length(zscore))
point.cusuml[1] = -max(0, -zscore[1] - slack)
for (i in 2:length(point.cusuml)) {
point.cusuml[i] = -max(0, -zscore[i] - slack - point.cusuml[i-1])
}
point.cusumh = matrix(, length(zscore))
point.cusumh[1] = max(0, zscore[1] - slack)
for (i in 2:length(point.cusuml)) {
point.cusumh[i] = max(0, zscore[i] - slack - point.cusumh[i - 1])
}
mean.cusum = 0
sigma.cusum = rep(1, length(subgroup.cusum))
````
```{r CUSUM_chart, fig.height = 3.5}
# Plot CUSUM chart
lower.plot = spc.plot(subgroup.cusum, point.cusuml, mean.cusum, sigma.cusum,
k = 5, band.show = FALSE, rule.show = FALSE,
label.y = "Wait Times? Cumulative sum")
lower.plot + geom_line(aes(y = point.cusumh), col = "royalblue3") +
geom_point(aes(y = point.cusumh), col = "royalblue3")
```
## Rare events: *t*-chart {#tchart}
If the time between rare events is best represented by a continuous time scale, use a *t*-chart. If a discrete time scale is reasonable, a *g*-chart (see the [previous chapter](#gchart)) may be simpler to implement and easier to interpret without transformation, though a *t*-chart is also acceptable.
### *t*-chart example
**Mean for time between events (*t*)(not shown):** $t = \bar{x}({y_i})$
*where*
$t$ = time between events, where *t* is always > 0
$y = t^{\frac{1}{3.6}}$
**Control limits for time between events (*t*)(not shown):** 2.66$MR_{bar}$
$MR_{bar}$ = average moving range of *y*s, excluding those > 3.27$MR_{bar}$
Note: *t* chart mean and limits can be transformed back to the original scale by raising those values to the 3.6 power. In addition, the y axis can be plotted on a log scale to make the display more symmetrical (which can be easier than explaining how the distribution works to a decision maker).
*Days between infections*
``` {r tex}
# Generate sample data using g-chart example data
y = linedays.btwn ^ (1/3.6)
mr = matrix(, length(y) - 1)
for (i in 1:length(y) - 1) {
mr[i] = abs(y[i + 1] - y[i])
}
#==================================================================================
# Is this the right way to interpret exclude > 3.27MR? Is this exclusion recursive?
#According to Provost and Murray (https://books.google.com/books?id=pRLcaOkswQsC) you first calculate MRbar, then remove all points that greater than 3.27*MRbar, then recalculate MRbar and use that recalculated MRbar for the UL and LL. which is what you seem to do. Made some edits to make this clear...maybe? I think?
#==================================================================================
mr = mr[mr <= 3.27*mean(mr)]
mr_prime = mean(mr)
# Calculate t chart inputs
subgroup.t = subgroup.g
point.t = y
central.t = mean(y)
sigma.t = rep(mr_prime, length(point.t))
```
```{r t_chart, fig.height = 3.5}
# Plot t chart
spc.plot(subgroup.t, point.t, central.t, sigma.t, lcl.show = FALSE,
band.show = FALSE, rule.show = FALSE,
lcl.min = 0, k = 2.66, label.x = "Infection number",
label.y = "Line days between infections (transformed)")
```
<br/>