-
Notifications
You must be signed in to change notification settings - Fork 0
/
Chapter_8.R
407 lines (334 loc) · 16.2 KB
/
Chapter_8.R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
# ========================================================================================
# Multi-parameter Evidence Synthesis --------------------------------------
# ========================================================================================
library(rjags)
library(R2jags)
library(bayesplot)
library(tidyverse)
library(parallel)
options(mc.cores = detectCores())
set.seed(20202204)
# ========================================================================================
# Prior and posterior simulation in a probabilistic model: Maple Syrup Urine Disease (MSUD)
# ========================================================================================
# Each of these basic parameters requires prior distributions, and as they are probabilities,
# Beta distributions are a natural choice. We choose minimally informative (uniform)
# distributions, Beta(1,1). Next we form expressions for the functional parameters that can
# be used to assess the impact of screening on the numbers of MSUD-associated cases of MR.
# Functional parameters can, by definition, be written as functions of basic parameters.
# Pr (mental retardation in MSUD with screening):
# theta[sm] = phi[s] * theta[em] + (1 - psi[s]) * theta[lm]
# Pr (mental retardation in MSUD without screening):
# theta[nm] = phi[n] * theta[em] + (1 - psi[n]) * theta[lm]
# Birth rate of MSUD-associated retardation with screening:
# theta[s] = 100, 000 * r * theta[sm]
# Birth rate of MSUD-associated retardation without screening:
# theta[n] = 100, 000 * r * theta[nm]
# Reduction in MSUD-associated retardation due to screening:
# e[d] = theta[s] - theta[n]
model_String <- "model {
# Likelihood models:
# for no. of
# MSUD cases:
r.r ~ dbin(r, n.r)
# no. detected early
# (screening):
r.s ~ dbin(phi.s, n.s)
# no. deteced early
# (no screening):
r.n ~ dbin(phi.n, n.n)
# no. retarded
# (early detection):
r.em ~ dbin(theta.em, n.em)
# no retarded
# (late detection):
r.lm ~ dbin(theta.lm, n.lm)
# Priors for basic parameters:
r ~ dbeta(1, 1)
phi.s ~ dbeta(1, 1)
phi.n ~ dbeta(1, 1)
theta.em ~ dbeta(1, 1)
theta.lm ~ dbeta(1, 1)
# Functional parameters:
theta.sm <- phi.s * theta.em + (1 - phi.s) * theta.lm
theta.nm <- phi.n * theta.em + (1 - phi.n) * theta.lm
theta.s <- (theta.sm * r) * 100000
theta.n <- (theta.nm * r) * 100000
e.d <- theta.s - theta.n
}
"
writeLines(text = model_String, con = "MSUDmod.txt")
Data_jags <- list(n.r = 724262, r.r = 7, n.s = 276, r.s = 253, n.n = 18,
r.n = 8, n.em = 10, r.em = 2, n.lm = 10, r.lm = 10)
Data_jags
params <- c("e.d", "theta.sm", "theta.nm", "theta.s", "theta.n")
jags_Mod <- jags(data = Data_jags, model.file = "MSUDmod.txt", parameters.to.save = params,
n.chains = 2, n.iter = 30000, n.burnin = 10000)
jags_Mod
posterior <- as.array(jags_Mod$BUGSoutput$sims.array)
color_scheme_set("viridisA")
theme_set(theme_minimal())
mcmc_trace(posterior, pars = params,
facet_args = list(ncol = 1, strip.position = "left"))
color_scheme_set("viridisB")
theme_set(theme_minimal())
mcmc_trace(posterior, pars = params,
size = 1, ) +
panel_bg(fill = "white", color = NA) +
legend_move("top")
color_scheme_set("mix-teal-pink")
mcmc_dens_overlay(posterior, pars = params)
color_scheme_set("pink")
mcmc_pairs(posterior, pars = params,
off_diag_args = list(size = 1.5))
color_scheme_set("mix-blue-brightblue")
mcmc_acf(posterior, pars = params, lags = 10)
# Note that in this particular case all the data directly informs five basic parameters,
# and these parameters are sufficient to define the target quantity e.d, the reduction in
# retardation that would follow screening for MSUD. The simplicity of the problem lies in
# the fact that there are no data directly informing the functional parameters. Further,
# because the Beta priors are conjugate with the Binomial likelihood, we can obtain the
# posterior distributions for the basic parameters in closed form. A Beta(a, b) prior
# together with binomial data with numerator y and denominator n gives us a
# Beta(a + y, b + n ! y) posterior. It is this that makes it possible to sample from the
# Bayesian posterior distributions using standard forward MC simulation. In the next
# example, however, forward MC simulation would not be sufficient and we must appeal to
# more powerful methods for Bayesian updating, such as MCMC, which do not rely on conjugacy.
# ========================================================================================
# A model for Prenatal HIV testing ----------------------------------------
# ========================================================================================
# The following analysis is simplified and somewhat stylised, but it illustrates an
# important characteristic of the data sources that must often be used in studies of
# screening, namely that data may not be available on the key parameters, but may instead
# be available on functions of the parameters. Also, and in contrast to the MSUD example,
# there may be data on more functions of parameters than there are basic parameters, which
# raises the possibility of conflict between sources of evidence, or – to put it another
# way – a chance to validate parameter estimates against independent evidence.
# The critical parameters in an economic analysis of screening are those that relate to the
# prevalence of the undetected condition. Similarly, in a comparison of universal and
# targeted testing, the key parameters are those relating to the prevalence of undiagnosed
# infection in the Low Risk group, as the higher risk groups will be covered by both
# screening strategies. The crucial parameters are, therefore, e, the prevalence of HIV in
# the Low Risk, and h, the proportion of Low Risk HIV infected who are already diagnosed.
# This intuition is confirmed by the Net Benefit equation. If M is the net monetary benefit
# of an early diagnosis, T is the cost of an HIV test, then the Incremental Net
# Benefit (INB) of universal relative to targeted screening for the women is:
# INB = (1 - a - b) * (Me(1 - h) - T(1 - eh))
# If this example was like MSUD, an investigator would be able to look through the
# literature and find data sources that informed each of the four parameters in the Net
# Benefit function. These could then be represented by suitable Beta distributions. Samples
# would then be drawn repeatedly from these distributions and INB calculated on each cycle,
# and finally a distribution of INB would be obtained. The problem, however, is that there
# are no data that directly inform the key parameters e and h, and it is difficult to
# imagine that there ever could be.
# An introductory synthesis exercise --------------------------------------
# We set ourselves the task of writing BUGS code to estimate the five parameters a, b, c,
# d and e from the first six items of data. We begin by assigning priors to these five
# basic parameters. For c, d and e we can straightforwardly assign vague uniform priors
# Beta(1, 1). For a and b we have to add a constraint that (a + b ≤ 1) because the
# proportions in each risk group must each be greater or equal to zero, but must always sum
# to 1. This is a Dirichlet distribution. A convenient way to represent a Dirichlet
# distribution in BUGS is to express it as a series of binomial distributions. For example,
# the code
# a ~ dbeta(1, 2)
# z ~ dbeta(1, 1)
# b <- z * (1 - a)
# sets up three variables: a, z = (1 - a), and b = z * (1 - a), where a is a probability,
# and b is a proportion of 1 - a, so that the required constraint is satisfied whatever
# values a and b may take. Note also that the priors give a, b and (1 - a - b) equal unit
# weight, which represents a vague uniform Dirichlet(1, 1, 1).
# The remaining basic parameters can be given uniform distributions:
# c ~ dbeta(1, 1)
# d ~ dbeta(1, 1)
# e ~ dbeta(1, 1)
# The next step is to define the relationships between the basic parameters and the
# functional parameters that the data estimates directly. Here it is convenient to create a
# vector p[ ], which in effect monitors the fitted values:
# p[1] <- a
# p[2] <- b
# p[3] <- c
# p[4] <- d
# p[5] <- (b * d + (1 - a - b) * e / (1 - a))
# p[6] <- (a * c + b * d + (1 - a - b) * e)
# The final step is to specify the Binomial likelihood, looping over the six items, and then
# to supply the data, in the form of a vector r[ ] for the numerators and a vector n[ ] for
# the denominators.
# So, the declared model logic takes the form:
model_String <- "model {
# Binomial likelihood:
for (i in 1:6) {
r[i] ~ dbin(p[i], n[i])
}
# Declared relationships
# between basic and
# functional parameters:
p[1] <- a
p[2] <- b
p[3] <- c
p[4] <- d
p[5] <- (b * d + (1 - a - b) * e / (1 - a))
p[6] <- (a * c + b * d + (1 - a - b) * e)
# Estimated basic
# prior parameters:
a ~ dbeta(1, 2)
c ~ dbeta(1, 1)
d ~ dbeta(1, 1)
e ~ dbeta(1, 1)
z ~ dbeta(1, 1)
# Estiamted functional
# prior parameters:
b <- z * (1 - a)
}
"
writeLines(text = model_String, con = "SyntEx.txt")
Data_jags <- list(
r = c(11044, 12, 252, 10, 74, 254),
n = c(104577, 882, 15428, 473, 136139, 102287)
)
Data_jags
params <- c("a", "b", "c", "d", "e", "z")
jags_Mod <- jags(data = Data_jags, model.file = "SyntEx.txt",
parameters.to.save = params, n.iter = 50000,
n.burnin = 10000, n.chains = 2)
jags_Mod
posterior <- as.array(jags_Mod$BUGSoutput$sims.array)
color_scheme_set("viridisA")
theme_set(theme_minimal())
mcmc_trace(posterior, pars = params,
facet_args = list(ncol = 1, strip.position = "left"))
color_scheme_set("viridisB")
theme_set(theme_minimal())
mcmc_trace(posterior, pars = params,
size = 1, ) +
panel_bg(fill = "white", color = NA) +
legend_move("top")
color_scheme_set("mix-teal-pink")
mcmc_dens_overlay(posterior, pars = params)
color_scheme_set("pink")
mcmc_pairs(posterior, pars = params,
off_diag_args = list(size = 1.5))
color_scheme_set("mix-blue-brightblue")
mcmc_acf(posterior, pars = params, lags = 10)
# ========================================================================================
# Model criticism in multi-parameter models -------------------------------
# ========================================================================================
# As we shall see, statistical investigation can reveal that inconsistency is present, but
# is unlikely by itself to tell us which data input is the source of that inconsistency.
# A further issue to consider in advance of any analysis is the number of potential
# inconsistencies that there can be.
# A natural approach is to compare the number of parameters (nine) with the number of
# functions of parameters on which data is available, and regard the difference (three) as
# a measure of the number of independent inconsistencies there could be in the data set.
# We might say that there are three degrees of freedom for inconsistency.
# Inconsistency in the HIV data --------------------------------------------
# The formal way to examine this is by cross-validation. Taking as an example, data point
# 4, we need to form a predictive distribution for the numerator r[4], based only on data
# points 1 – 3 and 5 – 12, that is excluding data point 4, and then compare this predictive
# distribution with the original data.
# To do this the priors and the model sections of the BUGS/JAGS code remain unaltered, but
# the likelihood is rewritten to exclude r[4]. The data list also remains unaltered. The
# cross-validation method then requires us to create a ‘replicate’ binomial variable based
# on the p[4] (d), now estimated without the benefit of data point 4, and n[4], and
# compare this to the original r[4].
model_String <- "model {
# Binomial likelihood:
for (i in 1:3) {
r[i] ~ dbin(p[i], n[i])
}
for (i in 5:12) {
r[i] ~ dbin(p[i], n[i])
}
# Declared relationships
# between basic and
# functional parameters:
p[1] <- a
p[2] <- b
p[3] <- c
p[4] <- d
p[5] <- (b * d + (1 - a - b) * e / (1 - a))
p[6] <- (a * c + b * d + (1 - a - b) * e)
p[7] <- (a * c * f) / ((a * c * f) + (b * d * g) + (e * h * (1 - a - b)))
p[8] <- (b * d * g) / ((b * d * g) + (e * h * (1 - a - b)))
p[9] <- ((a * c * d) + (b * d * g) + (e * h * (1 - a - b)) /
((a * c) + (b * d) + e * (1 - a - b)))
p[10] <- g
p[11] <- w
p[12] <- ((b * d / ((b * d) + e * (1 - a - b)))) + (e * w) * (1 - a - b) /
((b * d) + e * (1 - a - b))
# Estimated basic
# prior parameters:
a ~ dbeta(1, 2)
c ~ dbeta(1, 1)
d ~ dbeta(1, 1)
e ~ dbeta(1, 1)
f ~ dbeta(1, 1)
g ~ dbeta(1, 1)
h ~ dbeta(1, 1)
w ~ dbeta(1, 1)
z ~ dbeta(1, 1)
# Estiamted functional
# prior parameters:
b <- z * (1 - a)
r.rep ~ dbin(p[4], n[4])
p.xval <- step(r.rep - r[4]) - 0.5 * equals(r.rep, r[4])
dev[4] <- 0
}
"
writeLines(text = model_String, con = "ModCrit.txt")
Data_jags <- list(
r = c(11044, 12, 252, 10, 74, 254, 43, 4, 87, 12, 14, 5),
n = c(104577, 882, 15428, 473, 136139, 102287, 60, 17, 254, 15, 118, 31)
)
Data_jags
params <- c("p", "p.xval", "r.rep")
jags_Mod <- jags(data = Data_jags, model.file = "ModCrit.txt",
parameters.to.save = params, n.iter = 50000,
n.burnin = 10000, n.chains = 2)
jags_Mod
# The posterior for p.xval and hence the probability of observing a sample r[4] as high as
# 10 with a sample of n[4], given our model and the remaining data is 0.004.
posterior <- as.array(jags_Mod$BUGSoutput$sims.array)
color_scheme_set("viridisA")
theme_set()
mcmc_trace(posterior, pars = c("r.rep", "p[4]"),
facet_args = list(ncol = 1, strip.position = "left"))
color_scheme_set("viridisB")
theme_set(theme_minimal())
mcmc_trace(posterior, pars = c("r.rep", "p[4]"),
size = 1, ) +
panel_bg(fill = "white", color = NA) +
legend_move("top")
color_scheme_set("mix-teal-pink")
mcmc_dens_overlay(posterior, pars = c("p[4]", "r.rep"))
color_scheme_set("pink")
mcmc_hist(x = posterior, pars = c("r.rep"), binwidth = 1)
color_scheme_set("mix-blue-brightblue")
mcmc_acf(posterior, pars = c("r.rep"), lags = 10)
# ========================================================================================
# Summary of key points ---------------------------------------------------
# ========================================================================================
# We distinguish between forward Monte Carlo simulation from distributions of parameters,
# and Monte Carlo sampling from the joint posterior distribution of parameters.
# Multi-parameter evidence synthesis combines evidence not just from different sources, but
# on different parameters, within a coherent overall model. This may often involve more
# sources of evidence, or more types of sources, than there are parameters, introducing the
# possibility of inconsistency between different evidence sources.
# But, equally, the possibility of inconsistency is also an opportunity for independent
# validation of a model.
# We emphasise the advantages of assessing the potential sources of bias in evidence before
# attempting synthesis, due to the dangers of post hoc adjustments.
# Bayesian multi-parameter evidence synthesis meets the requirements we should expect from
# methods under-pinning ‘evidence-based’ policy: incorpo- ration of all available evidence,
# assessment of inconsistency, and uncertainty propagation.
# The danger with post hoc adjustments intended to resolve inconsistency is that the same
# inconsistency can be resolved equally well by removing any one of a number of data points,
# or adding bias parameters in several different parts of the network. Each different
# approach constitutes an alternative model with different parameter values, and potentially
# with fundamentally different public health impli- cations, and yet there is no statistical
# way of choosing between them. The entire analysis becomes prey to selective
# interpretation. Therefore, it is clearly preferable to identify potential sources of bias
# attaching to each data point prior to putting data together in a synthesis.
# Thus! the power and convenience of the method must not be allowed to lure the user away
# from a careful consideration of the assumptions being made, nor from careful attention to
# convergence.
# End file ----------------------------------------------------------------