forked from corona-warn-app/cwa-documentation
-
Notifications
You must be signed in to change notification settings - Fork 0
/
transmission_risk.Rmd
965 lines (764 loc) · 65.3 KB
/
transmission_risk.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
---
title: "Epidemiological Motivation of the Transmission Risk Level"
author: "CWA Team"
date: "`r Sys.Date()`"
output:
pdf_document:
toc: yes
latex_engine: xelatex
number_sections: true
fig_caption: yes
keep_tex: true #necessary for proper debugging
html_document:
theme: united
toc: yes
number_sections: true
word_document:
toc: yes
editor_options:
chunk_output_type: console
header-includes:
- \usepackage{float}
- \usepackage{romannum}
- \usepackage{hyperref}
- \hypersetup{colorlinks = true, linkcolor = [rgb]{.1,.1,.44}, urlcolor = blue, citecolor = [rgb]{.0,.39,.0}}
bibliography: transmission_risk_references.bib
---
```{r setup_knitr, include=FALSE, echo=FALSE, purl=FALSE}
knitr::opts_chunk$set(echo = TRUE, fig.width = 8, fig.height = 2.5, fig.align="center")
options(knitr.duplicate.label = "allow")
knitr::opts_chunk$set(fig.pos = "H", out.extra = "")
```
```{r setup, include=FALSE, echo=FALSE, purl=TRUE, eval=TRUE}
library(tidyverse)
# Default ggplot theme
theme_set(theme_minimal())
# Fix seed for reproducibility
set.seed(42)
```
```{r init, ref.label=c("convolute","helper_pmf_plot", "helper_discretize"), eval=TRUE, echo = FALSE, purl=FALSE}
```
\pagenumbering{arabic}
# Abstract
This document contains an epidemiological description of the *transmission risk level* used in the German [Corona-Warn-App](https://github.com/corona-warn-app/cwa-documentation) (CWA). As its name suggests, the transmission risk is an essential part when estimating the overall risk of a person to get infected in an exposure incident. Usage of the transmission risk level is specified in the [ExposureNotification API](https://developer.apple.com/documentation/exposurenotification) and in the [CWA Architecture](https://github.com/corona-warn-app/cwa-documentation/blob/master/solution_architecture.md#risk-score-calculation). In particular we use epidemiological information about COVID-19 from the literature to motivate the choice of levels for this parameter. To enhance transparency and reproducibility of the computations, we provide the mathematical derivations and the computations in one [Rmarkdown](https://rmarkdown.rstudio.com/) document. The methods sketched below are likely to be subject to change, once additional information about the characteristics of COVID-19 is obtained or as feedback from the use of the app arrives.
# Introduction
We are interested in the situation where a person **A** (potential infector) at time $t_0$ uploads information about being a laboratory confirmed SARS-CoV-2 case. The upload happens in terms of *A*'s *diagnosis keys* [see @ExposureNotificationCrypto]. Each *diagnosis key* is associated with a particular day in A's history (current and past 14 days) and also has an optional *transmission risk level* from I--VIII [see @AndroidExposureNotificationsApi].
Users can periodically download *diagnosis keys* from the *diagnosis server*. For each app user **B** (potential infectee), who downloads the list of valid *diagnosis keys* and discovers he or she has been in contact with *A*, a risk assessment will be made.[^a2b] This risk assessment is operationalized by the *total risk score*, of which one component is the *transmission risk level* $\lambda_A$ computed by *A*. The *transmission risk level* (provided by *A*) and its associated *transmission risk value* (set by *B*) are app-defined and should be based on the probability of transmission between the two persons being in close contact. In the present document we interpret this probability as a function of epidemiological information about *A* and the time of contact. Information about the closeness and the duration of the contact are not considered part of the transmission risk component, because they are handled separately in the computation of the *total risk score*. For more information on how to calculate the *total risk score* see the [Exposure Notification API](https://developer.apple.com/documentation/exposurenotification).
As the *transmission risk level* is computed on *A*'s device, additional information about *A* such as *being symptomatic or not*, *date of onset of symptoms*, *date of sampling* or *date of test result* could be used to estimate the **infectiousness of A** more precisely. We will use the currently known characteristics of COVID-19, especially its [infectiousness profile due to viral shedding](#viral-shedding) and the [operational delays](#delays) of its handling to estimate the infectiousness at certain times. We can obtain the information on how many days ago from **now** (i.e. $t_0$) the contact between *A* and *B* was: Let $t_C$ be the time of contact between *A* and *B*, then the contact was $d=t_0-t_C$ days ago. The aim of this document is thus to parametrize the time-dependent infectiousness of *A* as a function of $d$.
The better we can assess the probability of a transmission from *A* to *B*, the more accurate is the *combined risk score*[^risk] that is used to warn the user to take further action, e.g., to contact a local health authority. Having digital support for this type of contact tracing appears helpful in order to obtain a more complete coverage of contact tracing and to do this much quicker.
The present document is structured as follows. In section [Scenarios](#scenarios) we distinguish between four possible information states about *A* at the time of upload[^upload] depending on whether an onset of COVID-19 symptoms has occurred or not and whether this information can be used or not. We calculate a transmission risk for each of the four cases. However, since the initial version of the app did not allow a distinction between cases 1--4, and since the *transmission risk level* is only one of 4 components for the *total risk score*, we thus normalize the *transmission risk level* in a so called [base case](#base_case), which was used by the initial version of the app and is still used if no information on symptoms is provided. A section [Discussion](#discussion) summarizes the results and points out important limitations.
[^a2b]: Throughout we will assume that the contact between *A* and *B* was such that *A* infected *B*. At this point we explicitly ignore the possibility that *B* actually infected *A*.
[^risk]: For a more detailed description of the connection between the *total risk score* and the *combined risk score* see the [risk score calculation section](https://github.com/corona-warn-app/cwa-documentation/blob/master/solution_architecture.md#risk-score-calculation) of the [solution architecture document](https://github.com/corona-warn-app/cwa-documentation/blob/master/solution_architecture.md).
[^upload]: Due to technical restrictions, the upload of the diagnosis keys may not be a single-point-in-time event, but may happen over two consecutive days, without further interaction with the user after his or her initial consent. Given that the time point of the user's consent for upload may also be the last opportunity for providing additional information such as date of onset of symptoms, we use "consent for upload" when calculating delays and refer to it as simply "upload" throughout this document.
# Scenarios and Events {#scenarios}
Since the API allows for a customization of the transmission component $\lambda_A$ in the above, we shall study it in more detail here. Particular interest will be in four information scenarios about *A*, the potential infector, at the time of the upload.
For an infected person the following sequence of event times occurs (but not necessarily in the given order):
* $T_E=T_{\text{infection}}$: transmission of SARS-CoV-2 to an **e**xposed person *A* from some unknown source
* $T_I=T_{\text{infectious}}$: start of the infectious period in person *A*, i.e. *A* is able to infect others
* $T_S=T_{\text{symptoms}}$: onset of symptoms in person *A* (also referred to as **DSO**, day of symptom onset)
* $T_P=T_{\text{sampling}}$: time of sampling of person *A* [dt. **P**robenentnahme]
* $T_R=T_{\text{result}}$: time of *A* obtaining the positive test result
* $T_U=T_{\text{upload}}$: time where person *A* uploads the positive test result to the system
Note that before observation these times are to be considered as random quantities and, hence, are denoted by an uppercase letter. Once observed, a lower case letter shall be used.
Furthermore, note that for SARS-CoV-2 it is likely that $T_I$ occurs before $T_S$, but it cannot be ruled out that the order is reversed. For never-symptomatic cases $T_S$ will never occur, but $T_I$ might already have occurred or will occur in the future. For pre-symptomatic cases $T_S$ lies in the future, i.e. $T_S > t_0$, but $T_I$ might already have occurred or lie in the future. From our above description we would usually have $T_{\text{upload}}=t_0$. In a later section we will study the delays between the different event times. For this we will have to use the [convolution of random variables](#convolution) to get the distribution of their sum.
To derive the transmission component $\lambda_A$ we will distinguish persons who will eventually develop symptoms and those which are never-symptomatic. The former set will be denoted $\mathcal{Symp}$ and the later $\mathcal{N\!symp}$. Throughout the text we will use the shorthand notation $\mathcal{N\!symp}_A$ to denote the event that person *A* belongs to the set of never-symptomatic. Likewise for $\mathcal{Symp}_A$. We reserve the notion of *never-symptomatic* for those persons who never develop symptoms, whereas *pre-symptomatic* at a particular time $t$ are those who will eventually develop symptoms, but at a later point in time than $t$. If $T_S^A$ denotes the time of symptom onset in *A*, we will use the shorthand notation $\mathcal{Symp}_A(T^A_S > t)$ to denote the event that *A* belongs to the set of individuals who will eventually develop symptoms, but the onset of these symptoms has not yet occurred by time $t$, i.e.
$$
\mathcal{Symp}_A(T^A_S > t) = \{ \omega\in\mathcal{Symp}_A \;|\; \omega: T^A_S > t \}.
$$
Likewise we define the event $\mathcal{Symp}_A(T^A_S \leq t)$. Both never-symptomatic and pre-symptomatic are characterized as being *asymptomatic* at a given time of reference $t$. We will use the following shorthand to denote this event:
$$
\mathcal{Asymp}_A(T^A_S > t) = \mathcal{N\!symp}_A \cup \mathcal{Symp}_A(T^A_S > t),
$$
which is just a formal way of saying that in order for *A* to be asymptomatic at time t, *A* either belongs to the set of never-symptomatic cases or *A* is pre-symptomatic at time $t$.
Then we have (under the assumption that *A* is infected)
$$
\begin{aligned}
1 &= P(\mathcal{Symp}_A) + P(\mathcal{N\!symp}_A) \\
&= P(\mathcal{Symp}_A(T^A_S \leq t) + P(\mathcal{Symp}_A (T^A_S > t)) + P(\mathcal{N\!symp}_A) \\
&= P(\mathcal{Symp}_A(T^A_S \leq t)) + P(\mathcal{Asymp}_A (T^A_S > t))
\end{aligned}
$$
The four information scenarios about *A* are then:
1. symptomatic and day of symptom onset $t^A_S \leq t_0$ known at time $t_0$, i.e. the event $\mathcal{Symp}_A(T^A_S = t^A_S)$,
2. symptomatic but unknown day of symptom onset at time $t_0$, i.e. the event $\mathcal{Symp}_A(T^A_S \leq t_0)$,
3. asymptomatic at time $t_0$, i.e. the event $\mathcal{Asymp}_A(T^A_S>t_0)$,
4. no knowledge of symptom status at time $t_0$ (the *base case*), i.e. the event $\Omega = \mathcal{Symp}_A(T^A_S \leq t_0) \cup \mathcal{Asymp}_A (T^A_S > t_0)$.
Differentiation of these scenarios requires person *A* to provide additional (optional) information on potential symptom onset and respective date. The 2nd scenario occurs, if *A* accepts to reveal that COVID-19 relevant symptoms have been observed by the time of upload, but *A* does not want to reveal (or does not know) the day of symptom onset. If *A* despite a positive test result either has not yet developed or never will develop any symptoms, then we would be in scenario 3. If *A* does not provide any additional information, this would lead to scenario 4.
## Infectiousness Profile due to Viral Shedding {#viral-shedding}
Infectiousness of COVID-19, i.e. how much infectious material is being shed, varies as a function of time since infection, the development of symptoms and (if available) the DSO, see for example @he_etal2020. Assuming the amount of virus shed by a symptomatic case *A* is described by the function $v_A(d)$, where $d$ is the days since DSO in *A*. We expect the function to be positive even for negative $d$ values, due to pre-symptomatic transmission. Note that the scale of $v_A(d)$ is in principle arbitrary for our purposes as we are interested in the amount of virus shedding compared to the potential maximum value which informs the eventual classification into the risk levels I to VIII. Note also that symptomatic cases with unknown date of onset and never-symptomatic cases are not handled by the function $v_A(d)$. These will be addressed later in this section.
The study by @he_etal2020 examines the temporal dynamics of viral shedding and infectiousness of symptomatic COVID-19 cases. They provide results informing on the transmission risk for contacts of symptomatic cases.
Based on 77 identified transmission pairs @he_etal2020 inferred the infector's infectiousness profile with respect to symptom onset by fitting a left-shifted gamma-distribution to the empirical frequency of observed transmissions occurring at $d$ days before or after symptom onset of the infector. The left-shifting of the gamma-distribution allows for potential pre-symptomatic transmission. The resulting infectious profile is shown in Fig. 1c (middle) in @he_etal2020 and is subsequently
corrected in @he_etal2020correction, suggesting that infectiousness starts as early as 5-6 days prior to DSO and ends 8 days after DSO, with peak infectiousness at one day before DSO.
Thus, for the profile function $v_A(d)$ we implement a discretized version of the infectiousness profile as inferred by the corrected version of @he_etal2020. The discretized profile for each day $d$ computes the mean infectiousness within $[d,d+1)$, for instance the value $v_A(-2)$ refers to the mean infectiousness within the interval $[-2,-1)$ days, i.e. with respect to time of symptom onset. The profile $v_A(d)$ is normalized such that the maximum infectiousness by day equals one, which occurs at $d = - 1$.
<!-- He et al. (2020) work -->
```{r script_from_He, eval=FALSE, cache=TRUE, echo=FALSE}
# Run Script from He et al. (2020) - git clone https://github.com/ehylau/COVID-19
wd <- getwd()
setwd("he_etal2020")
source("Fig1c_RScript.R", local = FALSE)
setwd(wd)
```
```{r infectiousness_profile_as_in_He, echo=FALSE}
# Based on the work of He et al. (2020) we discretize the infectious profile
# in their Fig 1c middle.
# The following parameter estimates for the shifted gamma distribution are derived
# from the "Fig1c_RScript.R" script referenced above and hardcoded here for ease
# of computation.
inf.par <- c(2.11577895060007, 0.689858288192386, 2.30669123253302)
inf.par <- c(20.516508, 1.592124, 12.272481)
# Grid where to compute the values
x_grid <- seq(-7, 14, by = 1)
# Calculate CDF of the shifted gamma
cdf <- pgamma(x_grid + inf.par[3], inf.par[1], inf.par[2])
# Discretize to get PMF and name the vector with the support from -3, ... 13
pmf <- cdf[-1] - cdf[-length(cdf)]
names(pmf) <- x_grid[-length(x_grid)]
# Show result
(d_infprofile_check <- round(pmf, digits = 3))
```
```{r, echo=FALSE}
# Original values resulting from He et al. (2020)
# d_infprofile <- c(
# "-3" = 0.015, "-2" = 0.185, "-1" = 0.237, "0" = 0.197, "1" = 0.139,
# "2" = 0.091, "3" = 0.057, "4" = 0.034, "5" = 0.02, "6" = 0.011,
# "7" = 0.006, "8" = 0.004, "9" = 0.002, "10" = 0.001, "11" = 0.001,
# "12" = 0
# )
# Corrected values from He et al. (2020) - https://github.com/ehylau/COVID-19
# see below
```
```{r infectiousness_profile, fig.cap = "Assumed infectiousness profile."}
# Discretized values resulting from from He et al. (2020) correction
d_infprofile <- c(
"-7" = 0.002, "-6" = 0.009, "-5" = 0.025, "-4" = 0.054,
"-3" = 0.090, "-2" = 0.122, "-1" = 0.140, "0" = 0.140, "1" = 0.124,
"2" = 0.100, "3" = 0.073, "4" = 0.049, "5" = 0.031, "6" = 0.019,
"7" = 0.010, "8" = 0.006, "9" = 0.003, "10" = 0.001, "11" = 0.001,
"12" = 0, "13" = 0
)
d_infprofile <- d_infprofile / max(d_infprofile)
ggplot_pmf(d_infprofile) +
xlab("Time relative to symptom onset (Days) of infector") +
ylab("relative infectiousness") +
coord_cartesian(ylim = c(0, 1))
```
It should be noted that the inferred infectious profile by @he_etal2020 has some bias, since the infectiousness profile represents the conditional probability density of the time point of transmission relative to the infector's date of symptom onset, given that a transmission from A to B occurs. Thus, for transmission pairs with frequent contacts within a wide exposure window, this observed time of transmission merely represents the time of 'first successful transmission'. However, this eliminates the possibility for further transmissions (to be observed) between these persons, although the actual infectiousness of the infector might persist (and even further increase) after the observed transmission. The infectiousness was estimated to have completely vanished by 8 days after symptom onset, which might be just a consequence of this bias.
An indicator for this bias is the amount of virus shedding as a function of time since onset of symptoms, which is also provided within @he_etal2020. Virus shedding very likely corresponds well to the actual infectiousness of a symptomatic person and it was found that viral load only gradually decays until the end of the testing window (21 days after symptom onset), which suggests that infectiousness persists even beyond 8 days after symptom onset as inferred through the transmission pairs. However, a study by @woelfel_etal2020 suggests, that persistent viral shedding beyond 8 days after symptom onset does not yield viable viruses that could be amplified in cell culture, and @arons_etal2020 observed in a 'Skilled Nursing Facility' setting, that viable viruses were recovered more often in the days prior to symptom onset than in the subsequent days. Thus, the exact connection between the amount of shedding and the corresponding risk of transmitting the virus is still unclear.
Still, a more realistic infectiousness profile could be informed by also accounting for the duration of virus shedding, resulting in a longer duration of persisting infectiousness. Combining both of these data sources from @he_etal2020 with available and future data on virus viability in cell culture within a holistic statistical framework might yield such an adjusted infectiousness profile curve, which should be utilized in a future adoption of the transmission risk calculation.
## Operational Delays {#delays}
The delay between sampling until test result, i.e. $T_{\text{result}} - T_{\text{sampling}}$, is given by
```{r d_samp2res}
d_samp2res <- c("0" = 0.1, "1" = 0.7, "2" = 0.2)
```
The delay between the test result and the upload, i.e. $T_{\text{upload}} - T_{\text{result}}$, is given by
```{r d_res2upload}
d_res2upload <- c("0" = 0.7, "1" = 0.25, "2" = 0.05)
```
Note that the chosen probabilities in these two distributions are no more than educated expert guesses. They have to be adjusted based on real data once the system is running.
### Sampling to Upload
The convolution of the time from (sampling to getting the result) and (getting the result to uploading) leads to the following distribution for the time between sampling and upload:
```{r d_samp2upload, fig.cap = "PMF of the duration between sampling and upload."}
(d_samp2upload <- convolute(d_samp2res, d_res2upload))
ggplot_pmf(d_samp2upload) + xlab("Duration Sampling until Upload (Days)")
```
```{r d_samp2upload_check, echo=FALSE, results="hide"}
# Check by simulation that the `convolution` function works
X <- sample(as.numeric(names(d_samp2res)), size = 1e7, replace = TRUE, prob = d_samp2res)
Y <- sample(as.numeric(names(d_res2upload)), size = 1e7, replace = TRUE, prob = d_res2upload)
Z <- X + Y
pmf_z <- prop.table(table(Z))
all.equal(as.numeric(pmf_z), as.numeric(d_samp2upload), tolerance = 1e-3)
```
### Symptoms to Upload for Symptomatic
To compute the delay between onset of symptoms and and upload we need a distribution for the time between symptom onset and time of sampling. Note that for this distribution we assume that the infected person gets tested *because* of the developed symptoms and thus this delay cannot be negative. For the alternative scenario, in which a person gets tested for other reasons, the onset of symptom might occur after taking a sample. This case is treated further below.
We assume $T_{\text{sampling}} - T_{\text{symptoms}}$ for symptomatic individuals is a discrete distribution with the following PMF:
```{r d_symp2samp}
d_symp2samp <- c("0" = 0.1, "1" = 0.6, "2" = 0.2, "3" = 0.1)
```
This leads to the convoluted time from DSO to upload displayed as follows:
```{r d_symp2upload, fig.cap="PMF of the duration between DSO and upload."}
(d_symp2upload <- convolute(d_symp2samp, d_samp2upload))
ggplot_pmf(d_symp2upload) + xlab("Duration Symptom Onset until Upload (Days)")
```
### Upload to Symptoms for Pre-Syptomatic
Here, we cover the scenario in which people get tested independent of having developed any symptoms beforehand and in case of a positive test result, upload their diagnosis keys. This could, e.g., be the case for individuals which are tested as part of contact tracing, because they were identified as a contact at risk after having had contact with an infected person. For these cases the time from symptom onset to upload may be in fact negative, since symptoms might begin after receiving the test result and upload. Note that we here condition on DSO actually occurring, which is not the case for truly asymptomatic cases.
To compute the delay distribution for this case we define the delay distribution from sampling to DSO for cases which were tested within their pre-symptomatic phase and will develop symptoms afterwards. Since a PCR-test result is most likely to be positive in a phase of considerable viral shedding, we assume that only tests within the pre-symptomatic phase will come out positive, which means that only tests from samples taken within the 7 days prior to DSO will be positive. Thus, as we have no further information when suspected cases get tested during their pre-symptomatic stage, we assume that it takes between 1 and 7 days from sampling to symptom onset, each with probability $1/7$. In other words, the sampling occurs 1 to 7 days _prior_ to DSO for pre-symptomatically tested. Thus the time from DSO to sampling is negative. We denote the distribution of this delay by `d_symp2samp_presymptomatic`.
```{r d_symp2samp_presymptomatic}
(d_symp2samp_presymptomatic <- convolute(
c("0" = 1 / 7, "1" = 1 / 7, "2" = 1 / 7,
"3" = 1 / 7, "4" = 1 / 7, "5" = 1 / 7, "6" = 1 / 7), c("-7" = 1)))
```
By convoluting the (negative) time delay from symptom onset to sampling `d_symp2samp_presymptomatic` with the delay from sampling to upload `d_samp2upload` we obtain the overall time delay from upload to symptom onset, denoted by `d_upload2symp`. Note that this delay may be negative or positive, since some cases, although tested in a pre-symptomatic stage, might have already developed symptoms at the time of upload.
```{r d_upload2symp, fig.cap = "PMF of the duration between DSO and upload."}
d_upload2symp <- convolute(d_symp2samp_presymptomatic, d_samp2upload)
ggplot_pmf(d_upload2symp) + xlab("DSO relative to upload date (Days)")
```
```{r d_upload2symp_check, echo=FALSE}
# Sanity check
stopifnot(isTRUE(all.equal(sum(d_upload2symp), 1)))
```
## Raw Transmission Risk {#transmission_raw}
Assume that we know that *A* got a positive test result, which is uploaded to the server on day $t_0$. All contacts that occurred $d = t_0 - t_C$ days ago will be assigned the same *transmission risk level* $\lambda_A(d)$, containing information on the infectiousness of a generic contact of *A* on that day. The infectiousness and therefore also the risk level $\lambda_A(d)$ might depend on further information provided by *A*, in particular whether *A* developed symptoms by the time of upload and in that case the day of symptom onset $T^A_S=t^A_S$, which would imply that $t^A_S \leq t_0$.
The *transmission risk level* $\lambda_A(d)$ is derived in two steps:
1. First, we compute the _raw relative infectiousness_ (also referred to as _transmission risk_) $\lambda^{(raw)}_A(d)$ of *A* given the provided information, which serves as a continuous infectiousness value on the scale $[0,1]$. For this step we distinguish between the four possible scenarios of available information introduced in section [Scenarios and Events](#scenarios).
2. In a second step, this raw infectiousness is translated into a *transmission risk level* $\lambda_A(d)$, which is denoted by roman numerals from $\text{\Romannum{1}}$ to $\text{\Romannum{8}}$ and which will be further translated into a transmission risk value to be used within the *total risk score* calculation. This classification will be explained in section [Transmission Risk Levels](#trl).
### Case 1: Availability of $t^A_S$
If the date of symptom onset is available, i.e. in the event $\mathcal{Symp}_A(T^A_S = t^A_S)$, then at time $t_0$, the onset of symptoms in the primary case *A* happened $t_0 - t^A_S \geq 0$ days ago. So the relative infectiousness of case *A* relative to $t_0$ is just $v_A(-d)$ shifted $t_0 - t^A_S$ days to the right, i.e $v_A(-d + (t_0 - t^A_S)) = v_A(t_C- t^A_S)$.
```{r d_infprofile_t0, fig.height=5, warning=FALSE, fig.cap = "Plot of the infectiousness profile, if symptom onsets happened 1 (a) or 2 (b) days before upload."}
# Infectious profile relative to t0
d_infprofile_t0 <- function(t0_minus_tS, infprofile = d_infprofile) {
res <- infprofile
names(res) <- as.numeric(names(res)) - t0_minus_tS
res
}
plot_it <- function(t0_minus_tS, xlim) {
d_infprofile_t0_diff1 <- d_infprofile_t0(t0_minus_tS = t0_minus_tS)
ggplot_pmf(d_infprofile_t0_diff1) +
xlab(expression("Time of exposure of secondary case relative to "
* t[0] * " and " * t[S]^A *
" in primary case known (Days)")) +
ylab("relative infectiousness") +
ggtitle(substitute(t[0] - t[S]^A == a, list(a = t0_minus_tS))) +
xlim(xlim)
}
gridExtra::grid.arrange(plot_it(1, xlim = c(-10, 10)), plot_it(2, xlim = c(-10, 10)))
```
Hence, for a given DSO $t_S^A$ we define the raw transmission risk
$$\lambda^{(raw)}_A(d, \mathcal{Symp}_A(T^A_S = t^A_S)) \>=\> v_a(-d+t_0-t_S^A),$$
where $d = t_0 - t_C$ refers to the duration in days between the time of exposure and the time of upload. Again, here we only provide the raw value $\lambda^{(raw)}_A$, i.e. the relative infectiousness on a $[0,1]$-scale. These raw infectiousness values as a function of DSO and time since exposure are shown below.
```{r matrixdsoexposure}
# Maximum number of days since exposure to upload
max_dse <- 14
# Maximum number of days since onset of symptoms
max_dso <- 21
M_case1 <- matrix(0, nrow = max_dso + 1, ncol = max_dse + 1,
dimnames = list(days_since_symptoms = 0:max_dso, days_since_exposure = 0:max_dse))
for (i in 0:max_dso) {
inf_profile <- d_infprofile_t0(t0_minus_tS = i)
days <- as.numeric(names(inf_profile))
days_since_exposure <- days * (-1)
# Only pick events in the past since you condition on the two having met
reasonable_days <- (days <= 0) & (days_since_exposure <= max_dse)
days_since_exposure <- days_since_exposure[reasonable_days]
M_case1[i + 1, days_since_exposure + 1] <- inf_profile[reasonable_days]
}
```
```{r ggplot_rel_inf, echo=FALSE, message=FALSE, fig.width=8, fig.height=8, fig.cap="Raw transmission risk for Case 1: A is symptomatic with DSO provided."}
# Convert to data.frame
matrix_to_df <- function(mat) {
data.frame(
x = as.vector(row(mat) - 1),
y = as.vector(col(mat) - 1),
M = as.vector(mat)
)
}
df_M_case1 <- matrix_to_df(M_case1)
# Show the scale
plot_rel_infectiousness(
df_M_case1,
ylab_text = "Days since symptoms in infector",
breaks_y = 0:max_dso
)
```
### Case 2: Symptomatic, but no Availability of $t_S^A$, only Day of Upload
Here we consider the case, in which we know that *A* is symptomatic at time of upload, but do not know the exact DSO, i.e. the event $\mathcal{Symp}_A(T^A_S \leq t_0)$. In this case, we can infer a probability distribution for the DSO $T_S^A$ from our knowledge about the duration between DSO and upload given by `d_symp2upload` and knowing the date of upload $t_0$. We therefore define the raw transmission risk as the conditional expectation of $\lambda_A^{(raw)}$ given that *A* was symptomatic somewhere before $t_0$. We obtain this expectation by marginalizing over the possible DSO $T_S^A$, i.e.
$$
\begin{aligned}
\lambda_A^{(raw)}(d,\,\mathcal{Symp}_A(T^A_S \leq t_0)) &= \mathbb{E}_{T_S^A}\Big[ \lambda^{(raw)}_A(d, \mathcal{Symp}_A(T^A_S = t^A_S)) \>\Big|\> \mathcal{Symp}_A(T^A_S \leq t_0)\Big] \\
&= \sum_{t_S^A = t_0-14}^{t_0} \lambda^{(raw)}_A(d, \mathcal{Symp}_A(T^A_S = t^A_S)) \> \cdot d_{\texttt{symp2upload}}(t_0-t_S^A)
\end{aligned}
$$
```{r m_case2_and_plt, , fig.height=2, fig.cap="Raw transmission risk for Case 2: A is symptomatic with unknown DSO."}
M_case2 <- numeric(max_dse + 1)
for (days_since_exposure in 0:max_dse) {
case1_val <- M_case1[as.numeric(names(d_symp2upload)) + 1, days_since_exposure + 1]
M_case2[days_since_exposure + 1] <- sum(case1_val * d_symp2upload)
}
# Convert result to matrix for better comparison
M_case2 <- matrix(
M_case2,
ncol = max_dse + 1,
nrow = 1,
dimnames = list("no info about DSO at upload", days_since_exposure = 0:max_dse)
)
# Convert to data.frame
df_M_case2 <- matrix_to_df(M_case2)
# Show the scale
plot_rel_infectiousness(df_M_case2)
```
Due to marginalization, the resulting raw transmission value (as a function of $d = t_0 - t_C$) is flatter and wider compared to a raw transmission value curve for a known date of symptom onset $t_S^A$ (see case 1). In particular, the maximum raw infectiousness value is reached at `r round(max(df_M_case2$M), 2)` compared to 1 in the case of a known DSO. This difference shows the additional gain in risk assessment within the case of knowing the DSO.
Note that for intermediate scenarios in which some additional information, like the date of sampling, is provided by *A*, one can compute the raw transmission value similarly by marginalizing out the date of symptom onset with respect to the corresponding delay distribution. This again could yield a more informative value compared to relying on the upload date alone.
#### Addendum -- DSO only known by week
It might be, that a person does not remember the DSO precisely, but can only constrain the DSO to lie within a certain period of time, e.g. only the week it was in ("in the last 7 days", "1-2 weeks ago", "more than 2 weeks ago"). In this case, we can assume an uniformly distributed DSO within that period or with a slightly skewer (geometric) distribution (motivated by the [Forgetting Curve](https://en.wikipedia.org/wiki/Forgetting_curve)). In either case we we can argue similarly as before:
$$
\begin{aligned}
\lambda_A^{(raw)}(d,\,\mathcal{Symp}_A(t_{\text{min}} \leq T^A_S \leq t_{\text{max}})) &= \mathbb{E}_{T_S^A}\Big[ \lambda^{(raw)}_A(d, \mathcal{Symp}_A(T^A_S = t^A_S)) \>\Big|\> \mathcal{Symp}_A(t_{\text{min}} \leq T^A_S \leq t_{\text{max}})\Big] \\
&= \sum_{t_S^A = t_{\text{min}}}^{t_{\text{max}}} \lambda^{(raw)}_A(d, \mathcal{Symp}_A(T^A_S = t^A_S)) \> \cdot d_{\texttt{missingDSO}}(t_S^A).
\end{aligned}
$$
where $[t_{\text{min}}, t_{\text{max}}]$ is the interval in which we know the DSO to be located in (e.g. the week) and $d_{\texttt{missingDSO}}(t_S^A)$ is the probability mass function representing the belief in where the true DSO value is located in the interval $\{t_{\text{min}}, \ldots, t_{\text{max}}\}$.
```{r m_case2_mass_fun}
normalize <- function(x) x/sum(x, na.rm = TRUE)
# Discrete uniform over 1:n
d_missing_dso_unif <- function(n) normalize(rep(1, n))
# d_missing_dso_unif(10)
# Geometric over the interval
d_missing_dso_geom <- function(min, max, p=0.1) normalize(pgeom((min:max), prob = p))
# d_missing_dso_geom(7, 13, 0.2)
```
The difference in the geometric and the uniform approach is marginal. In the current week the later days are slightly rated higher, since we can assume that more recent days are remembered more often correctly and thus given as a precise date, not as a period. For the previous and penultimate week the differences to the uniform case almost disappear:
```{r m_missing_dso_dists, fig.height=2, fig.cap="The probability mass function representing the likelihood of the DSO.", include=TRUE, echo=FALSE}
rem <- .1
dist0 <- Reduce(
function(x, y, ...) merge(x, y, all = TRUE, ...),
list(
tibble(
i = 0:6,
y = d_missing_dso_unif(7),
v = "u"),
tibble(
i = 0:6,
y = d_missing_dso_geom(0,6,rem),
v = "0"),
tibble(
i = 0:6,
y = d_missing_dso_geom(7,13,rem),
v = "1"),
tibble(
i = 0:6,
y = d_missing_dso_geom(14,20,rem),
v = "2")))
ggplot(dist0, aes(i,y)) +
geom_point(aes(colour = v), size = 2) +
geom_line(aes(colour = v), size = 1) +
labs(y = "Probability", x="Day", color="", title = "Probability Mass Function") +
scale_colour_brewer(palette = "PiYG",
labels = c("in the last 7 days", "1-2 weeks ago", "more than 2 weeks ago", "uniform")) +
theme_minimal() + ylim(0.0, max(dist0$y))
```
With the geometric approach we get:
```{r m_case2_and_plt_w}
M_case2_minmax <- function(min, max, dist = d_missing_dso_unif(max-min+1)) {
M_case2_h <- numeric(max_dse + 1)
for (days_since_exposure in 0:max_dse) {
case1_val <- M_case1[min:max + 1, days_since_exposure + 1]
M_case2_h[days_since_exposure + 1] <- sum(case1_val * dist, na.rm = TRUE)
}
M_case2_h
}
# Convert result to matrix for better comparison
M_case2_minmax_m <- function(min, max, dist = d_missing_dso_unif(max-min+1)) {
matrix(
M_case2_minmax(min, max, dist=dist),
ncol = max_dse + 1,
nrow = 1,
dimnames = list("no info about DSO at upload", days_since_exposure = 0:max_dse)
)
}
```
```{r m_case2_and_plt_w_test, include=FALSE, echo=FALSE}
# Convert to data.frame
# df_M_case2_00d <- matrix_to_df(M_case2_minmax_m(0,0))
# df_M_case2_01d <- matrix_to_df(M_case2_minmax_m(1,1))
# df_M_case2_12d <- matrix_to_df(M_case2_minmax_m(12,12))
# Show the scale
# plot_rel_infectiousness(df_M_case2_00d)
# plot_rel_infectiousness(df_M_case2_01d)
# plot_rel_infectiousness(df_M_case2_12d)
```
```{r m_case2_and_plt_w0, fig.height=2, fig.cap="Raw transmission risk for Case 2: A is symptomatic with DSO only known to be in the current week (in the last 7 days)."}
df_M_case2_0w <- matrix_to_df(M_case2_minmax_m(0, 6, d_missing_dso_geom(0, 6, 0.1)))
plot_rel_infectiousness(df_M_case2_0w)
```
```{r m_case2_and_plt_w1, fig.height=2, fig.cap="Raw transmission risk for Case 2: A is symptomatic with DSO only known to be in the previous week (1-2 weeks ago)."}
df_M_case2_1w <- matrix_to_df(M_case2_minmax_m(7, 13, dist=d_missing_dso_geom(7, 13, 0.1)))
plot_rel_infectiousness(df_M_case2_1w)
```
```{r m_case2_and_plt_w2, fig.height=2, fig.cap="Raw transmission risk for Case 2: A is symptomatic with DSO only known to be in the penultimate week (more than 2 weeks ago)."}
df_M_case2_2w <- matrix_to_df(M_case2_minmax_m(14, 20, dist=d_missing_dso_geom(14, 20, 0.1)))
plot_rel_infectiousness(df_M_case2_2w)
```
### Case 3: Never-Symptomatic or Pre-Symptomatic
For this case we consider the scenario in which *A* at the time of upload provides the information that there was no onset of symptoms so far, i.e. the event $\mathcal{Asymp}_A(T_S^A > t_0)$. This case implies the two mutually exclusive situations:
i. $\mathcal{Symp}_A(T_S^A > t_0)$, i.e. *A* will develop symptoms at a later time point $T_S^A > t_0$ and is currently in a pre-symptomatic stage
ii. $\mathcal{N\!symp}_A$, i.e. *A* will never develop any symptoms
At the time of upload we will not know which of these two situations applies. For the purpose of computing a transmission risk for case 3 we thus require the probabilities for each of the two scenarios (and for the pre-symptomatic scenario for each sub-scenario $T_S^A = t_0 + n$, for $n \geq 1$) which will then be appropriately weighted together. Thus, let
$$w_n = P(\mathcal{Symp}_A(T_S^A = t_0 + n) \>|\> \mathcal{Asymp}_A(T_S^A > t_0))$$
denote the probabilities for the different sub-cases of the pre-symptomatic scenario, i.e. $w_n$ is the conditional probability for the event that *A* develops symptoms $n$ days after the upload date, where $n \in \{1, \ldots, N\}$ with $N$ being some plausible maximum time delay between upload date and DSO. Analogously, we define
$$w_{\texttt{neversymptomatic}} = P(\mathcal{N\!symp}_A \>|\> \mathcal{Asymp}_A(T_S^A > t_0))$$
which denotes the probability for the never-symptomatic scenario.
Thus, for the pre-symptomatic situation, i.e. $1 \leq n \leq N$, we obtain
$$
\begin{aligned}
w_n &= P\Big(\mathcal{Symp}_A(T_S^A = t_0 + n) \>\Big|\> \mathcal{Symp}_A(T_S^A > t_0) \>\cup\> \mathcal{N\!symp}_A\Big) \\
&= \frac{P\Big(\mathcal{Symp}_A(T_S^A = t_0 + n) \>\cap\> \big(\mathcal{Symp}_A(T_S^A > t_0) \>\cup\> \mathcal{N\!symp}_A\big)\Big)}{P\Big(\mathcal{Symp}_A(T_S^A > t_0) \>\cup\> \mathcal{N\!symp}_A\Big)} \\
&= \frac{P\Big( \mathcal{Symp}_A(T_S^A = t_0 + n)\Big)}{P\big(\mathcal{Symp}_A(T_S^A > t_0)\big) + P(\mathcal{N\!symp}_A)} \\
&= \frac{P\Big(\mathcal{Symp}_A(T_S^A = t_0+n) \>\Big|\> \mathcal{Symp}_A\Big) P(\mathcal{Symp}_A)}{P\Big(\mathcal{Symp}_A(T_S^A > t_0) \>\Big|\> \mathcal{Symp}_A\Big)P(\mathcal{Symp}_A) + P(\mathcal{N\!symp}_A)}
\end{aligned}
$$
```{r p_symp, echo=FALSE}
p_symp <- 0.86
```
In order to compute $w_n$ we thus require the probability that symptom onset occurs $n$ days after the upload date given that onset of symptoms will happen, i.e. $P(\mathcal{Symp}_A(T_S^A = t_0 + n) \>|\> \mathcal{Symp}_A)$. This is covered by the delay distribution `d_upload2symp` introduced further above. Furthermore we need the probability for any infected person to develop symptoms at all, i.e. $P(\mathcal{Symp}_A)$. This probability was in @mizumoto_etal2020 estimated to be $\hat{P}(\mathcal{Symp}_A) = `r p_symp`$. For $P(\mathcal{Symp}_A(T_S^A > t_0) \>|\> \mathcal{Symp}_A)$ we rely on the random time from data upload to symptom onset for suspected cases, who were tested in a pre-symptomatic stage.
Since according to `d_upload2symp` the maximum delay between upload and DSO is three days we set $N=3$. By plugging in the individual numbers we can compute the scenario probabilities $w_n$ for $n \in \{1,2,3\}$. For instance, for $w_1$ we obtain
$$w_1 = \frac{`r sprintf("%.4f",d_upload2symp["1"])` \cdot `r p_symp`}{(`r paste(sprintf("%.4f",d_upload2symp[c("1","2","3")]), collapse=" + ")`) \cdot `r p_symp` + `r 1- p_symp`} = `r sprintf("%.3f",(d_upload2symp["1"] * p_symp)/( sum(d_upload2symp[c("1","2","3")])*p_symp + (1 - p_symp)))`$$
The overall results for $w_n$, $n=1,2,3$, are given below:
```{r wn_computation}
w_n <- (d_upload2symp[c("1", "2", "3")] * p_symp) /
(sum(d_upload2symp[c("1", "2", "3")]) * p_symp + (1 - p_symp))
data.frame(n = 1:3, w_n = w_n)
```
Similarly, we obtain
$$
\begin{aligned}
w_{\texttt{neversymptomatic}}
&= P\Big(\mathcal{N\!symp}_A \>\Big|\> \mathcal{Symp}_A(T_S^A > t_0) \>\cup\> \mathcal{N\!symp}_A\Big) \\
&=\frac{P(\mathcal{N\!symp}_A)}{P\Big(\mathcal{Symp}_A(T_S^A > t_0) \>\Big|\> \mathcal{Symp}_A\Big)P(\mathcal{Symp}_A) + P(\mathcal{N\!symp}_A)}\\
&= \frac{`r 1-p_symp`}{(`r paste(sprintf("%.4f",d_upload2symp[c("1","2","3")]), collapse=" + ")`) \cdot `r p_symp` + `r 1-p_symp`} =
`r sprintf("%.3f",(1- p_symp)/( sum(d_upload2symp[c("1","2","3")])*p_symp + (1 - p_symp)))`
\end{aligned}
$$
```{r w_infty, echo=FALSE}
w_neversymptomatic <- (1 - p_symp) / (sum(d_upload2symp[c("1", "2", "3")]) * p_symp + (1 - p_symp))
### check sum equals 1
stopifnot(abs(sum(w_n) + w_neversymptomatic) - 1 < 1e-10)
```
#### Computation of the Transmission Risk
Knowing the probabilities of the possible scenarios and sub-scenarios in the case of a person being non-symptomatic at the time of uploading a positive test result, we can compute the transmission risk for case 3 by applying the [law of total probability](https://en.wikipedia.org/wiki/Law_of_total_probability), i.e.
$$
\begin{aligned}
&\quad \lambda_A^{(raw)}(d, \mathcal{Asymp}_A(T_S^A > t_0)) \\
&= \mathbb{E}\Big[ \lambda^{(raw)}_A(d, \cdot) \>\Big|\> \mathcal{Symp}_A(T_S^A > t_0) \>\cup\> \mathcal{N\!symp}_A\Big] \\
&= w_{\texttt{neversymptomatic}} \cdot \lambda^{(raw)}_A(d, \mathcal{N\!symp}_A) + \sum_{n = 1}^{3} w_n \cdot \lambda^{(raw)}_A(d, \mathcal{Symp}_A(T_S^A = t_0 + n))
\end{aligned}
$$
```{r factor_asymp_reduce_infectiousness, echo=FALSE, results="hide"}
factor_asymp_reduce_infectiousness <- 0.4
```
Thus, the first term shows that we require the transmission risk function for never-symptomatic people. Here we assume, that never-symptomatic cases have the same infectiousness profile (as a function around the date of upload) like symptomatic people, but with their infectiousness reduced by a factor of `r factor_asymp_reduce_infectiousness` [see @mizumoto_etal2020]. However, note that in this case we cannot rely on the computations from case 2 (symptomatic with unknown DSO), since here the DSO does not follow the distribution `d_symp2upload`, which applies for people who were known to be symptomatic at the time of upload. For this case we instead focus on people who were tested in an non-symptomatic stage, such that the DSO (for people who develop symptoms) is distributed according to `d_upload2symp` subject to the upload date $t_0$. Utilizing the transmission risk from case 1 with known DSO, this yields
$$
\begin{aligned}
&\quad \lambda^{(raw)}_A(d, \mathcal{Asymp}_A) \\
&= `r factor_asymp_reduce_infectiousness`\cdot \mathbb{E}_{T_S^A}\Big[ \lambda^{(raw)}_A(d, \mathcal{Symp}_A(T_S^A = t_S^A)) \>\Big|\> \mathcal{Symp}_A(T_S^A > T_{\text{sampling}}^A),\> \mathcal{Symp}_A\Big] \\
&= `r factor_asymp_reduce_infectiousness`\cdot \sum_{t_S^A = t_0-3}^{t_0 + 3} \lambda^{(raw)}_A(d, \mathcal{Symp}_A(T_S^A = t_S^A)) \cdot d_{\texttt{upload2symp}(t_S^A - t_0)}
\end{aligned}
$$
In code:
```{r m_case_3, , fig.height=2, fig.cap="Raw transmission risk for Case 3: A is non-symptomatic at day of upload."}
### add days_since_symptoms from -7 to -1 to M_case1
M_case1_extended <- rbind(
matrix(0,
ncol = ncol(M_case1),
nrow = 7,
dimnames = modifyList(dimnames(M_case1), list(days_since_symptoms = seq(-7, -1)))
),
M_case1
)
# Go backwards from delay 0 to -7 and shift \lambda_A by one to the left
for (rn in 7:1) {
M_case1_extended[rn, ] <- c(M_case1_extended[rn + 1, -1], 0)
}
### case pre-symptomatic
M_case3_presymtomatic <- w_n %*% M_case1_extended[c("-1", "-2", "-3"), ]
### case non-symptomatic
M_case3_nonsymptomatic <- matrix(0,
ncol = ncol(M_case1), nrow = 1,
dimnames = modifyList(dimnames(M_case1), list(days_since_symptoms = NA_character_))
)
for (d in names(d_upload2symp)) {
M_case3_nonsymptomatic <- M_case3_nonsymptomatic +
(factor_asymp_reduce_infectiousness * M_case1_extended[d, ] * d_upload2symp[d])
}
### combined case
M_case3 <- M_case3_presymtomatic + w_neversymptomatic * M_case3_nonsymptomatic
# Convert to data.frame
df_M_case3 <- matrix_to_df(M_case3)
# Show the scale
plot_rel_infectiousness(df_M_case3)
```
If person *A* has not developed any symptoms up to the day of upload, the transmission risk is highest at the day of upload itself, and gradually decays for each day that a risk contact lays further in the past.
### Case 4: Unknown Symptom Status {#base_case}
The final case represents the scenario in which no further information is available beyond the fact that person *A* uploaded a positive test results at time $t_0$. This case covers several possible scenarios from above and summarizes them into a condensed transmission risk.
We decompose the case into the following possible mutually exclusive scenarios and apply corresponding probabilities for each subscenario:
(i) $\mathcal{Symp}_A(T^A_S \leq T^A_{\text{sampling}})$, i.e. person *A* got tested because they developed COVID-19 related symptoms,
(ii) $\mathcal{Asymp}_A(T^A_S > T^A_{\text{sampling}})$, i.e. person *A* got tested as a suspected case, e.g. because of a risk contact, and was non-symptomatic at the time of sampling. This can be decomposed into:
(a) $\mathcal{Symp}_A(T^A_S > T^A_{\text{sampling}})$, i.e. person *A* develops symptoms after being tested,
(b) $\mathcal{N\!symp}_A \cap \mathcal{Asymp}_A(T^A_S > T^A_{\text{sampling}})$, i.e. person *A* is a never-symptomatic case and will thus never develop symptoms.
In summary we get the following disjoint decomposition:
$$
\Omega = \mathcal{Symp}_A(T^A_S \leq T^A_{\text{sampling}}) \;\cup\; \mathcal{Symp}_A(T^A_S > T^A_{\text{sampling}})
\;\cup\; \Big(\mathcal{N\!symp}_A \cap \mathcal{Asymp}_A(T^A_S > T^A_{\text{sampling}})\Big).
$$
The transmission risk as a function of time difference $d$ between day of upload and day of contact for each of these scenarios can be derived based on results and considerations from the above cases 1 to 3.
For (i) we can apply the risk value function from case 2 (symptomatic with unknown DSO), i.e.
$$
\lambda_A^{(raw)}(d,\mathcal{Symp}_A(T^A_S \leq T^A_{\text{sampling}})) = \lambda_A^{(raw)}(d,\mathcal{Symp}_A(T^A_S \leq t_0)).
$$
For (ii.a) we can rely on the same considerations as made for case 3 (non-symptomatic at time of upload). Again, note that for the here reflected scenario we assume that person *A* was pre-symptomatic at time of testing. For considering the potential DSOs, we therefore again rely on the delay distribution between upload and DSO used for pre-symptomatically tested people, i.e. `d_upload2symp`. Thus, we compute the expected risk value by marginalizing over the DSO which follows `d_upload2symp` subject to the given date of upload $t_0$, i.e.
$$
\begin{aligned}
&\quad \lambda_A^{(raw)}(d, \mathcal{Symp}_A(T^A_S > T^A_{\text{sampling}})) \\
&= \mathbb{E}_{T_S^A}\Big[ \lambda^{(raw)}_A(d, \mathcal{Symp}_A(T_S^A = t_S^A)) \>\Big|\> \mathcal{Symp}_A(T_S^A > T_{\text{sampling}}^A)\Big] \\
&= \sum_{t_S^A = t_0-3}^{t_0 + 3} \lambda^{(raw)}_A(d, \mathcal{Symp}_A(T_S^A = t_S^A)) \cdot d_{\texttt{upload2symp}(t_S^A - t_0)} \\
\end{aligned}
$$
For (ii.b) we follow the same arguments as in case 3 for never-symptomatic cases, which effectively yields $\lambda_A^{(raw)}(d, \mathcal{Symp}_A(T^A_S > T^A_{\text{sampling}}))$ times a factor for the reduced relative infectiousness, i.e.
$$
\begin{aligned}
&\quad \lambda_A^{(raw)}(d, \mathcal{N\!symp}_A \cap \mathcal{Asymp}_A(T^A_S > T^A_{\text{sampling}})) \\
&= \lambda^{(raw)}_A(d, \mathcal{N\!symp}_A) \\
&= `r factor_asymp_reduce_infectiousness` \cdot \lambda_A^{(raw)}(d, \mathcal{Symp}_A(T^A_S > T^A_{\text{sampling}}))
\end{aligned}
$$
```{r p_test_asymp, echo=FALSE}
p_suspect <- 0.2
```
To define the overall transmission risk for case 4, we require probabilities for how likely each of the distinct scenarios is to occur. In order to weigh scenarios (ii.a) and (ii.b) we again rely on the probability that a proportion of `r scales::percent(p_symp)` of infected cases develop symptoms (@mizumoto_etal2020). This leaves the question on what proportion of cases, who upload their positive test result, were already tested while being asymptomatic. For now, we can only assume that this proportion is rather low, at $p_{\texttt{suspect}} = `r p_suspect`$. However, the proportion $p_{\texttt{suspect}}$ is likely subject to change over time, in particular due to the impact of the _Corona-Warn-App_ itself, as it has the primary purpose of identifying potentially infected people and initiate corresponding tests in an early stage of their course of infection. Altogether, this leads to the transmission risk for case 4 being defined as follows.
$$
\begin{aligned}
&\quad \lambda_A^{(raw)}(d, \Omega) \\
&= (1 - p_{\texttt{suspect}}) \cdot \lambda_A^{(raw)}(d, \mathcal{Symp}_A(T^A_S \leq T^A_{\text{sampling}})) \> + \\
&\qquad p_{\texttt{suspect}} \cdot `r p_symp` \cdot \lambda_A^{(raw)}(d, \mathcal{Symp}_A(T^A_S > T^A_{\text{sampling}})) \> + \\
&\qquad p_{\texttt{suspect}} \cdot (1 - `r p_symp`) \cdot \lambda_A^{(raw)}(d, \mathcal{N\!symp}_A \cap \mathcal{Asymp}_A(T^A_S > T^A_{\text{sampling}}))
\end{aligned}
$$
This leads to the following transmission risks.
```{r m_case_4, fig.height=2, fig.cap="Raw transmission risk for Case 4: no information about A at upload."}
### combined case
M_case4 <- (1 - p_suspect) * M_case2 +
p_suspect * p_symp * (1 / factor_asymp_reduce_infectiousness) * M_case3_nonsymptomatic +
p_suspect * (1 - p_symp) * M_case3_nonsymptomatic
# Convert result to matrix for better comparison
M_case4 <- matrix(M_case4,
ncol = max_dse + 1,
nrow = 1,
dimnames = list("no information at upload", days_since_exposure = 0:max_dse)
)
# Convert to data.frame
df_M_case4 <- matrix_to_df(M_case4)
# Show the scale
plot_rel_infectiousness(df_M_case4)
```
Note that case 4 represents the baseline case that is applied if the app does not or cannot collect any additional information about person *A*, which will apply at least for the initial public version of the _Corona-Warn-App_.
## Transmission Risk Level {#trl}
The final step for generating the *transmission risk level* is to translate the raw transmission risk from the continuous scale $[0,1]$ into a discrete scale (categories from _I_ to _VIII_ in roman numbers).
To do so, we set equidistant thresholds within the $[0,1]$-scale, such that the scale is decomposed into 8 equal-sized intervals. A raw transmission risk $\lambda_A^{(raw)}(d, \cdot)$ within such an interval is then mapped onto the respective level $\lambda_A(d, \cdot)$, where the $\cdot$ stands for one of the four case scenarios, i.e.
$$
\lambda_A(d, \cdot) = l \quad\Leftrightarrow\quad \frac{l-1}{8} \leq \lambda^{(raw)}_A(d, \cdot) < \frac{l}{8},
$$
where $l \>\in \{1, \ldots, 8\}$. Note: we define that raw values with $\lambda^{(raw)}_A(d, \cdot) = 1$ are also classified as *transmission risk level* VIII.
### Transmission Risk Levels for Cases 1--4
This classification rule leads to the following risk levels for the raw risks obtained from each of the above considered case scenarios.
```{r plot_case1, echo=FALSE, message=FALSE, fig.cap="Transmission risk level for Case 1: A is symptomatic with DSO provided.", fig.width=5.5, fig.height=8}
df_M_case1_discrete <- discretize_matrix(M_case1)
plot_risk_levels(
df_M_case1_discrete,
breaks_y = 0:21,
ylab_text = "Days since symptoms in A"
)
```
```{r plot_case2, echo=FALSE, message=FALSE, fig.height=1.3, fig.width=5, fig.cap="Transmission risk level for Case 2: A is symptomatic but with unknown DSO."}
df_M_case2_discrete <- discretize_matrix(M_case2, max_value = max(M_case1))
plot_risk_levels(df_M_case2_discrete)
```
```{r plot_case3, echo=FALSE, message=FALSE, fig.height=1.3, fig.width=5, fig.cap="Transmission risk level for Case 3: A is non-symptomatic at day of upload."}
df_M_case3_discrete <- discretize_matrix(M_case3, max_value = max(M_case1))
plot_risk_levels(df_M_case3_discrete)
```
```{r plot_case4, echo=FALSE, message=FALSE, fig.height=1.3, fig.width=5, fig.cap="Transmission risk level for Case 4: no information about A at upload."}
df_M_case4_discrete <- discretize_matrix(M_case4, max_value = max(M_case1))
plot_risk_levels(df_M_case4_discrete)
```
#### Addendum -- DSO only known by week
For the DSO only known by the week it was in, we get:
```{r plot_case2_w0, echo=FALSE, message=FALSE, fig.height=1.3, fig.width=5, fig.cap="Transmission risk level for Case 2: A is symptomatic but with DSO only known to be in the current week (in the last 7 days)."}
df_M_case2_discrete_0w <- discretize_matrix(M_case2_minmax_m(0, 6, d_missing_dso_geom(0, 6, 0.1)),
max_value = max(M_case1))
plot_risk_levels(df_M_case2_discrete_0w)
```
```{r plot_case2_w1, echo=FALSE, message=FALSE, fig.height=1.3, fig.width=5, fig.cap="Transmission risk level for Case 2: A is symptomatic but with DSO only known to be in the previous week (1-2 weeks ago)."}
df_M_case2_discrete_1w <- discretize_matrix(M_case2_minmax_m(7, 13, d_missing_dso_geom(7, 13, 0.1)),
max_value = max(M_case1))
plot_risk_levels(df_M_case2_discrete_1w)
```
```{r plot_case2_w2, echo=FALSE, message=FALSE, fig.height=1.3, fig.width=5, fig.cap="Transmission risk level for Case 2: A is symptomatic but with DSO only known to be in the penultimate week (more than 2 weeks ago)."}
df_M_case2_discrete_2w <- discretize_matrix(M_case2_minmax_m(14, 20, d_missing_dso_geom(14, 20, 0.1)),
max_value = max(M_case1))
plot_risk_levels(df_M_case2_discrete_2w)
```
### Transmission Risk Levels for the Base Case
The initial version of the _Corona-Warn-App_ does not offer the possibility to provide additional information on symptom status when uploading a positive test result. Thus, the transmission risk is always calculated as in case 4, where no further information is available, i.e. the event $\Omega$.
For the risk level calculation, the transmission risks are classified into levels based on the highest achievable value from case 1, in which the DSO is known. This leads to the highest possible *transmission risk level* in case 4 being at `r df_M_case4_discrete$M %>% max %>% as.roman`. However, since no additional information is provided for the initial version of the app, this case does never actually apply. Due to averaging over all possible scenarios in case 4, selecting the transmission level from case 4 in the above calculations implies that the *transmission risk level* has an upper level of `r df_M_case4_discrete$M %>% max %>% as.roman`. Therefore the app does not make use of the full level scale for the transmission risk, in particular levels VII and VIII are effectively never used. This could mean that the transmission risk weights less in the subsequent *total risk score* computations.
To address this issue we re-adjust the level classification by accounting for the maximum possible transmission risk achieved within case 4. For this purpose we will also refer to case 4 as the _base case_ since it represents the only possible case within the initial app version. Thus we obtain
$$
\lambda_A(d) = \lambda_A(d, \text{Base case}) = l \quad\Leftrightarrow\quad \frac{l-1}{8} \leq \frac{\lambda^{(raw)}_A(d, \Omega)}{\max_{d \in\{0\ldots,`r max_dse`\}} \lambda^{(raw)}_A(d, \Omega)} < \frac{l}{8},
$$
where $l \>\in \{1, \ldots, 8\}$ and where again raw values with $\lambda^{(raw)}_A(d, \Omega) = \max_{d \in\{0\ldots,`r max_dse`\}} \lambda^{(raw)}_A(d, \Omega)$ are also classified as level 8. This leads to the following risk levels for this single base case:
```{r plot_level_case_4, echo=FALSE, message=FALSE, fig.width=5, fig.height=1.3, fig.cap="Transmission risk level for the Base Case."}
df_M_isolated_case4_discrete <- discretize_matrix(M_case4)
plot_risk_levels(df_M_isolated_case4_discrete, title = "Base Case")
```
These are the *transmission risk levels* to be used for the initial app version. Once the app offers the possibility to provide additional information, which then enables the cases 1 to 3, we advise to fall back to the case-specific risk levels $\lambda_A(d, \cdot)$ as defined above (and maybe adjust alarm thresholds within the app accordingly).
# Discussion {#discussion}
The present document reflects an epidemiological motivation of the *transmission risk level* used as part of the Corona-Warn-App. In particular we show how to inform this parameter by epidemiological information about COVID-19 and the processes leading to a confirmed diagnosis in order to identify periods of increased infectiousness. A stochastic framework was used to characterize the sequence of events. However, any such stochastic model is subject to assumptions and parameter choices, which need to be carefully evaluated. Given the present state of information about COVID-19 and the current simultaneous demands on resources, such an extensive evaluation and sensitivity analysis was not possible. As a consequence the present document reflects our best knowledge at the time of publication, but we strongly urge that relevant areas are identified for further evaluation. In particular this could consist in quantifying some of the operational delays from real-world data, once the app is in use.
With regards to some of the other parameters, a special challenge lies in the decentralized structure of the app. This was a non-negotiable design decision in the run-up of developing the app. However, in order to improve the app, epidemiological studies with a sample of voluntary users could be designed, in order to evaluate the appropriateness of not just the *transmission risk level*, but the overall classificational approach of the app.
One component to evaluate continuously should be the infectiousness profile utilized in this model. Our current estimate inspired by @he_etal2020 contains several time- and context-specific quantities such as the contact structure and the resulting serial interval. As an example, the serial interval is a time-dependent quantity of an outbreak, e.g., because susceptible persons are depleted (@svensson2007), or because increased contact tracing efforts, reduce the period of infectiousness. Finally, the introduction and increased and successful use of the app may alter the very parameters that we estimated for its calibration.
Altogether, it would be important to continuously evaluate and update the infectiousness profile, e.g., by being better able to connect information about viral shedding with the risk of infection.
Another important limitation is that throughout the document we assumed that *A* infected *B*. However, the likelihood and the consequences that it was *B* who infected *A* should be further investigated - this seems particularly important in a setting with many asymptomatic or pre-symptomatic transmissions.
One important point of this document is, that a better transmission risk assessment would be possible, if information about day of symptom onset of the person uploading his or her *diagnosis keys* would be available. The specific gain, for example in terms of sensitivity and specificity, is currently not provided, but could be quantified using simulation.
However, besides the epidemiological aspect two other aspects are important: Firstly, using additional information about *A* and mapping that to levels from I-VIII can be a privacy risk, because a potential attacker could reveal some information about *A* from level and time of upload. In the specific case, if provided, the DSO of *A* can probably be inferred in some situations. Secondly, the information about DSO will be self-reported. In our computations we assume DSO to be reported correctly, but misreports are likely to occur, which could also lead to serious bias in the *transmission risk level*. A malicious user can also falsely report his or her DSO to cause false alarms. Hence, one has to carefully balance these two aspects, e.g., by evaluating different formats of DSO specification in the App. For a thoroughly discussion of potential risk of a proximity tracing app see [Privacy and Security Attacks on Digital Proximity Tracing Systems](https://github.com/DP-3T/documents/blob/master/Security%20analysis/Privacy%20and%20Security%20Attacks%20on%20Digital%20Proximity%20Tracing%20Systems.pdf).
The epidemiological motivated *transmission risk level* has value in contact tracing beyond the App. Close work together with local health authorities and those actually performing the contact tracing in the field is, however, necessary to make this added-value more specific.
## Quality Assurance
The calculations in the present document reflect one of two implementations, which were made as part of a [two-mindset approach](https://staff.math.su.se/hoehle/blog/2017/09/02/pairprogramming.html) for quality assurance of the *transmission risk level* computations.
## Acknowledgements
We thank Eric H. Y. Lau for his quick and friendly response to some additional questions regarding underlying data of the @he_etal2020 publication.
# Appendix
## Convolution of Discrete Random Variables {#convolution}
In what follows we will use the [fact](https://en.wikipedia.org/wiki/Convolution_of_probability_distributions) that for two *independent* integer random variables $X$ and $Y$ with PMFs $f_X$ and $f_Y$, respectively, we will have for their sum $Z=X+Y$ that
$$
f_Z(z) = \sum_{x} f_X(x) f_Y(z-x).
$$
This can be handled in `R` using the following function, where we assume that the PMF is formulated as a vector where the elements denote the individual probabilities and the names of the vector contain the support of the distribution.
```{r convolute, purl=TRUE}
#' Function to convolute two discrete probability distributions with
#' support on 0, 1, ... I.e. we compute the PMF of Z = X + Y.
#'
#' @param fX - the PMF of X given as a named vector, where the names
#' represent the support X_min, ..., 0, 1, ..., X_max
#' @param fY - the PMF of Y given as a named vector, where the names
#' represent the support Y_min, ..., 0, 1, ..., Y_max
#'
#' @return A names vector containing the PMF of Z = X+Y.
#'
#' @examples
#' convolute(
#' c(`-2` = .2, `-1` = .3, `0` = .2, `1` = .1, `2` = .1),
#' c(`-1` = .2,`0` = .3, `1` = .2, `2` = .2, `3` = 0.1)
#' )
convolute <- function(fX, fY) {
fZ <- rep(0, 1 + (length(fX) - 1) + (length(fY) - 1))
names(fZ) <- as.character(seq(
min(as.numeric(names(fX))) + min(as.numeric(names(fY))),
max(as.numeric(names(fX))) + max(as.numeric(names(fY)))
))
for (i in names(fX)) {
for (k in names(fY)) {
j <- as.numeric(i) + as.numeric(k)
fZ[as.character(j)] <- fZ[as.character(j)] + fX[i] * fY[k]
}
}
fZ
}
```
## Helper Functions for Discrete PMFs and Plots
```{r helper_pmf_plot, include=TRUE, purl=TRUE}
#' Helper function to convert a names vector containing a PMF to a data.frame
#'
#' @param pmf - the PMF of X given as a named vector, where the names
#' represent the support X_min, ..., 0, 1, ..., X_max
#'
#' @return A data frame containing the PMF with columns for name (x) and value (pmf).
#'
#' @examples
#' pmf2df(c("-1" = 0.2, "0" = 0.3, "1" = 0.4, "2" = 0.1))
pmf2df <- function(pmf) {
data.frame(x = as.numeric(names(pmf)), pmf = pmf)
}
#' Helper function to draw a PMF
#'
#' @examples
#' ggplot_pmf(c("-1" = 0.2, "0" = 0.3, "1" = 0.4, "2" = 0.1))
ggplot_pmf <- function(pmf) {
ggplot(pmf2df(pmf), aes(x = x, ymin = 0, ymax = pmf)) +
geom_linerange() +
ylab("PMF")
}
#' Helper function to plot the risk levels
plot_risk_levels <- function(data, title = "",
breaks_y = NULL, ylab_text = "",
show_roman = TRUE,
fill_limits = c(1, 8),
fill_name = NULL) {
ggplot(data, aes(x = y, y = x)) +
geom_tile(aes(fill = M)) +
geom_text(aes(label = as.character(M_roman)), size = 2.5) +
xlab("Delay from Exposure to Consent for Upload") +
ylab(ylab_text) +
#labs(title = title) +
scale_fill_distiller(
palette = "PiYG", limits = fill_limits
) +
scale_y_continuous(breaks = breaks_y) +
scale_x_continuous(breaks = 0:max_dse) +
theme_minimal() +
coord_fixed(expand = FALSE) +
guides(fill = FALSE)
}
#' Helper function to plot the relative infectiousness
plot_rel_infectiousness <- function(data, breaks_y = NULL, ylab_text = "") {
ggplot(data, aes(x = y, y = x)) +
geom_tile(aes(fill = M)) +
geom_text(aes(label = as.character(round(M,digits = 2))), size = 2.5) +
xlab("Delay from Exposure to Consent for Upload") +
ylab(ylab_text) +
scale_fill_distiller(
name = "Relative Infectiousness",
palette = "PiYG", limits = c(0, 1)
) +
scale_y_continuous(breaks = breaks_y) +
scale_x_continuous(breaks = 0:max_dse) +
theme_minimal() +
coord_fixed(expand = FALSE) +
theme(legend.position = "bottom")
}
```
## Discretization of the Values in a Matrix
```{r helper_discretize, purl=TRUE}
#' Take the values in the matrix mat and divide it into 8 equidistant levels
#' The result is returned as a df in long format with rows and cols variables
discretize_matrix <- function(mat, max_value = max(mat, na.rm=TRUE)) {
cuts <- cut(
mat,
breaks = seq(0, max_value, length = 9),
right = FALSE,
include.lowest = TRUE
)
mat <- matrix(
as.numeric(cuts),
nrow = nrow(mat),
ncol = ncol(mat),
byrow = FALSE
)
data.frame(
x = as.vector(row(mat) - 1),
y = as.vector(col(mat) - 1),
M = as.vector(mat),
M_roman = as.character(as.roman(as.vector(mat))),
stringsAsFactors = TRUE
)
}
```
```{r plot_case2_wX, include=FALSE, echo=FALSE, message=FALSE, fig.height=1.3, fig.width=5, fig.cap="Transmission risk level for Case 2: A is symptomatic but with DSO only known to be in a certain period."}
# Plot TRL table for all periods (2 til 18)
# (change max_dso to a higher value if you want the full range til 21)
x <- c()
i <- 0
for(q in c(2:18))
for(d in unique(c(0:(21-q+1)))) {
df_M_case2_discrete_2X <- discretize_matrix(M_case2_minmax_m(d, d+q-1,
d_missing_dso_geom(d, d+q-1, 0.1)),
max_value = max(M_case1)) %>%
mutate(d=d, q=q, x = i, l=paste0(as.character(q),'-',as.character(d)))
x <- rbind(x, df_M_case2_discrete_2X)
i <- i+1
}
plt_x <- plot_risk_levels(
x,
ylab_text = "Offset and minimal days since onset of symptoms in A"
) + scale_y_continuous(breaks = c(1:length(unique(x$l))-1), labels = unique(x$l)) +
theme(axis.text.x = element_text(angle = 90, vjust=0.5, hjust=1)) +
coord_flip()
# ggsave("../img/Parameters_Offset_2-18.pdf", plot=plt_x, device = "pdf")
# write.csv(x,"../img/Parameters_Offset_2-18.csv")
```
# Literature