From 93414035ed3e3d5f35c7506cfbdc1c2e129dd04b Mon Sep 17 00:00:00 2001 From: Michael McCarthy <51542091+mccarthy-m-g@users.noreply.github.com> Date: Sat, 25 May 2024 12:42:20 -0700 Subject: [PATCH 1/6] fix typo --- R/data.R | 2 +- man/cocaine_relapse_1.Rd | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/R/data.R b/R/data.R index cf2acb1..c9fe715 100644 --- a/R/data.R +++ b/R/data.R @@ -391,7 +391,7 @@ NULL #' Weeks to cocaine relapse after treatment #' -#' A subset of data from Hall, Havassy, and Wasserman's (1990) measuring the +#' A subset of data from Hall, Havassy, and Wasserman (1990) measuring the #' number of weeks of relapse to cocaine use in a sample of 104 former addicts #' released from an in-patient treatment program. In-patients were followed for #' up to 12 weeks or until they used cocaine for 7 consecutive days. diff --git a/man/cocaine_relapse_1.Rd b/man/cocaine_relapse_1.Rd index ac05253..d6bba90 100644 --- a/man/cocaine_relapse_1.Rd +++ b/man/cocaine_relapse_1.Rd @@ -25,7 +25,7 @@ Journal of Consulting and Clinical Psychology, 58, 175–181. cocaine_relapse_1 } \description{ -A subset of data from Hall, Havassy, and Wasserman's (1990) measuring the +A subset of data from Hall, Havassy, and Wasserman (1990) measuring the number of weeks of relapse to cocaine use in a sample of 104 former addicts released from an in-patient treatment program. In-patients were followed for up to 12 weeks or until they used cocaine for 7 consecutive days. From 88a97c9f49f51c379b7f1680ae740d9b3073b4db Mon Sep 17 00:00:00 2001 From: Michael McCarthy <51542091+mccarthy-m-g@users.noreply.github.com> Date: Sat, 25 May 2024 12:45:12 -0700 Subject: [PATCH 2/6] fix typo --- R/data.R | 2 +- man/first_sex.Rd | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/R/data.R b/R/data.R index c9fe715..86aa959 100644 --- a/R/data.R +++ b/R/data.R @@ -415,7 +415,7 @@ NULL #' Age of first sexual intercourse #' -#' A subset of data from Capaldi, Crosby, and Stoolmiller's (1996) measuring the +#' A subset of data from Capaldi, Crosby, and Stoolmiller (1996) measuring the #' grade year of first sexual intercourse in a sample of 180 at-risk #' heterosexual adolescent males. Adolescent males were followed from Grade 7 up #' to Grade 12 or until they reported having had sexual intercourse for the diff --git a/man/first_sex.Rd b/man/first_sex.Rd index 477d544..2bf5dc9 100644 --- a/man/first_sex.Rd +++ b/man/first_sex.Rd @@ -25,7 +25,7 @@ of first sexual intercourse for at-risk adolescent males. Child Development, first_sex } \description{ -A subset of data from Capaldi, Crosby, and Stoolmiller's (1996) measuring the +A subset of data from Capaldi, Crosby, and Stoolmiller (1996) measuring the grade year of first sexual intercourse in a sample of 180 at-risk heterosexual adolescent males. Adolescent males were followed from Grade 7 up to Grade 12 or until they reported having had sexual intercourse for the From f91a13f0dbda56b8d438e3c2d6149ce5666daca5 Mon Sep 17 00:00:00 2001 From: Michael McCarthy <51542091+mccarthy-m-g@users.noreply.github.com> Date: Sat, 25 May 2024 22:38:19 -0700 Subject: [PATCH 3/6] add missing word --- vignettes/articles/chapter-5.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/articles/chapter-5.Rmd b/vignettes/articles/chapter-5.Rmd index 84f6732..70c6ae1 100644 --- a/vignettes/articles/chapter-5.Rmd +++ b/vignettes/articles/chapter-5.Rmd @@ -259,7 +259,7 @@ The multilevel model may fail to converge or be unable to estimate one or more v - **Removing boundary constraints**, where the software is permitted to obtain negative variance components. - **Fixing rates of change**, where the model is simplified by removing the varying slope change. -For this example we a subset of the `dropout_wages` data purposefully constructed to be severely unbalanced. +For this example we use a subset of the `dropout_wages` data purposefully constructed to be severely unbalanced. ```{r} dropout_wages_subset From d50eef788f0d2c961303a9fa0c102ebd24dafd39 Mon Sep 17 00:00:00 2001 From: Michael McCarthy <51542091+mccarthy-m-g@users.noreply.github.com> Date: Sun, 26 May 2024 20:58:46 -0700 Subject: [PATCH 4/6] revise column names --- R/data.R | 6 +++--- data-raw/data.R | 5 +++-- data-raw/data/tidy/congresswomen.csv | 2 +- data-raw/data/tidy/suicide_ideation.csv | 2 +- data/congresswomen.rda | Bin 2610 -> 2624 bytes data/suicide_ideation.rda | Bin 2133 -> 2160 bytes man/congresswomen.Rd | 2 +- man/suicide_ideation.Rd | 4 ++-- 8 files changed, 11 insertions(+), 10 deletions(-) diff --git a/R/data.R b/R/data.R index 86aa959..fb1c811 100644 --- a/R/data.R +++ b/R/data.R @@ -457,9 +457,9 @@ NULL #' #' \describe{ #' \item{`id`}{Participant ID.} -#' \item{`time`}{Reported age of first suicide ideation.} +#' \item{`age`}{Reported age of first suicide ideation.} #' \item{`censor`}{Censoring status.} -#' \item{`age`}{Participant age at the time of the survey.} +#' \item{`age_now`}{Participant age at the time of the survey.} #' } #' @source #' Bolger, N., Downey, G., Walker, E., & Steininger, P. (1989). The onset of @@ -480,7 +480,7 @@ NULL #' \describe{ #' \item{`id`}{Participant ID.} #' \item{`name`}{Representative name.} -#' \item{`time`}{Number of terms in office.} +#' \item{`terms`}{Number of terms in office.} #' \item{`censor`}{Censoring status.} #' \item{`democrat`}{Party affiliation.} #' } diff --git a/data-raw/data.R b/data-raw/data.R index 223336c..d3cd5e3 100644 --- a/data-raw/data.R +++ b/data-raw/data.R @@ -196,10 +196,11 @@ first_sex <- tidy_data$firstsex |> parental_antisociality = pas ) -suicide_ideation <- tidy_data$suicide_orig +suicide_ideation <- tidy_data$suicide_orig |> + rename(age_now = age, age = time) congresswomen <- tidy_data$congress_orig |> - rename(democrat = dem) + rename(terms = time, democrat = dem) # Chapter 11 ------------------------------------------------------------------ diff --git a/data-raw/data/tidy/congresswomen.csv b/data-raw/data/tidy/congresswomen.csv index 846b3a2..b1d7e1a 100644 --- a/data-raw/data/tidy/congresswomen.csv +++ b/data-raw/data/tidy/congresswomen.csv @@ -1,4 +1,4 @@ -id,name,time,censor,democrat +id,name,terms,censor,democrat 1,"Abzug, Bella",3,0,1 2,"Andrews, Elizabeth",1,0,1 3,"Ashbrook, Jean",1,0,0 diff --git a/data-raw/data/tidy/suicide_ideation.csv b/data-raw/data/tidy/suicide_ideation.csv index 10faf88..ba1a781 100644 --- a/data-raw/data/tidy/suicide_ideation.csv +++ b/data-raw/data/tidy/suicide_ideation.csv @@ -1,4 +1,4 @@ -id,time,censor,age +id,age,censor,age_now 1,16,0,18 2,10,0,19 3,16,0,19 diff --git a/data/congresswomen.rda b/data/congresswomen.rda index f12fd295c7b2e4fe7aa310f388736bb063024603..e291bcc5b5a11234a1d1cc0cc885c70e44184d4e 100644 GIT binary patch delta 2530 zcmV<82_5#b6u=Y?LRx4!F+o`-Q&}D)X{(VAB!ADoA0GlfxHvElYY@;_F_>kBrb1!~ zfdfKfU`&}Oq)h>#qeDiG0000001S-)Xda^`fB*mjo~20rQ~!FYQ~ye!00000XaE2J z0000q000lw0000001X3yFoum9X{L;T4Gc`033!i~yQqG{6yrZ4DraQK6=q z84Wbh&;uqzBOo*Zpavkw0|;P1XfT5#MvQ<1A+$z}nE*+tA|_8#o)Mq~^#EuZ27mwn z&;S4c0B8UJ0000015GzUB-iO?mW1A#ZhvOCXw;BW_Y*N7+Jry=igpRpf>Cp&-N*nci&;5_I!VGG>TqXy@f2Dj-kwBbr>z7#L|$2@K!>2TuG} zgaYXZ=Z=QL`^%=18m(PurQiY>GHKd~9!|E18T*=$n)D%l?by4!q?!mrXPN~-+J7+B zFg7HFT_hs$3*;~BxC|I@;>VLFR-<2OPBuDJ*$MJ;g(y>(nVP9u%;l@I*RW#CL7PUpAU1B{&X$|4O;>u_`x_u_ZM-N8 zaklRB*U#VzgA0Fyh94IkPEKBKcs8-4^mO%g_DGZ4E4#nL)$q^Ydwaq7`GpVaBwxS5 zpZ>l9!C7T`G05Nm0SG}3;xs@20*e^RlVbu%f9v6F(MlXGl=AC1ZpEmA*5TR9FH5Iz zi&-EeuVy5Y7>m0FXe9#{#rB#S8H&F53IcGftZN9w8mB8W@s`>X^4Vo(PqniLrn_`!d%E{ zXbz1f2lYnm0HnEVVb*Z2KefG7jYjRWkB0(f15<)>FBoabFBqIq*QG_KVlAPoKPIH(5VCP~_Q>}fUusTT!CRFA#*MFwGoU( z6qaNumO|#tU!kPir8t-hL1BHWHA+&jSxzl>?})l$Ln1nrWv7J9thb6pU@SA787!zQ zfTUueC8Gf3M+l%(=J_zy>l!eLNO&;GTVo;8JrEj}l5!jt$HzZU1J{p%89w zOYL#+QWld*tFLq^Q{Ivvd?U#!8qTRPa&%`NElga$pNf@E+SXffAbY~dEHmjs_3Sbi-6_s zo`)0Fon1Ns&JfsNbq7fF)oJIZ;QCBvfW0iMn5*VS7|#i^gH~X#SJ_m!F6I~u+2ZIE zGAcgZj)A60ko=@`ocHoCz~a|B;%_rD<~f3>-09kzhlRj2XayrZ<1Zna_w0T0Vgg3b z3A_adZ~hJrf8QBAc5ofQOX!^Eno7JK%M;fEno^jciY8e$PXJ141Cxz5+o-zeg~f6$ z!@^~xAy&|cl-8QjxG#pWnrVJ^@peq?u;^UDj57^(GjVKrUyEg_p_;124jDWO7YjRy zm85((hD;Bk7&ZElRVIOI_0d>ZS0)(15DM;v#ffaLfBW7J8PWLN>Ox}cO{5GnKU)8b zQ{3Y^2(IkI>!8q|0{oS|*x^O4ud#5o_%+tNF%=nQvN;XxSb?6h8%BG1$`x3Vz9OVC z5SE90)Ie7e8Vb*L+f=M9Fm6n1UTH3<(*#>Fkzu&7%z2tCW5ioHRx7DeSJ;P<14Jkf z(U(GVf5{OFF4((z%ZL#mlPj)NSQeOrxmuuHskkPGMLGl9u02VWl6s{s7A1#R?nM+E zM1hK!Cv4VZm5dcD2~1Wg^{Jl7G&KSvbr$c~q2mt`5-v3L)LPbTZmX(sXme}@T!62cuU1b!@Rd6!of952a@GB8p#R7~93TLFGD? zf2IH&qcT8AU??+{?1~|1M0ZM5Zs7*bR?1KkB!4iT zN{M415RnnoGDUD19@SLW5FzGJnI|X7In=0H{kU{sn^?&go+}yB(82@|H}?|@hN8o4 zURzmp+B+)Z^;N02G0B6K&1VgPm}I{BzSQC~Epc{bN;o2+yiKjfOt<74RBkfb>4r)X zve3q<-)R8h_{ZUGJ!ndhs38&{8b?hfa1vhR^PGu{nt$K|NC7Sm;A7iu8w zwgPi$mJZnIenju9q*iZx*L9xAX2MBau^3n)?$)lhv82pT1-v&3G#;OniIO(k`nA&4 znWF>}GRRCzB!ej001-q z0000002%-Q2kL+T00006ffA!bO*AqZX`!G7Oom24XahhDL68Ozz<|(U21bn;00u*7 zjei+30Mj7?6G9UylgUBpKUB!{AZP}jkO0sDqd))v001-q27mwuqI!YAm_tU5G}A^v z28Je01_Bux88l=uGGQ@21ZdD27!c8*#0;7SjD~>FB-D`;N2yN)&`c+&14fMl)Bpem zfB*mh13&-(000008fm%-Cc{TCv?lh`bAL74Mx=t5xS5Ft)FJ=`R5(ze65$6#(ZMhZ zFhj(|D3@{`W{5l`DF+~=@f3M^Xi-HJCWwcZp7EG95GVSP%r0hx3^ZtjhG+l-r+zC! z0d$0OgwNh@ytvUrOIcb8cmReBns%ZGldhr$cIKoe!3bZwb}sJeCVCLr<^fPPjDIwk z7fA@HkcTMnFU)g1*Eh}}LxF>Yi5?yz zM3}WW*y&SbC&|hbp-x_AYNcy4maflU!HX>hZ5rr+*}I22T5h#9UF&P?Y=O15@SrWm z+q=(SKY%I>E&dJ|d|YukIeEF_#x;-8)7976B2R3t?*C6$!#{!T?+4@N6hEkue*Xr4 z`uGP0WtHg1BY*$|Ap|>!(EtDnEN3c{WCBQk?RD@-b~zKbt>f?C)gZW_L&0_6cd!wA zqy+cQ?k7TbTpl139+6ppk9HjWnWM+rUW-EEsHm>!tH~Mv-a>P^X-T-x-`pFZMK(?gy?1jnU%gM z;KgIUknXdUTTYdbR76yTIylejA!k;9Cp5NM?={k1ENPY11-2Ys|1)Mn_cVN;Y0pYj zO)(p7gC;%3SxxZA<73NVxs!<)87jD*j~9#S@h5M3;h$GNeEIX}(V@A|NOW3F;-t77|hyGT(bq7{pOY zW&Rg#joA*7feWhWJgl0wC%F1F7Zf=1%_*bC6xt` z6pT~_0aivZMq5|w_+3kRsS@A^EQQ2#j@2SgAHeY7 ze%&;77H6}t0T(6|8|EnBXhkNE1PLs(YN?DH3QMpi{uQr|zEuE{Ln0e()eA|!NtkgT z32wH3#T2zhozmBZH(cqK&5*->sE{+d-?0J2Ko}viwnl9a6vd-<{7rW)AP6yvZGlbt z+7uul=P#aEsPwU2_olnBBXONj86~pXi$ik&V3UdZx*|_6p9)cGv$h__h*)WkRn%D)#Mb7I%!>IYu~Z=$%qLXJSOlI z9l!WEJA7pE*}!)JFQRjv#JXG^%X8WSno^jciY8n(PXJ0|2QM1N+o-A03+m)vhlIOf&Bd|hzZT0=KQUE8hYX$t3x&7OIzodNqp&jX9=bb?@rWAK zO}UK66-Lz=nC8xdUDzA8x%0G^;ryszr!$B`vWU zsgSTl?JHhq<*Q*~gK}hA@=H>UFh#R}Ar>p^49Ah8&OKt;#Ib2gme#~Ph#Db4evN#| z%Pd4Fs@+&FIDrxgGdlch0_zZW?8X-|Z;KG-r!aeU$Bx3QpGc;}#IWq0_@aYokTt2D zmM%#%(unMdj9Dr4>7L*;H3B1b7qu z?PZr;64J6(^xGcHI5ZSCi>;!WYOV@g{jqep<$^9Ek)W>s7|BI|({1YwpErrY*ONT@ zLN|x8BC=qCfuj@1s}ZYXqh*k{P{fI4O(R^~@d|LV4QzQr)fV{yWrR;A6yg}tUm8Jk3 zqcT8AVJI?|3e-aAi0UbkyM!A#TPZ+DmK%Dt=2)HXZQPuL$>-4qtXiCFXl!lXUBU>f zmn~qiE2t6pTlvN$gGixc*0;BX>JYLs48Oq6jQ6lp20!;z| zQ3yCjiztqS2Q25s6-Y|aq^TrHK#=7kC$a=C|GDm-)noxk0=fY`VHYF^NCXZslt#{@ zw35~w6mrJTFlO$X7Xy2|-B3I-M8l2;fc_4GC~jw+4lb=U?D=;IzmA6JE=wCz8sN-{ zN?^?D#58EwX}+Uk4^V`Ef#Ojk84w_53=v^u!S}Z9M+VSaPI(m+R)%4vM97*!@d;5R zW=OOHB-+wVp#mOd36^qvla)$^uil3?8yaRv!0}tkmWB`@gMe``xN0mm<>j@vU7@nB zFG{sFOmbl5xs2hkGK`nYbrX3DJP5L6J7Ab>Zq^0yHwDue7jo(6b+0z^^<8<6!tCjS+oNgfSZwd|QTtn%$JbCKd zaJ3^Sw@Y3?@0%rspw8hKhrPI7=Cm_R*G%A5unBrd~x} z^nNKVq*XKMOh$-zgk)tr4~g>YZ!4&>QwXGP3JSGM5nMrr)5~(0$jH;O&!p(HQ1W-24Ubt9PzT1ki~DdCh^%5+F`X(aV|;7pJT-~*d)uA6J{&eE}3 eWVPE{iN~@?eINvk+VaJJ;_gVN3K9qinR9@|fLO`^ diff --git a/data/suicide_ideation.rda b/data/suicide_ideation.rda index c33a37313f9520258fc773fb8bd04a532509bd0b..ad52ae549259c530e58295a6c0623693fc5add09 100644 GIT binary patch literal 2160 zcmah~d0Z3e8UAvJTtPsPYi(l)NCYvFAg~?{G{HqQ}&xm?}(=Ta0J#eE(MSQLd`_?S(O$^ zJOK$&K_;)y&h|u65bE$cL>=%XkAMse2B0W8r%f3-l^@?~PO+=F?|bg>rU>@YbbzwX z03)6NuCE}EV1ShvQ2_vk1Tx4dEa)S{Ei4!~p85(wHUbp@h$l?2e31ky7I8V+_$e;) z_GKc72qI9aFw9`2r~EWxjiAz6cGc83n3bfJppyJJwhG;Y6>@&_{ zqiirTVhLaebr`T1F7k^*5JM5!_)pDd)@GJ;Z_T+^o1~=ahD$PhaTiW1%2DROZ{hE-)KzVW>s1- zozv{=OV_hKuC6X@+^X7RQYuOnr+QUEgZ-VD1`bZ-3f1nY_sO)F#@=$q8Y4VQ%XA z-jJ)jAiI(qQdIttAx84~?#B|6^)D$9%~$5tRwb)X47eR}|4oliAMXd31XR?5aaPbq zMV1Y^9+Q+gjgV;6_L)5dP_5}Fc>3cgM(<6rBVS^9-8#iEi5fbPINrPL$t#XUMGmagCRz`r~L!+{^GUgq@r4qxA;Ep%|sD z>q=#n0_7gN&5bh1Z`K@gL|==k!$tV>Qhi+F)-)FDT~m{PMSb_X^;G%kCNwqNxp>H_ zIST5?JbL_+HybJSI@xztvftG~O?_ZeMep1Lv@wfl0{$F`>Q z>f|I_MM^!vHC3G;$YZfsc|7TPeX#ywh`y6V3k~h)3Z3#gFR{~u`5l^oFc?pQt-yPAL{C@Z5i98nt`WwSnb-Dz}9tb#Dhb5#$Ykb!3!`14;7q3q|m;HvecN_$5u6^%59Fh8Ri8qB3F<37Cy!|)Wtq5?3&(8!vIl$kpHGG{(u~(8XTcR|$O1hLs zV;3u>G+fq-SIyr(b1R1QGu}D)=~%7#;*J@^u2*k0|AiKAjuPq3pF4^=SJ<-5W<^0VbYQbENhn3gCc!} zSybLNTwhQN>95Ek06cfpNJ~#LN%5VtIo+dnI%2)7*{=jX)JpO+E5^5$L566KEMG;w zM9w;c=`a;yF4tu{lE37~P(jKvhAXQ3lNC2@ewg)o35KeU!t>yOX4q9SBHcU{Qo907(sB@aDk(FXBoucRFMZbN?{#> z!BN&BT4#UU|6aTQX-oIGt`!F(SRy$RHXv6v5E$GOFCXvE27^(OrTLH;EvJj|28oq$ z0pP2}4hs(@NC;1HhKR_KLoRHZ;Ty<5;KI#aM)+aa3>TcZjBnp}3V+`bUHxaU@URj< zc(IMo9LivjEZw~hth}oLc81P;sLj0uy9#~9+x@!z+T0}I4B-#Mcd8O2`x)^o=B;C4 z@K)Js&dK2yKtdm`gi(eJgHmAEGVY^r?oChN}CG^2jfo}U9jS&E^9R(^m z#oWR!+eTo(1J;rOJST=jAKcpX?_0qV8LV|)mPkL&AvLTH8-AS$12#+L(JY>OUtC@2 z?S9(Q1{cAu>A+u^Q%)*NcfYm@-iUZzHU(HU_C4Q#o!<~)^@txS%5RH|{zom+uw^WB zG_$Z@Wm88*n7LBL3dD}BWFawNEV+t}PH6XwTgz`v<~a1M(Iy1$5JKb^rPCr53xCoD z7_UEswZyPsIvoJlJYEbE7IAQ{7muuOw%^nmm{4|2r?D+bwxbWW<5E1Vj=v2vJ*R%~ zOmlDWo7WlUT-oHjW1Yg>32>P=)vc}#WUU{k?G#?x*CdFYd9-&fG2)%WP~@7cWwhZv w0-bwLX%d%J6VQK@fc!uKAkd;-908r}I5Q^9Nf*%%++U!dwRK*4i@&n&KR-%v*#H0l literal 2133 zcmah~eLU3J8b9+wLybj4GTE4hW}041hLXAsM%U0t6O&OQ!>oy4wY!mxYsX8PgfwWG zvQ5d{yp}ODL|I9$QfLNY5_(bD$}PR9{n+-5_U`B2zwY;Qe&;;rd!F-sp67hd?{mCD zW8CSUe5Y-E$%laguz>&h|6&w(8qB%<8b4|uo~+I6Lk7ttDsv_f01zcoY~H+OX*pov z8KP1qWNerK!vv_a@<GO<){;9^8g;$I_Ip80mj*xPGY)^xxok90!+l%n_>WP?P(EE_J4qm z0KIQr@ebwG)n5fdW&5{clp>u8eNgg+csRhUW`U9$i=pAZF{L7CMpcp0XK$d5#dw1g z01N=L{$kF~4IPG2w&cdPTQR!c4faFFxAii5zNjs(fQz;O(+jivZ!l&?x#@>Tre02u zg&)a9ei{4t)O{~8J+=F-^_cQmCg({1ienFGr}&L%o>HGs+Z4uEOeZQ$yYSa|<-v8P z4-}t5g)2!@E-_!l(D;K7C)%fZdhes{zpAc%qg$}4Q8=NG8#w0jH=VgaR6j>GCTje& z!|LAJVf&jU^n{t_ zsdgtR*QpO^msxq{s80pmS05M}3=BwL<;d)Jo#QEL*7lml>!NF%NAnzquY0gp&NF{j z>z?E4+hfPC5-PhmR;a{EPjg>^f*x~qkKcpnxi*Pu);U(|Ts+kNS2o*3n!RYK2>L$q zz2`A9uXcNd(Zo(lw#wU@8;A8Wdu+R*$gMh%r}@4{_-@EsV8+iyEj7Mb<8{vb)r-(t z(}np9ni?#OZbW#?8cN-Tq1RhCKgsROIuZ6*@3Z{fNr_}$OGiYzoiuiBu)?-4%}&?M z&mv>4ToXGL#w|eiEq`A0qFb|w(tj$o__!i@+%S_a8%SLd3OVY*e)Hvx|zM zcGbAmuW49&***0@e8R!R9}XoY|0Cr`acWw6#^KDvMY9s}YL#^=wWhw|a%0n#tIfZ(Tx-4FcBB1Y9XD@v z-oDe--E&u~>+S2mH!wJ)zd!t7WOVG|qsLFipFW#TP6^*wn($^Mynf{M$Dd|WKA#zR zzc0Yw$*1|U1ptmMe_i+l>*XN~`29ryqiOihSi^ag7#9+08<9uJ@*z?Ae*QjGLKdrY z#jlC?IjdOtNK>jxiqLp3_4Rv)o05#;?}h!@pg2mSN5ub;bQ@yhC(6XY3@!kGX!H?q zfFwB|xny9zP>dAhrNx>=8OI4HQpw~pgmTxBiq(qNP%YW2RE;bk z%@2=Z;u+fA=khqTvrLPiB4_(NXX6qaKwYE)?BF6X%Jw@f+t+Q=P#(#TnWH$5lkM?T zGVZ4>5mD*<9ID8vplBvBR<9=#35CU);UZvLP)5Ze4PID*O?!1nJ3WH{sD9i6;QHH# z$usJUCGqz{`_+$GEY_x43yW1roa$IOpG+}ca!Bk^hE8)JUqG|T6w9sX)L|B|^rK)| zUS6qWDZzwX2yW|w%DH8SrpSd19p}&}JS=P()sHFV?I5xD5=fR;%6VYVp)k#4@bFYF61GC|7rd%Qs zTe3ZxAnIJ2eE^{=E+5ehcEuML7e|sH`qi*(dbY0>s|BF^0%08jo7Sq`w!(5(cDX|I z^kSxvrhQe>nhXG%OgxSUr`y_T8SoN}k+s&z?|C$oR%3I4^p(0Cux3 zsOscLxv-w~Qy~%8lq^U|iWc8SL2_cOKuR&DJ~0cTUeA4E@?8C0x?@u;#)_PkZloS<2x3 zL~qkDe6CgCWTkm|n(Wj3>Muj$Pl0Za4Gvrl?VTDYyZs&4M~lq8>;TcUk4K0e`J?kqKWD+XX! f2aJi$$loGjE84W}0_EcACMmTwBjfMQZSp?=lS6=X diff --git a/man/congresswomen.Rd b/man/congresswomen.Rd index 843be36..f0d5e59 100644 --- a/man/congresswomen.Rd +++ b/man/congresswomen.Rd @@ -11,7 +11,7 @@ A person-level data frame with 168 rows and \describe{ \item{\code{id}}{Participant ID.} \item{\code{name}}{Representative name.} -\item{\code{time}}{Number of terms in office.} +\item{\code{terms}}{Number of terms in office.} \item{\code{censor}}{Censoring status.} \item{\code{democrat}}{Party affiliation.} } diff --git a/man/suicide_ideation.Rd b/man/suicide_ideation.Rd index 274f4d5..77be2b7 100644 --- a/man/suicide_ideation.Rd +++ b/man/suicide_ideation.Rd @@ -10,9 +10,9 @@ A person-level data frame with 391 rows and \describe{ \item{\code{id}}{Participant ID.} -\item{\code{time}}{Reported age of first suicide ideation.} +\item{\code{age}}{Reported age of first suicide ideation.} \item{\code{censor}}{Censoring status.} -\item{\code{age}}{Participant age at the time of the survey.} +\item{\code{age_now}}{Participant age at the time of the survey.} } } \source{ From 40a7b1d7ada7111e47c261c3163206dbfc187d02 Mon Sep 17 00:00:00 2001 From: Michael McCarthy <51542091+mccarthy-m-g@users.noreply.github.com> Date: Sun, 26 May 2024 20:59:05 -0700 Subject: [PATCH 5/6] update Chapter 10 --- vignettes/articles/chapter-10.Rmd | 417 +++++++++++++++++++++++------- 1 file changed, 321 insertions(+), 96 deletions(-) diff --git a/vignettes/articles/chapter-10.Rmd b/vignettes/articles/chapter-10.Rmd index 816fc20..22202b1 100644 --- a/vignettes/articles/chapter-10.Rmd +++ b/vignettes/articles/chapter-10.Rmd @@ -2,14 +2,12 @@ title: "Chapter 10: Describing discrete-time event occurrence data" --- -::: {.alert .alert-warning} -This chapter is under construction. -::: - ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, - comment = "#>" + comment = "#>", + warning = FALSE, + message = FALSE ) ggplot2::theme_set(ggplot2::theme_bw()) ggplot2::theme_update( @@ -22,157 +20,384 @@ ggplot2::theme_update( library(alda) library(dplyr) library(tidyr) +library(purrr) library(ggplot2) +library(patchwork) library(survival) library(broom) ``` -## 10.1 The Life Table +## 10.1 The life table + +In Section 10.1 Singer and Willett (2003) introduce the **life table**---the primary tool for for summarizing the sample distribution of event occurrence---using a subset of data from Singer (1993), who measured how many years 3941 newly hired special educators in Michigan stayed in teaching between 1972 and 1978. Teachers were followed for up to 13 years or until they stopped teaching in the state. -Table 10.1, page 327: +For this example we return to the `teachers` data set introduced in Chapter 9, a person-level data frame with 3941 rows and 3 columns: + +- `id`: Teacher ID. +- `years`: The number of years between a teacher's dates of hire and departure from the Michigan public schools. +- `censor`: Censoring status. + +```{r} +teachers +``` + +As Singer and Willett (2003) discuss, a life table tracks the event histories of a sample of individuals over a series of contiguous intervals---from the beginning of time through the end of data collection---by including information on the number of individuals who: + +- Entered each interval. +- Experienced the target event during an interval. +- Were censored at the end of an interval. + +We can either construct a life table "by hand" by first converting the person-level data set to a person-period data set, then cross-tabulating **time period** and **event-indicator** variables; or by using a prepackaged routine. Because we will be constructing a life table "by hand" in Section 10.5, here we demonstrate the prepackaged routine approach. + +Conceptually, the life table is simply the tabular form of a **survival function** (see Section 10.2); thus, an easy way to construct a life table is to first fit a survival function to the person-level data set, then use the summary of the fit as a starting point to construct the remainder of the table. + +We can fit a survival function using the `survfit()` function from the **survival** package. The model formula for the `survfit()` function takes the form `response ~ terms`, where the response must be a "survival object" created by the `Surv()` function. For right-censored data, the survival object can be created by supplying two unnamed arguments to the `Surv()` function corresponding to `time` and `event` variables, in that order. Note that we can recode a `censor` variable into an `event` variable by reversing its values. For 0-1 coded data, we can write the event status as `event = censor - 1`. ```{r} -# A life table is the tabular form of a survival curve, so begin by fitting a -# Kaplan-Meir curve to the data. teachers_fit <- survfit(Surv(years, 1 - censor) ~ 1, data = teachers) -table_10.1 <- teachers_fit |> - # Add a starting time (time 0) for the table. +summary(teachers_fit) +``` + +Next, we'll collect the summary information from the `survfit` object into a tibble using the `tidy()` function from the **broom** package. For now we will exclude any statistical summaries from the life table, focusing exclusively on columns related to the event histories of the `teachers` data. Note also that the summary information from the `survfit` object starts at the time of the first event, not the "beginning of time". We can add a "beginning of time" to the `survfit` object using the `survfit0()` function from the survival package, which (by default) adds a starting time of 0 to the life table. + +```{r} +teachers_lifetable <- teachers_fit |> survfit0() |> tidy() |> - # The summary of the fit gives most of what we want, but to match Table 10.1 - # we need to do a little more wrangling. - select(-c(std.error:conf.low)) |> - mutate( - interval = paste0("[", time, ", ", time + 1, ")"), - haz.estimate = n.event / n.risk - ) |> - rename(year = time, surv.estimate = estimate) |> - relocate( - year, interval, n.risk, n.event, n.censor, haz.estimate, surv.estimate - ) + select(-c(estimate:conf.low)) |> + mutate(interval = paste0("[", time, ", ", time + 1, ")"), .after = time) |> + rename(year = time) -table_10.1 +teachers_lifetable ``` -## 10.2 A Framework for Characterizing the Distribution of Discrete-Time Event Occurrence Data +As Singer and Willett (2003) discuss, we interpret the columns of the life table as follows: + +- `year`: Defines the time of each interval using ordinal numbers. +- `interval`: Defines precisely which event times appear in each interval using the **interval notation**, `[start, end)`, where each interval includes the starting time and excludes the ending time. +- `n.risk`: Defines the **risk set** during each interval; that is, the number of (remaining) individuals who are eligible to experience the target event during an interval. +- `n.event`: Defines the number of individuals who experienced the target event during an interval. +- `n.censor`: Defines the number of individuals who were censored during an interval. + +Importantly, notice that once an individual experiences the target event or is censored during an interval, they drop out of the risk set in all future intervals; thus, the risk set is inherently irreversible. + +## 10.2 A framework for characterizing the distribution of discrete-time event occurrence data + +In Section 10.2 Singer and Willett (2003) introduce three statistics for summarizing the event history information of the life table, which can estimated directly from the life table: + +- **Hazard** is the fundamental quantity used to assess the risk of event occurrence in each discrete time period. The **discrete-time hazard function** is the *conditional probability* that the $i$th individual will experience the target event in the $j$th interval, given that they did not experience it in any prior interval: -Figure 10.1, page 333: + $$ + h(t_{ij}) = \Pr[T_i = j \mid T \geq j], + $$ + + whose maximum likelihood estimates are given by the proportion of each interval's risk set that experiences the event during that interval: + + $$ + \hat h(t_j) = \frac{n \text{ events}_j}{n \text{ at risk}_j}. + $$ + +- The **survival function** is the *cumulative probability* that the $i$th individual will *not* experience the target event past the $j$th interval: + + $$ + S(t_{ij}) = \Pr[T > j], + $$ + + whose maximum likelihood estimates are given by the cumulative product of the complement of the estimated hazard probabilities across the current and all previous intervals: + + $$ + \hat S(t_j) = [1 - \hat h(t_j)] + [1 - \hat h(t_{j-1})] + [1 - \hat h(t_{j-2})] + \dots + [1 - \hat h(t_1)]. + $$ + +- The **median lifetime** is a measure of central tendency identifying the point in time by which we estimate that half of the sample has experienced the target event and half has not, given by: + + $$ + \text{Estimated median lifetime} = m + + \Bigg[ \frac{\hat S(t_m) - .5}{\hat S(t_m) - \hat S(t_{m + 1})} \Bigg] + \big( (m + 1) - m \big), + $$ + + where $m$ is the time interval immediately before the median lifetime, $\hat S(t_m)$ is the value of the survivor function in the $m$th interval, and $\hat S(t_{m + 1})$ is the value of the survivor function in the next interval. + +First, the discrete-time hazard function and the survival function. Note the use of if-else statements to provide preset values for the "beginning of time", which by definition will always be `NA` for the discrete-time hazard function and `1` for the survival function. ```{r} -ggplot(table_10.1, aes(x = year, y = haz.estimate)) + - geom_line() + - scale_x_continuous(breaks = 0:13, limits = c(1, 13)) + - scale_y_continuous(breaks = c(0, .05, .1, .15), limits = c(0, .15)) + - coord_cartesian(xlim = c(0, 13)) +teachers_lifetable <- teachers_lifetable |> + mutate( + haz.estimate = if_else(year != 0, n.event / n.risk, NA), + surv.estimate = if_else(year != 0, 1 - haz.estimate, 1), + surv.estimate = cumprod(surv.estimate) + ) + +# Table 10.1, page 327: +teachers_lifetable +``` + +Then the median lifetime. Here we use the `slice()` function from the **dplyr** package to select the time intervals immediately before and after the median lifetime, then do a bit of wrangling to make applying the median lifetime equation easier and clearer. -# First interpolate median lifetime -median_lifetime <- table_10.1 |> - # Get the row indices for the first survival estimate immediately below and - # immediately above 0.5. This will only work correctly if the values are in - # descending order, otherwise min() and max() must be swapped. By default, the - # survival estimates are in descending order, however, I've added the - # redundant step of ensuring they are here for demonstration purposes. - arrange(desc(surv.estimate)) |> - slice(min(which(surv.estimate <= .5)), max(which(surv.estimate >= .5))) |> - select(year, surv.estimate) |> - # Linearly interpolate between the two values of the survival estimates that - # bracket .5 following Miller's (1981) equation. +```{r} +teachers_median_lifetime <- teachers_lifetable |> + slice(max(which(surv.estimate >= .5)), min(which(surv.estimate <= .5))) |> + mutate(m = c("before", "after")) |> + select(m, year, surv = surv.estimate) |> + pivot_wider(names_from = m, values_from = c(year, surv)) |> summarise( - year = - min(year) + - ((max(surv.estimate) - .5) / - (max(surv.estimate) - min(surv.estimate))) * - ((min(year) + 1) - min(year)), - surv.estimate = .5 + surv.estimate = .5, + year = year_before + + ((surv_before - .5) / (surv_before - surv_after)) + * (year_after - year_before) ) + +teachers_median_lifetime +``` + +A valuable way of examining these statistics is to plot their trajectories over time. + +```{r} +teachers_haz <- ggplot(teachers_lifetable, aes(x = year, y = haz.estimate)) + + geom_line() + + scale_x_continuous(breaks = 0:13) + + coord_cartesian(xlim = c(0, 13), ylim = c(0, .15)) -ggplot(table_10.1, aes(x = year, y = surv.estimate)) + +teachers_surv <- ggplot(teachers_lifetable, aes(x = year, y = surv.estimate)) + geom_line() + geom_segment( - aes(xend = year, y = 0, yend = .5), data = median_lifetime, linetype = 2 + aes(xend = year, y = 0, yend = .5), + data = teachers_median_lifetime, + linetype = 2 ) + geom_segment( - aes(xend = 0, yend = .5), data = median_lifetime, linetype = 2 + aes(xend = 0, yend = .5), + data = teachers_median_lifetime, + linetype = 2 ) + scale_x_continuous(breaks = 0:13) + - scale_y_continuous(breaks = c(0, .5, 1), limits = c(0, 1)) + + scale_y_continuous(breaks = c(0, .5, 1)) + coord_cartesian(xlim = c(0, 13)) + +# Figure 10.1, page 333: +teachers_haz + teachers_surv + plot_layout(ncol = 1, axes = "collect") ``` -## 10.3 Developing Intuition About Hazard Functions, Survivor Functions, and Median Lifetimes +When examining plots like these, Singer and Willett (2003) recommend looking for patterns in and between the trajectories to answer questions like: + +- What is the overall shape of the hazard function? +- When are the time periods of high and low risk? +- Are time periods with elevated risk likely to affect large or small numbers of people, given the value of the survivor function? + +## 10.3 Developing intuition about hazard functions, survivor functions, and median lifetimes + +In Section 10.3 Singer and Willett (2003) examine and describe the estimated discrete-time hazard functions, survivor functions, and median lifetimes from four studies that differ by their type of target event, metric for clocking time, and underlying profile of risk: + + - `cocaine_relapse_1`: A person-level data frame with 104 rows and 4 columns containing a subset of data from Hall, Havassy, and Wasserman (1990), who measured the number of weeks of relapse to cocaine use in a sample of 104 former addicts released from an in-patient treatment program. In-patients were followed for up to 12 weeks or until they used cocaine for 7 consecutive days. -Figure 10.2, page 340: + ```{r} + cocaine_relapse_1 + ``` + + - `first_sex`: A person-level data frame with 180 rows and 5 columns containing a subset of data from Capaldi, Crosby, and Stoolmiller (1996), who measured the grade year of first sexual intercourse in a sample of 180 at-risk heterosexual adolescent males. Adolescent males were followed from Grade 7 up to Grade 12 or until they reported having had sexual intercourse for the first time. + + ```{r} + first_sex + ``` + + - `suicide_ideation`: A person-level data frame with 391 rows and 4 columns containing a subset of data from Bolger and colleagues (1989) measuring age of first suicide ideation in a sample of 391 undergraduate students aged 16 to 22. Age of first suicide ideation was measured with a two-item survey asking respondents "Have you ever thought of committing suicide?" and if so, "At what age did the thought first occur to you?" + + ```{r} + suicide_ideation + ``` + + - `congresswomen`: A person-level data frame with 168 rows and 5 columns containing data measuring how long all 168 women who were elected to the U.S. House of Representatives between 1919 and 1996 remained in office. Representatives were followed for up to eight terms or until 1998. + + ```{r} + congresswomen + ``` + +We can plot the discrete-time hazard functions, survivor functions, and median lifetimes from each of these four studies in a single call using the `pmap()` function from the **purrr** package. ```{r} -relapse_fit <- survfit(Surv(weeks, 1 - censor) ~ 1, data = cocaine_relapse_1) -relapse_tidy <- tidy(relapse_fit) -relapse_summary <- glance(relapse_fit) +#| fig.height: 10 +study_plots <- pmap( + list( + list("cocaine_relapse_1", "first_sex", "suicide_ideation", "congresswomen"), + list(cocaine_relapse_1, first_sex, suicide_ideation, congresswomen), + list("weeks", "grade", "age", "terms"), + list(0, 6, 5, 0) + ), + \(.title, .study, .time, .beginning) { + + # Get life table statistics. + study_fit <- survfit(Surv(.study[[.time]], 1 - censor) ~ 1, data = .study) + + study_lifetable <- study_fit |> + survfit0(start.time = .beginning) |> + tidy() |> + rename(surv.estimate = estimate) |> + mutate(haz.estimate = if_else(time != .beginning, n.event / n.risk, NA)) + + study_median_lifetime <- study_lifetable |> + slice(max(which(surv.estimate >= .5)), min(which(surv.estimate <= .5))) |> + mutate(m = c("before", "after")) |> + select(m, time, surv = surv.estimate) |> + pivot_wider(names_from = m, values_from = c(time, surv)) |> + summarise( + surv.estimate = .5, + time = time_before + + ((surv_before - .5) / (surv_before - surv_after)) + * (time_after - time_before) + ) + + # Plot discrete-time hazard and survival functions. + study_haz <- ggplot(study_lifetable, aes(x = time, y = haz.estimate)) + + geom_line() + + xlab(.time) + + study_surv <- ggplot(study_lifetable, aes(x = time, y = surv.estimate)) + + geom_line() + + geom_segment( + aes(xend = time, y = 0, yend = .5), + data = study_median_lifetime, + linetype = 2 + ) + + geom_segment( + aes(xend = .beginning, yend = .5), + data = study_median_lifetime, + linetype = 2 + ) + + xlab(.time) + + wrap_elements(panel = (study_haz | study_surv)) + ggtitle(.title) + } +) + +# Figure 10.2, page 340: +wrap_plots(study_plots, ncol = 1) ``` -## 10.4 Quantifying the Effects of Sampling Variation +Focusing on the overall shape of the discrete-time hazard functions, and contextualizing their shape against their respective survival functions, Singer and Willet (2003) make the following observations: -Table 10.2, page 349: +- `cocaine_relapse_1`: The discrete-time hazard function follows a **monotonically decreasing** pattern---peaking immediately after the "beginning of time" and decreasing thereafter---which is common when studying target events related to recurrence and relapse. This is reflected in the survival function, which drops rapidly in early time periods, then more slowly over time as the hazard decreases, indicating that the prevalence of relapse to cocaine use was greatest shortly after leaving treatment. -```{r} -summary(teachers_fit) +- `first_sex`: The discrete-time hazard function follows a **monotonically increasing** pattern---starting low immediately after the "beginning of time" and increasing thereafter---which is common when studying target events that are ultimately inevitable or near universal. This is reflected in the survival function, which drops slowly in early time periods, then more rapidly over time as the hazard increases, indicating that the prevalence of first sexual intercourse in those still at risk progressively increased as time progressed. -teachers_fit |> - tidy() |> +- `suicide_ideation`: The discrete-time hazard function follows a **nonmonotonic** pattern with multiple distinctive peaks and troughs, which generally arise in studies of long duration due to the data collection period being of sufficient length to capture reversals in otherwise seemingly monotonic trends. This is reflected in the survival function, which has multiple periods of slow and rapid decline, indicating that prevalence of suicide ideation was low during childhood, peaked during adolescence, and then declined near early-childhood levels in late adolescence for those still at risk. + +- `congresswomen`: The discrete-time hazard function follows a **U-shaped** pattern---with periods of high risk immediately after the "beginning of time" and again at the end of time---which is common when studying target events that have different causes near the beginning and end of time. This is reflected in the survival function, which drops rapidly in early and late time periods, but more slowly in the middle, indicating that prevalence of leaving office was greatest shortly after their first election, and then after having served for a long period of time for those still at risk. + +## 10.4 Quantifying the effects of sampling variation + +In Section 10.4 Singer and Willett (2003) return to the `teachers` data to discuss standard errors for the estimated discrete-time hazard probabilities and survival probabilities, which can also estimated directly from the life table: + +- Because the estimated discrete-time hazard probability is simply a sample proportion, its standard error in the $j$th time period can be estimated using the usual formula for estimating the standard error of a proportion: + + $$ + se \big(\hat h(t_j) \big) = + \sqrt{\frac{\hat h(t_j) \big(1 - \hat h(t_j) \big)}{n \text{ at risk}_j}}. + $$ + +- For risk sets greater than size 20, the standard error of the survival probability in the $j$th time period can be can be estimated using Greenwood's approximation: + + $$ + se \big(\hat S(t_j) \big) = + \hat S(t_j) \sqrt{ + \frac{\hat h(t_1)}{n \text{ at risk}_1 \big(1 - \hat h(t_1) \big)} + + \frac{\hat h(t_2)}{n \text{ at risk}_2 \big(1 - \hat h(t_2) \big)} + + \cdots + + \frac{\hat h(t_j)}{n \text{ at risk}_j \big(1 - \hat h(t_j) \big)} + }. + $$ + +We estimate these standard errors here using the `teachers_lifetable` from Section 10.2. + +```{r} +# Table 10.2, page 349: +teachers_lifetable |> + filter(year != 0) |> mutate( - # The tidy() method for survfit objects returns the standard error for the - # cumulative hazard instead of the survival probability. Multiplying the - # survival estimate with the cumulative hazard's standard error will return - # the standard error for the survival probability. Note that it is unlikely - # the tidy() method will ever change to return the the standard error for - # the survival probability instead. See: - # - https://github.com/tidymodels/broom/pull/1162 - # Other transformations of the survival probability can be found here: - # - https://stat.ethz.ch/pipermail/r-help/2014-June/376247.html - surv.std.error = estimate * std.error, - haz.estimate = n.event / n.risk, haz.std.error = sqrt(haz.estimate * (1 - haz.estimate) / n.risk), - sqrt = (std.error)^2 / (estimate)^2 + surv.std.error = surv.estimate * sqrt( + cumsum(haz.estimate / (n.risk * (1 - haz.estimate))) + ) ) |> - select( - year = time, - n.risk, - haz.estimate, - haz.std.error, - surv.estimate = estimate, - sqrt, - surv.std.error - ) + select(year, n.risk, starts_with("haz"), starts_with("surv")) ``` -## 10.5 A Simple and Useful Strategy for Constructing the Life Table +## 10.5 A simple and useful strategy for constructing the life table + +In Section 10.5 Singer and Willett (2003) introduce the **person-period** format for event occurrence data, demonstrating how it can be used to construct the life table "by hand" using the **person-level** `teachers` data set. -Figure 10.4, page 353: +### The person-level data set + +In the person-level format for event occurrence data, each person has only one row of data with columns for their event time and censorship status, and (optionally) a participant identifier variable or any other variables of interest. This is demonstrated in the `teachers` data set, A person-level data frame with 3941 rows and 3 columns: + +- `id`: Teacher ID. +- `years`: The number of years between a teacher's dates of hire and departure from the Michigan public schools. +- `censor`: Censoring status. ```{r} -filter(teachers, id %in% c(20, 126, 129)) +teachers +``` + +Note that unlike when modelling change, the person-level data set does not contain multiple columns for each time period; thus, as we will demonstrate below, a new strategy is needed to convert a person-level data set into a person-period data set. Additionally, and also unlike when modelling change, the person-level data set is often useful for analyzing event occurrence---as we have demonstrated through several examples in the current and previous chapter. + +### The person-period data set + +In the person-period format for event occurrence data, each person has one row of data for each **time period** when they were at risk, with a **participant identifier variable** for each person, and an **event-indicator variable** for each time period. +We can use the `reframe()` function from the **dplyr** package to convert a person-level data set into a person-period data set. The `reframe()` function works similarly to dplyr's `summarise()` function, except that it can return an arbitrary number of rows per group. We take advantage of this property to add rows for each time period when individuals were at risk, then use the information stored in these new rows and the person-level data set to identify whether an event occurred in each individual's last period, given their censorship status. + +```{r} teachers_pp <- teachers |> + group_by(id) |> reframe( - year = 1:max(years), - event = if_else(year == years & censor == 0, 1, 0), - .by = id + year = 1:years, + event = if_else(year == years & censor == 0, true = 1, false = 0) + ) + +teachers_pp +``` + +Following similar logic, we can use the `summarise()` function from the dplyr package to convert a person-period data set to person-level data set. + +```{r} +teachers_pl <- teachers_pp |> + group_by(id) |> + summarise( + years = max(year), + censor = if_else(all(event == 0), true = 1, false = 0) ) +teachers_pl +``` + +The difference between the person-level and person-period formats is best seen by examining the data from a subset of individuals with different (censored) event times. + +```{r} +# Figure 10.4, page 353: +filter(teachers_pl, id %in% c(20, 126, 129)) + teachers_pp |> filter(id %in% c(20, 126, 129)) |> print(n = 27) ``` -Table 10.3, page 355: +### Using the person-period data set to construct the life table + +The life table can be constructed using the person-period data set through cross-tabulation of the time period and event-indicator variables. This can be accomplished using a standard `df |> group_by(...) |> summarise(...)` statement with the dplyr package, where we count the number of individuals who were at risk, who experienced the target event, and who were censored for each time period. After this, statistics for summarizing the event history information of the life table can be estimated using the methods demonstrated in Section 10.2. ```{r} +# Table 10.3, page 355: teachers_pp |> group_by(year) |> - count(event) |> - pivot_wider(names_from = event, names_prefix = "event_", values_from = n) |> - mutate( - total = event_0 + event_1, - p.event_1 = event_1 / total + summarise( + n.risk = n(), + n.event = sum(event == 1), + n.censor = sum(event == 0), + haz.estimate = n.event / n.risk ) ``` From 7a59c5bf1798371397bea20e43c381f5c082670e Mon Sep 17 00:00:00 2001 From: Michael McCarthy <51542091+mccarthy-m-g@users.noreply.github.com> Date: Sun, 26 May 2024 21:07:58 -0700 Subject: [PATCH 6/6] add subheading --- vignettes/articles/chapter-10.Rmd | 2 ++ 1 file changed, 2 insertions(+) diff --git a/vignettes/articles/chapter-10.Rmd b/vignettes/articles/chapter-10.Rmd index 22202b1..3cc71a4 100644 --- a/vignettes/articles/chapter-10.Rmd +++ b/vignettes/articles/chapter-10.Rmd @@ -124,6 +124,8 @@ In Section 10.2 Singer and Willett (2003) introduce three statistics for summari where $m$ is the time interval immediately before the median lifetime, $\hat S(t_m)$ is the value of the survivor function in the $m$th interval, and $\hat S(t_{m + 1})$ is the value of the survivor function in the next interval. +### Using the life table to estimate hazard probability, survival probability, and median lifetime + First, the discrete-time hazard function and the survival function. Note the use of if-else statements to provide preset values for the "beginning of time", which by definition will always be `NA` for the discrete-time hazard function and `1` for the survival function. ```{r}