`fitMcmcNull()` does not handle vectors of `NA` or `NULL` #9

mvankessel-EMC · 2024-08-21T09:41:51Z

While running an SCCS analysis I run into the following error during the calibration of the estimates:

#> Error in optim(c(0, 100), loglikelihoodNullMcmc, logRr = logRr, seLogRr = seLogRr):
#>  function cannot be evaluated at initial parameters

What seems to be the case is that the supplied logRr and SeLogRr are both vectors containing NA's. So they get all get removed here. Which results in a vector of numeric(0) (length = 0) for both logRr and SeLogRr.

If after removal should there be a check that asserts the length of both and if they are of length 0, set the value to 0, or a vector of the original length of 0's? I'm not sure what 'standard' value you'd use in this case.

I.e.:

if (any(is.na(seLogRr))) {
  warning("Estimate(s) with NA standard error detected. Removing before fitting null distribution")
  logRr <- logRr[!is.na(seLogRr)]
  seLogRr <- seLogRr[!is.na(seLogRr)]
}
if (any(is.na(logRr))) {
  warning("Estimate(s) with NA logRr detected. Removing before fitting null distribution")
  seLogRr <- seLogRr[!is.na(logRr)]
  logRr <- logRr[!is.na(logRr)]
}
if (length(seLogRr) == 0) {
  # Or should this be a vector of 0's of the length of the original seLogRr vector?
  seLogRr <- 0
}
if (length(logRr) == 0) {
  logRr <- 0
}

The text was updated successfully, but these errors were encountered:

mvankessel-EMC · 2024-08-21T09:46:55Z

Here are some toy examples with various logRr and seLogRr inputs:

logRr <- c(1, 2, 3)
seLogRr <- c(4, 5, 6)
EmpiricalCalibration::fitMcmcNull(logRr, seLogRr)
#> Estimated null distribution (using MCMC)
#> 
#>           Estimate lower .95 upper .95
#> Mean      1.491929  1.491929    1.4919
#> Precision 1.475598  0.017284   24.2599
#> 
#> Acceptance rate: 0.859931400685993

logRr <- c(1, 2, NULL)
seLogRr <- c(4, 5, NULL)
EmpiricalCalibration::fitMcmcNull(logRr, seLogRr)
#> Estimated null distribution (using MCMC)
#> 
#>           Estimate lower .95 upper .95
#> Mean      0.942659  0.942659    0.9427
#> Precision 2.661310  0.011013   35.4243
#> 
#> Acceptance rate: 0.91490085099149

logRr <- c(1, 2, NA)
seLogRr <- c(4, 5, NA)
EmpiricalCalibration::fitMcmcNull(logRr, seLogRr)
#> Warning in EmpiricalCalibration::fitMcmcNull(logRr, seLogRr): Estimate(s) with
#> NA standard error detected. Removing before fitting null distribution
#> Estimated null distribution (using MCMC)
#> 
#>            Estimate lower .95 upper .95
#> Mean      0.9426589 0.9426589    0.9427
#> Precision 0.6259526 0.0067681   11.5137
#> 
#> Acceptance rate: 0.871571284287157

logRr <- c(NULL, NULL, NULL)
seLogRr <- c(NULL, NULL, NULL)
EmpiricalCalibration::fitMcmcNull(logRr, seLogRr)
#> Error in eval(expr, envir, enclos): Not compatible with requested type: [type=NULL; target=double].

logRr <- c(NA, NA, NA)
seLogRr <- c(NA, NA, NA)
EmpiricalCalibration::fitMcmcNull(logRr, seLogRr)
#> Warning in EmpiricalCalibration::fitMcmcNull(logRr, seLogRr): Estimate(s) with
#> NA standard error detected. Removing before fitting null distribution
#> Error in optim(c(0, 100), logLikelihoodNullMcmc, logRr = logRr, seLogRr = seLogRr): function cannot be evaluated at initial parameters

^{Created on 2024-08-21 with reprex v2.1.1}

Session info

sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.1 (2024-06-14 ucrt)
#>  os       Windows 11 x64 (build 22631)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  Dutch_Netherlands.utf8
#>  ctype    Dutch_Netherlands.utf8
#>  tz       Europe/Amsterdam
#>  date     2024-08-21
#>  pandoc   3.1.11 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package              * version date (UTC) lib source
#>  cli                    3.6.3   2024-06-21 [1] CRAN (R 4.4.1)
#>  digest                 0.6.37  2024-08-19 [1] CRAN (R 4.4.1)
#>  EmpiricalCalibration   3.1.2   2024-08-21 [1] Github (OHDSI/EmpiricalCalibration@b3ade3b)
#>  evaluate               0.24.0  2024-06-10 [1] CRAN (R 4.4.1)
#>  fastmap                1.2.0   2024-05-15 [1] CRAN (R 4.4.1)
#>  fs                     1.6.4   2024-04-25 [1] CRAN (R 4.4.1)
#>  glue                   1.7.0   2024-01-09 [1] CRAN (R 4.4.1)
#>  htmltools              0.5.8.1 2024-04-04 [1] CRAN (R 4.4.1)
#>  knitr                  1.48    2024-07-07 [1] CRAN (R 4.4.1)
#>  lifecycle              1.0.4   2023-11-07 [1] CRAN (R 4.4.1)
#>  Rcpp                   1.0.13  2024-07-17 [1] CRAN (R 4.4.1)
#>  reprex                 2.1.1   2024-07-06 [1] CRAN (R 4.4.1)
#>  rlang                  1.1.4   2024-06-04 [1] CRAN (R 4.4.1)
#>  rmarkdown              2.28    2024-08-17 [1] CRAN (R 4.4.1)
#>  rstudioapi             0.16.0  2024-03-24 [1] CRAN (R 4.4.1)
#>  sessioninfo            1.2.2   2021-12-06 [1] CRAN (R 4.4.1)
#>  withr                  3.0.1   2024-07-31 [1] CRAN (R 4.4.1)
#>  xfun                   0.47    2024-08-17 [1] CRAN (R 4.4.1)
#>  yaml                   2.3.10  2024-07-26 [1] CRAN (R 4.4.1)
#> 
#>  [1] C:/R/R-4.4.1/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

schuemie · 2024-08-22T12:04:24Z

Hmmm, I guess if there are no valid estimates at all, the fitted distribution should have NA parameters, and calibration should return NA. I'll add this as a special case and make sure it gets handled correctly.

Where do you see NULL estimates?

schuemie · 2024-08-22T13:24:49Z

This should now be handled in this commit: 37e237c

The real question is: why did you have NA estimates for all your negative controls?

mvankessel-EMC · 2024-08-26T07:27:41Z

So sometimes the logRr and seLogRr would be vectors of numeric values, but sometimes would be vectors of NA. Could this be due to no records existing for all the negative controls? Would you typically handle this by explicitly removing negative controls that have 0 counts?

mvankessel-EMC · 2024-08-26T10:04:11Z

So I tried the above. There are values in both logRr and seLogRr now. It crashes when EmpiricalCalibration:::logLikelihoodNullMcmc() returns Inf. See the repex:

logRr <- c(
  -0.040514268, 0.481692514, 0.036482466, -0.095145701, 0.314433784,
  -1.497334478, 0.183195428, 0.289136733, 0.402694353, 0.367573632,
  0.463256874, -0.054422481, -0.060860638, 0.013167817, -18.000212194,
  1.052980223, -0.008283057, -0.031046398, -1.321463935, 0.559708684,
  0.768082068, 0.116301485, 0.164942591, 0.746996019
)

seLogRr <- c(
  0.1483240473, 0.1497285553, 0.1020657210, 0.1609180653, 0.0937424754,
  0.7320934116, 0.1442173789, 0.0959738294, 0.1462481972, 0.3990881643,
  0.3010253775, 0.4164489931, 0.2158912359, 0.0003913046, 0.0000000000,
  0.6348588999, 0.5327961873, 0.3360717777, 1.1168928052, 0.2971940110,
  0.7763132355, 0.6266069969, 0.2506407727, 1.3517969687
)

optim(c(0, 100), EmpiricalCalibration:::logLikelihoodNullMcmc, logRr = logRr, seLogRr = seLogRr)
#> Error in optim(c(0, 100), EmpiricalCalibration:::logLikelihoodNullMcmc, : function cannot be evaluated at initial parameters
EmpiricalCalibration:::logLikelihoodNullMcmc(theta = c(0, 100), logRr = logRr, seLogRr = seLogRr)
#> [1] Inf

If I remove the value 0 from the seLogRr and the corresponding value from logRr it seems to not return Inf.

# Removing logRr 0 (index 15)
logRrNoZero <- logRr[c(1:14, 16:length(logRr))]
seLogRrNoZero <- seLogRr[c(1:14, 16:length(seLogRr))]

EmpiricalCalibration:::logLikelihoodNullMcmc(theta = c(0, 100), logRr = logRrNoZero, seLogRr = seLogRrNoZero)
#> [1] 28.44091
optim(c(0, 100), EmpiricalCalibration:::logLikelihoodNullMcmc, logRr = logRrNoZero, seLogRr = seLogRrNoZero)
#> $par
#> [1]  0.1692758 34.8485282
#> 
#> $value
#> [1] 21.13314
#> 
#> $counts
#> function gradient 
#>       79       NA 
#> 
#> $convergence
#> [1] 0
#> 
#> $message
#> NULL

Finally I checked the delta between each logRr and seLogRr value and the 15th value has a delta of ~18.

abs(logRr - seLogRr)
#> [1]  0.188838315  0.331963959  0.065583255  0.256063766  0.220691309  2.229427890  0.038978049
#> [8]  0.193162904  0.256446156  0.031514532  0.162231496  0.470871474  0.276751874  0.012776512
#> [15] 18.000212194  0.418121323  0.541079244  0.367118176  2.438356740  0.262514673  0.008231167
#> [22]  0.510305512  0.085698182  0.604800950

^{Created on 2024-08-26 with reprex v2.1.1}

I can reproduce it in a toy example aswel:

logRr <- c(rnorm(9), -19)
seLogRr <- c(rnorm(9), 0)
theta <- c(0, 100)

optim(c(0, 100), EmpiricalCalibration:::logLikelihoodNullMcmc, logRr = logRr, seLogRr = seLogRr)
#> Error in optim(c(0, 100), EmpiricalCalibration:::logLikelihoodNullMcmc, : function cannot be evaluated at initial parameters
EmpiricalCalibration:::logLikelihoodNullMcmc(theta = c(0, 100), logRr = logRr, seLogRr = seLogRr)
#> [1] Inf

^{Created on 2024-08-26 with reprex v2.1.1}

I'm out of my depth of what the implications are or how to solve it. But hopefully this is helpful.

mvankessel-EMC · 2024-08-26T11:00:17Z

This should now be handled in this commit: 37e237c

The real question is: why did you have NA estimates for all your negative controls?

Regarding this build, I run into a different error:

#> Error in quantile.default(dist, c(0.5, alpha/2, 1 - (alpha/2))) : 
#>   missing values and NaN's not allowed if 'na.rm' is FALSE
#> In addition: Warning messages:
#> 1: In EmpiricalCalibration::fitMcmcNull(logRr = ncs$logRr, seLogRr = ncs$seLogRr) :
#>   Estimate(s) with NA standard error detected. Removing before fitting null distribution
#> 2: In EmpiricalCalibration::fitMcmcNull(logRr = ncs$logRr, seLogRr = ncs$seLogRr) :
#>   No valid estimates left. Returning undefined null distribution

I'm assuming it is this quantile() call, that also should have na.rm = TRUE.

…ematicError()`. Fixes #9

schuemie · 2024-10-11T11:43:08Z

Thanks! Should now be fixed in develop

mvankessel-EMC changed the title ~~optim() cannot be evaluated at initial parameters~~ fitMcmcNull() does not handle vectors of NA or NULL Aug 21, 2024

mvankessel-EMC mentioned this issue Aug 21, 2024

Calibrating estimates OHDSI/SelfControlledCaseSeries#61

Open

schuemie added a commit that referenced this issue Oct 11, 2024

Now also handling undefined MCMC null in `computeExpectedAbsoluteSyst…

63b3252

…ematicError()`. Fixes #9

schuemie self-assigned this Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`fitMcmcNull()` does not handle vectors of `NA` or `NULL` #9

`fitMcmcNull()` does not handle vectors of `NA` or `NULL` #9

mvankessel-EMC commented Aug 21, 2024 •

edited

Loading

mvankessel-EMC commented Aug 21, 2024

schuemie commented Aug 22, 2024

schuemie commented Aug 22, 2024

mvankessel-EMC commented Aug 26, 2024 •

edited

Loading

mvankessel-EMC commented Aug 26, 2024 •

edited

Loading

mvankessel-EMC commented Aug 26, 2024

schuemie commented Oct 11, 2024

fitMcmcNull() does not handle vectors of NA or NULL #9

fitMcmcNull() does not handle vectors of NA or NULL #9

Comments

mvankessel-EMC commented Aug 21, 2024 • edited Loading

mvankessel-EMC commented Aug 21, 2024

schuemie commented Aug 22, 2024

schuemie commented Aug 22, 2024

mvankessel-EMC commented Aug 26, 2024 • edited Loading

mvankessel-EMC commented Aug 26, 2024 • edited Loading

mvankessel-EMC commented Aug 26, 2024

schuemie commented Oct 11, 2024

`fitMcmcNull()` does not handle vectors of `NA` or `NULL` #9

`fitMcmcNull()` does not handle vectors of `NA` or `NULL` #9

mvankessel-EMC commented Aug 21, 2024 •

edited

Loading

mvankessel-EMC commented Aug 26, 2024 •

edited

Loading

mvankessel-EMC commented Aug 26, 2024 •

edited

Loading