Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fitMcmcNull() does not handle vectors of NA or NULL #9

Open
mvankessel-EMC opened this issue Aug 21, 2024 · 6 comments
Open

fitMcmcNull() does not handle vectors of NA or NULL #9

mvankessel-EMC opened this issue Aug 21, 2024 · 6 comments

Comments

@mvankessel-EMC
Copy link

mvankessel-EMC commented Aug 21, 2024

While running an SCCS analysis I run into the following error during the calibration of the estimates:

#> Error in optim(c(0, 100), loglikelihoodNullMcmc, logRr = logRr, seLogRr = seLogRr):
#>  function cannot be evaluated at initial parameters

What seems to be the case is that the supplied logRr and SeLogRr are both vectors containing NA's. So they get all get removed here. Which results in a vector of numeric(0) (length = 0) for both logRr and SeLogRr.

If after removal should there be a check that asserts the length of both and if they are of length 0, set the value to 0, or a vector of the original length of 0's? I'm not sure what 'standard' value you'd use in this case.

I.e.:

if (any(is.na(seLogRr))) {
  warning("Estimate(s) with NA standard error detected. Removing before fitting null distribution")
  logRr <- logRr[!is.na(seLogRr)]
  seLogRr <- seLogRr[!is.na(seLogRr)]
}
if (any(is.na(logRr))) {
  warning("Estimate(s) with NA logRr detected. Removing before fitting null distribution")
  seLogRr <- seLogRr[!is.na(logRr)]
  logRr <- logRr[!is.na(logRr)]
}
if (length(seLogRr) == 0) {
  # Or should this be a vector of 0's of the length of the original seLogRr vector?
  seLogRr <- 0
}
if (length(logRr) == 0) {
  logRr <- 0
}
@mvankessel-EMC
Copy link
Author

Here are some toy examples with various logRr and seLogRr inputs:

logRr <- c(1, 2, 3)
seLogRr <- c(4, 5, 6)
EmpiricalCalibration::fitMcmcNull(logRr, seLogRr)
#> Estimated null distribution (using MCMC)
#> 
#>           Estimate lower .95 upper .95
#> Mean      1.491929  1.491929    1.4919
#> Precision 1.475598  0.017284   24.2599
#> 
#> Acceptance rate: 0.859931400685993

logRr <- c(1, 2, NULL)
seLogRr <- c(4, 5, NULL)
EmpiricalCalibration::fitMcmcNull(logRr, seLogRr)
#> Estimated null distribution (using MCMC)
#> 
#>           Estimate lower .95 upper .95
#> Mean      0.942659  0.942659    0.9427
#> Precision 2.661310  0.011013   35.4243
#> 
#> Acceptance rate: 0.91490085099149

logRr <- c(1, 2, NA)
seLogRr <- c(4, 5, NA)
EmpiricalCalibration::fitMcmcNull(logRr, seLogRr)
#> Warning in EmpiricalCalibration::fitMcmcNull(logRr, seLogRr): Estimate(s) with
#> NA standard error detected. Removing before fitting null distribution
#> Estimated null distribution (using MCMC)
#> 
#>            Estimate lower .95 upper .95
#> Mean      0.9426589 0.9426589    0.9427
#> Precision 0.6259526 0.0067681   11.5137
#> 
#> Acceptance rate: 0.871571284287157

logRr <- c(NULL, NULL, NULL)
seLogRr <- c(NULL, NULL, NULL)
EmpiricalCalibration::fitMcmcNull(logRr, seLogRr)
#> Error in eval(expr, envir, enclos): Not compatible with requested type: [type=NULL; target=double].

logRr <- c(NA, NA, NA)
seLogRr <- c(NA, NA, NA)
EmpiricalCalibration::fitMcmcNull(logRr, seLogRr)
#> Warning in EmpiricalCalibration::fitMcmcNull(logRr, seLogRr): Estimate(s) with
#> NA standard error detected. Removing before fitting null distribution
#> Error in optim(c(0, 100), logLikelihoodNullMcmc, logRr = logRr, seLogRr = seLogRr): function cannot be evaluated at initial parameters

Created on 2024-08-21 with reprex v2.1.1

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.1 (2024-06-14 ucrt)
#>  os       Windows 11 x64 (build 22631)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  Dutch_Netherlands.utf8
#>  ctype    Dutch_Netherlands.utf8
#>  tz       Europe/Amsterdam
#>  date     2024-08-21
#>  pandoc   3.1.11 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package              * version date (UTC) lib source
#>  cli                    3.6.3   2024-06-21 [1] CRAN (R 4.4.1)
#>  digest                 0.6.37  2024-08-19 [1] CRAN (R 4.4.1)
#>  EmpiricalCalibration   3.1.2   2024-08-21 [1] Github (OHDSI/EmpiricalCalibration@b3ade3b)
#>  evaluate               0.24.0  2024-06-10 [1] CRAN (R 4.4.1)
#>  fastmap                1.2.0   2024-05-15 [1] CRAN (R 4.4.1)
#>  fs                     1.6.4   2024-04-25 [1] CRAN (R 4.4.1)
#>  glue                   1.7.0   2024-01-09 [1] CRAN (R 4.4.1)
#>  htmltools              0.5.8.1 2024-04-04 [1] CRAN (R 4.4.1)
#>  knitr                  1.48    2024-07-07 [1] CRAN (R 4.4.1)
#>  lifecycle              1.0.4   2023-11-07 [1] CRAN (R 4.4.1)
#>  Rcpp                   1.0.13  2024-07-17 [1] CRAN (R 4.4.1)
#>  reprex                 2.1.1   2024-07-06 [1] CRAN (R 4.4.1)
#>  rlang                  1.1.4   2024-06-04 [1] CRAN (R 4.4.1)
#>  rmarkdown              2.28    2024-08-17 [1] CRAN (R 4.4.1)
#>  rstudioapi             0.16.0  2024-03-24 [1] CRAN (R 4.4.1)
#>  sessioninfo            1.2.2   2021-12-06 [1] CRAN (R 4.4.1)
#>  withr                  3.0.1   2024-07-31 [1] CRAN (R 4.4.1)
#>  xfun                   0.47    2024-08-17 [1] CRAN (R 4.4.1)
#>  yaml                   2.3.10  2024-07-26 [1] CRAN (R 4.4.1)
#> 
#>  [1] C:/R/R-4.4.1/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

@mvankessel-EMC mvankessel-EMC changed the title optim() cannot be evaluated at initial parameters fitMcmcNull() does not handle vectors of NA or NULL Aug 21, 2024
@schuemie
Copy link
Member

Hmmm, I guess if there are no valid estimates at all, the fitted distribution should have NA parameters, and calibration should return NA. I'll add this as a special case and make sure it gets handled correctly.

Where do you see NULL estimates?

@schuemie
Copy link
Member

This should now be handled in this commit: 37e237c

The real question is: why did you have NA estimates for all your negative controls?

@mvankessel-EMC
Copy link
Author

mvankessel-EMC commented Aug 26, 2024

So sometimes the logRr and seLogRr would be vectors of numeric values, but sometimes would be vectors of NA. Could this be due to no records existing for all the negative controls? Would you typically handle this by explicitly removing negative controls that have 0 counts?

@mvankessel-EMC
Copy link
Author

mvankessel-EMC commented Aug 26, 2024

So I tried the above. There are values in both logRr and seLogRr now. It crashes when EmpiricalCalibration:::logLikelihoodNullMcmc() returns Inf. See the repex:

logRr <- c(
  -0.040514268, 0.481692514, 0.036482466, -0.095145701, 0.314433784,
  -1.497334478, 0.183195428, 0.289136733, 0.402694353, 0.367573632,
  0.463256874, -0.054422481, -0.060860638, 0.013167817, -18.000212194,
  1.052980223, -0.008283057, -0.031046398, -1.321463935, 0.559708684,
  0.768082068, 0.116301485, 0.164942591, 0.746996019
)

seLogRr <- c(
  0.1483240473, 0.1497285553, 0.1020657210, 0.1609180653, 0.0937424754,
  0.7320934116, 0.1442173789, 0.0959738294, 0.1462481972, 0.3990881643,
  0.3010253775, 0.4164489931, 0.2158912359, 0.0003913046, 0.0000000000,
  0.6348588999, 0.5327961873, 0.3360717777, 1.1168928052, 0.2971940110,
  0.7763132355, 0.6266069969, 0.2506407727, 1.3517969687
)

optim(c(0, 100), EmpiricalCalibration:::logLikelihoodNullMcmc, logRr = logRr, seLogRr = seLogRr)
#> Error in optim(c(0, 100), EmpiricalCalibration:::logLikelihoodNullMcmc, : function cannot be evaluated at initial parameters
EmpiricalCalibration:::logLikelihoodNullMcmc(theta = c(0, 100), logRr = logRr, seLogRr = seLogRr)
#> [1] Inf

If I remove the value 0 from the seLogRr and the corresponding value from logRr it seems to not return Inf.

# Removing logRr 0 (index 15)
logRrNoZero <- logRr[c(1:14, 16:length(logRr))]
seLogRrNoZero <- seLogRr[c(1:14, 16:length(seLogRr))]

EmpiricalCalibration:::logLikelihoodNullMcmc(theta = c(0, 100), logRr = logRrNoZero, seLogRr = seLogRrNoZero)
#> [1] 28.44091
optim(c(0, 100), EmpiricalCalibration:::logLikelihoodNullMcmc, logRr = logRrNoZero, seLogRr = seLogRrNoZero)
#> $par
#> [1]  0.1692758 34.8485282
#> 
#> $value
#> [1] 21.13314
#> 
#> $counts
#> function gradient 
#>       79       NA 
#> 
#> $convergence
#> [1] 0
#> 
#> $message
#> NULL

Finally I checked the delta between each logRr and seLogRr value and the 15th value has a delta of ~18.

abs(logRr - seLogRr)
#> [1]  0.188838315  0.331963959  0.065583255  0.256063766  0.220691309  2.229427890  0.038978049
#> [8]  0.193162904  0.256446156  0.031514532  0.162231496  0.470871474  0.276751874  0.012776512
#> [15] 18.000212194  0.418121323  0.541079244  0.367118176  2.438356740  0.262514673  0.008231167
#> [22]  0.510305512  0.085698182  0.604800950

Created on 2024-08-26 with reprex v2.1.1

I can reproduce it in a toy example aswel:

logRr <- c(rnorm(9), -19)
seLogRr <- c(rnorm(9), 0)
theta <- c(0, 100)

optim(c(0, 100), EmpiricalCalibration:::logLikelihoodNullMcmc, logRr = logRr, seLogRr = seLogRr)
#> Error in optim(c(0, 100), EmpiricalCalibration:::logLikelihoodNullMcmc, : function cannot be evaluated at initial parameters
EmpiricalCalibration:::logLikelihoodNullMcmc(theta = c(0, 100), logRr = logRr, seLogRr = seLogRr)
#> [1] Inf

Created on 2024-08-26 with reprex v2.1.1

I'm out of my depth of what the implications are or how to solve it. But hopefully this is helpful.

@mvankessel-EMC
Copy link
Author

This should now be handled in this commit: 37e237c

The real question is: why did you have NA estimates for all your negative controls?

Regarding this build, I run into a different error:

#> Error in quantile.default(dist, c(0.5, alpha/2, 1 - (alpha/2))) : 
#>   missing values and NaN's not allowed if 'na.rm' is FALSE
#> In addition: Warning messages:
#> 1: In EmpiricalCalibration::fitMcmcNull(logRr = ncs$logRr, seLogRr = ncs$seLogRr) :
#>   Estimate(s) with NA standard error detected. Removing before fitting null distribution
#> 2: In EmpiricalCalibration::fitMcmcNull(logRr = ncs$logRr, seLogRr = ncs$seLogRr) :
#>   No valid estimates left. Returning undefined null distribution

I'm assuming it is this quantile() call, that also should have na.rm = TRUE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants