Skip to content

SBC FAQ

Hyunji Moon edited this page Aug 26, 2021 · 8 revisions

SBC argument recommendations

The number of datasets N, the number of posterior samples from each dataset M affect SBC test results. M is understood as the minimum targeted effective sample size for all parameters. Within SBC workflow, these choices are encoded with n_datasets, iter_warmup, iter_sampling, thin_rank arguments.

Example of diagnostics affected by the argument, thin_rank (from 10 (default) to 40 to 50) shows the need for a more robust design regarding input arguments.

Autocorrelation

It is recommended that the posterior sample not having autocorrelation which can be adjusted in two ways. First is online; doubling the ESS with online inspection of ESS_tail. during sampling and double the compute_results level via thin_ranks parameters.

Which parameter and diagnostics to use?

This rests upon modeler's choice, but we observed lp__ and ess_tail are most conservative diagnostics in most cases. As illustrated, lp__ tends to be most robust, giving nonNA ess; higher ess_bulk does not necessarily mean higher ess_tail. Refer to this Rhat_ESS for the comparison of the two.

> fit$[[1]]
 variable  rhat ess_bulk ess_tail
 lp__      1.01      224      219
 rate[1]   1.04      129      243
 rate[2]   1.04      118       NA
 rate[3]   1.05      120       NA
 p21       1.01      308      340
 p31       1.01      342      253
> fit$[[11]]
 variable  rhat ess_bulk ess_tail
 lp__      1.02      185      249
 rate[1]   1.01      236       NA
 rate[2]   1.01      229      226
 rate[3]   1.01      210      200
 p21       1.01      258      301
 p31       1.01      369       NA

Is SBC_hacking ok?

def: changing arguments until SBC test is passed which could understate the false positive.

Power, false positive, false discovery rate

Ongoing discussion on the SBC test can be found in this post.

Diagnostics

Rank smoothing

When dealing with discrete parameter values, the samples may include ties. Eq. 1 and 2 from this paper give two ways to "smooth" the ranked samples back to a discrete uniform distribution with 1 + ceiling(M / thin_rank) possible ranks. Implemented in this package is the randomized rank smoothing defined in Eq. 1 of the article. In short, a prior sample theta is assigned a random rank chosen uniformly between sum(posterior_theta < theta) and sum(posterior_theta <= theta), where posterior_theta is a sample from the posterior distribution of the parameter theta conditioned on the data generated based on the prior draw theta.

Interpreting numeric summary

Numeric diagnostic which compares prior and posterior has three flavors:

  1. point to set metric
  2. set to set metric
  3. rank to uniform

1. point to set comparison

  • z and c score A z-score to contraction plot is an example of point to set comparison (Betancourt, 2018). z-score and c are defined as below and theta_tilde is a prior sample from the true model; z-score measures relative bias while c calculates posterior contraction for each prior sample.

2. set to set comparison

This extends the previous metric and measures the distance between the prior sample and posterior samples as a whole. The figure below illustrates the difference between 1 and 2 where A in the right figure means that prior samples go through data simulation and posterior simulation.

3. rank to uniform

Distance metric between two discrete distributions D(sum_m(posterior_mn < prior_n), uniform) is supported; such as pval, max_diff, Wasserstein, Cumulative Jensen-Shannon divergence.

Interpreting graphic summary

Bias and dispersion are two main calibration targets for a given joint distribution simulator. From the four-quadrant in SBC_diff plot, one breach (outside the interval) might indicate bias whereas two breaches in (1,3) or (2,4) might allude dispersion.