-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration with InferenceObjects.jl #381
Comments
It looks like this would involved implementing |
Since there have been no objections to these steps, I'm going to move forward with opening a PR for Step 1. |
Okay, thank you! |
I wonder actually if this is going the wrong way about this. MCMCChains destructively flattens draws into one large array, and a converter then needs to infer from the variable names how to unflatten the draws, whereas Turing can return NamedTuples containing unflattened draws. It might be better to just directly implement DynamicPPL/AbstractMCMC interfaces for I've opened a draft PR at TuringLang/Turing.jl#1913 |
On Twitter, @yebai suggested adding integration with InferenceObjects to MCMCChains: https://twitter.com/Hong_Ge2/status/1560343482216103938. I'm opening this issue for further discussion.
InferenceObjects.InferenceData
is the storage format for Monte Carlo draws used by ArviZ.jl. Along with Python'sarviz.InferenceData
, it follows the cross-language InferenceData schema. PyMC uses Python's implementation as its official sample storage format.InferenceData
can be serialized to NetCDF to standardize communicating results of Bayesian analyses across languages and PPLs. In Julia, it is built on DimensionalData. See example usage and plotting examples (using the Tables interface).@yebai's suggestion is ultimately to deprecate
Chains
to instead useInferenceData
. I see several upsides of this approach:Chains
is based on the somewhat outdated AxisArrays, while DimensionalData is more modern.Chains
flattens all draws and sampling statistics into a single 3D float array, which discards a lot of the structure of the sampled types (which may themselves be multidimensional or have non-float eltypes, such asInt
or evenCholesky
).InferenceData
's features are a superset ofChains
. It can get closer to the original structure of the user's samples with named dimensions, but it also supports storing other metadata and can store prior, predictive, log-likelihood, and warmup draws, as well as the original data.InferenceObjects
is a relatively light dependency (~0.120-0.2s load time on Julia v1.7-1.8 vs MCMCChains with 1.7-3.6s) so would not add much to MCMCChains's load time.Currently ArviZ.jl has a converter
from_mcmcchains
, which is used to convertChains
toInferenceData
. Integration betweenChains
andInferenceData
might look like the following steps:ArviZ.from_mcmcchains
here (with a better name)InferenceData
a supportedchain_type
forAbstractMCMC.sample
(https://beta.turing.ml/AbstractMCMC.jl/dev/api/#Chains), which would bypassChains
's flattening entirely. I'm not sure this should live here, but it should not live in InferenceObjects.The text was updated successfully, but these errors were encountered: