You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently we have two classes of conversion functions. The first class consists of from_XXX functions, which dispatch on the posterior type and have a number of keywords specific to that type. e.g. from_namedtuple or from_mcmcchains. Then we have the generic functions convert_to_inference_data and convert_to_dataset, which have methods that dispatch to the from_XXX functions. These functions can even be used within other from_XXX functions to allow groups of one type to be mixed with a posterior of another type.
When a user wants their type to be convertible to an InferenceData, they in general implement a from_XXX function and a special convert_to_inference_data method.
Here I propose a major design of this pipeline. Here are some principles we use:
There are 2 types of objects we want to convert: objects that contain data for a single group (or part of a group) and objects that contain data for multiple groups.
A user may want to split the data in the first type of object into several groups.
We drop the prefix convert_to_, since we're not in general doing conversion.
The first type of object can be hooked into the pipeline by implementing a single function dataset.
The second type of object can be hooked into the pipeline by implementing the functions inferencedata and dataset.
All conversion functions should absorb unused keywords into kwargs, so that a single inferencedata call can use keywords for multiple conversion methods so long as they don't clash.
Working prototype of the pipeline
using InferenceObjects
# fallback to current pipeline for demonstration purposesdataset(data; kwargs...) =convert_to_dataset(data; kwargs...)
inferencedata(data::InferenceData; kwargs...) = data
inferencedata(data; kwargs...) =inferencedata(:posterior=> data; kwargs...)
functioninferencedata(data::Pair{Symbol}; kwargs...)
k, v = data
ds =if k ∈ (:constant_data, :observed_data)
dataset(v; default_dims=(), kwargs...)
elsedataset(v; kwargs...)
endreturnInferenceData(; k => ds)
endfunctioninferencedata(data, next::Pair{Symbol}, others::Pair{Symbol}...; kwargs...)
inferencedata(inferencedata(data; kwargs...), next, others...; kwargs...)
endfunctioninferencedata(data::InferenceData, next::Pair{Symbol}, others::Pair{Symbol}...; kwargs...)
merge(data, inferencedata(next; kwargs...), others...; kwargs...)
endstruct Subset{V}
source::Symbol
var_map::Vendfunctionsubset(source::Symbol, var_map::Tuple{Vararg{Union{Symbol,Pair{Symbol,Symbol}}}})
var_map_new =map(var_map) do v
v isa Pair &&return v
return v => v
endreturnSubset(source, var_map_new)
endfunctioninferencedata(data::InferenceData, next::Pair{Symbol,<:Subset}, others::Pair{Symbol}...; kwargs...)
k, s = next
source_vars =map(last, s.var_map)
source_ds = data[s.source]
source_ds_new = source_ds[filter(∉(source_vars), keys(source_ds))]
subset_nt =NamedTuple(source_ds[source_vars])
subset =Dataset(NamedTuple{map(first, s.var_map)}(values(subset_nt)))
idata_merged =merge(data, InferenceData(; s.source => source_ds_new, k => subset))
returninferencedata(idata_merged, others...; kwargs...)
end
The subseting machinery is generic, so we don't need to customize it for every type like we currently do in the from_XXX methods.
There are still some kinks to work out in this pipeline, like correct handling of dimensions when the variables are renamed, but let's check the extensibility of the pipeline.
Demonstration of pipeline extensibility
Here we define two types of storage of MCMC results, representing the two types defined above.
# represents some object containing data from a single datasetstruct PosteriorStorage
nt
enddataset(post::PosteriorStorage; kwargs...) =dataset(post.nt; kwargs...)
# represents some object containing data from multiple datasets, here a posterior and sample_stats# we allow it to be converted to an InferenceData or to a dataset, in which case a single Dataset is extracted, here the posterior# e.g. MCMCChains.Chains or SampleChains.MultiChainstruct MultiGroupStorage
nt
endfunctioninferencedata(post::MultiGroupStorage; kwargs...)
inferencedata(post.nt, :sample_stats=>subset(:posterior, (:lp,)); kwargs...);
enddataset(post::MultiGroupStorage; kwargs...) =inferencedata(post; kwargs...).posterior
Now let's wrap our NamedTuple in these types and execute the pipeline:
Currently we have two classes of conversion functions. The first class consists of
from_XXX
functions, which dispatch on the posterior type and have a number of keywords specific to that type. e.g.from_namedtuple
orfrom_mcmcchains
. Then we have the generic functionsconvert_to_inference_data
andconvert_to_dataset
, which have methods that dispatch to thefrom_XXX
functions. These functions can even be used within otherfrom_XXX
functions to allow groups of one type to be mixed with a posterior of another type.When a user wants their type to be convertible to an
InferenceData
, they in general implement afrom_XXX
function and a specialconvert_to_inference_data
method.Here I propose a major design of this pipeline. Here are some principles we use:
convert_to_
, since we're not in general doing conversion.dataset
.inferencedata
anddataset
.kwargs
, so that a singleinferencedata
call can use keywords for multiple conversion methods so long as they don't clash.Working prototype of the pipeline
Demonstration
Now here's a demonstration of how we use it:
The
subset
ing machinery is generic, so we don't need to customize it for every type like we currently do in thefrom_XXX
methods.There are still some kinks to work out in this pipeline, like correct handling of dimensions when the variables are renamed, but let's check the extensibility of the pipeline.
Demonstration of pipeline extensibility
Here we define two types of storage of MCMC results, representing the two types defined above.
Now let's wrap our
NamedTuple
in these types and execute the pipeline:The text was updated successfully, but these errors were encountered: