Samples as a parameterization #170

aimalz · 2023-04-28T17:48:31Z

Sometimes the distribution is really defined by a set of samples, which especially changes how the CDF/PPF would be calculated. It's also relevant to converting to many other distributions that could logically be instantiated by fitting to samples but currently don't do so (spline being a notable exception).

Once this is ready, it should be immediately propagated to the PIT metric, RAIL's trainZ estimator, and the SOM summarizer/estimator/classifier, among others.

EDIT: Another important application of this is mentioned in #180:

There needs to be a way to access the raw PIT values rather than a parameterization of the histogram thereof so users can make a qq-plot or their own histogram from the samples. (And making these plots should be part of the RAIL evaluation demo as well as the qp demo -- I might make a RAIL issue for this separately.)

eacharles · 2023-04-28T23:29:53Z

Sorry. It's not clear to me what you want here. Specifically are you thinking of storing the samples as the persistent representation? Sure, you can do that, but then you need to specify how to extract the pdf() and cdf() from the samples, i.e., by doing something like a kernel density estimate. But if you are going to do that, why not sure the kde parameters as the persistent representation. So, probably we want to clarify a couple of things here:

So, what we need to decide to implement this is

what is the persistent representation?
how do you go from that to pdf() and cdf()?

aimalz · 2023-05-02T20:34:17Z

This was discussed in a recent RAIL TT tag-up, and the conclusion was that there could be reconstruction options like there are for quantiles. One possibility is to go all-in on it being essentially a discrete distribution, which, though not very attractive, is trivially self-consistent and would meet the needs of real estimators that output samples (e.g. SOM). Another is to make a KDE (with a bandwidth determined by Scott's Rule or another algorithm) that corresponds to what more users will expect when interpreting/propagating the samples from such a method, though it might be tricky to make self-consistent as Eric noted above; probably it would be safer to consider a KDE parameterization to be distinct from a samples parameterization.

aimalz · 2023-08-02T01:35:54Z

This is a duplicate with #33, but the conversation there was more stale, so I'll keep this issue open and close that one.

elts6570 · 2023-09-20T14:03:52Z

We also require the samples parameterization in one of the Bayesian Pipelines Topical Team cosmology projects: LSSTDESC/bayesian-pipelines-cosmology#5

aimalz added enhancement New feature or request good first issue Good for newcomers parameterization new/upgraded PDF parameterization need labels Apr 28, 2023

aimalz mentioned this issue Jun 22, 2023

Fix inconsistent methods syntax (read/write, get/set, etc.) #180

Open

aimalz mentioned this issue Aug 2, 2023

Samples parameterization #33

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Samples as a parameterization #170

Samples as a parameterization #170

aimalz commented Apr 28, 2023 •

edited

Loading

eacharles commented Apr 28, 2023

aimalz commented May 2, 2023

aimalz commented Aug 2, 2023

elts6570 commented Sep 20, 2023 •

edited

Loading

Samples as a parameterization #170

Samples as a parameterization #170

Comments

aimalz commented Apr 28, 2023 • edited Loading

eacharles commented Apr 28, 2023

aimalz commented May 2, 2023

aimalz commented Aug 2, 2023

elts6570 commented Sep 20, 2023 • edited Loading

aimalz commented Apr 28, 2023 •

edited

Loading

elts6570 commented Sep 20, 2023 •

edited

Loading