Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Samples as a parameterization #170

Open
aimalz opened this issue Apr 28, 2023 · 4 comments
Open

Samples as a parameterization #170

aimalz opened this issue Apr 28, 2023 · 4 comments
Labels
enhancement New feature or request good first issue Good for newcomers parameterization new/upgraded PDF parameterization need

Comments

@aimalz
Copy link
Collaborator

aimalz commented Apr 28, 2023

Sometimes the distribution is really defined by a set of samples, which especially changes how the CDF/PPF would be calculated. It's also relevant to converting to many other distributions that could logically be instantiated by fitting to samples but currently don't do so (spline being a notable exception).

Once this is ready, it should be immediately propagated to the PIT metric, RAIL's trainZ estimator, and the SOM summarizer/estimator/classifier, among others.

EDIT: Another important application of this is mentioned in #180:

There needs to be a way to access the raw PIT values rather than a parameterization of the histogram thereof so users can make a qq-plot or their own histogram from the samples. (And making these plots should be part of the RAIL evaluation demo as well as the qp demo -- I might make a RAIL issue for this separately.)

@aimalz aimalz added enhancement New feature or request good first issue Good for newcomers parameterization new/upgraded PDF parameterization need labels Apr 28, 2023
@eacharles
Copy link
Collaborator

Sorry. It's not clear to me what you want here. Specifically are you thinking of storing the samples as the persistent representation? Sure, you can do that, but then you need to specify how to extract the pdf() and cdf() from the samples, i.e., by doing something like a kernel density estimate. But if you are going to do that, why not sure the kde parameters as the persistent representation. So, probably we want to clarify a couple of things here:

So, what we need to decide to implement this is

  1. what is the persistent representation?
  2. how do you go from that to pdf() and cdf()?

@aimalz
Copy link
Collaborator Author

aimalz commented May 2, 2023

This was discussed in a recent RAIL TT tag-up, and the conclusion was that there could be reconstruction options like there are for quantiles. One possibility is to go all-in on it being essentially a discrete distribution, which, though not very attractive, is trivially self-consistent and would meet the needs of real estimators that output samples (e.g. SOM). Another is to make a KDE (with a bandwidth determined by Scott's Rule or another algorithm) that corresponds to what more users will expect when interpreting/propagating the samples from such a method, though it might be tricky to make self-consistent as Eric noted above; probably it would be safer to consider a KDE parameterization to be distinct from a samples parameterization.

@aimalz
Copy link
Collaborator Author

aimalz commented Aug 2, 2023

This is a duplicate with #33, but the conversation there was more stale, so I'll keep this issue open and close that one.

@elts6570
Copy link
Collaborator

elts6570 commented Sep 20, 2023

We also require the samples parameterization in one of the Bayesian Pipelines Topical Team cosmology projects: LSSTDESC/bayesian-pipelines-cosmology#5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers parameterization new/upgraded PDF parameterization need
Projects
None yet
Development

No branches or pull requests

3 participants