Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] concat operation on distributions #340

Open
fkiraly opened this issue May 17, 2024 · 1 comment
Open

[ENH] concat operation on distributions #340

fkiraly opened this issue May 17, 2024 · 1 comment
Labels
enhancement module:probability&simulation probability distributions and simulators

Comments

@fkiraly
Copy link
Collaborator

fkiraly commented May 17, 2024

There should be a concat operation on distributions.

This will require:

  • a "concatenated distribution" compositor skpro.concat, similar to pd.concat
  • a dunder-like method in distributions to implement specific concat operations that allow to "flatten" concatenated distibutions of the same type

A direct consumer of this interface would be sktime.forecasting's predict_proba vectorization.

@SaiRevanth25
Copy link
Contributor

Hackmd notes from the previous discussion

To view, click the dropdown arrow.

concat operation

Example usage

d1 = Normal(mu=[[1, 2], [3, 4]], sigma=1)  # 2 x 2
d2 = Normal(mu=0, sigma = [[2, 42]])  # 1 x 2

d = concat([d1, d2], axis=0)  # 3 x 2

This d should then be the same abstract distribution (not necessarily the same object) as if constructed direcctly by

d = Normal(
    mu = [[1, 2], [3, 4], [0, 0]],
    sigma = [[1, 1], [1, 1], [2, 42]],
)  # 3 x 2

(the repetition of 0s and 1s is due to broadcasting)

We observe the following:

  • pd.concat([d1.mean(), d2.mean()]) is the same as skpro.concat([d1, d2]).mean()
  • pd.concat([d1.var(), d2.var()]) is the same as skpro.concat([d1, d2]).var()
  • pd.concat([d1.pdf(x1), d2.pdf(x2)]) is the same as skpro.concat([d1, d2]).pdf(pd.concat([x1, x2]))

different distributions

SR - what happens if there are two different distributions, e.g., Normal or Laplace?

Example:

d1 = Normal(mu=[[1, 2], [3, 4]], sigma=1)  # 2 x 2
d2 = Laplace(mu=0, sscale=[[2, 42]])  # 1 x 2

d = concat([d1, d2], axis=0)  # 3 x 2

What is d?

FK - good question, I think it needs to be the "outer product" distribution, i.e., outer product of probability measures. This could be a separate compositee distribution object, and the same as

ConcatDistr([d1, d2], axis=0)

This happens whenever the two distribution types are different, i.e., we are not concating Normal with Normal or Laplace with Laplace.

mean and var behave as one would expect, same as above:

  • pd.concat([d1.mean(), d2.mean()]) is the same as skpro.concat([d1, d2]).mean()
  • pd.concat([d1.var(), d2.var()]) is the same as skpro.concat([d1, d2]).var()
  • pd.concat([d1.pdf(x1), d2.pdf(x2)]) is the same as skpro.concat([d1, d2]).pdf(pd.concat([x1, x2]))

Nothing would change here, except that ConcatDistr has to do the concatenations under the hood.

Implementation

FK: I would do two cases

First, detect whether all participating distributions are the same (type/class).

If yes, unwrap and concatenate the parameters, construct again. Perhaps allow only a certain set of distributions to behave lik ethis.

If no, wrap in ConcatDistr. This distribution type has special _mean, _var, _pdf etc, which applies these per component distribution, and then concatenates the result via pd.concat.

Thought: maybe there should be an option in concat, whether we always use ConcatDistr, or not (default?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement module:probability&simulation probability distributions and simulators
Projects
None yet
Development

No branches or pull requests

2 participants