[ENH] `concat` operation on distributions #340

fkiraly · 2024-05-17T11:28:19Z

There should be a concat operation on distributions.

This will require:

a "concatenated distribution" compositor skpro.concat, similar to pd.concat
a dunder-like method in distributions to implement specific concat operations that allow to "flatten" concatenated distibutions of the same type

A direct consumer of this interface would be sktime.forecasting's predict_proba vectorization.

The text was updated successfully, but these errors were encountered:

SaiRevanth25 · 2024-11-03T05:21:35Z

Hackmd notes from the previous discussion

To view, click the dropdown arrow.

concat operation

Example usage

d1 = Normal(mu=[[1, 2], [3, 4]], sigma=1)  # 2 x 2
d2 = Normal(mu=0, sigma = [[2, 42]])  # 1 x 2

d = concat([d1, d2], axis=0)  # 3 x 2

This d should then be the same abstract distribution (not necessarily the same object) as if constructed direcctly by

d = Normal(
    mu = [[1, 2], [3, 4], [0, 0]],
    sigma = [[1, 1], [1, 1], [2, 42]],
)  # 3 x 2

(the repetition of 0s and 1s is due to broadcasting)

We observe the following:

pd.concat([d1.mean(), d2.mean()]) is the same as skpro.concat([d1, d2]).mean()
pd.concat([d1.var(), d2.var()]) is the same as skpro.concat([d1, d2]).var()
pd.concat([d1.pdf(x1), d2.pdf(x2)]) is the same as skpro.concat([d1, d2]).pdf(pd.concat([x1, x2]))

different distributions

SR - what happens if there are two different distributions, e.g., Normal or Laplace?

Example:

d1 = Normal(mu=[[1, 2], [3, 4]], sigma=1)  # 2 x 2
d2 = Laplace(mu=0, sscale=[[2, 42]])  # 1 x 2

d = concat([d1, d2], axis=0)  # 3 x 2

What is d?

FK - good question, I think it needs to be the "outer product" distribution, i.e., outer product of probability measures. This could be a separate compositee distribution object, and the same as

ConcatDistr([d1, d2], axis=0)

This happens whenever the two distribution types are different, i.e., we are not concating Normal with Normal or Laplace with Laplace.

mean and var behave as one would expect, same as above:

pd.concat([d1.mean(), d2.mean()]) is the same as skpro.concat([d1, d2]).mean()
pd.concat([d1.var(), d2.var()]) is the same as skpro.concat([d1, d2]).var()
pd.concat([d1.pdf(x1), d2.pdf(x2)]) is the same as skpro.concat([d1, d2]).pdf(pd.concat([x1, x2]))

Nothing would change here, except that ConcatDistr has to do the concatenations under the hood.

Implementation

FK: I would do two cases

First, detect whether all participating distributions are the same (type/class).

If yes, unwrap and concatenate the parameters, construct again. Perhaps allow only a certain set of distributions to behave lik ethis.

If no, wrap in ConcatDistr. This distribution type has special _mean, _var, _pdf etc, which applies these per component distribution, and then concatenates the result via pd.concat.

Thought: maybe there should be an option in concat, whether we always use ConcatDistr, or not (default?)

fkiraly added enhancement module:probability&simulation probability distributions and simulators labels May 17, 2024

fkiraly mentioned this issue Oct 18, 2024

[ENH] Extending predict_proba support to Hierarchical and Panel datatypes sktime/sktime#7148

Open

SaiRevanth25 mentioned this issue Nov 3, 2024

[ENH] support for row multi-index in distributions #212

Open

SaiRevanth25 mentioned this issue Nov 19, 2024

[ENH] concat operation on distributions #499

Draft

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] `concat` operation on distributions #340

[ENH] `concat` operation on distributions #340

fkiraly commented May 17, 2024

SaiRevanth25 commented Nov 3, 2024

Example usage

different distributions

Implementation

[ENH] concat operation on distributions #340

[ENH] concat operation on distributions #340

Comments

fkiraly commented May 17, 2024

SaiRevanth25 commented Nov 3, 2024

Example usage

different distributions

Implementation

[ENH] `concat` operation on distributions #340

[ENH] `concat` operation on distributions #340