Skip to content
Pierre-Elouan Réthoré edited this page Jun 19, 2015 · 9 revisions

Motivation

We are interested to represent the inputs and outputs of the models in FUSED-Wind with information about their probability distribution and / or their uncertainty. The question that this extension proposal is trying to address is how could we complement the I/O definition dictionary with those additional information. How this information is then taken into account by the component is not the focus of this discussion.

Proposal

Basic Idea

The idea is that we would be extending the definition of a scalar value, e.g.

wind_speed:
    name:   wind_speed
    desc:   wind speed value
    type:   float
    value:  5.0
    min:    4.0
    max:    25.0
    units:  m/s

to definitions about the variables that represent a distribution instead, e.g.

wind_speed:
    name:   wind_speed
    desc:   wind speed distribution (truncated Weibull)
    type:   weibull
    A:      4.0
    k:      2.0
    min:    4.0
    max:    25.0
    units:  m/s

Concepts

Different representation of distributions

  • Gaussian
    type:   gaussian
    mean:   10.0
    sigma:  2.0
  • Weibull
    type:   weibull
    A:      4.0
    k:      2.0
  • Beta...

  • Kernel Density Estimator

    type: KDE
    value: array([])  # size:n*3 array, with n the number of normals centers
``

* Gaussian process ?
```yaml
    type:       GP
    value:      array([])    # size:n*2 array, with n the number of points of the GP
    covariance: cubic   # type of covariance matrix used
``


#### Additive uncertainty
The additive uncertainty can be represented as an overall distribution. In this example we assume to have a time series
of wind direction measurement with an estimated overall uncertainty assumed to be normal.

```yaml
wind_directions
    name:   wind_directions
    desc:   wind directions with measurement uncertainty
    type:   array
    value:  array([])
    uncertainty:
            type:   gaussian
            mean:   0.0
            sigma:  10.0
``    





#### Distribution hyper-parameter uncertainty
What if we have an uncertainty in the parameters of the distribution? This information could come from the fit of the
distribution achieved over a dataset.

```yaml
wind_speed:
    name:   wind_speed
    desc:   wind speed truncated weibull distribution fit with parameter uncertainty
    type:   weibull
    A:
            type:   gaussian
            mean:   4.0
            sigma:  1.0
    k:
            type:   gaussian
            mean:   2.0
            sigma:  0.2
    min:    4.0
    max:    25.0
    units:  m/s

Distribution additive uncertainty

Another way of describing the uncertainty of the fit would be to estimate the overall error of the fit:

wind_speed
    name:   wind_speed
    desc:   wind speed weibull distribution with overall fit uncertainty
    type:   weibull
    A:      4.0
    k:      2.0
    min:    4.0
    max:    25.0
    uncertainty:
            type:   gaussian
            mean:   0.0
            sigma:  2.0

If for some reason we know that the uncertainty of the fit is higher for some region of the distribution (f.ex not enough data point in the measurement, due to a model uncertainty, or propagated uncertainty) we can represent that uncertainty using another distribution (e.g. here a Kernel Density Estimator, KDE).

wind_speed
    name:   wind_speed
    desc:   wind speed weibull distribution with distribution dependent fit uncertainty
    type:   weibull
    A:      4.0
    k:      2.0
    min:    4.0
    max:    25.0
    uncertainty:
            type:   KDE           # kernel density estimator
            value:  array([...]), # size:n*3 array, with n the number of normals centers

Joint distributions

Joint distribution combine several inputs together. For instance a wind rose combine wind speed and wind direction together

wind_rose
    name:   wind_rose
    desc:   local wind resource
    type:   binned_weibull
    dimensions: ['wind_speed', 'wind_direction'],
    wind_speed: weibull
    wind_direction: cdf
    values: np.array([]),
    columns:['wind_direction', 'frequency', 'A', 'k']
``

#### More complicated examples:
A joint distribution with 4 dimensions built from the marginal distribution of each dimension and a copula correlating 
them together

```yaml
wind_atlas
    name:   wind_atlas
    desc:  joint distribution built from a Copula for U,D,TI,S. U.
           U is a truncated Weibull, TI is a lognormal and D&S are KDEs.
    dimensions:['wind_speed', 'wind_direction','TI','stability']
    type:   copula
    copula:
            type:           gaussian #gumble...
            correlation:    array([] #4x4 correlation on the cdf-1 of the variables (might be more useful to store the inversed and determinant
    marginals:
            wind_speed:
                type:   weibull
                A:      12.0
                k:      2.0
                min:    4.0
                max:    25.0
            wind_direction:
                type:   KDE
                value:  array([...]), #n*3 array, with 1 dimension, n the number of normals centers
            TI:
                type:   lognormal
                mean:   5.0
                sigma:  1.0
            stability:
                type:   KDE
                value:array([...]), #n*3 array, with 1 dimension, n the number of normals centers

A joint 4-dimensional distribution built from a multi-dimensional KDE

wind_atlas
    name:   wind_atlas
    desc:   joint KDE distribution for U,D,TI,S with fixed gaussian uncertainty
    type:   KDE
    value:  array([...]), #d*n*3 array, with d nb dimensions, n the number of normals centers
    dimensions:['wind_speed', 'wind_direction','TI','stability']
    uncertainty:
            type:       multivariate_gaussian
            mean:       [0.0,...],#list(4)
            covariance: array([]),#array 4x4

A joint 4-dimensional distribution built from a multi-dimensional KDE with an additional KDE to model its uncertainty

wind_atlas
    name:   wind_atlas
    desc:   joint distribution for U,D,TI,S with a probability uncertainty function of the inputs
    type:   KDE
    value:  array([...]), #d*n*3 array, with d nb dimensions, n the number of normals centers
    dimensions:['wind_speed', 'wind_direction','TI','stability']
    uncertainty:
            type:   KDE
            value:  array([...]), #d*n*3 array, with d nb dimensions, n the number of normals centers
            dimensions:['wind_speed', 'wind_direction','TI','stability']
``    
Clone this wiki locally