Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplex projection #150

Draft
wants to merge 19 commits into
base: main
Choose a base branch
from
1 change: 1 addition & 0 deletions docs/Project.toml
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
[deps]
DelayEmbeddings = "5732040d-69e3-5649-938a-b6b4f237613f"
Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
DocumenterTools = "35a29f4d-8980-5a13-9543-d66fff28ecb8"
DynamicalSystems = "61744808-ddfa-5f27-97ff-6e42cc95d634"
Expand Down
3 changes: 2 additions & 1 deletion docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ using Pkg
CI = get(ENV, "CI", nothing) == "true" || get(ENV, "GITHUB_TOKEN", nothing) !== nothing
CI && Pkg.activate(@__DIR__)
CI && Pkg.instantiate()
CI && (ENV["GKSwstype"] = "100")
ENV["GKSwstype"] = "100"
using DelayEmbeddings
using TransferEntropy
using Documenter
Expand Down Expand Up @@ -49,6 +49,7 @@ PAGES = [
"info_estimators.md",
],

"edm.md",
"example_systems.md",
"Utilities" => [
"invariant_measure.md",
Expand Down
237 changes: 237 additions & 0 deletions docs/src/edm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,237 @@
# Empirical dynamical modelling

## Simplex projection

```@docs
simplex_predictions
```

### Example: reproducing Sugihara & May (1990)

The simplex projection method was introduced in Sugihara & May (1990). Let's try to reproduce their figure 1.

We start by defining the tent map and generating a time series for `μ = 1.98`.

```@example simplex_projection
using CausalityTools, DynamicalSystems, Plots, Statistics, Distributions; gr()

function eom_tentmap(dx, x, p, n)
x = x[1]
μ = p[1]
dx[1] = x < 0.5 ? μ*x : μ*(1 - x)

return
end

function tentmap(u₀ = rand(); μ = 1.98)
DiscreteDynamicalSystem(eom_tentmap, [u₀], [μ])
end

npts = 2000
sys = tentmap(μ = 1.98)
ts = diff(trajectory(sys, npts , Ttr = 1000)[:, 1])

p_ts = plot(xlabel = "Time step", ylabel = "Value", ylims = (-1.1, 1.1))
plot!(ts[1:100], label = "")
savefig("simplex_ts.svg") # hide
```

![](simplex_ts.svg)

Next, let's compute the predicted and observed values for `k = 1` and `k = 7` using embedding dimension 3 and
embedding lag 1. We'll use the first 1000 points as our training set, and try to predict the next 500 points.

```@example simplex_projection
d, τ = 3, 1
training = 1:1000
prediction = 1001:1500
x_1, x̃_1 = simplex_predictions(ts, 1, d = d, τ = τ, training = training, prediction = prediction)
x_7, x̃_7 = simplex_predictions(ts, 7, d = d, τ = τ, training = training, prediction = prediction)

p_obs_vs_pred = plot(xlabel = "Observed values", ylabel = "Predicted values")
scatter!(x_1, x̃_1, label = "k = 1", shape = :circle)
scatter!(x_7, x̃_7, label = "k = 7", shape = :star5)
savefig("simplex_correspondence.svg") # hide
```

![](simplex_correspondence.svg)

There is high correlation between observed and predicted values when predicting only one time step (`k = 1`)
into the future. As `k` increases, the performance drops off. Let's investigate this systematically.

```@example simplex_projection
kmax = 20
cors = zeros(kmax)
for k = 1:kmax
X, X̃ = simplex_predictions(ts, k, d = d, τ = τ, training = training, prediction = prediction)
cors[k] = cor(X, X̃)
end

plot(legend = :bottomleft, ylabel = "Correlation coefficient (ρ)",
xlabel = "Prediction time (k)",
ylims = (-1.1, 1.1))
hline!([0], ls = :dash, label = "", color = :grey)
scatter!(1:kmax, cors, label = "")
savefig("simplex_correlation.svg") # hide
```

![](simplex_correlation.svg)

The correlation between observed and predicted values is near perfect until `k = 3`, and then rapidly
drops off as `k` increases. At `k = 8`, there is virtually no correlation between observed and predicted values.
This means that, for this particular system, for this particular choice of embedding and choice of training/prediction sets, the predictability of the system is limited to about 4 or 5 time steps into the future (if you want good predictions).

The main point of Sugihara & May's paper was that this drop-off of prediction accuracy with `k` is characteristic of chaotic systems, and can be used to distinguish chaos from regular behaviour in time series.

Let's demonstrate this by also investigating how the correlation between observed and predicted values behaves as a function of `k` for a regular, non-chaotic time series. We'll use a sine wave with additive noise.

```@example simplex_projection
𝒩 = Uniform(-0.5, 0.5)
xs = 0.0:1.0:2000.0
r = sin.(0.5 .* xs) .+ rand(𝒩, length(xs))
plot(r[1:200])

cors_sine = zeros(kmax)
for k = 1:kmax
X, X̃ = simplex_predictions(r, k, d = d, τ = τ, training = training, prediction = prediction)
cors_sine[k] = cor(X, X̃)
end

plot(legend = :bottomleft, ylabel = "Correlation coefficient (ρ)",
xlabel = "Prediction time (k)",
ylims = (-1.1, 1.1))
hline!([0], ls = :dash, label = "", color = :grey)
scatter!(1:kmax, cors, label = "tent map", marker = :star)
scatter!(1:kmax, cors_sine, label = "sine")
savefig("simplex_correlation_sine_tent.svg") # hide
```

![](simplex_correlation_sine_tent.svg)

In contrast to the tent map, for which prediction accuracy drops off and stabilizes around zero for increasing `k`, the prediction accuracy is rather insensitive to the choice of `k` for the noisy sine time series.

### Example: determining optimal embedding dimension

```@docs
delay_simplex
```

The simplex projection method can also be used to determine the optimal embedding dimension for a time series.
Given an embedding lag `τ`, we can embed a time series `x` for a range of embedding dimensions `d ∈ 2:dmax` and
compute the average prediction power over multiple `ks` using the simplex projection method.

Here, we compute the average prediction skills from `k=1` up to `k=10` time steps into the future, for
embedding dimensions `d = 2:10`.

```@example
using CausalityTools, DynamicalSystems, Plots; gr()

sys = CausalityTools.ExampleSystems.lorenz_lorenz_bidir()
T, Δt = 150, 0.05
lorenz = trajectory(sys, T, Δt = Δt, Ttr = 100)[:, 1:3]
x1, x2, x3 = columns(lorenz)

# Determine the optimal embedding delay
τ = estimate_delay(x1, "ac_zero")

# Compute average prediction skill for
ds, ks = 2:10, 1:10
ρs = delay_simplex(x1, τ, ds = ds, ks = ks)

plot(xlabel = "Embedding dimension", ylabel = "ρ̄(observed, predicted")
plot!(ds, ρs, label = "", c = :black, marker = :star)
savefig("simplex_embedding.svg") # hide
```

![](simplex_embedding.svg)

Based on the predictability criterion, the optimal embedding dimension, for this particular realization
of the first variable of the Lorenz system, seems to be 2.

## S-map

```@docs
smap
```

The s-map, or sequential locally weighted global map, was introduced in Sugihara (1994)[^Sugihara1994]. The s-map approximates the dynamics of a system as a locally weighted global map, with a tuning parameter ``\theta`` that controls the degree of nonlinearity in the model. For ``\theta = 0``, the model is the maximum likelihood global linear solution (of eq. 2 in Sugihara, 1994), and for increasing ``\theta > 0``, the model becomes increasingly nonlinear and localized (Sugihara, 1996)[^Sugihara1996].

When such a model has been constructed, it be used as prediction tool for out-of-sample points, and can be used to characterize nonlinearity in a time series (Sugihara, 1994). Let's demonstrate with an example.

### Example: prediction power for the Lorenz system

In our implementation of `smap`, the input is a multivariate dataset - which can be a `Dataset` of either the raw variables of a multivariate dynamical system, or a `Dataset` containing an embedding of a single time series.
Here, we'll show an example of the former.

Let's generate an example orbit from a bidirectionally coupled set of Lorenz systems. We'll use the built-in
`CausalityTools.ExampleSystems.lorenz_lorenz_bidir` system, and select the first three variables for analysis.

```@example smap_lorenz
using CausalityTools, Plots, DynamicalSystems, Statistics; gr()

sys = CausalityTools.ExampleSystems.lorenz_lorenz_bidir()
T, Δt = 150, 0.05
lorenz = trajectory(sys, T, Δt = Δt, Ttr = 100)[:, 1:3]
p_orbit = plot(xlabel = "x", ylabel = "y", zlabel = "z")
plot!(columns(lorenz)..., marker = :circle, label = "", ms = 2, lα = 0.5)
savefig("lorenz_attractor.svg") # hide
```

![](lorenz_attractor.svg)

Now, we compute the `k`-step forward predictions for `k` ranging from `1` to `15`. The tuning parameter `θ`
varies from `0.0` (linear model) to `2.0` (strongly nonlinear model). Our goal is to see which model
yields the best predictions across multiple `k`.

We'll use the first 500 points of the orbit to train the model. Then, using that model, we try to
predict the next 500 points (which are not part of the training set).
Finally, we compute the correlation between the predicted values and the observed values, which measures
the model prediction skill. This procedure is repeated for each combination of `k` and `θ`.

```@example smap_lorenz
ks, θs = 1:15, 0.0:0.5:2.0
n_trainees, n_predictees = 500, 500;

# Compute correlations between predicted values `preds` and actual values `truths`
# for all parameter combinations
cors = zeros(length(ks), length(θs))
for (i, k) in enumerate(ks)
for (j, θ) in enumerate(θs)
pred = n_trainees+1:n_trainees+n_predictees
train = 1:n_trainees

preds, truths = smap(lorenz, θ = θ, k = k, trainees = train, predictees = pred)
cors[i, j] = cor(preds, truths)
end
end

p_θ_k_sensitivity = plot(xlabel = "Prediction time (k)", ylabel = "cor(observed, predicted)",
legend = :bottomleft, ylims = (-1.1, 1.1))
hline!([0], ls = :dash, c = :grey, label = "")
markers = [:star :circle :square :star5 :hexagon :circle :star]
cols = [:black, :red, :blue, :green, :purple, :grey, :black]
labels = ["θ = $θ" for θ in θs]
for i = 1:length(θs)
plot!(ks, cors[:, i], marker = markers[i], c = cols[i], ms = 5, label = labels[i])
end

# Let's also plot the subset of points we're actually using.
p_orbit = plot(columns(lorenz[1:n_trainees+n_predictees])..., marker = :circle, label = "", ms = 2, lα = 0.5)

plot(p_orbit, p_θ_k_sensitivity, layout = grid(1, 2), size = (700, 400))
savefig("smap_sensitivity_lorenz.svg") # hide
```

![](smap_sensitivity_lorenz.svg)

The nonlinear models (colored lines and symbols) far outperform the linear model (black line + stars).

Because the predictions for our system improves with increasingly nonlinear models, it indicates that our
system has some inherent nonlinearity. This is, of course, correct, since our Lorenz system is chaotic.

A formal way to test the presence of nonlinearity is, for example, to define the null hypothesis
"H0: predictions do not improve when using an equivalent nonlinear versus a linear model" (equivalent in the sense that the only parameter is `θ`) or, equivalently, "`H0: ρ_linear = ρ_nonlinear`". If predictions do in fact improve, we
instead accept the alternative hypothesis that prediction *do* improve when using nonlinear models versus using linear models. This can be formally tested using a z-statistic [^Sugihara1994].

[^Sugihara1994]: Sugihara, G. (1994). Nonlinear forecasting for the classification of natural time series. Philosophical Transactions of the Royal Society of London. Series A: Physical and Engineering Sciences, 348(1688), 477-495.
[^Sugihara1996]: Sugihara, George, et al. "Nonlinear control of heart rate variability in human infants." Proceedings of the National Academy of Sciences 93.6 (1996): 2608-2613.
49 changes: 49 additions & 0 deletions simplex.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
using CausalityTools, DynamicalSystems, Plots, StatsBase, Statistics, Distributions, Neighborhood, LinearAlgebra; gr()

function eom_tentmap(dx, x, p, n)
x = x[1]
μ = p[1]
dx[1] = x < 0.5 ? μ*x : μ*(1 - x)

return
end

function tentmap(u₀ = rand(); μ = 1.9)
DiscreteDynamicalSystem(eom_tentmap, [u₀], [μ])
end

npts = 2000
sys = tentmap(μ = 1.9)
ts = trajectory(sys, npts , Ttr = 1000)[:, 1]

plot(legend = :topleft)
k = 3
kmax = 20
cors = zeros(kmax)
τ = 1
d = 3
@show ts
training = 1:200
prediction = 201:500
for k = 1:kmax
X, X̃ = simplex_predictions(ts, k, d = d, τ = τ, training = training, prediction = prediction)
cors[k] = cor(X, X̃)
end

plot(legend = :bottomleft, ylabel = "Correlation coefficient (ρ)",
xlabel = "Prediction time (k)",
ylims = (-0.05, 1.1))
scatter!(1:kmax, cors, label = "")



# r = 1:35
# p1 = plot(prediction[r], ts[prediction][r], c = :black, lw = 2, alpha = 0.5)
# #plot!(prediction_set, x, c = :green, lw = 2)
# plot!(prediction[r], xpred[r], ls = :dot, c = :blue)
# @show xpred

# r = 1:35
# p1 = plot(prediction, ts[prediction], c = :black, lw = 2, alpha = 0.5)
# #plot!(prediction_set, x, c = :green, lw = 2)
# plot!(prediction, xpred, ls = :dot, c = :blue)
1 change: 1 addition & 0 deletions src/CausalityTools.jl
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ module CausalityTools
include("CrossMappings/CrossMappings.jl")
include("SMeasure/smeasure.jl")
include("PredictiveAsymmetry/PredictiveAsymmetry.jl")
include("EmpiricalDynamicalModelling/EmpiricalDynamicalModelling.jl")
include("example_systems/ExampleSystems.jl")

using Requires
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
using Reexport

@reexport module EmpiricalDynamicalModelling

include("simplex_projection.jl")
include("delay_simplexprojection.jl")
include("smap.jl")
end
47 changes: 47 additions & 0 deletions src/EmpiricalDynamicalModelling/delay_simplexprojection.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
using DelayEmbeddings

export delay_simplex

"""
delay_simplex(x, τ; ds = 2:10, ks = 1:10) → ρs

Determine the optimal embedding dimension for `x` based on the simplex projection algorithm
from Sugihara & May (1990)[^Sugihara1990].

For each `d ∈ ds`, we compute the correlation between observed and predicted values
for different prediction times `ks`, and average the correlation coefficients. The
embedding dimension for which the average correlation is highest is taken as the optimal
dimension. The embedding delay `τ` is given as a positive number.

Returns the prediction skills `ρs` - one `ρ` for each `d ∈ ds`.

Note: the library/training and prediction sets are automatically taken as the first and
second halves of the data, respectively. This convenience method does not allow tuning
the libraries further.

[^Sugihara1990]: Sugihara, George, and Robert M. May. "Nonlinear forecasting as a way of distinguishing chaos from measurement error in time series." Nature 344.6268 (1990): 734-741.
"""
function delay_simplex(x, τ; ds = 2:10, ks = 1:8)
ρs = zeros(length(ds))
for (i, d) in enumerate(ds)
embedding = genembed(x, -τ)

ρs_k = zeros(length(ks))
for (j, k) in enumerate(ks)
train = 1:length(x) ÷ 2
# Predictees need somewhere to go when projected forward in time,
# so make sure that we exclude predictees that are among the the
# last τ*(d-1) - k points (these would result in simplices for
# which one of the vertices has no target `k` steps in the future)
pred = length(x) ÷ 2 + 1 : length(x) - τ*(d-1) - k

x̄, x̄_actual = simplex_predictions(x, k; τ = τ, d = d,
training = train,
prediction = pred)
ρs_k[j] = cor(x̄, x̄_actual)
end
ρs[i] = mean(ρs_k)
end

return ρs
end
Loading