diff --git a/Project.toml b/Project.toml
index 72e38e72e..a0fbf9981 100644
--- a/Project.toml
+++ b/Project.toml
@@ -9,4 +9,4 @@ Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
Weave = "44d3d7a6-8a23-5bf8-98c5-b353f8df5ec9"
[compat]
-julia = "1"
\ No newline at end of file
+julia = "1"
diff --git a/tutorials/bayesian-deep-learning/01_bayesian-neural-network.jmd b/tutorials/bayesian-deep-learning/01_bayesian-neural-network.jmd
index fc5af42f0..9b82181ca 100644
--- a/tutorials/bayesian-deep-learning/01_bayesian-neural-network.jmd
+++ b/tutorials/bayesian-deep-learning/01_bayesian-neural-network.jmd
@@ -10,7 +10,7 @@ We will begin with importing the relevant libraries.
```julia
# Import libraries.
-using Turing, Flux, Plots, Random
+using Turing, Flux, Plots, Random, ReverseDiff
# Hide sampling progress.
Turing.turnprogress(false);
@@ -19,17 +19,6 @@ Turing.turnprogress(false);
Turing.setadbackend(:reversediff)
```
- ┌ Info: [Turing]: progress logging is disabled globally
- └ @ Turing /home/cameron/.julia/packages/Turing/cReBm/src/Turing.jl:22
-
-
-
-
-
- :reversediff
-
-
-
Our goal here is to use a Bayesian neural network to classify points in an artificial dataset. The code below generates data points arranged in a box-like pattern and displays a graph of the dataset we'll be working with.
@@ -68,13 +57,6 @@ end
plot_data()
```
-
-
-
-![svg](/tutorials/3_BayesNN_files/3_BayesNN_4_0.svg)
-
-
-
## Building a Neural Network
The next step is to define a [feedforward neural network](https://en.wikipedia.org/wiki/Feedforward_neural_network) where we express our parameters as distribtuions, and not single points as with traditional neural networks. The two functions below, `unpack` and `nn_forward` are helper functions we need when we specify our model in Turing.
@@ -121,7 +103,7 @@ alpha = 0.09
sig = sqrt(1.0 / alpha)
# Specify the probabalistic model.
-@model bayes_nn(xs, ts) = begin
+@model function bayes_nn(xs, ts)
# Create the weight and bias vector.
nn_params ~ MvNormal(zeros(20), sig .* ones(20))
@@ -138,7 +120,6 @@ end;
Inference can now be performed by calling `sample`. We use the `HMC` sampler here.
-
```julia
# Perform inference.
N = 5000
@@ -147,10 +128,9 @@ ch = sample(bayes_nn(hcat(xs...), ts), HMC(0.05, 4), N);
Now we extract the weights and biases from the sampled chain. We'll use these primarily in determining how good a classifier our model is.
-
```julia
# Extract all weight and bias parameters.
-theta = ch[:nn_params].value.data;
+theta = MCMCChains.group(ch, :nn_params).value;
```
## Prediction Visualization
@@ -163,7 +143,7 @@ We can use [MAP estimation](https://en.wikipedia.org/wiki/Maximum_a_posteriori_e
plot_data()
# Find the index that provided the highest log posterior in the chain.
-_, i = findmax(ch[:lp].value.data)
+_, i = findmax(ch[:lp])
# Extract the max row value from i.
i = i.I[1]
@@ -175,24 +155,16 @@ Z = [nn_forward([x, y], theta[i, :])[1] for x=x_range, y=y_range]
contour!(x_range, y_range, Z)
```
-
-
-
-![svg](/tutorials/3_BayesNN_files/3_BayesNN_16_0.svg)
-
-
-
The contour plot above shows that the MAP method is not too bad at classifying our data.
Now we can visualize our predictions.
-\$\$
+$$
p(\tilde{x} | X, \alpha) = \int_{\theta} p(\tilde{x} | \theta) p(\theta | X, \alpha) \approx \sum_{\theta \sim p(\theta | X, \alpha)}f_{\theta}(\tilde{x})
-\$\$
+$$
The `nn_predict` function takes the average predicted value from a network parameterized by weights drawn from the MCMC chain.
-
```julia
# Return the average predicted value across
# multiple weights.
@@ -203,7 +175,6 @@ end;
Next, we use the `nn_predict` function to predict the value at a sample of points where the `x` and `y` coordinates range between -6 and 6. As we can see below, we still have a satisfactory fit to our data.
-
```julia
# Plot the average prediction.
plot_data()
@@ -215,16 +186,8 @@ Z = [nn_predict([x, y], theta, n_end)[1] for x=x_range, y=y_range]
contour!(x_range, y_range, Z)
```
-
-
-
-![svg](/tutorials/3_BayesNN_files/3_BayesNN_21_0.svg)
-
-
-
If you are interested in how the predictive power of our Bayesian neural network evolved between samples, the following graph displays an animation of the contour plot generated from the network weights in samples 1 to 1,000.
-
```julia
# Number of iterations to plot.
n_end = 500
@@ -232,24 +195,10 @@ n_end = 500
anim = @gif for i=1:n_end
plot_data()
Z = [nn_forward([x, y], theta[i,:])[1] for x=x_range, y=y_range]
- contour!(x_range, y_range, Z, title="Iteration $$i", clim = (0,1))
+ contour!(x_range, y_range, Z, title="Iteration $i", clim = (0,1))
end every 5
-
-
```
- ┌ Info: Saved animation to
- │ fn = /home/cameron/code/TuringTutorials/tmp.gif
- └ @ Plots /home/cameron/.julia/packages/Plots/cc8wh/src/animation.jl:98
-
-
-
-
-
-
-
-
-
## Variational Inference (ADVI)
We can also use Turing's variational inference tools to estimate the parameters of this model. See [variational inference](https://turing.ml/dev/docs/for-developers/variational_inference) for more information.
@@ -258,6 +207,7 @@ We can also use Turing's variational inference tools to estimate the parameters
```julia
using Bijectors
using Turing: Variational
+using AdvancedVI
m = bayes_nn(hcat(xs...), ts);
@@ -266,27 +216,20 @@ q = Variational.meanfield(m)
μ = randn(length(q))
ω = -1 .* ones(length(q))
-q = Variational.update(q, μ, exp.(ω));
+q = AdvancedVI.update(q, μ, exp.(ω));
-advi = ADVI(10, 1000)
+advi = ADVI(10, 5_000)
q_hat = vi(m, advi, q);
```
- ┌ Info: [ADVI] Should only be seen once: optimizer created for θ
- │ objectid(θ) = 3812708583762184342
- └ @ Turing.Variational /home/cameron/.julia/packages/Turing/cReBm/src/variational/VariationalInference.jl:204
-
-
-
```julia
samples = transpose(rand(q_hat, 5000))
-ch_vi = Chains(reshape(samples, size(samples)..., 1), ["nn_params[$$i]" for i = 1:20]);
+ch_vi = Chains(reshape(samples, size(samples)..., 1), string.(MCMCChains.namesingroup(ch, :nn_params)));
# Extract all weight and bias parameters.
-theta = ch_vi[:nn_params].value.data;
+theta = MCMCChains.group(ch_vi, :nn_params).value;
```
-
```julia
# Plot the average prediction.
plot_data()
@@ -298,13 +241,6 @@ Z = [nn_predict([x, y], theta, n_end)[1] for x=x_range, y=y_range]
contour!(x_range, y_range, Z)
```
-
-
-
-![svg](/tutorials/3_BayesNN_files/3_BayesNN_28_0.svg)
-
-
-
## Generic Bayesian Neural Networks
The below code is intended for use in more general applications, where you need to be able to change the basic network shape fluidly. The code above is highly rigid, and adapting it for other architectures would be time consuming. Currently the code below only supports networks of `Dense` layers.
@@ -366,23 +302,11 @@ end
end
end
-# Set the backend.
-Turing.setadbackend(:reverse_diff)
-
# Perform inference.
num_samples = 500
ch2 = sample(bayes_nn_general(hcat(xs...), ts, network_shape, num_params), NUTS(0.65), num_samples);
```
- ┌ Warning: `Turing.setadbackend(:reverse_diff)` is deprecated. Please use `Turing.setadbackend(:tracker)` to use `Tracker` or `Turing.setadbackend(:reversediff)` to use `ReverseDiff`. To use `ReverseDiff`, please make sure it is loaded separately with `using ReverseDiff`.
- │ caller = setadbackend(::Symbol) at ad.jl:5
- └ @ Turing.Core /home/cameron/.julia/packages/Turing/cReBm/src/core/ad.jl:5
- ┌ Info: Found initial step size
- │ ϵ = 0.2
- └ @ Turing.Inference /home/cameron/.julia/packages/Turing/cReBm/src/inference/hmc.jl:556
-
-
-
```julia
# This function makes predictions based on network shape.
function nn_predict(x, theta, num, network_shape)
@@ -390,7 +314,7 @@ function nn_predict(x, theta, num, network_shape)
end;
# Extract the θ parameters from the sampled chain.
-params2 = ch2[:θ].value.data
+params2 = MCMCChains.group(ch2, :θ).value
plot_data()
@@ -400,11 +324,4 @@ Z = [nn_predict([x, y], params2, length(ch2), network_shape)[1] for x=x_range, y
contour!(x_range, y_range, Z)
```
-
-
-
-![svg](/tutorials/3_BayesNN_files/3_BayesNN_31_0.svg)
-
-
-
This has been an introduction to the applications of Turing and Flux in defining Bayesian neural networks.
diff --git a/tutorials/bayesian-deep-learning/Project.toml b/tutorials/bayesian-deep-learning/Project.toml
new file mode 100644
index 000000000..df324f01c
--- /dev/null
+++ b/tutorials/bayesian-deep-learning/Project.toml
@@ -0,0 +1,8 @@
+[deps]
+AdvancedVI = "b5ca4192-6429-45e5-a2d9-87aec30a685c"
+Bijectors = "76274a88-744f-5084-9051-94815aaf08c4"
+Flux = "587475ba-b771-5e3f-ad9e-33799f191a9c"
+Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
+Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
+ReverseDiff = "37e2e3b7-166d-5795-8a7a-e32c996b4267"
+Turing = "fce5fe82-541a-59a6-adf8-730c64b5f9a0"
diff --git a/tutorials/differential-equations/01_bayesian-diff-eq.jmd b/tutorials/differential-equations/01_bayesian-diff-eq.jmd
index 7f01456ce..4c70407dd 100644
--- a/tutorials/differential-equations/01_bayesian-diff-eq.jmd
+++ b/tutorials/differential-equations/01_bayesian-diff-eq.jmd
@@ -25,8 +25,6 @@ $$\frac{dx}{dt} = (\alpha - \beta y)x$$
$$\frac{dy}{dt} = (\delta x - \gamma)y$$
-
-
```julia
function lotka_volterra(du,u,p,t)
x, y = u
@@ -41,30 +39,13 @@ sol = solve(prob,Tsit5())
plot(sol)
```
-
-
-
-![svg](/tutorials/10_BayesianDiffEq_files/10_BayesianDiffEq_3_0.svg)
-
-
-
We'll generate the data to use for the parameter estimation from simulation.
With the `saveat` [argument](https://docs.sciml.ai/latest/basics/common_solver_opts/) we specify that the solution is stored only at `0.1` time units.
-
```julia
odedata = Array(solve(prob,Tsit5(),saveat=0.1))
```
-
-
-
- 2×101 Array{Float64,2}:
- 1.0 1.03981 1.05332 1.03247 0.972908 … 0.133965 0.148601 0.165247
- 1.0 1.22939 1.52387 1.88714 2.30908 0.476902 0.450153 0.426924
-
-
-
## Fitting Lotka-Volterra with DiffEqBayes
[DiffEqBayes.jl](https://github.com/SciML/DiffEqBayes.jl) is a high level package that set of extension functionality for estimating the parameters of differential equations using Bayesian methods. It allows the choice of using CmdStan.jl, Turing.jl, DynamicHMC.jl and ApproxBayes.jl to perform a Bayesian estimation of a differential equation problem specified via the DifferentialEquations.jl interface. You can read the [docs](https://docs.sciml.ai/latest/analysis/parameter_estimation/#Bayesian-Methods-1) for an understanding of the available functionality.
@@ -77,145 +58,18 @@ priors = [truncated(Normal(1.5,0.5),0.5,2.5),truncated(Normal(1.2,0.5),0,2),trun
bayesian_result_turing = turing_inference(prob,Tsit5(),t,odedata,priors,num_samples=10_000)
```
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Info: Found initial step size
- │ ϵ = 0.00625
- └ @ Turing.Inference /home/cameron/.julia/packages/Turing/GMBTf/src/inference/hmc.jl:629
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
-
-
-
-
-
- Object of type Chains, with data of type 9000×17×1 Array{Float64,3}
-
- Iterations = 1:9000
- Thinning interval = 1
- Chains = 1
- Samples per chain = 9000
- internals = acceptance_rate, hamiltonian_energy, hamiltonian_energy_error, is_accept, log_density, lp, max_hamiltonian_energy_error, n_steps, nom_step_size, numerical_error, step_size, tree_depth
- parameters = theta[1], theta[2], theta[3], theta[4], σ[1]
-
- 2-element Array{ChainDataFrame,1}
-
- Summary Statistics
- parameters mean std naive_se mcse ess r_hat
- ────────── ────── ────── ──────── ────── ───────── ──────
- theta[1] 2.3263 0.1073 0.0011 0.0021 2202.3643 1.0000
- theta[2] 1.5434 0.0957 0.0010 0.0019 2575.4033 1.0002
- theta[3] 3.1259 0.1983 0.0021 0.0031 4127.1344 1.0000
- theta[4] 1.8356 0.0827 0.0009 0.0017 2189.2825 1.0000
- σ[1] 0.8569 0.0436 0.0005 0.0005 6856.5421 0.9999
-
- Quantiles
- parameters 2.5% 25.0% 50.0% 75.0% 97.5%
- ────────── ────── ────── ────── ────── ──────
- theta[1] 2.1185 2.2428 2.3337 2.4169 2.4916
- theta[2] 1.3655 1.4750 1.5422 1.6075 1.7367
- theta[3] 2.7571 2.9893 3.1166 3.2546 3.5440
- theta[4] 1.6902 1.7708 1.8307 1.9006 1.9868
- σ[1] 0.7755 0.8266 0.8551 0.8847 0.9484
-
-
-
-
The estimated parameters are clearly very close to the desired parameter values. We can also check that the chains have converged in the plot.
-
```julia
plot(bayesian_result_turing)
```
-
-
-
-![svg](/tutorials/10_BayesianDiffEq_files/10_BayesianDiffEq_9_0.svg)
-
-
-
## Direct Handling of Bayesian Estimation with Turing
You could want to do some sort of reduction with the differential equation's solution or use it in some other way as well. In those cases DiffEqBayes might not be useful. Turing and DifferentialEquations are completely composable and you can write of the differential equation inside a Turing `@model` and it will just work.
We can rewrite the Lotka Volterra parameter estimation problem with a Turing `@model` interface as below
-
```julia
Turing.setadbackend(:forwarddiff)
@@ -239,47 +93,6 @@ model = fitlv(odedata)
chain = sample(model, NUTS(.65),10000)
```
- ┌ Info: Found initial step size
- │ ϵ = 0.2
- └ @ Turing.Inference /home/cameron/.julia/packages/Turing/GMBTf/src/inference/hmc.jl:629
- [32mSampling: 100%|█████████████████████████████████████████| Time: 0:02:48[39m
-
-
-
-
-
- Object of type Chains, with data of type 9000×17×1 Array{Float64,3}
-
- Iterations = 1:9000
- Thinning interval = 1
- Chains = 1
- Samples per chain = 9000
- internals = acceptance_rate, hamiltonian_energy, hamiltonian_energy_error, is_accept, log_density, lp, max_hamiltonian_energy_error, n_steps, nom_step_size, numerical_error, step_size, tree_depth
- parameters = α, β, γ, δ, σ
-
- 2-element Array{ChainDataFrame,1}
-
- Summary Statistics
- parameters mean std naive_se mcse ess r_hat
- ────────── ────── ────── ──────── ────── ───────── ──────
- α 1.4999 0.0060 0.0001 0.0001 2341.1779 0.9999
- β 0.9999 0.0037 0.0000 0.0001 2440.6968 0.9999
- γ 3.0001 0.0047 0.0000 0.0001 4070.6419 1.0003
- δ 1.0001 0.0032 0.0000 0.0001 2324.4733 0.9999
- σ 0.0151 0.0011 0.0000 0.0000 4591.2728 0.9999
-
- Quantiles
- parameters 2.5% 25.0% 50.0% 75.0% 97.5%
- ────────── ────── ────── ────── ────── ──────
- α 1.4881 1.4960 1.4998 1.5038 1.5118
- β 0.9925 0.9975 0.9999 1.0024 1.0074
- γ 2.9911 2.9970 3.0000 3.0032 3.0095
- δ 0.9937 0.9979 1.0001 1.0022 1.0066
- σ 0.0131 0.0143 0.0150 0.0158 0.0173
-
-
-
-
## Scaling to Large Models: Adjoint Sensitivities
DifferentialEquations.jl's efficiency for large stiff models has been shown in multiple [benchmarks](https://github.com/SciML/DiffEqBenchmarks.jl). To learn more about how to optimize solving performance for stiff problems you can take a look at the [docs](https://docs.sciml.ai/latest/tutorials/advanced_ode_example/).
@@ -297,7 +110,9 @@ All we had to do is switch the AD backend to one of the adjoint-compatible backe
```julia
+using Zygote
Turing.setadbackend(:zygote)
+
@model function fitlv(data)
σ ~ InverseGamma(2, 3)
α ~ truncated(Normal(1.5,0.5),0.5,2.5)
@@ -315,50 +130,8 @@ model = fitlv(odedata)
chain = sample(model, NUTS(.65),1000)
```
- ┌ Info: Found initial step size
- │ ϵ = 0.2
- └ @ Turing.Inference /home/cameron/.julia/packages/Turing/GMBTf/src/inference/hmc.jl:629
- [32mSampling: 100%|█████████████████████████████████████████| Time: 0:10:42[39m
-
-
-
-
-
- Object of type Chains, with data of type 500×17×1 Array{Float64,3}
-
- Iterations = 1:500
- Thinning interval = 1
- Chains = 1
- Samples per chain = 500
- internals = acceptance_rate, hamiltonian_energy, hamiltonian_energy_error, is_accept, log_density, lp, max_hamiltonian_energy_error, n_steps, nom_step_size, numerical_error, step_size, tree_depth
- parameters = α, β, γ, δ, σ
-
- 2-element Array{ChainDataFrame,1}
-
- Summary Statistics
- parameters mean std naive_se mcse ess r_hat
- ────────── ────── ────── ──────── ────── ──────── ──────
- α 1.4997 0.0052 0.0002 0.0003 201.5277 1.0046
- β 0.9999 0.0033 0.0001 0.0001 219.1974 1.0027
- γ 3.0003 0.0047 0.0002 0.0003 290.3332 1.0014
- δ 1.0002 0.0029 0.0001 0.0002 210.0807 1.0046
- σ 0.0151 0.0010 0.0000 0.0001 246.6502 1.0017
-
- Quantiles
- parameters 2.5% 25.0% 50.0% 75.0% 97.5%
- ────────── ────── ────── ────── ────── ──────
- α 1.4892 1.4962 1.5002 1.5030 1.5108
- β 0.9934 0.9978 1.0000 1.0019 1.0066
- γ 2.9910 2.9971 3.0002 3.0039 3.0084
- δ 0.9943 0.9983 1.0000 1.0021 1.0060
- σ 0.0131 0.0143 0.0151 0.0158 0.0172
-
-
-
-
Now we can exercise control of the sensitivity analysis method that is used by using the `sensealg` keyword argument. Let's choose the `InterpolatingAdjoint` from the available AD [methods](https://docs.sciml.ai/latest/analysis/sensitivity/#Sensitivity-Algorithms-1) and enable a compiled ReverseDiff vector-Jacobian product:
-
```julia
@model function fitlv(data)
σ ~ InverseGamma(2, 3)
@@ -377,98 +150,6 @@ model = fitlv(odedata)
@time chain = sample(model, NUTS(.65),1000)
```
- ┌ Info: Found initial step size
- │ ϵ = 0.2
- └ @ Turing.Inference /home/cameron/.julia/packages/Turing/GMBTf/src/inference/hmc.jl:629
- [32mSampling: 11%|████▍ | ETA: 0:06:27[39m┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- [32mSampling: 13%|█████▍ | ETA: 0:05:58[39m┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- [32mSampling: 15%|██████▎ | ETA: 0:05:27[39m┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- [32mSampling: 21%|████████▌ | ETA: 0:04:20[39m┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- [32mSampling: 23%|█████████▎ | ETA: 0:04:03[39m┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- [32mSampling: 24%|██████████ | ETA: 0:03:48[39m┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- [32mSampling: 28%|███████████▌ | ETA: 0:03:27[39m┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- [32mSampling: 29%|███████████▊ | ETA: 0:03:24[39m┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- [32mSampling: 29%|████████████ | ETA: 0:03:20[39m┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- [32mSampling: 36%|███████████████ | ETA: 0:02:45[39m┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- [32mSampling: 37%|███████████████▏ | ETA: 0:02:44[39m┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- [32mSampling: 39%|████████████████ | ETA: 0:02:36[39m┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- [32mSampling: 46%|██████████████████▉ | ETA: 0:02:08[39m┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- [32mSampling: 48%|███████████████████▊ | ETA: 0:02:03[39m┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- [32mSampling: 49%|████████████████████▏ | ETA: 0:02:01[39m┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- [32mSampling: 50%|████████████████████▎ | ETA: 0:02:00[39m┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- [32mSampling: 100%|█████████████████████████████████████████| Time: 0:03:32[39m
-
-
- 225.663919 seconds (1.41 G allocations: 66.216 GiB, 5.25% gc time)
-
-
-
-
-
- Object of type Chains, with data of type 500×17×1 Array{Float64,3}
-
- Iterations = 1:500
- Thinning interval = 1
- Chains = 1
- Samples per chain = 500
- internals = acceptance_rate, hamiltonian_energy, hamiltonian_energy_error, is_accept, log_density, lp, max_hamiltonian_energy_error, n_steps, nom_step_size, numerical_error, step_size, tree_depth
- parameters = α, β, γ, δ, σ
-
- 2-element Array{ChainDataFrame,1}
-
- Summary Statistics
- parameters mean std naive_se mcse ess r_hat
- ────────── ────── ────── ──────── ────── ──────── ──────
- α 0.9122 0.2810 0.0126 0.0152 211.4497 0.9992
- β 1.8499 0.1141 0.0051 0.0055 302.7650 1.0018
- γ 2.5879 0.3299 0.0148 0.0228 307.5110 0.9997
- δ 0.1259 0.0221 0.0010 0.0007 219.5371 1.0006
- σ 0.8746 0.0437 0.0020 0.0017 342.6660 1.0008
-
- Quantiles
- parameters 2.5% 25.0% 50.0% 75.0% 97.5%
- ────────── ────── ────── ────── ────── ──────
- α 0.5060 0.6920 0.8932 1.0874 1.5467
- β 1.5810 1.7796 1.8709 1.9437 1.9873
- γ 1.9519 2.3707 2.5999 2.8158 3.1966
- δ 0.0843 0.1103 0.1245 0.1410 0.1704
- σ 0.7984 0.8444 0.8722 0.9044 0.9651
-
-
-
-
For more examples of adjoint usage on large parameter models, consult the [DiffEqFlux documentation](https://diffeqflux.sciml.ai/dev/)
## Including Process Noise: Estimation of Stochastic Differential Equations
@@ -483,7 +164,6 @@ $$dx = (\alpha - \beta y)xdt + \phi_1 xdW_1$$
$$dy = (\delta x - \gamma)ydt + \phi_2 ydW_2$$
-
```julia
function lotka_volterra_noise(du,u,p,t)
du[1] = p[5]*u[1]
@@ -493,18 +173,8 @@ p = [1.5, 1.0, 3.0, 1.0, 0.3, 0.3]
prob = SDEProblem(lotka_volterra,lotka_volterra_noise,u0,(0.0,10.0),p)
```
-
-
-
- [36mSDEProblem[0m with uType [36mArray{Float64,1}[0m and tType [36mFloat64[0m. In-place: [36mtrue[0m
- timespan: (0.0, 10.0)
- u0: [1.0, 1.0]
-
-
-
Solving it repeatedly confirms the randomness of the solution
-
```julia
sol = solve(prob,saveat=0.01)
p1 = plot(sol)
@@ -515,63 +185,29 @@ p3 = plot(sol)
plot(p1,p2,p3)
```
-
-
-
-![svg](/tutorials/10_BayesianDiffEq_files/10_BayesianDiffEq_23_0.svg)
-
-
-
With the `MonteCarloSummary` it is easy to summarize the results from multiple runs through the `EnsembleProblem` interface, here we run the problem for 1000 `trajectories` and visualize the summary:
-
```julia
sol = solve(EnsembleProblem(prob),SRIW1(),saveat=0.1,trajectories=500)
summ = MonteCarloSummary(sol)
plot(summ)
```
-
-
-
-![svg](/tutorials/10_BayesianDiffEq_files/10_BayesianDiffEq_25_0.svg)
-
-
-
Get data from the means to fit:
-
```julia
using DiffEqBase.EnsembleAnalysis
averagedata = Array(timeseries_steps_mean(sol))
```
-
-
-
- 2×101 Array{Float64,2}:
- 1.0 1.04218 1.05885 1.03187 0.967422 … 0.190811 0.197071 0.203714
- 1.0 1.22803 1.5283 1.89036 2.30967 1.16424 1.11006 1.07984
-
-
-
Now fit the means with Turing.
We will utilize multithreading with the [`EnsembleProblem`](https://docs.sciml.ai/stable/tutorials/sde_example/#Ensemble-Simulations-1) interface to speed up the SDE parameter estimation.
-
```julia
Threads.nthreads()
```
-
-
-
- 16
-
-
-
-
```julia
Turing.setadbackend(:forwarddiff)
@@ -596,63 +232,4 @@ end;
model = fitlv(averagedata)
chain = sample(model, NUTS(.65),500)
-```
-
- ┌ Info: Found initial step size
- │ ϵ = 0.2
- └ @ Turing.Inference /home/cameron/.julia/packages/Turing/GMBTf/src/inference/hmc.jl:629
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- [32mSampling: 0%|▏ | ETA: 0:03:49[39m┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- [32mSampling: 100%|█████████████████████████████████████████| Time: 2:33:35[39m
-
-
-
-
-
- Object of type Chains, with data of type 250×19×1 Array{Float64,3}
-
- Iterations = 1:250
- Thinning interval = 1
- Chains = 1
- Samples per chain = 250
- internals = acceptance_rate, hamiltonian_energy, hamiltonian_energy_error, is_accept, log_density, lp, max_hamiltonian_energy_error, n_steps, nom_step_size, numerical_error, step_size, tree_depth
- parameters = α, β, γ, δ, σ, ϕ1, ϕ2
-
- 2-element Array{ChainDataFrame,1}
-
- Summary Statistics
- parameters mean std naive_se mcse ess r_hat
- ────────── ────── ────── ──────── ────── ────── ──────
- α 1.6255 0.0000 0.0000 0.0000 2.0325 2.5501
- β 1.1163 0.0000 0.0000 0.0000 2.0325 Inf
- γ 3.2056 0.0000 0.0000 0.0000 2.0325 0.9960
- δ 0.9268 0.0000 0.0000 0.0000 2.0325 2.9880
- σ 0.0669 0.0000 0.0000 0.0000 2.0325 1.1011
- ϕ1 0.2329 0.0000 0.0000 0.0000 2.0325 3.2549
- ϕ2 0.2531 0.0000 0.0000 0.0000 2.0325 0.9960
-
- Quantiles
- parameters 2.5% 25.0% 50.0% 75.0% 97.5%
- ────────── ────── ────── ────── ────── ──────
- α 1.6255 1.6255 1.6255 1.6255 1.6255
- β 1.1163 1.1163 1.1163 1.1163 1.1163
- γ 3.2056 3.2056 3.2056 3.2056 3.2056
- δ 0.9268 0.9268 0.9268 0.9268 0.9268
- σ 0.0669 0.0669 0.0669 0.0669 0.0669
- ϕ1 0.2329 0.2329 0.2329 0.2329 0.2329
- ϕ2 0.2531 0.2531 0.2531 0.2531 0.2531
-
-
-
-
-
-```julia
-
```
diff --git a/tutorials/differential-equations/Project.toml b/tutorials/differential-equations/Project.toml
new file mode 100644
index 000000000..4330259fe
--- /dev/null
+++ b/tutorials/differential-equations/Project.toml
@@ -0,0 +1,12 @@
+[deps]
+DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
+DiffEqBase = "2b5f629d-d688-5b77-993f-72d75c75574e"
+DiffEqBayes = "ebbdde9d-f333-5424-9be2-dbf1e9acfb5e"
+DiffEqSensitivity = "41bf760c-e81c-5289-8e54-58b1f1f8abe2"
+DifferentialEquations = "0c46a032-eb83-5123-abaf-570d42b7fbaa"
+Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
+MCMCChains = "c7f686f2-ff18-58e9-bc7b-31028e88f75d"
+Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
+Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
+StatsPlots = "f3b207a7-027a-5e70-b257-86293d7955fd"
+Turing = "fce5fe82-541a-59a6-adf8-730c64b5f9a0"
diff --git a/tutorials/graphical-models/01_hidden-markov-model.jmd b/tutorials/graphical-models/01_hidden-markov-model.jmd
index 2dc5f9057..bbe0bccd6 100644
--- a/tutorials/graphical-models/01_hidden-markov-model.jmd
+++ b/tutorials/graphical-models/01_hidden-markov-model.jmd
@@ -36,13 +36,6 @@ N = length(y); K = 3;
plot(y, xlim = (0,15), ylim = (-1,5), size = (500, 250))
```
-
-
-
-![svg](/tutorials/4_BayesHmm_files/4_BayesHmm_3_0.svg)
-
-
-
We can see that we have three states, one for each height of the plot (1, 2, 3). This height is also our emission parameter, so state one produces a value of one, state two produces a value of two, and so on.
Ultimately, we would like to understand three major parameters:
@@ -53,9 +46,9 @@ Ultimately, we would like to understand three major parameters:
With this in mind, let's set up our model. We are going to use some of our knowledge as modelers to provide additional information about our system. This takes the form of the prior on our emission parameter.
-\$\$
-m_i \sim Normal(i, 0.5), \space m = \{1,2,3\}
-\$\$
+$$
+m_i \sim \mathrm{Normal}(i, 0.5) \quad \text{where} \quad m = \{1,2,3\}
+$$
Simply put, this says that we expect state one to emit values in a Normally distributed manner, where the mean of each state's emissions is that state's value. The variance of 0.5 helps the model converge more quickly — consider the case where we have a variance of 1 or 2. In this case, the likelihood of observing a 2 when we are in state 1 is actually quite high, as it is within a standard deviation of the true emission value. Applying the prior that we are likely to be tightly centered around the mean prevents our model from being too confused about the state that is generating our observations.
@@ -120,8 +113,8 @@ The code below generates an animation showing the graph of the data above, and t
using StatsPlots
# Extract our m and s parameters from the chain.
-m_set = c[:m].value.data
-s_set = c[:s].value.data
+m_set = MCMCChains.group(c, :m).value
+s_set = MCMCChains.group(c, :s).value
# Iterate through the MCMC samples.
Ns = 1:length(c)
@@ -130,7 +123,7 @@ Ns = 1:length(c)
animation = @animate for i in Ns
m = m_set[i, :];
s = Int.(s_set[i,:]);
- emissions = collect(skipmissing(m[s]))
+ emissions = m[s]
p = plot(y, c = :red,
size = (500, 250),
@@ -139,60 +132,32 @@ animation = @animate for i in Ns
legend = :topright, label = "True data",
xlim = (0,15),
ylim = (-1,5));
- plot!(emissions, color = :blue, label = "Sample $$N")
-end every 10;
+ plot!(emissions, color = :blue, label = "Sample $N")
+end every 10
```
-![animation](https://user-images.githubusercontent.com/422990/50612436-de588980-0e8e-11e9-8635-4e3e97c0d7f9.gif)
-
Looks like our model did a pretty good job, but we should also check to make sure our chain converges. A quick check is to examine whether the diagonal (representing the probability of remaining in the current state) of the transition matrix appears to be stationary. The code below extracts the diagonal and shows a traceplot of each persistence probability.
```julia
# Index the chain with the persistence probabilities.
-subchain = c[:,["T[$$i][$$i]" for i in 1:K],:]
+subchain = MCMCChains.group(c, :T)
+# TODO: This doesn't work anymore. Note sure what it was originally doing
# Plot the chain.
-plot(subchain,
+plot(
+ subchain,
colordim = :parameter,
seriestype=:traceplot,
title = "Persistence Probability",
legend=:right
- )
+)
```
-
-
-
-![svg](/tutorials/4_BayesHmm_files/4_BayesHmm_11_0.svg)
-
-
-
A cursory examination of the traceplot above indicates that at least `T[3,3]` and possibly `T[2,2]` have converged to something resembling stationary. `T[1,1]`, on the other hand, has a slight "wobble", and seems less consistent than the others. We can use the diagnostic functions provided by [MCMCChain](https://github.com/TuringLang/MCMCChain.jl) to engage in some formal tests, like the Heidelberg and Welch diagnostic:
-
```julia
-heideldiag(c[:T])
+heideldiag(MCMCChains.group(c, :T))
```
-
-
-
- 1-element Array{ChainDataFrame{NamedTuple{(:parameters, Symbol("Burn-in"), :Stationarity, Symbol("p-value"), :Mean, :Halfwidth, :Test),Tuple{Array{String,1},Array{Float64,1},Array{Float64,1},Array{Float64,1},Array{Float64,1},Array{Float64,1},Array{Float64,1}}}},1}:
- Heidelberger and Welch Diagnostic - Chain 1
- parameters Burn-in Stationarity p-value Mean Halfwidth Test
- ────────── ─────── ──────────── ─────── ────── ───────── ──────
- T[1][1] 50.0000 0.0000 0.0001 0.5329 0.0063 1.0000
- T[1][2] 50.0000 0.0000 0.0189 0.1291 0.0043 1.0000
- T[1][3] 50.0000 0.0000 0.0230 0.3381 0.0032 1.0000
- T[2][1] 30.0000 1.0000 0.2757 0.0037 0.0000 1.0000
- T[2][2] 0.0000 1.0000 0.1689 0.0707 0.0022 1.0000
- T[2][3] 0.0000 1.0000 0.1365 0.9255 0.0022 1.0000
- T[3][1] 50.0000 0.0000 0.0454 0.4177 0.0147 1.0000
- T[3][2] 40.0000 1.0000 0.0909 0.2549 0.0080 1.0000
- T[3][3] 50.0000 0.0000 0.0098 0.3274 0.0067 1.0000
-
-
-
-
The p-values on the test suggest that we cannot reject the hypothesis that the observed sequence comes from a stationary distribution, so we can be somewhat more confident that our transition matrix has converged to something reasonable.
diff --git a/tutorials/graphical-models/Project.toml b/tutorials/graphical-models/Project.toml
new file mode 100644
index 000000000..2704e9aad
--- /dev/null
+++ b/tutorials/graphical-models/Project.toml
@@ -0,0 +1,5 @@
+[deps]
+Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
+Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
+StatsPlots = "f3b207a7-027a-5e70-b257-86293d7955fd"
+Turing = "fce5fe82-541a-59a6-adf8-730c64b5f9a0"
diff --git a/tutorials/introduction/01_introduction.jmd b/tutorials/introduction/01_introduction.jmd
index 0fec4556d..452ff0a42 100644
--- a/tutorials/introduction/01_introduction.jmd
+++ b/tutorials/introduction/01_introduction.jmd
@@ -53,18 +53,6 @@ data = rand(Bernoulli(p_true), last(Ns))
data[1:5]
```
-
-
-
- 5-element Array{Bool,1}:
- 1
- 0
- 1
- 1
- 0
-
-
-
After flipping all our coins, we want to set a prior belief about what we think the distribution of coin flips look like. In this case, we are going to choose a common prior distribution called the [Beta](https://en.wikipedia.org/wiki/Beta_distribution) distribution.
@@ -81,11 +69,11 @@ For the mathematically inclined, the `Beta` distribution is updated by adding ea
This works because mean of the `Beta` distribution is defined as the following:
-\$\$ \text{E}[\text{Beta}] = \dfrac{\alpha}{\alpha+\beta} \$\$
+$$\text{E}[\text{Beta}] = \dfrac{\alpha}{\alpha+\beta}$$
Which is 0.5 when $$\alpha = \beta$$, as we expect for a large enough number of coin flips. As we increase the number of samples, our variance will also decrease, such that the distribution will reflect less uncertainty about the probability of receiving a heads. The definition of the variance for the `Beta` distribution is the following:
-\$\$ \text{var}[\text{Beta}] = \dfrac{\alpha\beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)} \$\$
+$$\text{var}[\text{Beta}] = \dfrac{\alpha\beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)}$$
The intuition about this definition is that the variance of the distribution will approach 0 with more and more samples, as the denominator will grow faster than will the numerator. More samples means less variance.
@@ -107,23 +95,16 @@ animation = @gif for (i, N) in enumerate(Ns)
# Plotting
plot(updated_belief,
size = (500, 250),
- title = "Updated belief after $$N observations",
+ title = "Updated belief after $N observations",
xlabel = "probability of heads",
ylabel = "",
legend = nothing,
xlim = (0,1),
fill=0, α=0.3, w=3)
vline!([p_true])
-end;
+end
```
- ┌ Info: Saved animation to
- │ fn = /home/cameron/code/TuringTutorials/tmp.gif
- └ @ Plots /home/cameron/.julia/packages/Plots/Xnzc7/src/animation.jl:104
-
-
-![animation](https://user-images.githubusercontent.com/7974003/44995702-37c1b200-af9c-11e8-8b26-c88a528956af.gif)
-
The animation above shows that with increasing evidence our belief about the probability of heads in a coin flip slowly adjusts towards the true value. The orange line in the animation represents the true probability of seeing heads on a single coin flip, while the mode of the distribution shows what the model believes the probability of a heads is given the evidence it has seen.
### Coin Flipping With Turing
@@ -151,7 +132,7 @@ First, we define the coin-flip model using Turing.
@model coinflip(y) = begin
# Our prior belief about the probability of heads in a coin.
- p ~ Beta(1, 1)
+ p ~ Beta(1, 1)
# The number of observations.
N = length(y)
@@ -184,13 +165,6 @@ p_summary = chain[:p]
plot(p_summary, seriestype = :histogram)
```
-
-
-
-![svg](/tutorials/0_Introduction_files/0_Introduction_21_0.svg)
-
-
-
Now we can build our plot:
@@ -212,11 +186,4 @@ plot!(p, range(0, stop = 1, length = 100), pdf.(Ref(updated_belief), range(0, st
vline!(p, [p_true], label = "True probability", c = :red)
```
-
-
-
-![svg](/tutorials/0_Introduction_files/0_Introduction_23_0.svg)
-
-
-
As we can see, the Turing model closely approximates the true probability. Hopefully this tutorial has provided an easy-to-follow, yet informative introduction to Turing's simpler applications. More advanced usage will be demonstrated in later tutorials.
diff --git a/tutorials/introduction/02_gaussian-mixture-model.jmd b/tutorials/introduction/02_gaussian-mixture-model.jmd
index 4ed0c4110..e444183ba 100644
--- a/tutorials/introduction/02_gaussian-mixture-model.jmd
+++ b/tutorials/introduction/02_gaussian-mixture-model.jmd
@@ -27,31 +27,24 @@ x = mapreduce(c -> rand(MvNormal([μs[c], μs[c]], 1.), N), hcat, 1:2)
scatter(x[1,:], x[2,:], legend = false, title = "Synthetic Dataset")
```
-
-
-
-![svg](/tutorials/1_GaussianMixtureModel_files/1_GaussianMixtureModel_2_0.svg)
-
-
-
## Gaussian Mixture Model in Turing
To cluster the data points shown above, we use a model that consists of two mixture components (clusters) and assigns each datum to one of the components. The assignment thereof determines the distribution that the data point is generated from.
In particular, in a Bayesian Gaussian mixture model with $$1 \leq k \leq K$$ components for 1-D data each data point $$x_i$$ with $$1 \leq i \leq N$$ is generated according to the following generative process.
First we draw the parameters for each cluster, i.e. in our example we draw location of the distributions from a Normal:
-\$\$
-\mu_k \sim Normal() \, , \; \forall k \\
-\$\$
+$$
+\mu_k \sim \mathrm{Normal}() \, , \; \forall k
+$$
and then draw mixing weight for the $$K$$ clusters from a Dirichlet distribution, i.e.
-\$\$
- w \sim Dirichlet(K, \alpha) \, . \\
-\$\$
+$$
+ w \sim \mathrm{Dirichlet}(K, \alpha) \, .
+$$
After having constructed all the necessary model parameters, we can generate an observation by first selecting one of the clusters and then drawing the datum accordingly, i.e.
-\$\$
- z_i \sim Categorical(w) \, , \; \forall i \\
- x_i \sim Normal(\mu_{z_i}, 1.) \, , \; \forall i
-\$\$
+$$
+ z_i \sim \mathrm{Categorical}(w) \, , \; \forall i \\
+ x_i \sim \mathrm{Normal}(\mu_{z_i}, 1.) \, , \; \forall i
+$$
For more details on Gaussian mixture models, we refer to Christopher M. Bishop, *Pattern Recognition and Machine Learning*, Section 9.
@@ -60,21 +53,9 @@ For more details on Gaussian mixture models, we refer to Christopher M. Bishop,
using Turing, MCMCChains
# Turn off the progress monitor.
-Turing.turnprogress(false)
+Turing.turnprogress(false);
```
- ┌ Info: [Turing]: progress logging is disabled globally
- └ @ Turing /home/cameron/.julia/packages/Turing/cReBm/src/Turing.jl:22
-
-
-
-
-
- false
-
-
-
-
```julia
@model GaussianMixtureModel(x) = begin
@@ -107,13 +88,6 @@ Turing.turnprogress(false)
end
```
-
-
-
- ##GaussianMixtureModel#361 (generic function with 2 methods)
-
-
-
After having specified the model in Turing, we can construct the model function and run a MCMC simulation to obtain assignments of the data points.
@@ -139,17 +113,10 @@ In particular, in this example we consider the sample values of the location par
```julia
-ids = findall(map(name -> occursin("μ", name), names(tchain)));
+ids = findall(map(name -> occursin("μ", string(name)), names(tchain)));
p=plot(tchain[:, ids, :], legend=true, labels = ["Mu 1" "Mu 2"], colordim=:parameter)
```
-
-
-
-![svg](/tutorials/1_GaussianMixtureModel_files/1_GaussianMixtureModel_13_0.svg)
-
-
-
You'll note here that it appears the location means are switching between chains. We will address this in future tutorials. For those who are keenly interested, see [this](https://mc-stan.org/users/documentation/case-studies/identifying_mixture_models.html) article on potential solutions.
For the moment, we will just use the first chain to ensure the validity of our inference.
@@ -173,44 +140,23 @@ function predict(x, y, w, μ)
end
```
-
-
-
- predict (generic function with 1 method)
-
-
-
-
```julia
contour(range(-5, stop = 3), range(-6, stop = 2),
- (x, y) -> predict(x, y, [0.5, 0.5], [mean(tchain[:μ1].value), mean(tchain[:μ2].value)])
+ (x, y) -> predict(x, y, [0.5, 0.5], [mean(tchain[:μ1]), mean(tchain[:μ2])])
)
scatter!(x[1,:], x[2,:], legend = false, title = "Synthetic Dataset")
```
-
-
-
-![svg](/tutorials/1_GaussianMixtureModel_files/1_GaussianMixtureModel_18_0.svg)
-
-
-
## Infered Assignments
Finally, we can inspect the assignments of the data points infered using Turing. As we can see, the dataset is partitioned into two distinct groups.
```julia
-assignments = collect(skipmissing(mean(tchain[:k].value, dims=1).data))
+# TODO: is there a better way than this icky `.nt.mean` stuff?
+assignments = mean(MCMCChains.group(tchain, :k)).nt.mean
scatter(x[1,:], x[2,:],
legend = false,
title = "Assignments on Synthetic Dataset",
zcolor = assignments)
```
-
-
-
-
-![svg](/tutorials/1_GaussianMixtureModel_files/1_GaussianMixtureModel_21_0.svg)
-
-
diff --git a/tutorials/introduction/Project.toml b/tutorials/introduction/Project.toml
new file mode 100644
index 000000000..c3d3e22a9
--- /dev/null
+++ b/tutorials/introduction/Project.toml
@@ -0,0 +1,7 @@
+[deps]
+Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
+MCMCChains = "c7f686f2-ff18-58e9-bc7b-31028e88f75d"
+Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
+Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
+StatsPlots = "f3b207a7-027a-5e70-b257-86293d7955fd"
+Turing = "fce5fe82-541a-59a6-adf8-730c64b5f9a0"
diff --git a/tutorials/non-parameteric/01_infinite-mixture-model.jmd b/tutorials/non-parameteric/01_infinite-mixture-model.jmd
index 310cad70a..5ac4ef9d8 100644
--- a/tutorials/non-parameteric/01_infinite-mixture-model.jmd
+++ b/tutorials/non-parameteric/01_infinite-mixture-model.jmd
@@ -23,12 +23,12 @@ The generative process of such a model can be written as:
$$
\begin{align}
-\pi_1 &\sim Beta(a, b) \\
+\pi_1 &\sim \mathrm{Beta}(a, b) \\
\pi_2 &= 1-\pi_1 \\
-\mu_1 &\sim Normal(\mu_0, \Sigma_0) \\
-\mu_2 &\sim Normal(\mu_0, \Sigma_0) \\
-z_i &\sim Categorical(\pi_1, \pi_2) \\
-x_i &\sim Normal(\mu_{z_i}, \Sigma)
+\mu_1 &\sim \mathrm{Normal}(\mu_0, \Sigma_0) \\
+\mu_2 &\sim \mathrm{Normal}(\mu_0, \Sigma_0) \\
+z_i &\sim \mathrm{Categorical}(\pi_1, \pi_2) \\
+x_i &\sim \mathrm{Normal}(\mu_{z_i}, \Sigma)
\end{align}
$$
@@ -36,41 +36,32 @@ where $$\pi_1, \pi_2$$ are the mixing weights of the mixture model, i.e. $$\pi_1
We can implement this model in Turing for 1D data as follows:
-
```julia
-@model two_model(x) = begin
-
+@model function two_model(x)
# Hyper-parameters
μ0 = 0.0
σ0 = 1.0
# Draw weights.
- π1 ~ Beta(1,1)
+ π1 ~ Beta(1,1)
π2 = 1-π1
# Draw locations of the components.
- μ1 ~ Normal(μ0, σ0)
- μ2 ~ Normal(μ0, σ0)
+ μ1 ~ Normal(μ0, σ0)
+ μ2 ~ Normal(μ0, σ0)
# Draw latent assignment.
- z ~ Categorical([π1, π2])
+ z ~ Categorical([π1, π2])
# Draw observation from selected component.
if z == 1
- x ~ Normal(μ1, 1.0)
+ x ~ Normal(μ1, 1.0)
else
- x ~ Normal(μ2, 1.0)
+ x ~ Normal(μ2, 1.0)
end
end
```
-
-
-
- DynamicPPL.ModelGen{var"###generator#282",(:x,),(),Tuple{}}(##generator#282, NamedTuple())
-
-
-
#### Finite Mixture Model
If we have more than two components, this model can elegantly be extend using a Dirichlet distribution as prior for the mixing weights $$\pi_1, \dots, \pi_K$$. Note that the Dirichlet distribution is the multivariate generalization of the beta distribution. The resulting model can be written as:
@@ -78,9 +69,9 @@ If we have more than two components, this model can elegantly be extend using a
$$
\begin{align}
(\pi_1, \dots, \pi_K) &\sim Dirichlet(K, \alpha) \\
-\mu_k &\sim Normal(\mu_0, \Sigma_0), \;\; \forall k \\
+\mu_k &\sim \mathrm{Normal}(\mu_0, \Sigma_0), \;\; \forall k \\
z &\sim Categorical(\pi_1, \dots, \pi_K) \\
-x &\sim Normal(\mu_z, \Sigma)
+x &\sim \mathrm{Normal}(\mu_z, \Sigma)
\end{align}
$$
@@ -101,9 +92,9 @@ We now will utilize the fact that one can integrate out the mixing weights in a
In fact, if the mixing weights are integrated out, the conditional prior for the latent variable $$z$$ is given by:
-\$\$
-p(z_i = k \mid z_{\not i}, \alpha) = \frac{n_k + \alpha/K}{N - 1 + \alpha}
-\$\$
+$$
+p(z_i = k \mid z_{\not i}, \alpha) = \frac{n_k + \alpha K}{N - 1 + \alpha}
+$$
where $$z_{\not i}$$ are the latent assignments of all observations except observation $$i$$. Note that we use $$n_k$$ to denote the number of observations at component $$k$$ excluding observation $$i$$. The parameter $$\alpha$$ is the concentration parameter of the Dirichlet distribution used as prior over the mixing weights.
@@ -113,15 +104,15 @@ To obtain the Chinese restaurant process construction, we can now derive the con
For $$n_k > 0$$ we obtain:
-\$\$
-p(z_i = k \mid z_{\not i}, \alpha) = \frac{n_k}{N - 1 + \alpha}
-\$\$
+$$
+p(z_i = k \mid z_{\not i}, \alpha) = \frac{n_k}{N - 1 + \alpha}
+$$
and for all infinitely many clusters that are empty (combined) we get:
-\$\$
+$$
p(z_i = k \mid z_{\not i}, \alpha) = \frac{\alpha}{N - 1 + \alpha}
-\$\$
+$$
Those equations show that the conditional prior for component assignments is proportional to the number of such observations, meaning that the Chinese restaurant process has a rich get richer property.
@@ -159,21 +150,14 @@ using Plots
# Plot the cluster assignments over time
@gif for i in 1:Nmax
scatter(collect(1:i), z[1:i], markersize = 2, xlabel = "observation (i)", ylabel = "cluster (k)", legend = false)
-end;
+end
```
- ┌ Info: Saved animation to
- │ fn = /home/cameron/code/TuringTutorials/tmp.gif
- └ @ Plots /home/cameron/.julia/packages/Plots/Xnzc7/src/animation.jl:104
-
-
-![tmp](https://user-images.githubusercontent.com/422990/55284032-90cfa980-5323-11e9-8a99-f9315db170cb.gif)
-
Further, we can see that the number of clusters is logarithmic in the number of observations and data points. This is a side-effect of the "rich-get-richer" phenomenon, i.e. we expect large clusters and thus the number of clusters has to be smaller than the number of observations.
-\$\$
-E[K \mid N] \approx \alpha \cdot log \big(1 + \frac{N}{\alpha}\big)
-\$\$
+$$
+\mathbb{E}[K \mid N] \approx \alpha \cdot log \big(1 + \frac{N}{\alpha}\big)
+$$
We can see from the equation that the concentration parameter $$\alpha$$ allows us to control the number of clusters formed *a priori*.
@@ -181,8 +165,7 @@ In Turing we can implement an infinite Gaussian mixture model using the Chinese
```julia
-@model infiniteGMM(x) = begin
-
+@model function infiniteGMM(x)
# Hyper-parameters, i.e. concentration parameter and parameters of H.
α = 1.0
μ0 = 0.0
@@ -207,14 +190,14 @@ In Turing we can implement an infinite Gaussian mixture model using the Chinese
nk = Vector{Int}(map(k -> sum(z .== k), 1:K))
# Draw the latent assignment.
- z[i] ~ ChineseRestaurantProcess(rpm, nk)
+ z[i] ~ ChineseRestaurantProcess(rpm, nk)
# Create a new cluster?
if z[i] > K
push!(μ, 0.0)
# Draw location of new cluster.
- μ[z[i]] ~ H
+ μ[z[i]] ~ H
end
# Draw observation.
@@ -223,13 +206,6 @@ In Turing we can implement an infinite Gaussian mixture model using the Chinese
end
```
-
-
-
- DynamicPPL.ModelGen{var"###generator#800",(:x,),(),Tuple{}}(##generator#800, NamedTuple())
-
-
-
We can now use Turing to infer the assignments of some data points. First, we will create some random data that comes from three clusters, with means of 0, -5, and 10.
@@ -254,60 +230,45 @@ model_fun = infiniteGMM(data);
chain = sample(model_fun, SMC(), iterations);
```
- [32mSampling: 100%|█████████████████████████████████████████| Time: 0:00:00[39m
-
-
Finally, we can plot the number of clusters in each sample.
```julia
# Extract the number of clusters for each sample of the Markov chain.
-k = map(t -> length(unique(chain[:z].value[t,:,:])), 1:iterations);
+k = map(
+ t -> length(unique(vec(chain[t, MCMCChains.namesingroup(chain, :z), :].value))),
+ 1:iterations
+);
# Visualize the number of clusters.
plot(k, xlabel = "Iteration", ylabel = "Number of clusters", label = "Chain 1")
```
-
-
-
-![svg](/tutorials/6_InfiniteMixtureModel_files/6_InfiniteMixtureModel_31_0.svg)
-
-
-
If we visualize the histogram of the number of clusters sampled from our posterior, we observe that the model seems to prefer 3 clusters, which is the true number of clusters. Note that the number of clusters in a Dirichlet process mixture model is not limited a priori and will grow to infinity with probability one. However, if conditioned on data the posterior will concentrate on a finite number of clusters enforcing the resulting model to have a finite amount of clusters. It is, however, not given that the posterior of a Dirichlet process Gaussian mixture model converges to the true number of clusters, given that data comes from a finite mixture model. See Jeffrey Miller and Matthew Harrison: [A simple example of Dirichlet process mixture inconsitency for the number of components](https://arxiv.org/pdf/1301.2708.pdf) for details.
-
```julia
histogram(k, xlabel = "Number of clusters", legend = false)
```
-
-
-
-![svg](/tutorials/6_InfiniteMixtureModel_files/6_InfiniteMixtureModel_33_0.svg)
-
-
-
One issue with the Chinese restaurant process construction is that the number of latent parameters we need to sample scales with the number of observations. It may be desirable to use alternative constructions in certain cases. Alternative methods of constructing a Dirichlet process can be employed via the following representations:
Size-Biased Sampling Process
-\$\$
-j_k \sim Beta(1, \alpha) * surplus
-\$\$
+$$
+j_k \sim \mathrm{Beta}(1, \alpha) \cdot \mathrm{surplus}
+$$
Stick-Breaking Process
-\$\$
-v_k \sim Beta(1, \alpha)
-\$\$
+$$
+v_k \sim \mathrm{Beta}(1, \alpha)
+$$
Chinese Restaurant Process
-\$\$
+$$
p(z_n = k | z_{1:n-1}) \propto \begin{cases}
\frac{m_k}{n-1+\alpha}, \text{ if } m_k > 0\\\
\frac{\alpha}{n-1+\alpha}
\end{cases}
-\$\$
+$$
For more details see [this article](https://www.stats.ox.ac.uk/~teh/research/npbayes/Teh2010a.pdf).
diff --git a/tutorials/non-parameteric/Project.toml b/tutorials/non-parameteric/Project.toml
new file mode 100644
index 000000000..dc3ff2545
--- /dev/null
+++ b/tutorials/non-parameteric/Project.toml
@@ -0,0 +1,4 @@
+[deps]
+Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
+Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
+Turing = "fce5fe82-541a-59a6-adf8-730c64b5f9a0"
diff --git a/tutorials/regression/01_logistic-regression.jmd b/tutorials/regression/01_logistic-regression.jmd
index 037ec08b2..7e4544379 100644
--- a/tutorials/regression/01_logistic-regression.jmd
+++ b/tutorials/regression/01_logistic-regression.jmd
@@ -33,17 +33,6 @@ Random.seed!(0);
Turing.turnprogress(false)
```
- ┌ Info: [Turing]: progress logging is disabled globally
- └ @ Turing /home/cameron/.julia/packages/Turing/cReBm/src/Turing.jl:22
-
-
-
-
-
- false
-
-
-
## Data Cleaning & Set Up
Now we're going to import our dataset. The first six rows of the dataset are shown below so you can get a good feel for what kind of data we have.
@@ -57,13 +46,6 @@ data = RDatasets.dataset("ISLR", "Default");
first(data, 6)
```
-
-
-
-
| Default | Student | Balance | Income |
---|
| Categorical… | Categorical… | Float64 | Float64 |
---|
6 rows × 4 columns
1 | No | No | 729.526 | 44361.6 |
---|
2 | No | Yes | 817.18 | 12106.1 |
---|
3 | No | No | 1073.55 | 31767.1 |
---|
4 | No | No | 529.251 | 35704.5 |
---|
5 | No | No | 785.656 | 38463.5 |
---|
6 | No | Yes | 919.589 | 7491.56 |
---|
-
-
-
Most machine learning processes require some effort to tidy up the data, and this is no different. We need to convert the `Default` and `Student` columns, which say "Yes" or "No" into 1s and 0s. Afterwards, we'll get rid of the old words-based columns.
@@ -78,14 +60,6 @@ select!(data, Not([:Default, :Student]))
# Show the first six rows of our edited dataset.
first(data, 6)
```
-
-
-
-
- | Balance | Income | DefaultNum | StudentNum |
---|
| Float64 | Float64 | Float64 | Float64 |
---|
6 rows × 4 columns
1 | 729.526 | 44361.6 | 0.0 | 0.0 |
---|
2 | 817.18 | 12106.1 | 0.0 | 1.0 |
---|
3 | 1073.55 | 31767.1 | 0.0 | 0.0 |
---|
4 | 529.251 | 35704.5 | 0.0 | 0.0 |
---|
5 | 785.656 | 38463.5 | 0.0 | 0.0 |
---|
6 | 919.589 | 7491.56 | 0.0 | 1.0 |
---|
-
-
-
After we've done that tidying, it's time to split our dataset into training and testing sets, and separate the labels from the data. We separate our data into two halves, `train` and `test`. You can use a higher percentage of splitting (or a lower one) by modifying the `at = 0.05` argument. We have highlighted the use of only a 5% sample to show the power of Bayesian inference with small sample sizes.
We must rescale our variables so that they are centered around zero by subtracting each column by the mean and dividing it by the standard deviation. Without this step, Turing's sampler will have a hard time finding a place to start searching for parameter estimates. To do this we will leverage `MLDataUtils`, which also lets us effortlessly shuffle our observations and perform a stratified split to get a representative test set.
@@ -164,30 +138,6 @@ chain = mapreduce(c -> sample(logistic_regression(train, train_label, n, 1), HMC
describe(chain)
```
-
-
-
- 2-element Array{ChainDataFrame,1}
-
- Summary Statistics
- parameters mean std naive_se mcse ess r_hat
- ────────── ─────── ────── ──────── ────── ───────── ──────
- balance 1.6517 0.3099 0.0046 0.0080 110.2122 1.0004
- income -0.5174 0.3241 0.0048 0.0081 1440.4337 1.0010
- intercept -3.8265 0.5148 0.0077 0.0148 54.8792 1.0004
- student -1.8662 0.6088 0.0091 0.0223 840.9122 1.0037
-
- Quantiles
- parameters 2.5% 25.0% 50.0% 75.0% 97.5%
- ────────── ─────── ─────── ─────── ─────── ───────
- balance 1.1418 1.4534 1.6331 1.8242 2.2196
- income -1.1678 -0.7300 -0.5094 -0.3006 0.1079
- intercept -4.6202 -4.0685 -3.7947 -3.5465 -3.0855
- student -3.0690 -2.2803 -1.8574 -1.4528 -0.7137
-
-
-
-
Since we ran multiple chains, we may as well do a spot check to make sure each chain converges around similar points.
@@ -195,13 +145,6 @@ Since we ran multiple chains, we may as well do a spot check to make sure each c
plot(chain)
```
-
-
-
-![svg](/tutorials/2_LogisticRegression_files/2_LogisticRegression_13_0.svg)
-
-
-
Looks good!
We can also use the `corner` function from MCMCChains to show the distributions of the various parameters of our logistic regression.
@@ -215,13 +158,6 @@ l = [:student, :balance, :income]
corner(chain, l)
```
-
-
-
-![svg](/tutorials/2_LogisticRegression_files/2_LogisticRegression_15_0.svg)
-
-
-
Fortunately the corner plot appears to demonstrate unimodal distributions for each of our parameters, so it should be straightforward to take the means of each parameter's sampled values to estimate our model to make predictions.
## Making Predictions
@@ -233,10 +169,10 @@ The `prediction` function below takes a `Matrix` and a `Chain` object. It takes
```julia
function prediction(x::Matrix, chain, threshold)
# Pull the means from each parameter's sampled values in the chain.
- intercept = mean(chain[:intercept].value)
- student = mean(chain[:student].value)
- balance = mean(chain[:balance].value)
- income = mean(chain[:income].value)
+ intercept = mean(chain[:intercept])
+ student = mean(chain[:student])
+ balance = mean(chain[:balance])
+ income = mean(chain[:income])
# Retrieve the number of rows.
n, _ = size(x)
@@ -271,13 +207,6 @@ predictions = prediction(test, chain, threshold)
loss = sum((predictions - test_label).^2) / length(test_label)
```
-
-
-
- 0.1163157894736842
-
-
-
Perhaps more important is to see what percentage of defaults we correctly predicted. The code below simply counts defaults and predictions and presents the results.
@@ -288,23 +217,15 @@ not_defaults = length(test_label) - defaults
predicted_defaults = sum(test_label .== predictions .== 1)
predicted_not_defaults = sum(test_label .== predictions .== 0)
-println("Defaults: $$defaults
- Predictions: $$predicted_defaults
- Percentage defaults correct $$(predicted_defaults/defaults)")
+println("Defaults: $defaults
+ Predictions: $predicted_defaults
+ Percentage defaults correct $(predicted_defaults/defaults)")
-println("Not defaults: $$not_defaults
- Predictions: $$predicted_not_defaults
- Percentage non-defaults correct $$(predicted_not_defaults/not_defaults)")
+println("Not defaults: $not_defaults
+ Predictions: $predicted_not_defaults
+ Percentage non-defaults correct $(predicted_not_defaults/not_defaults)")
```
- Defaults: 316.0
- Predictions: 265
- Percentage defaults correct 0.8386075949367089
- Not defaults: 9184.0
- Predictions: 8130
- Percentage non-defaults correct 0.8852351916376306
-
-
The above shows that with a threshold of 0.07, we correctly predict a respectable portion of the defaults, and correctly identify most non-defaults. This is fairly sensitive to a choice of threshold, and you may wish to experiment with it.
This tutorial has demonstrated how to use Turing to perform Bayesian logistic regression.
diff --git a/tutorials/regression/02_linear-regression.jmd b/tutorials/regression/02_linear-regression.jmd
index 72226a5da..f3f41c07e 100644
--- a/tutorials/regression/02_linear-regression.jmd
+++ b/tutorials/regression/02_linear-regression.jmd
@@ -34,20 +34,6 @@ Random.seed!(0)
Turing.turnprogress(false);
```
- ┌ Info: Precompiling Turing [fce5fe82-541a-59a6-adf8-730c64b5f9a0]
- └ @ Base loading.jl:1260
- ┌ Info: Precompiling RDatasets [ce6b1742-4840-55fa-b093-852dadbb1d8b]
- └ @ Base loading.jl:1260
- ┌ Info: Precompiling Plots [91a5bcdd-55d7-5caf-9e0b-520d859cae80]
- └ @ Base loading.jl:1260
- ┌ Info: Precompiling StatsPlots [f3b207a7-027a-5e70-b257-86293d7955fd]
- └ @ Base loading.jl:1260
- ┌ Info: Precompiling MLDataUtils [cc2ba9b6-d476-5e6d-8eaf-a92d5412d41d]
- └ @ Base loading.jl:1260
- ┌ Info: [Turing]: progress logging is disabled globally
- └ @ Turing /home/cameron/.julia/packages/Turing/GMBTf/src/Turing.jl:22
-
-
We will use the `mtcars` dataset from the [RDatasets](https://github.com/johnmyleswhite/RDatasets.jl) package. `mtcars` contains a variety of statistics on different car models, including their miles per gallon, number of cylinders, and horsepower, among others.
We want to know if we can construct a Bayesian linear regression model to predict the miles per gallon of a car, given the other statistics it has. Lets take a look at the data we have.
@@ -61,25 +47,10 @@ data = RDatasets.dataset("datasets", "mtcars");
first(data, 6)
```
-
-
-
- | Model | MPG | Cyl | Disp | HP | DRat | WT | QSec | VS |
---|
| String | Float64 | Int64 | Float64 | Int64 | Float64 | Float64 | Float64 | Int64 |
---|
6 rows × 12 columns (omitted printing of 3 columns)
1 | Mazda RX4 | 21.0 | 6 | 160.0 | 110 | 3.9 | 2.62 | 16.46 | 0 |
---|
2 | Mazda RX4 Wag | 21.0 | 6 | 160.0 | 110 | 3.9 | 2.875 | 17.02 | 0 |
---|
3 | Datsun 710 | 22.8 | 4 | 108.0 | 93 | 3.85 | 2.32 | 18.61 | 1 |
---|
4 | Hornet 4 Drive | 21.4 | 6 | 258.0 | 110 | 3.08 | 3.215 | 19.44 | 1 |
---|
5 | Hornet Sportabout | 18.7 | 8 | 360.0 | 175 | 3.15 | 3.44 | 17.02 | 0 |
---|
6 | Valiant | 18.1 | 6 | 225.0 | 105 | 2.76 | 3.46 | 20.22 | 1 |
---|
-
-
-
-
```julia
size(data)
```
-
-
-
- (32, 12)
-
-
-
The next step is to get our data ready for testing. We'll split the `mtcars` dataset into two subsets, one for training our model and one for evaluating our model. Then, we separate the targets we want to learn (`MPG`, in this case) and standardize the datasets by subtracting each column's means and dividing by the standard deviation of that column. The resulting data is not very familiar looking, but this standardization process helps the sampler converge far easier.
@@ -110,15 +81,15 @@ rescale!(test_target, μtarget, σtarget; obsdim = 1);
In a traditional frequentist model using [OLS](https://en.wikipedia.org/wiki/Ordinary_least_squares), our model might look like:
-\$\$
-MPG_i = \alpha + \boldsymbol{\beta}^\mathsf{T}\boldsymbol{X_i}
-\$\$
+$$
+\mathrm{MPG}_i = \alpha + \boldsymbol{\beta}^\mathsf{T}\boldsymbol{X_i}
+$$
where $$\boldsymbol{\beta}$$ is a vector of coefficients and $$\boldsymbol{X}$$ is a vector of inputs for observation $$i$$. The Bayesian model we are more concerned with is the following:
-\$\$
-MPG_i \sim \mathcal{N}(\alpha + \boldsymbol{\beta}^\mathsf{T}\boldsymbol{X_i}, \sigma^2)
-\$\$
+$$
+\mathrm{MPG}_i \sim \mathcal{N}(\alpha + \boldsymbol{\beta}^\mathsf{T}\boldsymbol{X_i}, \sigma^2)
+$$
where $$\alpha$$ is an intercept term common to all observations, $$\boldsymbol{\beta}$$ is a coefficient vector, $$\boldsymbol{X_i}$$ is the observed data for car $$i$$, and $$\sigma^2$$ is a common variance term.
@@ -144,32 +115,13 @@ For $$\sigma^2$$, we assign a prior of `truncated(Normal(0, 100), 0, Inf)`. This
end
```
-
-
-
- DynamicPPL.ModelGen{var"###generator#273",(:x, :y),(),Tuple{}}(##generator#273, NamedTuple())
-
-
-
With our model specified, we can call the sampler. We will use the No U-Turn Sampler ([NUTS](http://turing.ml/docs/library/#-turingnuts--type)) here.
-
```julia
model = linear_regression(train, train_target)
chain = sample(model, NUTS(0.65), 3_000);
```
- ┌ Info: Found initial step size
- │ ϵ = 1.6
- └ @ Turing.Inference /home/cameron/.julia/packages/Turing/GMBTf/src/inference/hmc.jl:629
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/P9wqk/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/P9wqk/src/hamiltonian.jl:47
-
-
As a visual check to confirm that our coefficients have converged, we show the densities and trace plots for our parameters using the `plot` functionality.
@@ -177,60 +129,12 @@ As a visual check to confirm that our coefficients have converged, we show the d
plot(chain)
```
-
-
-
-![svg](/tutorials/5_LinearRegression_files/5_LinearRegression_12_0.svg)
-
-
-
It looks like each of our parameters has converged. We can check our numerical esimates using `describe(chain)`, as below.
-
```julia
describe(chain)
```
-
-
-
- 2-element Array{ChainDataFrame,1}
-
- Summary Statistics
- parameters mean std naive_se mcse ess r_hat
- ──────────────── ─────── ────── ──────── ────── ──────── ──────
- coefficients[1] -0.0413 0.5648 0.0126 0.0389 265.1907 1.0010
- coefficients[2] 0.2770 0.6994 0.0156 0.0401 375.2777 1.0067
- coefficients[3] -0.4116 0.3850 0.0086 0.0160 695.3990 1.0032
- coefficients[4] 0.1805 0.2948 0.0066 0.0126 479.9290 1.0010
- coefficients[5] -0.2669 0.7168 0.0160 0.0316 373.0291 1.0009
- coefficients[6] 0.0256 0.3461 0.0077 0.0119 571.0954 1.0028
- coefficients[7] 0.0277 0.3899 0.0087 0.0174 637.1596 1.0007
- coefficients[8] 0.1535 0.3050 0.0068 0.0117 579.1998 1.0032
- coefficients[9] 0.1223 0.2839 0.0063 0.0105 587.6752 0.9995
- coefficients[10] -0.2839 0.3975 0.0089 0.0195 360.9612 1.0019
- intercept 0.0058 0.1179 0.0026 0.0044 580.0222 0.9995
- σ₂ 0.3017 0.1955 0.0044 0.0132 227.2322 1.0005
-
- Quantiles
- parameters 2.5% 25.0% 50.0% 75.0% 97.5%
- ──────────────── ─────── ─────── ─────── ─────── ──────
- coefficients[1] -1.0991 -0.4265 -0.0199 0.3244 1.1093
- coefficients[2] -1.1369 -0.1523 0.2854 0.7154 1.6488
- coefficients[3] -1.1957 -0.6272 -0.3986 -0.1800 0.3587
- coefficients[4] -0.3896 -0.0155 0.1663 0.3593 0.7818
- coefficients[5] -1.6858 -0.6835 -0.2683 0.1378 1.1995
- coefficients[6] -0.6865 -0.1672 0.0325 0.2214 0.7251
- coefficients[7] -0.7644 -0.1976 0.0090 0.2835 0.8185
- coefficients[8] -0.4980 -0.0194 0.1451 0.3428 0.7685
- coefficients[9] -0.4643 -0.0294 0.1237 0.2807 0.7218
- coefficients[10] -1.0898 -0.5091 -0.2846 -0.0413 0.5163
- intercept -0.2240 -0.0671 0.0083 0.0746 0.2364
- σ₂ 0.1043 0.1860 0.2525 0.3530 0.8490
-
-
-
-
## Comparing to OLS
A satisfactory test of our model is to evaluate how well it predicts. Importantly, we want to compare our model to existing tools like OLS. The code below uses the [GLM.jl]() package to generate a traditional OLS multiple regression model on the same data as our probabalistic model.
@@ -256,10 +160,6 @@ p = GLM.predict(ols, test_with_intercept)
test_prediction_ols = μtarget .+ σtarget .* p;
```
- ┌ Info: Precompiling GLM [38e38edf-8417-5370-95a0-9cbb8c7f171a]
- └ @ Base loading.jl:1260
-
-
The function below accepts a chain and an input matrix and calculates predictions. We use the samples of the model parameters in the chain starting with sample 200, which is where the warm-up period for the NUTS sampler ended.
@@ -272,16 +172,8 @@ function prediction(chain, x)
end
```
-
-
-
- prediction (generic function with 1 method)
-
-
-
When we make predictions, we unstandardize them so they are more understandable.
-
```julia
# Calculate the predictions for the training and testing sets
# and unstandardize them.
@@ -298,20 +190,12 @@ DataFrame(
)
```
-
-
-
- | MPG | Bayes | OLS |
---|
| Float64 | Float64 | Float64 |
---|
10 rows × 3 columns
1 | 19.2 | 18.3766 | 18.1265 |
---|
2 | 15.0 | 6.4176 | 6.37891 |
---|
3 | 16.4 | 13.9125 | 13.883 |
---|
4 | 14.3 | 11.8393 | 11.7337 |
---|
5 | 21.4 | 25.3622 | 25.1916 |
---|
6 | 18.1 | 20.7687 | 20.672 |
---|
7 | 19.7 | 16.03 | 15.8408 |
---|
8 | 15.2 | 18.2903 | 18.3391 |
---|
9 | 26.0 | 28.5191 | 28.4865 |
---|
10 | 17.3 | 14.498 | 14.534 |
---|
-
-
-
Now let's evaluate the loss for each method, and each prediction set. We will use the mean squared error to evaluate loss, given by
-\$\$
-\text{MSE} = \frac{1}{n} \sum_{i=1}^n {(y_i - \hat{y_i})^2}
-\$\$
+$$
+\mathrm{MSE} = \frac{1}{n} \sum_{i=1}^n {(y_i - \hat{y_i})^2}
+$$
where $$y_i$$ is the actual value (true MPG) and $$\hat{y_i}$$ is the predicted value using either OLS or Bayesian linear regression. A lower SSE indicates a closer fit to the data.
-
```julia
println(
"Training set:",
@@ -330,12 +214,4 @@ println(
)
```
- Training set:
- Bayes loss: 4.664508273535872
- OLS loss: 4.648142085690519
- Test set:
- Bayes loss: 14.66153554719035
- OLS loss: 14.796847779051628
-
-
As we can see above, OLS and our Bayesian model fit our training and test data set about the same.
diff --git a/tutorials/regression/03_poisson-regression.jmd b/tutorials/regression/03_poisson-regression.jmd
index 2ca7349bb..db0dae1a6 100644
--- a/tutorials/regression/03_poisson-regression.jmd
+++ b/tutorials/regression/03_poisson-regression.jmd
@@ -26,17 +26,6 @@ Random.seed!(12);
Turing.turnprogress(false)
```
- ┌ Info: [Turing]: progress logging is disabled globally
- └ @ Turing /home/cameron/.julia/packages/Turing/cReBm/src/Turing.jl:22
-
-
-
-
-
- false
-
-
-
# Generating data
We start off by creating a toy dataset. We take the case of a person who takes medicine to prevent excessive sneezing. Alcohol consumption increases the rate of sneezing for that person. Thus, the two factors affecting the number of sneezes in a given day are alcohol consumption and whether the person has taken his medicine. Both these variable are taken as boolean valued while the number of sneezes will be a count valued variable. We also take into consideration that the interaction between the two boolean variables will affect the number of sneezes
@@ -66,13 +55,6 @@ df = DataFrame(nsneeze = nsneeze_data, alcohol_taken = alcohol_data, nomeds_take
df[sample(1:nrow(df), 5, replace = false), :]
```
-
-
-
- | nsneeze | alcohol_taken | nomeds_taken | product_alcohol_meds |
---|
| Int64 | Float64 | Float64 | Float64 |
---|
5 rows × 4 columns
1 | 5 | 0.0 | 0.0 | 0.0 |
---|
2 | 5 | 1.0 | 0.0 | 0.0 |
---|
3 | 8 | 1.0 | 0.0 | 0.0 |
---|
4 | 1 | 0.0 | 0.0 | 0.0 |
---|
5 | 38 | 1.0 | 1.0 | 1.0 |
---|
-
-
-
# Visualisation of the dataset
We plot the distribution of the number of sneezes for the 4 different cases taken above. As expected, the person sneezes the most when he has taken alcohol and not taken his medicine. He sneezes the least when he doesn't consume alcohol and takes his medicine.
@@ -87,13 +69,6 @@ p4 = Plots.histogram((df[(df[:,:alcohol_taken] .== 1) .& (df[:,:nomeds_taken] .=
plot(p1, p2, p3, p4, layout = (2, 2), legend = false)
```
-
-
-
-![svg](/tutorials/7_PoissonRegression_files/7_PoissonRegression_5_0.svg)
-
-
-
We must convert our `DataFrame` data into the `Matrix` form as the manipulations that we are about are designed to work with `Matrix` data. We also separate the features from the labels which will be later used by the Turing sampler to generate samples from the posterior.
@@ -104,39 +79,6 @@ data_labels = df[:,:nsneeze]
data
```
-
-
-
- 400×3 Array{Float64,2}:
- 0.0 0.0 0.0
- 0.0 0.0 0.0
- 0.0 0.0 0.0
- 0.0 0.0 0.0
- 0.0 0.0 0.0
- 0.0 0.0 0.0
- 0.0 0.0 0.0
- 0.0 0.0 0.0
- 0.0 0.0 0.0
- 0.0 0.0 0.0
- 0.0 0.0 0.0
- 0.0 0.0 0.0
- 0.0 0.0 0.0
- ⋮
- 1.0 1.0 1.0
- 1.0 1.0 1.0
- 1.0 1.0 1.0
- 1.0 1.0 1.0
- 1.0 1.0 1.0
- 1.0 1.0 1.0
- 1.0 1.0 1.0
- 1.0 1.0 1.0
- 1.0 1.0 1.0
- 1.0 1.0 1.0
- 1.0 1.0 1.0
- 1.0 1.0 1.0
-
-
-
We must recenter our data about 0 to help the Turing sampler in initialising the parameter estimates. So, normalising the data in each column by subtracting the mean and dividing by the standard deviation:
@@ -145,39 +87,6 @@ We must recenter our data about 0 to help the Turing sampler in initialising the
data = (data .- mean(data, dims=1)) ./ std(data, dims=1)
```
-
-
-
- 400×3 Array{Float64,2}:
- -0.998749 -0.998749 -0.576628
- -0.998749 -0.998749 -0.576628
- -0.998749 -0.998749 -0.576628
- -0.998749 -0.998749 -0.576628
- -0.998749 -0.998749 -0.576628
- -0.998749 -0.998749 -0.576628
- -0.998749 -0.998749 -0.576628
- -0.998749 -0.998749 -0.576628
- -0.998749 -0.998749 -0.576628
- -0.998749 -0.998749 -0.576628
- -0.998749 -0.998749 -0.576628
- -0.998749 -0.998749 -0.576628
- -0.998749 -0.998749 -0.576628
- ⋮
- 0.998749 0.998749 1.72988
- 0.998749 0.998749 1.72988
- 0.998749 0.998749 1.72988
- 0.998749 0.998749 1.72988
- 0.998749 0.998749 1.72988
- 0.998749 0.998749 1.72988
- 0.998749 0.998749 1.72988
- 0.998749 0.998749 1.72988
- 0.998749 0.998749 1.72988
- 0.998749 0.998749 1.72988
- 0.998749 0.998749 1.72988
- 0.998749 0.998749 1.72988
-
-
-
# Declaring the Model: Poisson Regression
Our model, `poisson_regression` takes four arguments:
@@ -228,110 +137,6 @@ chain = mapreduce(
1:num_chains);
```
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Info: Found initial step size
- │ ϵ = 2.384185791015625e-8
- └ @ Turing.Inference /home/cameron/.julia/packages/Turing/cReBm/src/inference/hmc.jl:556
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Info: Found initial step size
- │ ϵ = 0.00078125
- └ @ Turing.Inference /home/cameron/.julia/packages/Turing/cReBm/src/inference/hmc.jl:556
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
- ┌ Info: Found initial step size
- │ ϵ = 0.000390625
- └ @ Turing.Inference /home/cameron/.julia/packages/Turing/cReBm/src/inference/hmc.jl:556
- ┌ Info: Found initial step size
- │ ϵ = 0.05
- └ @ Turing.Inference /home/cameron/.julia/packages/Turing/cReBm/src/inference/hmc.jl:556
- ┌ Warning: The current proposal will be rejected due to numerical error(s).
- │ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
- └ @ AdvancedHMC /home/cameron/.julia/packages/AdvancedHMC/WJCQA/src/hamiltonian.jl:47
-
-
# Viewing the Diagnostics
We use the Gelman, Rubin, and Brooks Diagnostic to check whether our chains have converged. Note that we require multiple chains to use this diagnostic which analyses the difference between these multiple chains.
@@ -342,20 +147,6 @@ We expect the chains to have converged. This is because we have taken sufficient
gelmandiag(chain)
```
-
-
-
- Gelman, Rubin, and Brooks Diagnostic
- parameters PSRF 97.5%
- ────────── ────── ──────
- b0 1.1861 1.3924
- b1 1.1307 1.2582
- b2 1.1350 1.2865
- b3 1.0660 1.1118
-
-
-
-
From the above diagnostic, we can conclude that the chains have converged because the PSRF values of the coefficients are close to 1.
So, we have obtained the posterior distributions of the parameters. We transform the coefficients and recover theta values by taking the exponent of the meaned values of the coefficients `b0`, `b1`, `b2` and `b3`. We take the exponent of the means to get a better comparison of the relative values of the coefficients. We then compare this with the intuitive meaning that was described earlier.
@@ -366,38 +157,22 @@ So, we have obtained the posterior distributions of the parameters. We transform
c1 = chain[:,:,1]
# Calculating the exponentiated means
-b0_exp = exp(mean(c1[:b0].value))
-b1_exp = exp(mean(c1[:b1].value))
-b2_exp = exp(mean(c1[:b2].value))
-b3_exp = exp(mean(c1[:b3].value))
+b0_exp = exp(mean(c1[:b0]))
+b1_exp = exp(mean(c1[:b1]))
+b2_exp = exp(mean(c1[:b2]))
+b3_exp = exp(mean(c1[:b3]))
print("The exponent of the meaned values of the weights (or coefficients are): \n")
print("b0: ", b0_exp, " \n", "b1: ", b1_exp, " \n", "b2: ", b2_exp, " \n", "b3: ", b3_exp, " \n")
print("The posterior distributions obtained after sampling can be visualised as :\n")
```
- The exponent of the meaned values of the weights (or coefficients are):
- b0: 5.116678482496325
- b1: 1.8791946940293356
- b2: 2.5245646467859904
- b3: 1.3005130214177183
- The posterior distributions obtained after sampling can be visualised as :
-
-
- Visualising the posterior by plotting it:
-
+Visualising the posterior by plotting it:
```julia
plot(chain)
```
-
-
-
-![svg](/tutorials/7_PoissonRegression_files/7_PoissonRegression_19_0.svg)
-
-
-
# Interpreting the Obtained Mean Values
The exponentiated mean of the coefficient `b1` is roughly half of that of `b2`. This makes sense because in the data that we generated, the number of sneezes was more sensitive to the medicinal intake as compared to the alcohol consumption. We also get a weaker dependence on the interaction between the alcohol consumption and the medicinal intake as can be seen from the value of `b3`.
@@ -413,73 +188,14 @@ To remove these warmup values, we take all values except the first 200. This is
describe(chain)
```
-
-
-
- 2-element Array{ChainDataFrame,1}
-
- Summary Statistics
- parameters mean std naive_se mcse ess r_hat
- ────────── ────── ────── ──────── ────── ─────── ──────
- b0 1.2639 2.1637 0.0216 0.2114 42.6654 1.0565
- b1 0.7091 0.8433 0.0084 0.0728 41.7860 1.0620
- b2 1.1998 1.7572 0.0176 0.1676 42.5718 1.0675
- b3 0.2357 0.7392 0.0074 0.0596 91.3888 1.0240
-
- Quantiles
- parameters 2.5% 25.0% 50.0% 75.0% 97.5%
- ────────── ─────── ────── ────── ────── ──────
- b0 -4.7815 1.6189 1.6409 1.6624 1.7026
- b1 0.4366 0.5151 0.5548 0.5986 3.7771
- b2 0.7707 0.8461 0.8848 0.9259 8.4861
- b3 -1.7651 0.2497 0.2882 0.3275 0.4136
-
-
-
-
-
```julia
# Removing the first 200 values of the chains.
chains_new = chain[201:2500,:,:]
describe(chains_new)
```
-
-
-
- 2-element Array{ChainDataFrame,1}
-
- Summary Statistics
- parameters mean std naive_se mcse ess r_hat
- ────────── ────── ────── ──────── ────── ─────── ──────
- b0 1.6378 0.0823 0.0009 0.0055 46.6518 1.0182
- b1 0.5639 0.1729 0.0018 0.0117 45.5782 1.0196
- b2 0.8932 0.1727 0.0018 0.0118 45.0961 1.0195
- b3 0.2798 0.1544 0.0016 0.0104 46.0058 1.0195
-
- Quantiles
- parameters 2.5% 25.0% 50.0% 75.0% 97.5%
- ────────── ────── ────── ────── ────── ──────
- b0 1.5791 1.6226 1.6427 1.6637 1.7024
- b1 0.4413 0.5142 0.5516 0.5919 0.6726
- b2 0.7764 0.8448 0.8819 0.9187 0.9973
- b3 0.1785 0.2544 0.2893 0.3266 0.3942
-
-
-
-
-Visualising the new posterior by plotting it:
-
-
```julia
plot(chains_new)
```
-
-
-
-![svg](/tutorials/7_PoissonRegression_files/7_PoissonRegression_25_0.svg)
-
-
-
As can be seen from the numeric values and the plots above, the standard deviation values have decreased and all the plotted values are from the estimated posteriors. The exponentiated mean values, with the warmup samples removed, have not changed by much and they are still in accordance with their intuitive meanings as described earlier.
diff --git a/tutorials/regression/04_multinomial-logistic-regression.jmd b/tutorials/regression/04_multinomial-logistic-regression.jmd
index 4440737ea..2bf5934ad 100644
--- a/tutorials/regression/04_multinomial-logistic-regression.jmd
+++ b/tutorials/regression/04_multinomial-logistic-regression.jmd
@@ -33,25 +33,10 @@ Random.seed!(0)
Turing.turnprogress(false);
```
- ┌ Info: Precompiling Turing [fce5fe82-541a-59a6-adf8-730c64b5f9a0]
- └ @ Base loading.jl:1260
- ┌ Info: Precompiling RDatasets [ce6b1742-4840-55fa-b093-852dadbb1d8b]
- └ @ Base loading.jl:1260
- ┌ Info: Precompiling StatsPlots [f3b207a7-027a-5e70-b257-86293d7955fd]
- └ @ Base loading.jl:1260
- ┌ Info: Precompiling MLDataUtils [cc2ba9b6-d476-5e6d-8eaf-a92d5412d41d]
- └ @ Base loading.jl:1260
- ┌ Info: [Turing]: progress logging is disabled globally
- └ @ Turing /home/cameron/.julia/packages/Turing/3goIa/src/Turing.jl:23
- ┌ Info: [AdvancedVI]: global PROGRESS is set as false
- └ @ AdvancedVI /home/cameron/.julia/packages/AdvancedVI/PaSeO/src/AdvancedVI.jl:15
-
-
## Data Cleaning & Set Up
Now we're going to import our dataset. Twenty rows of the dataset are shown below so you can get a good feel for what kind of data we have.
-
```julia
# Import the "iris" dataset.
data = RDatasets.dataset("datasets", "iris");
@@ -60,16 +45,8 @@ data = RDatasets.dataset("datasets", "iris");
data[rand(1:size(data, 1), 20), :]
```
-
-
-
- | SepalLength | SepalWidth | PetalLength | PetalWidth | Species |
---|
| Float64 | Float64 | Float64 | Float64 | Cat… |
---|
20 rows × 5 columns
1 | 5.6 | 2.9 | 3.6 | 1.3 | versicolor |
---|
2 | 5.8 | 2.7 | 3.9 | 1.2 | versicolor |
---|
3 | 5.5 | 2.3 | 4.0 | 1.3 | versicolor |
---|
4 | 6.7 | 3.3 | 5.7 | 2.5 | virginica |
---|
5 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
---|
6 | 5.1 | 3.8 | 1.5 | 0.3 | setosa |
---|
7 | 4.8 | 3.4 | 1.9 | 0.2 | setosa |
---|
8 | 6.0 | 2.9 | 4.5 | 1.5 | versicolor |
---|
9 | 6.9 | 3.1 | 5.4 | 2.1 | virginica |
---|
10 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
---|
11 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
---|
12 | 5.7 | 3.0 | 4.2 | 1.2 | versicolor |
---|
13 | 5.0 | 3.3 | 1.4 | 0.2 | setosa |
---|
14 | 7.7 | 3.0 | 6.1 | 2.3 | virginica |
---|
15 | 5.8 | 2.8 | 5.1 | 2.4 | virginica |
---|
16 | 4.4 | 3.0 | 1.3 | 0.2 | setosa |
---|
17 | 6.3 | 3.3 | 4.7 | 1.6 | versicolor |
---|
18 | 6.0 | 2.7 | 5.1 | 1.6 | versicolor |
---|
19 | 4.6 | 3.4 | 1.4 | 0.3 | setosa |
---|
20 | 6.0 | 2.2 | 4.0 | 1.0 | versicolor |
---|
-
-
-
In this data set, the outcome `Species` is currently coded as a string. We convert it to a numerical value by using indices `1`, `2`, and `3` to indicate species `setosa`, `versicolor`, and `virginica`, respectively.
-
```julia
# Recode the `Species` column.
species = ["setosa", "versicolor", "virginica"]
@@ -79,13 +56,6 @@ data[!, :Species_index] = indexin(data[!, :Species], species)
data[rand(1:size(data, 1), 20), [:Species, :Species_index]]
```
-
-
-
- | Species | Species_index |
---|
| Cat… | Union… |
---|
20 rows × 2 columns
1 | versicolor | 2 |
---|
2 | setosa | 1 |
---|
3 | setosa | 1 |
---|
4 | versicolor | 2 |
---|
5 | setosa | 1 |
---|
6 | virginica | 3 |
---|
7 | versicolor | 2 |
---|
8 | versicolor | 2 |
---|
9 | setosa | 1 |
---|
10 | virginica | 3 |
---|
11 | setosa | 1 |
---|
12 | setosa | 1 |
---|
13 | versicolor | 2 |
---|
14 | versicolor | 2 |
---|
15 | virginica | 3 |
---|
16 | versicolor | 2 |
---|
17 | setosa | 1 |
---|
18 | setosa | 1 |
---|
19 | virginica | 3 |
---|
20 | setosa | 1 |
---|
-
-
-
After we've done that tidying, it's time to split our dataset into training and testing sets, and separate the features and target from the data. Additionally, we must rescale our feature variables so that they are centered around zero by subtracting each column by the mean and dividing it by the standard deviation. Without this step, Turing's sampler will have a hard time finding a place to start searching for parameter estimates.
@@ -151,99 +121,30 @@ Now we can run our sampler. This time we'll use [`HMC`](http://turing.ml/docs/li
chain = sample(logistic_regression(train_features, train_target, 1), HMC(0.05, 10), MCMCThreads(), 1500, 3)
```
-
-
-
- Chains MCMC chain (1500×19×3 Array{Float64,3}):
-
- Iterations = 1:1500
- Thinning interval = 1
- Chains = 1, 2, 3
- Samples per chain = 1500
- parameters = coefficients_versicolor[1], coefficients_versicolor[2], coefficients_versicolor[3], coefficients_versicolor[4], coefficients_virginica[1], coefficients_virginica[2], coefficients_virginica[3], coefficients_virginica[4], intercept_versicolor, intercept_virginica
- internals = acceptance_rate, hamiltonian_energy, hamiltonian_energy_error, is_accept, log_density, lp, n_steps, nom_step_size, step_size
-
- Summary Statistics
- [0m[1m parameters [0m [0m[1m mean [0m [0m[1m std [0m [0m[1m naive_se [0m [0m[1m mcse [0m [0m[1m ess [0m [0m[1m rhat [0m [0m
- [0m[90m Symbol [0m [0m[90m Float64 [0m [0m[90m Float64 [0m [0m[90m Float64 [0m [0m[90m Float64 [0m [0m[90m Float64 [0m [0m[90m Float64 [0m [0m
- [0m [0m [0m [0m [0m [0m [0m [0m [0m [0m [0m [0m [0m [0m [0m
- [0m coefficients_versicolor[1] [0m [0m 1.5404 [0m [0m 0.6753 [0m [0m 0.0101 [0m [0m 0.0335 [0m [0m 332.4769 [0m [0m 1.0017 [0m [0m
- [0m coefficients_versicolor[2] [0m [0m -1.4298 [0m [0m 0.5098 [0m [0m 0.0076 [0m [0m 0.0171 [0m [0m 786.5622 [0m [0m 1.0015 [0m [0m
- [0m coefficients_versicolor[3] [0m [0m 1.1382 [0m [0m 0.7772 [0m [0m 0.0116 [0m [0m 0.0398 [0m [0m 328.8508 [0m [0m 1.0091 [0m [0m
- [0m coefficients_versicolor[4] [0m [0m 0.0693 [0m [0m 0.7300 [0m [0m 0.0109 [0m [0m 0.0374 [0m [0m 368.3007 [0m [0m 1.0048 [0m [0m
- [0m coefficients_virginica[1] [0m [0m 0.4251 [0m [0m 0.6983 [0m [0m 0.0104 [0m [0m 0.0294 [0m [0m 381.6545 [0m [0m 1.0017 [0m [0m
- [0m coefficients_virginica[2] [0m [0m -0.6744 [0m [0m 0.6036 [0m [0m 0.0090 [0m [0m 0.0250 [0m [0m 654.1030 [0m [0m 1.0012 [0m [0m
- [0m coefficients_virginica[3] [0m [0m 2.0076 [0m [0m 0.8424 [0m [0m 0.0126 [0m [0m 0.0390 [0m [0m 344.6077 [0m [0m 1.0067 [0m [0m
- [0m coefficients_virginica[4] [0m [0m 2.6704 [0m [0m 0.7982 [0m [0m 0.0119 [0m [0m 0.0423 [0m [0m 337.9600 [0m [0m 1.0043 [0m [0m
- [0m intercept_versicolor [0m [0m 0.8408 [0m [0m 0.5257 [0m [0m 0.0078 [0m [0m 0.0167 [0m [0m 874.4821 [0m [0m 1.0044 [0m [0m
- [0m intercept_virginica [0m [0m -0.7351 [0m [0m 0.6639 [0m [0m 0.0099 [0m [0m 0.0285 [0m [0m 525.8135 [0m [0m 1.0039 [0m [0m
-
- Quantiles
- [0m[1m parameters [0m [0m[1m 2.5% [0m [0m[1m 25.0% [0m [0m[1m 50.0% [0m [0m[1m 75.0% [0m [0m[1m 97.5% [0m [0m
- [0m[90m Symbol [0m [0m[90m Float64 [0m [0m[90m Float64 [0m [0m[90m Float64 [0m [0m[90m Float64 [0m [0m[90m Float64 [0m [0m
- [0m [0m [0m [0m [0m [0m [0m [0m [0m [0m [0m [0m [0m
- [0m coefficients_versicolor[1] [0m [0m 0.2659 [0m [0m 1.0755 [0m [0m 1.5231 [0m [0m 1.9860 [0m [0m 2.9059 [0m [0m
- [0m coefficients_versicolor[2] [0m [0m -2.4714 [0m [0m -1.7610 [0m [0m -1.4109 [0m [0m -1.0749 [0m [0m -0.4921 [0m [0m
- [0m coefficients_versicolor[3] [0m [0m -0.4377 [0m [0m 0.6358 [0m [0m 1.1456 [0m [0m 1.6500 [0m [0m 2.6215 [0m [0m
- [0m coefficients_versicolor[4] [0m [0m -1.3741 [0m [0m -0.4381 [0m [0m 0.0652 [0m [0m 0.5711 [0m [0m 1.4808 [0m [0m
- [0m coefficients_virginica[1] [0m [0m -0.9452 [0m [0m -0.0487 [0m [0m 0.4287 [0m [0m 0.8991 [0m [0m 1.7973 [0m [0m
- [0m coefficients_virginica[2] [0m [0m -1.8717 [0m [0m -1.0756 [0m [0m -0.6641 [0m [0m -0.2501 [0m [0m 0.4867 [0m [0m
- [0m coefficients_virginica[3] [0m [0m 0.3740 [0m [0m 1.4180 [0m [0m 1.9941 [0m [0m 2.5862 [0m [0m 3.6788 [0m [0m
- [0m coefficients_virginica[4] [0m [0m 1.1985 [0m [0m 2.1347 [0m [0m 2.6359 [0m [0m 3.1795 [0m [0m 4.3502 [0m [0m
- [0m intercept_versicolor [0m [0m -0.1652 [0m [0m 0.4888 [0m [0m 0.8340 [0m [0m 1.1858 [0m [0m 1.8891 [0m [0m
- [0m intercept_virginica [0m [0m -2.0101 [0m [0m -1.1944 [0m [0m -0.7453 [0m [0m -0.2834 [0m [0m 0.5836 [0m [0m
-
-
-
-
Since we ran multiple chains, we may as well do a spot check to make sure each chain converges around similar points.
-
```julia
plot(chain)
```
-
-
-
-![svg](/tutorials/8_MultinomialLogisticRegression_files/8_MultinomialLogisticRegression_13_0.svg)
-
-
-
Looks good!
We can also use the `corner` function from MCMCChains to show the distributions of the various parameters of our multinomial logistic regression. The corner function requires MCMCChains and StatsPlots.
-
```julia
corner(
- chain, [Symbol("coefficients_versicolor[$$i]") for i in 1:4];
- label=[string(i) for i in 1:4], fmt=:png
+ chain, MCMCChains.namesingroup(chain, :coefficients_versicolor);
+ label=[string(i) for i in 1:4]
)
```
-
-
-
-![png](/tutorials/8_MultinomialLogisticRegression_files/8_MultinomialLogisticRegression_15_0.png)
-
-
-
-
```julia
corner(
- chain, [Symbol("coefficients_virginica[$$i]") for i in 1:4];
- label=[string(i) for i in 1:4], fmt=:png
+ chain, MCMCChains.namesingroup(chain, :coefficients_virginica);
+ label=[string(i) for i in 1:4]
)
```
-
-
-
-![png](/tutorials/8_MultinomialLogisticRegression_files/8_MultinomialLogisticRegression_16_0.png)
-
-
-
Fortunately the corner plots appear to demonstrate unimodal distributions for each of our parameters, so it should be straightforward to take the means of each parameter's sampled values to estimate our model to make predictions.
## Making Predictions
@@ -258,8 +159,14 @@ function prediction(x::Matrix, chain)
# Pull the means from each parameter's sampled values in the chain.
intercept_versicolor = mean(chain, :intercept_versicolor)
intercept_virginica = mean(chain, :intercept_virginica)
- coefficients_versicolor = [mean(chain, "coefficients_versicolor[$$i]") for i in 1:4]
- coefficients_virginica = [mean(chain, "coefficients_virginica[$$i]") for i in 1:4]
+ coefficients_versicolor = [
+ mean(chain, k) for k in
+ MCMCChains.namesingroup(chain, :coefficients_versicolor)
+ ]
+ coefficients_virginica = [
+ mean(chain, k) for k in
+ MCMCChains.namesingroup(chain, :coefficients_virginica)
+ ]
# Compute the index of the species with the highest probability for each observation.
values_versicolor = intercept_versicolor .+ x * coefficients_versicolor
@@ -281,16 +188,8 @@ predictions = prediction(test_features, chain)
mean(predictions .== testset[!, :Species_index])
```
-
-
-
- 0.8533333333333334
-
-
-
Perhaps more important is to see the accuracy per class.
-
```julia
for s in 1:3
rows = testset[!, :Species_index] .== s
@@ -300,12 +199,4 @@ for s in 1:3
end
```
- Number of `setosa`: 22
- Percentage of `setosa` predicted correctly: 1.0
- Number of `versicolor`: 24
- Percentage of `versicolor` predicted correctly: 0.875
- Number of `virginica`: 29
- Percentage of `virginica` predicted correctly: 0.7241379310344828
-
-
This tutorial has demonstrated how to use Turing to perform Bayesian multinomial logistic regression.
diff --git a/tutorials/regression/Project.toml b/tutorials/regression/Project.toml
new file mode 100644
index 000000000..51d942647
--- /dev/null
+++ b/tutorials/regression/Project.toml
@@ -0,0 +1,14 @@
+[deps]
+DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
+Distances = "b4f34e82-e78d-54a5-968a-f98e89d6e8f7"
+Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
+GLM = "38e38edf-8417-5370-95a0-9cbb8c7f171a"
+MCMCChains = "c7f686f2-ff18-58e9-bc7b-31028e88f75d"
+MLDataUtils = "cc2ba9b6-d476-5e6d-8eaf-a92d5412d41d"
+NNlib = "872c559c-99b0-510c-b3b7-b6c96a88d5cd"
+Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
+RDatasets = "ce6b1742-4840-55fa-b093-852dadbb1d8b"
+Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
+StatsFuns = "4c63d2b9-4356-54db-8cca-17b64c39e42c"
+StatsPlots = "f3b207a7-027a-5e70-b257-86293d7955fd"
+Turing = "fce5fe82-541a-59a6-adf8-730c64b5f9a0"
diff --git a/tutorials/variational-inference/01_variational-inference.jmd b/tutorials/variational-inference/01_variational-inference.jmd
index f025b8775..d65e01cd2 100644
--- a/tutorials/variational-inference/01_variational-inference.jmd
+++ b/tutorials/variational-inference/01_variational-inference.jmd
@@ -8,7 +8,7 @@ In this post we'll have a look at what's know as **variational inference (VI)**,
Here we will focus on how to use VI in Turing and not much on the theory underlying VI. If you're interested in understanding the mathematics you can checkout [our write-up](../../docs/for-developers/variational_inference) or any other resource online (there a lot of great ones).
Using VI in Turing.jl is very straight forward. If `model` denotes a definition of a `Turing.Model`, performing VI is as simple as
-```julia
+```julia; eval = false
m = model(data...) # instantiate model on the data
q = vi(m, vi_alg) # perform VI on `m` using the VI method `vi_alg`, which returns a `VariationalPosterior`
```
@@ -18,7 +18,6 @@ To get a bit more into what we can do with `vi`, we'll first have a look at a si
## Setup
-
```julia
using Random
using Turing
@@ -41,15 +40,11 @@ Recall that *conjugate* refers to the fact that we can obtain a closed-form expr
First we generate some synthetic data, define the `Turing.Model` and instantiate the model on the data:
-
```julia
-# generate data, n = 2000
+# generate data
x = randn(2000);
```
-
-
-
```julia
@model model(x) = begin
s ~ InverseGamma(2, 3)
@@ -57,19 +52,11 @@ x = randn(2000);
for i = 1:length(x)
x[i] ~ Normal(m, sqrt(s))
end
-end
+end;
```
-
-
-
- ##model#344 (generic function with 2 methods)
-
-
-
-
```julia
-# construct model
+# Instantiate model
m = model(x);
```
@@ -84,12 +71,6 @@ If you don't understand what "adaptation" or "target acceptance rate" refers to,
samples_nuts = sample(m, NUTS(200, 0.65), 10000);
```
- ┌ Info: Found initial step size
- │ ϵ = 0.025
- └ @ Turing.Inference /home/cameron/.julia/packages/Turing/cReBm/src/inference/hmc.jl:556
- [32mSampling: 100%|█████████████████████████████████████████| Time: 0:00:02[39m
-
-
Now let's try VI. The most important function you need to now about to do VI in Turing is `vi`:
@@ -97,25 +78,6 @@ Now let's try VI. The most important function you need to now about to do VI in
print(@doc(Variational.vi))
```
- ```
- vi(model, alg::VariationalInference)
- vi(model, alg::VariationalInference, q::VariationalPosterior)
- vi(model, alg::VariationalInference, getq::Function, θ::AbstractArray)
- ```
-
- Constructs the variational posterior from the `model` and performs the optimization following the configuration of the given `VariationalInference` instance.
-
- # Arguments
-
- * `model`: `Turing.Model` or `Function` z ↦ log p(x, z) where `x` denotes the observations
- * `alg`: the VI algorithm used
- * `q`: a `VariationalPosterior` for which it is assumed a specialized implementation of the variational objective used exists.
- * `getq`: function taking parameters `θ` as input and returns a `VariationalPosterior`
- * `θ`: only required if `getq` is used, in which case it is the initial parameters for the variational posterior
-
-
-`vi` takes the `Model` you want to approximate, a `VariationalInference` whose type specifies the method to use and then its fields specify the configuration of the method.
-
Additionally, you can pass
- an initial variational posterior `q`, for which we assume there exists a implementation of `update(::typeof(q), θ::AbstractVector)` returning an updated posterior `q` with parameters `θ`.
- a function mapping $$\theta \mapsto q_{\theta}$$ (denoted above `getq`) together with initial parameters `θ`. This provides more flexibility in the types of variational families that we can use, and can sometimes be slightly more convenient for quick and rough work.
@@ -127,14 +89,6 @@ By default, i.e. when calling `vi(m, advi)`, Turing use a *mean-field* approxima
print(@doc(Variational.meanfield))
```
- ```
- meanfield(model::Model)
- meanfield(rng::AbstractRNG, model::Model)
- ```
-
- Creates a mean-field approximation with multivariate normal as underlying distribution.
-
-
Currently the only implementation of `VariationalInference` available is `ADVI`, which is very convenient and applicable as long as your `Model` is differentiable with respect to the *variational parameters*, that is, the parameters of your variational distribution, e.g. mean and variance in the mean-field approximation.
@@ -142,28 +96,6 @@ Currently the only implementation of `VariationalInference` available is `ADVI`,
print(@doc(Variational.ADVI))
```
- ```julia
- struct ADVI{AD} <: Turing.Variational.VariationalInference{AD}
- ```
-
- Automatic Differentiation Variational Inference (ADVI) with automatic differentiation backend `AD`.
-
- # Fields
-
- * `samples_per_step::Int64`
-
- Number of samples used to estimate the ELBO in each optimization step.
- * `max_iters::Int64`
-
- Maximum number of gradient steps.
-
- ```
- ADVI([samples_per_step=1, max_iters=1000])
- ```
-
- Create an [`ADVI`](@ref) with the currently enabled automatic differentiation backend `ADBackend()`.
-
-
To perform VI on the model `m` using 10 samples for gradient estimation and taking 1000 gradient steps is then as simple as:
@@ -173,12 +105,6 @@ advi = ADVI(10, 1000)
q = vi(m, advi);
```
- ┌ Info: [ADVI] Should only be seen once: optimizer created for θ
- │ objectid(θ) = 12334556482979097499
- └ @ Turing.Variational /home/cameron/.julia/packages/Turing/cReBm/src/variational/VariationalInference.jl:204
- [32m[ADVI] Optimizing...: 100%|█████████████████████████████████████████| Time: 0:00:03[39m
-
-
Unfortunately, for such a small problem Turing's new `NUTS` sampler is *so* efficient now that it's not that much more efficient to use ADVI. So, so very unfortunate...
With that being said, this is not the case in general. For very complex models we'll later find that `ADVI` produces very reasonable results in a much shorter time than `NUTS`.
@@ -190,13 +116,6 @@ And one significant advantage of using `vi` is that we can sample from the resul
q isa MultivariateDistribution
```
-
-
-
- true
-
-
-
This means that we can call `rand` to sample from the variational posterior `q`
@@ -204,15 +123,6 @@ This means that we can call `rand` to sample from the variational posterior `q`
rand(q)
```
-
-
-
- 2-element Array{Float64,1}:
- 1.0134702063474585
- -0.07429020521027016
-
-
-
and `logpdf` to compute the log-probability
@@ -220,13 +130,6 @@ and `logpdf` to compute the log-probability
logpdf(q, rand(q))
```
-
-
-
- 4.277478745320889
-
-
-
Let's check the first and second moments of the data to see how our approximation compares to the point-estimates form the data:
@@ -234,25 +137,10 @@ Let's check the first and second moments of the data to see how our approximatio
var(x), mean(x)
```
-
-
-
- (1.021109459575047, -0.028838703049547422)
-
-
-
-
```julia
(mean(rand(q, 1000); dims = 2)..., )
```
-
-
-
- (1.02716749432684, -0.02510701319723139)
-
-
-
That's pretty close! But we're Bayesian so we're not interested in *just* matching the mean.
Let's instead look the actual density `q`.
@@ -263,51 +151,30 @@ For that we need samples:
samples = rand(q, 10000);
```
-
```julia
# setup for plotting
using Plots, LaTeXStrings, StatsPlots
-pyplot()
```
- ┌ Info: Precompiling PyPlot [d330b81b-6aea-500a-939a-2ce795aea3ee]
- └ @ Base loading.jl:1260
-
-
-
-
-
- Plots.PyPlotBackend()
-
-
-
-
```julia
p1 = histogram(samples[1, :], bins=100, normed=true, alpha=0.2, color = :blue, label = "")
density!(samples[1, :], label = "s (ADVI)", color = :blue, linewidth = 2)
-density!(collect(skipmissing(samples_nuts[:s].value)), label = "s (NUTS)", color = :green, linewidth = 2)
+density!(samples_nuts, :s; label = "s (NUTS)", color = :green, linewidth = 2)
vline!([var(x)], label = "s (data)", color = :black)
vline!([mean(samples[1, :])], color = :blue, label ="")
p2 = histogram(samples[2, :], bins=100, normed=true, alpha=0.2, color = :blue, label = "")
density!(samples[2, :], label = "m (ADVI)", color = :blue, linewidth = 2)
-density!(collect(skipmissing(samples_nuts[:m].value)), label = "m (NUTS)", color = :green, linewidth = 2)
+density!(samples_nuts, :m; label = "m (NUTS)", color = :green, linewidth = 2)
vline!([mean(x)], color = :black, label = "m (data)")
vline!([mean(samples[2, :])], color = :blue, label="")
plot(p1, p2, layout=(2, 1), size=(900, 500))
```
-
-
-
-![png](/tutorials/9_VariationalInference_files/9_VariationalInference_34_0.png)
-
-
-
For this particular `Model`, we can in fact obtain the posterior of the latent variables in closed form. This allows us to compare both `NUTS` and `ADVI` to the true posterior $$p(s, m \mid \{x_i\}_{i = 1}^n )$$.
-*The code below is just work to get the marginals $$p(s \mid \{x_i\}_{i = 1}^n)$$ and $$p(m \mid \{x_i\}_{i = 1}^n$$ from the posterior obtained using ConjugatePriors.jl. Feel free to skip it.*
+*The code below is just work to get the marginals $$p(s \mid \{x_i\}_{i = 1}^n)$$ and $$p(m \mid \{x_i\}_{i = 1}^n)$$ from the posterior obtained using ConjugatePriors.jl. Feel free to skip it.*
```julia
@@ -351,7 +218,7 @@ p_μ_pdf = z -> exp(logpdf(p_μ, (z - μₙ) * exp(- 0.5 * log(βₙ) + 0.5 * lo
p1 = plot();
histogram!(samples[1, :], bins=100, normed=true, alpha=0.2, color = :blue, label = "")
density!(samples[1, :], label = "s (ADVI)", color = :blue)
-density!(collect(skipmissing(samples_nuts[:s].value)), label = "s (NUTS)", color = :green)
+density!(samples_nuts, :s; label = "s (NUTS)", color = :green)
vline!([mean(samples[1, :])], linewidth = 1.5, color = :blue, label ="")
# normalize using Riemann approx. because of (almost certainly) numerical issues
@@ -365,7 +232,7 @@ xlims!(0.75, 1.35);
p2 = plot();
histogram!(samples[2, :], bins=100, normed=true, alpha=0.2, color = :blue, label = "")
density!(samples[2, :], label = "m (ADVI)", color = :blue)
-density!(collect(skipmissing(samples_nuts[:m].value)), label = "m (NUTS)", color = :green)
+density!(samples_nuts, :m; label = "m (NUTS)", color = :green)
vline!([mean(samples[2, :])], linewidth = 1.5, color = :blue, label="")
@@ -381,26 +248,16 @@ xlims!(-0.25, 0.25);
p = plot(p1, p2; layout=(2, 1), size=(900, 500))
```
- ┌ Info: Precompiling ConjugatePriors [1624bea9-42b1-5fc1-afd3-e96f729c8d6c]
- └ @ Base loading.jl:1260
-
-
-
-
-
-![png](/tutorials/9_VariationalInference_files/9_VariationalInference_36_1.png)
-
-
# Bayesian linear regression example using `ADVI`
-This is simply a duplication of the tutorial [5. Linear regression](../../tutorials/5-linearregression) but now with the addition of an approximate posterior obtained using `ADVI`.
+This is simply a duplication of the tutorial [5. Linear regression](../regression/02_linear-regression) but now with the addition of an approximate posterior obtained using `ADVI`.
As we'll see, there is really no additional work required to apply variational inference to a more complex `Model`.
-## Copy-paste from [5. Linear regression](../../tutorials/5-linearregression)
+## Copy-paste from [5. Linear regression](../regression/02_linear-regression)
-This section is basically copy-pasting the code from the [linear regression tutorial](../../tutorials/5-linearregression).
+This section is basically copy-pasting the code from the [linear regression tutorial](../regression/02_linear-regression).
```julia
@@ -416,10 +273,6 @@ using RDatasets
Turing.turnprogress(true);
```
- ┌ Info: [Turing]: progress logging is enabled globally
- └ @ Turing /home/cameron/.julia/packages/Turing/cReBm/src/Turing.jl:22
-
-
```julia
# Import the "Default" dataset.
@@ -429,14 +282,6 @@ data = RDatasets.dataset("datasets", "mtcars");
first(data, 6)
```
-
-
-
- | Model | MPG | Cyl | Disp | HP | DRat | WT | QSec | VS |
---|
| String | Float64 | Int64 | Float64 | Int64 | Float64 | Float64 | Float64 | Int64 |
---|
6 rows × 12 columns (omitted printing of 3 columns)
1 | Mazda RX4 | 21.0 | 6 | 160.0 | 110 | 3.9 | 2.62 | 16.46 | 0 |
---|
2 | Mazda RX4 Wag | 21.0 | 6 | 160.0 | 110 | 3.9 | 2.875 | 17.02 | 0 |
---|
3 | Datsun 710 | 22.8 | 4 | 108.0 | 93 | 3.85 | 2.32 | 18.61 | 1 |
---|
4 | Hornet 4 Drive | 21.4 | 6 | 258.0 | 110 | 3.08 | 3.215 | 19.44 | 1 |
---|
5 | Hornet Sportabout | 18.7 | 8 | 360.0 | 175 | 3.15 | 3.44 | 17.02 | 0 |
---|
6 | Valiant | 18.1 | 6 | 225.0 | 105 | 2.76 | 3.46 | 20.22 | 1 |
---|
-
-
-
-
```julia
# Function to split samples.
function split_data(df, at = 0.70)
@@ -458,14 +303,6 @@ function unstandardize(x, orig)
end
```
-
-
-
- unstandardize (generic function with 1 method)
-
-
-
-
```julia
# Remove the model column.
select!(data, Not(:Model))
@@ -527,32 +364,17 @@ q0 = Variational.meanfield(m)
typeof(q0)
```
-
-
-
- Bijectors.TransformedDistribution{DistributionsAD.TuringDiagMvNormal{Array{Float64,1},Array{Float64,1}},Bijectors.Stacked{Tuple{Bijectors.Inverse{Bijectors.TruncatedBijector{Float64},0},Bijectors.Identity{0},Bijectors.Identity{1}},3},Multivariate}
-
-
-
-
```julia
advi = ADVI(10, 10_000)
```
-
-
-
- ADVI{Turing.Core.ForwardDiffAD{40}}(10, 10000)
-
-
-
Turing also provides a couple of different optimizers:
- `TruncatedADAGrad` (default)
- `DecayedADAGrad`
as these are well-suited for problems with high-variance stochastic objectives, which is usually what the ELBO ends up being at different times in our optimization process.
With that being said, thanks to Requires.jl, if we add a `using Flux` prior to `using Turing` we can also make use of all the optimizers in `Flux`, e.g. `ADAM`, without any additional changes to your code! For example:
-```julia
+```julia; eval = false
using Flux, Turing
using Turing.Variational
@@ -568,28 +390,11 @@ opt = Variational.DecayedADAGrad(1e-2, 1.1, 0.9)
```
-
-
- Turing.Variational.DecayedADAGrad(0.01, 1.1, 0.9, IdDict{Any,Any}())
-
-
-
-
```julia
q = vi(m, advi, q0; optimizer = opt)
typeof(q)
```
- [32m[ADVI] Optimizing...: 100%|█████████████████████████████████████████| Time: 0:00:05[39m
-
-
-
-
-
- Bijectors.TransformedDistribution{DistributionsAD.TuringDiagMvNormal{Array{Float64,1},Array{Float64,1}},Bijectors.Stacked{Tuple{Bijectors.Inverse{Bijectors.TruncatedBijector{Float64},0},Bijectors.Identity{0},Bijectors.Identity{1}},3},Multivariate}
-
-
-
*Note: as mentioned before, we internally define a `update(q::TransformedDistribution{<:TuringDiagMvNormal}, θ::AbstractVector)` method which takes in the current variational approximation `q` together with new parameters `z` and returns the new variational approximation. This is required so that we can actually update the `Distribution` object after each optimization step.*
*Alternatively, we can instead provide the mapping $$\theta \mapsto q_{\theta}$$ directly together with initial parameters using the signature `vi(m, advi, getq, θ_init)` as mentioned earlier. We'll see an explicit example of this later on!*
@@ -608,88 +413,26 @@ Now we can for example look at the average
avg = vec(mean(z; dims = 2))
```
-
-
-
- 12-element Array{Float64,1}:
- 0.4606389176400052
- 0.05202909837745655
- 0.4064267006145497
- -0.11468688188714653
- -0.09745310785481277
- 0.6148587707658169
- 0.01308179579131569
- 0.09698898180610954
- -0.07232304322690832
- 0.13320265040493984
- 0.28561578772443025
- -0.829825963610117
-
-
-
The vector has the same ordering as the model, e.g. in this case `σ₂` has index `1`, `intercept` has index `2` and `coefficients` has indices `3:12`. If you forget or you might want to do something programmatically with the result, you can obtain the `sym → indices` mapping as follows:
```julia
-_, sym2range = Variational.bijector(m; sym_to_ranges = Val(true));
+_, sym2range = bijector(m, Val(true));
sym2range
```
-
-
-
- (intercept = UnitRange{Int64}[2:2], σ₂ = UnitRange{Int64}[1:1], coefficients = UnitRange{Int64}[3:12])
-
-
-
-
```julia
avg[union(sym2range[:σ₂]...)]
```
-
-
-
- 1-element Array{Float64,1}:
- 0.4606389176400052
-
-
-
-
```julia
avg[union(sym2range[:intercept]...)]
```
-
-
-
- 1-element Array{Float64,1}:
- 0.05202909837745655
-
-
-
-
```julia
avg[union(sym2range[:coefficients]...)]
```
-
-
-
- 10-element Array{Float64,1}:
- 0.4064267006145497
- -0.11468688188714653
- -0.09745310785481277
- 0.6148587707658169
- 0.01308179579131569
- 0.09698898180610954
- -0.07232304322690832
- 0.13320265040493984
- 0.28561578772443025
- -0.829825963610117
-
-
-
*Note: as you can see, this is slightly awkward to work with at the moment. We'll soon add a better way of dealing with this.*
With a bit of work (this will be much easier in the future), we can also visualize the approximate marginals of the different variables, similar to `plot(chain)`:
@@ -705,14 +448,14 @@ function plot_variational_marginals(z, sym2range)
offset = 1
for r in indices
for j in r
- p = density(z[j, :], title = "$$(sym)[$$offset]", titlefontsize = 10, label = "")
+ p = density(z[j, :], title = "$(sym)[$offset]", titlefontsize = 10, label = "")
push!(ps, p)
offset += 1
end
end
else
- p = density(z[first(indices), :], title = "$$(sym)", titlefontsize = 10, label = "")
+ p = density(z[first(indices), :], title = "$(sym)", titlefontsize = 10, label = "")
push!(ps, p)
end
end
@@ -722,24 +465,10 @@ end
```
-
-
- plot_variational_marginals (generic function with 1 method)
-
-
-
-
```julia
plot_variational_marginals(z, sym2range)
```
-
-
-
-![png](/tutorials/9_VariationalInference_files/9_VariationalInference_68_0.png)
-
-
-
And let's compare this to using the `NUTS` sampler:
@@ -747,72 +476,19 @@ And let's compare this to using the `NUTS` sampler:
chain = sample(m, NUTS(0.65), 10_000);
```
- ┌ Info: Found initial step size
- │ ϵ = 0.4
- └ @ Turing.Inference /home/cameron/.julia/packages/Turing/cReBm/src/inference/hmc.jl:556
- [32mSampling: 100%|█████████████████████████████████████████| Time: 0:00:04[39m
-
-
-
```julia
plot(chain)
```
-
-
-![png](/tutorials/9_VariationalInference_files/9_VariationalInference_71_0.png)
-
-
-
-
```julia
vi_mean = vec(mean(z; dims = 2))[[union(sym2range[:coefficients]...)..., union(sym2range[:intercept]...)..., union(sym2range[:σ₂]...)...]]
```
-
-
-
- 12-element Array{Float64,1}:
- 0.4064267006145497
- -0.11468688188714653
- -0.09745310785481277
- 0.6148587707658169
- 0.01308179579131569
- 0.09698898180610954
- -0.07232304322690832
- 0.13320265040493984
- 0.28561578772443025
- -0.829825963610117
- 0.05202909837745655
- 0.4606389176400052
-
-
-
-
```julia
mean(chain).nt.mean
```
-
-
-
- 12-element Array{Float64,1}:
- 0.40737234076000634
- -0.12119407949255825
- -0.09258229213058687
- 0.6075161662165318
- 0.010710254061742489
- 0.0962666098260057
- -0.07340041375352217
- 0.14124748712473906
- 0.2782293300542158
- -0.8234179979734787
- 0.049650076749642606
- 0.47011974512236054
-
-
-
One thing we can look at is simply the squared error between the means:
@@ -820,13 +496,6 @@ One thing we can look at is simply the squared error between the means:
sum(abs2, mean(chain).nt.mean .- vi_mean)
```
-
-
-
- 0.00038407031406971286
-
-
-
That looks pretty good! But let's see how the predictive distributions looks for the two.
## Prediction
@@ -857,14 +526,6 @@ function prediction_chain(chain, x)
end
```
-
-
-
- prediction_chain (generic function with 1 method)
-
-
-
-
```julia
# Make a prediction using samples from the variational posterior given an input vector.
function prediction(samples::AbstractVector, sym2ranges, x)
@@ -880,14 +541,6 @@ function prediction(samples::AbstractMatrix, sym2ranges, x)
end
```
-
-
-
- prediction (generic function with 2 methods)
-
-
-
-
```julia
# Unstandardize the dependent variable.
train_cut.MPG = unstandardize(train_cut.MPG, data.MPG);
@@ -900,14 +553,6 @@ test_cut.MPG = unstandardize(test_cut.MPG, data.MPG);
first(test_cut, 6)
```
-
-
-
- | MPG | Cyl | Disp | HP | DRat | WT | QSec | VS |
---|
| Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 |
---|
6 rows × 12 columns (omitted printing of 4 columns)
1 | 116.195 | 1.01488 | 0.591245 | 0.0483133 | -0.835198 | 0.222544 | -0.307089 | -0.868028 |
---|
2 | 114.295 | 1.01488 | 0.962396 | 1.4339 | 0.249566 | 0.636461 | -1.36476 | -0.868028 |
---|
3 | 120.195 | 1.01488 | 1.36582 | 0.412942 | -0.966118 | 0.641571 | -0.446992 | -0.868028 |
---|
4 | 128.295 | -1.22486 | -1.22417 | -1.17684 | 0.904164 | -1.31048 | 0.588295 | 1.11604 |
---|
5 | 126.995 | -1.22486 | -0.890939 | -0.812211 | 1.55876 | -1.10097 | -0.642858 | -0.868028 |
---|
6 | 131.395 | -1.22486 | -1.09427 | -0.491337 | 0.324377 | -1.74177 | -0.530935 | 1.11604 |
---|
-
-
-
-
```julia
z = rand(q, 10_000);
```
@@ -933,24 +578,15 @@ bayes_loss2 = mean((test_cut.BayesPredictions - test_cut.MPG).^2)
ols_loss2 = mean((test_cut.OLSPrediction - test_cut.MPG).^2)
println("Training set:
- VI loss: $$vi_loss1
- Bayes loss: $$bayes_loss1
- OLS loss: $$ols_loss1
+ VI loss: $vi_loss1
+ Bayes loss: $bayes_loss1
+ OLS loss: $ols_loss1
Test set:
- VI loss: $$vi_loss2
- Bayes loss: $$bayes_loss2
- OLS loss: $$ols_loss2")
+ VI loss: $vi_loss2
+ Bayes loss: $bayes_loss2
+ OLS loss: $ols_loss2")
```
- Training set:
- VI loss: 3.0784608943296643
- Bayes loss: 3.0716118391411906
- OLS loss: 3.070926124893019
- Test set:
- VI loss: 27.159605003619333
- Bayes loss: 26.58835451660728
- OLS loss: 27.094813070760107
-
Interestingly the squared difference between true- and mean-prediction on the test-set is actually *better* for the mean-field variational posterior than for the "true" posterior obtained by MCMC sampling using `NUTS`. But, as Bayesians, we know that the mean doesn't tell the entire story. One quick check is to look at the mean predictions ± standard deviation of the two different approaches:
@@ -959,38 +595,23 @@ Interestingly the squared difference between true- and mean-prediction on the te
z = rand(q, 1000);
preds = hcat([unstandardize(prediction(z[:, i], sym2range, test), data.MPG) for i = 1:size(z, 2)]...);
-scatter(1:size(test, 1), mean(preds; dims = 2), yerr=std(preds; dims = 2), label="prediction (mean ± std)", size = (900, 500))
+scatter(1:size(test, 1), mean(preds; dims = 2), yerr=std(preds; dims = 2), label="prediction (mean ± std)", size = (900, 500), markersize = 8)
scatter!(1:size(test, 1), unstandardize(test_label, data.MPG), label="true")
xaxis!(1:size(test, 1))
ylims!(95, 140)
title!("Mean-field ADVI (Normal)")
```
-
-
-
-![png](/tutorials/9_VariationalInference_files/9_VariationalInference_88_0.png)
-
-
-
-
```julia
preds = hcat([unstandardize(prediction_chain(chain[i], test), data.MPG) for i = 1:5:size(chain, 1)]...);
-scatter(1:size(test, 1), mean(preds; dims = 2), yerr=std(preds; dims = 2), label="prediction (mean ± std)", size = (900, 500))
+scatter(1:size(test, 1), mean(preds; dims = 2), yerr=std(preds; dims = 2), label="prediction (mean ± std)", size = (900, 500), markersize = 8)
scatter!(1:size(test, 1), unstandardize(test_label, data.MPG), label="true")
xaxis!(1:size(test, 1))
ylims!(95, 140)
title!("MCMC (NUTS)")
```
-
-
-
-![png](/tutorials/9_VariationalInference_files/9_VariationalInference_89_0.png)
-
-
-
Indeed we see that the MCMC approach generally provides better uncertainty estimates than the mean-field ADVI approach! Good. So all the work we've done to make MCMC fast isn't for nothing.
## Alternative: provide parameter-to-distribution instead of `q` with`update` implemented
@@ -1015,17 +636,6 @@ d = length(q)
base_dist = Turing.DistributionsAD.TuringDiagMvNormal(zeros(d), ones(d))
```
-
-
-
- DistributionsAD.TuringDiagMvNormal{Array{Float64,1},Array{Float64,1}}(
- m: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
- σ: [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
- )
-
-
-
-
`bijector(model::Turing.Model)` is defined by Turing, and will return a `bijector` which takes you from the space of the latent variables to the real space. In this particular case, this is a mapping `((0, ∞) × ℝ × ℝ¹⁰) → ℝ¹²`. We're interested in using a normal distribution as a base-distribution and transform samples to the latent space, thus we need the inverse mapping from the reals to the latent space:
@@ -1046,39 +656,16 @@ function getq(θ)
end
```
-
-
-
- getq (generic function with 1 method)
-
-
-
-
```julia
q_mf_normal = vi(m, advi, getq, randn(2 * d));
```
- ┌ Info: [ADVI] Should only be seen once: optimizer created for θ
- │ objectid(θ) = 8127634262038331167
- └ @ Turing.Variational /home/cameron/.julia/packages/Turing/cReBm/src/variational/VariationalInference.jl:204
- [32m[ADVI] Optimizing...: 100%|█████████████████████████████████████████| Time: 0:00:06[39m
-
-
-
```julia
p1 = plot_variational_marginals(rand(q_mf_normal, 10_000), sym2range) # MvDiagNormal + Affine transformation + to_constrained
p2 = plot_variational_marginals(rand(q, 10_000), sym2range) # Turing.meanfield(m)
plot(p1, p2, layout = (1, 2), size = (800, 2000))
```
-
-
-
-
-![png](/tutorials/9_VariationalInference_files/9_VariationalInference_100_0.png)
-
-
-
As expected, the fits look pretty much identical.
But using this interface it becomes trivial to go beyond the mean-field assumption we made for the variational posterior, as we'll see in the next section.
@@ -1092,16 +679,24 @@ Here we'll instead consider the variational family to be a full non-diagonal mul
using LinearAlgebra
```
+```julia
+# Using `ComponentArrays.jl` together with `UnPack.jl` makes our lives much easier.
+using ComponentArrays, UnPack
+```
```julia
-d = 12
+proto_arr = ComponentArray(
+ L = zeros(d, d),
+ b = zeros(d)
+)
+proto_axes = proto_arr |> getaxes
+num_params = length(proto_arr)
function getq(θ)
- offset = 0
- L = LowerTriangular(reshape(@inbounds(θ[offset + 1: offset + d^2]), (d, d)))
- offset += d^2
- b = @inbounds θ[offset + 1: offset + d]
-
+ L, b = begin
+ @unpack L, b = ComponentArray(θ, proto_axes)
+ LowerTriangular(L), b
+ end
# For this to represent a covariance matrix we need to ensure that the diagonal is positive.
# We can enforce this by zeroing out the diagonal and then adding back the diagonal exponentiated.
D = Diagonal(diag(L))
@@ -1113,33 +708,14 @@ function getq(θ)
end
```
-
-
-
- getq (generic function with 1 method)
-
-
-
-
```julia
advi = ADVI(10, 20_000)
```
-
-
-
- ADVI{Turing.Core.ForwardDiffAD{40}}(10, 20000)
-
-
-
-
```julia
-q_full_normal = vi(m, advi, getq, randn(d^2 + d); optimizer = Variational.DecayedADAGrad(1e-2));
+q_full_normal = vi(m, advi, getq, randn(num_params); optimizer = Variational.DecayedADAGrad(1e-2));
```
- [32m[ADVI] Optimizing...: 100%|█████████████████████████████████████████| Time: 0:01:05[39m
-
-
Let's have a look at the learned covariance matrix:
@@ -1147,38 +723,10 @@ Let's have a look at the learned covariance matrix:
A = q_full_normal.transform.ts[1].a
```
-
-
-
- 12×12 LowerTriangular{Float64,Array{Float64,2}}:
- 0.154572 ⋅ ⋅ … ⋅ ⋅ ⋅
- 0.00674249 0.169072 ⋅ ⋅ ⋅ ⋅
- -0.00288782 -0.0283984 0.413288 ⋅ ⋅ ⋅
- -0.030621 0.0450533 -0.0415525 ⋅ ⋅ ⋅
- -0.0115003 0.208366 -0.0420414 ⋅ ⋅ ⋅
- 0.00139553 -0.0619506 0.0853589 … ⋅ ⋅ ⋅
- 0.0129097 -0.0647154 0.00228644 ⋅ ⋅ ⋅
- -0.0128701 -0.0531755 0.0999936 ⋅ ⋅ ⋅
- 0.00169318 0.0274239 0.0903744 ⋅ ⋅ ⋅
- -0.0172387 -0.0304655 0.0661713 0.14843 ⋅ ⋅
- -0.000468924 0.300281 0.0789093 … -0.131391 0.128256 ⋅
- 0.00160201 -0.122274 -0.0776935 0.0468996 -0.00752499 0.120458
-
-
-
-
```julia
heatmap(cov(A * A'))
```
-
-
-
-![png](/tutorials/9_VariationalInference_files/9_VariationalInference_110_0.png)
-
-
-
-
```julia
zs = rand(q_full_normal, 10_000);
```
@@ -1192,12 +740,6 @@ plot(p1, p2, layout = (1, 2), size = (800, 2000))
```
-
-
-![png](/tutorials/9_VariationalInference_files/9_VariationalInference_112_0.png)
-
-
-
So it seems like the "full" ADVI approach, i.e. no mean-field assumption, obtain the same modes as the mean-field approach but with greater uncertainty for some of the `coefficients`. This
@@ -1227,97 +769,51 @@ bayes_loss2 = mean((test_cut.BayesPredictions - test_cut.MPG).^2)
ols_loss2 = mean((test_cut.OLSPrediction - test_cut.MPG).^2)
println("Training set:
- VI loss: $$vi_loss1
- VI (full) loss: $$vifull_loss1
- Bayes loss: $$bayes_loss1
- OLS loss: $$ols_loss1
+ VI loss: $vi_loss1
+ Bayes loss: $bayes_loss1
+ OLS loss: $ols_loss1
Test set:
- VI loss: $$vi_loss2
- VI (full) loss: $$vifull_loss2
- Bayes loss: $$bayes_loss2
- OLS loss: $$ols_loss2")
+ VI loss: $vi_loss2
+ Bayes loss: $bayes_loss2
+ OLS loss: $ols_loss2")
```
- Training set:
- VI loss: 3.0784608943296643
- VI (full) loss: 3.0926834377972288
- Bayes loss: 3.0716118391411906
- OLS loss: 3.070926124893019
- Test set:
- VI loss: 27.159605003619333
- VI (full) loss: 26.912162732716684
- Bayes loss: 26.58835451660728
- OLS loss: 27.094813070760107
-
-
-
```julia
z = rand(q_mf_normal, 1000);
preds = hcat([unstandardize(prediction(z[:, i], sym2range, test), data.MPG) for i = 1:size(z, 2)]...);
-p1 = scatter(1:size(test, 1), mean(preds; dims = 2), yerr=std(preds; dims = 2), label="prediction (mean ± std)", size = (900, 500))
+p1 = scatter(1:size(test, 1), mean(preds; dims = 2), yerr=std(preds; dims = 2), label="prediction (mean ± std)", size = (900, 500), markersize = 8)
scatter!(1:size(test, 1), unstandardize(test_label, data.MPG), label="true")
xaxis!(1:size(test, 1))
ylims!(95, 140)
title!("Mean-field ADVI (Normal)")
```
-
-
-
-![png](/tutorials/9_VariationalInference_files/9_VariationalInference_117_0.png)
-
-
-
-
```julia
z = rand(q_full_normal, 1000);
preds = hcat([unstandardize(prediction(z[:, i], sym2range, test), data.MPG) for i = 1:size(z, 2)]...);
-p2 = scatter(1:size(test, 1), mean(preds; dims = 2), yerr=std(preds; dims = 2), label="prediction (mean ± std)", size = (900, 500))
+p2 = scatter(1:size(test, 1), mean(preds; dims = 2), yerr=std(preds; dims = 2), label="prediction (mean ± std)", size = (900, 500), markersize = 8)
scatter!(1:size(test, 1), unstandardize(test_label, data.MPG), label="true")
xaxis!(1:size(test, 1))
ylims!(95, 140)
title!("Full ADVI (Normal)")
```
-
-
-
-![png](/tutorials/9_VariationalInference_files/9_VariationalInference_118_0.png)
-
-
-
-
```julia
preds = hcat([unstandardize(prediction_chain(chain[i], test), data.MPG) for i = 1:5:size(chain, 1)]...);
-p3 = scatter(1:size(test, 1), mean(preds; dims = 2), yerr=std(preds; dims = 2), label="prediction (mean ± std)", size = (900, 500))
+p3 = scatter(1:size(test, 1), mean(preds; dims = 2), yerr=std(preds; dims = 2), label="prediction (mean ± std)", size = (900, 500), markersize = 8)
scatter!(1:size(test, 1), unstandardize(test_label, data.MPG), label="true")
xaxis!(1:size(test, 1))
ylims!(95, 140)
title!("MCMC (NUTS)")
```
-
-
-
-![png](/tutorials/9_VariationalInference_files/9_VariationalInference_119_0.png)
-
-
-
-
```julia
plot(p1, p2, p3, layout = (1, 3), size = (900, 250), label="")
```
-
-
-
-![png](/tutorials/9_VariationalInference_files/9_VariationalInference_120_0.png)
-
-
-
Here we actually see that indeed both the full ADVI and the MCMC approaches does a much better job of quantifying the uncertainty of predictions for never-before-seen samples, with full ADVI seemingly *overestimating* the variance slightly compared to MCMC.
So now you know how to do perform VI on your Turing.jl model! Great isn't it?
diff --git a/tutorials/variational-inference/Project.toml b/tutorials/variational-inference/Project.toml
new file mode 100644
index 000000000..638a1fd97
--- /dev/null
+++ b/tutorials/variational-inference/Project.toml
@@ -0,0 +1,26 @@
+[deps]
+Bijectors = "76274a88-744f-5084-9051-94815aaf08c4"
+ComponentArrays = "b0b7db55-cfe3-40fc-9ded-d10e2dbeff66"
+ConjugatePriors = "1624bea9-42b1-5fc1-afd3-e96f729c8d6c"
+GLM = "38e38edf-8417-5370-95a0-9cbb8c7f171a"
+LaTeXStrings = "b964fa9f-0449-5b57-a5c2-d3ea65f4040f"
+Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
+PyPlot = "d330b81b-6aea-500a-939a-2ce795aea3ee"
+RDatasets = "ce6b1742-4840-55fa-b093-852dadbb1d8b"
+Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
+StatsPlots = "f3b207a7-027a-5e70-b257-86293d7955fd"
+Turing = "fce5fe82-541a-59a6-adf8-730c64b5f9a0"
+UnPack = "3a884ed6-31ef-47d7-9d2a-63182c4928ed"
+
+[compat]
+Bijectors = "0.8.5"
+ComponentArrays = "0.8.2"
+ConjugatePriors = "0.4.0"
+GLM = "1.3.10"
+LaTeXStrings = "1.2.0"
+Plots = "1.6.6"
+PyPlot = "2.9.0"
+RDatasets = "0.6.10"
+StatsPlots = "0.14.13"
+Turing = "0.14.3"
+UnPack = "1.0.2"