diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json
index a8e97b4..a017a80 100644
--- a/dev/.documenter-siteinfo.json
+++ b/dev/.documenter-siteinfo.json
@@ -1 +1 @@
-{"documenter":{"julia_version":"1.10.4","generation_timestamp":"2024-06-21T15:57:45","documenter_version":"1.4.1"}}
\ No newline at end of file
+{"documenter":{"julia_version":"1.10.4","generation_timestamp":"2024-07-08T19:54:17","documenter_version":"1.5.0"}}
\ No newline at end of file
diff --git a/dev/ae/index.html b/dev/ae/index.html
index b2a14d2..4f9fbc3 100644
--- a/dev/ae/index.html
+++ b/dev/ae/index.html
@@ -1,5 +1,5 @@
-
The deterministic autoencoders are a type of neural network that learns to embed high-dimensional data into a lower-dimensional space in a one-to-one fashion. The AEs module provides the necessary tools to train these networks. The main type is the AE struct, which is a simple feedforward neural network composed of two parts: an Encoder and a Decoder.
encoder::E: Neural network that encodes the input into the latent space. E is a subtype of AbstractDeterministicEncoder.
decoder::D: Neural network that decodes the latent representation back to the original input space. D is a subtype of AbstractDeterministicDecoder.
An AE consists of an encoder and decoder network with a bottleneck latent space in between. The encoder compresses the input into a low-dimensional representation. The decoder tries to reconstruct the original input from the point in the latent space.
Processes the input data x through the autoencoder (AE) that consists of an encoder and a decoder.
Arguments
x::AbstractVecOrMat{Float32}: The data to be decoded. This can be a vector or a matrix where each column represents a separate sample.
Optional Keyword Arguments
latent::Bool: If set to true, returns a dictionary containing the latent representation alongside the reconstructed data. Defaults to false.
Returns
If latent=false: A Namedtuple with key :decoder that contains the reconstructed data after processing through the encoder and decoder.
If latent=true: A Namedtuplewith keys :encoder, and :decoder, containing the corresponding values.
Description
The function first encodes the input x using the encoder to get the encoded representation in the latent space. This latent representation is then decoded using the decoder to produce the reconstructed data. If latent is set to true, it also returns the latent representation.
Note
Ensure the input data x matches the expected input dimensionality for the encoder in the AE.
The deterministic autoencoders are a type of neural network that learns to embed high-dimensional data into a lower-dimensional space in a one-to-one fashion. The AEs module provides the necessary tools to train these networks. The main type is the AE struct, which is a simple feedforward neural network composed of two parts: an Encoder and a Decoder.
encoder::E: Neural network that encodes the input into the latent space. E is a subtype of AbstractDeterministicEncoder.
decoder::D: Neural network that decodes the latent representation back to the original input space. D is a subtype of AbstractDeterministicDecoder.
An AE consists of an encoder and decoder network with a bottleneck latent space in between. The encoder compresses the input into a low-dimensional representation. The decoder tries to reconstruct the original input from the point in the latent space.
Processes the input data x through the autoencoder (AE) that consists of an encoder and a decoder.
Arguments
x::AbstractVecOrMat{Float32}: The data to be decoded. This can be a vector or a matrix where each column represents a separate sample.
Optional Keyword Arguments
latent::Bool: If set to true, returns a dictionary containing the latent representation alongside the reconstructed data. Defaults to false.
Returns
If latent=false: A Namedtuple with key :decoder that contains the reconstructed data after processing through the encoder and decoder.
If latent=true: A Namedtuplewith keys :encoder, and :decoder, containing the corresponding values.
Description
The function first encodes the input x using the encoder to get the encoded representation in the latent space. This latent representation is then decoded using the decoder to produce the reconstructed data. If latent is set to true, it also returns the latent representation.
Note
Ensure the input data x matches the expected input dimensionality for the encoder in the AE.
Calculate the mean squared error (MSE) loss for an autoencoder (AE) using separate input and target output vectors.
The AE loss is computed as: loss = MSE(xout, x̂) + regstrength × reg_term
Where:
x_out is the target output vector.
x̂ is the reconstructed output from the AE given x_in as input.
regstrength × regterm is an optional regularization term.
Arguments
ae::AE: An AE model.
x_in::AbstractArray: Input vector to the AE encoder.
x_out::AbstractArray: Target output vector to compute the reconstruction error.
Optional Keyword Arguments
reg_function::Union{Function, Nothing}=nothing: A function that computes the regularization term based on the ae outputs. Should return a Float32. This function must take as input the ae outputs and the keyword arguments provided in reg_kwargs.
reg_kwargs::Union{NamedTuple,Dict}=Dict(): Keyword arguments to pass to the regularization function.
reg_strength::Number=1.0f0: The strength of the regularization term.
Returns
The computed loss value between the target x_out and its reconstructed counterpart from x_in, including possible regularization terms.
Note
Ensure that the input data x_in matches the expected input dimensionality for the encoder in the AE.
Customized training function to update parameters of an autoencoder given a specified loss function.
Arguments
ae::AE: A struct containing the elements of an autoencoder.
x_in::AbstractArray: Input data on which the autoencoder will be trained.
x_out::AbstractArray: Target output data for the autoencoder.
opt::NamedTuple: State of the optimizer for updating parameters. Typically initialized using Flux.Train.setup.
Optional Keyword Arguments
loss_function::Function: The loss function used for training. It should accept the autoencoder model and input data x, and return a loss value.
loss_kwargs::Union{NamedTuple,Dict} = Dict(): Additional arguments for the loss function.
verbose::Bool=false: If true, the loss value will be printed during training.
loss_return::Bool=false: If true, the loss value will be returned after training.
Description
Trains the autoencoder by:
Computing the gradient of the loss with respect to the autoencoder parameters.
Updating the autoencoder parameters using the optimizer.
Settings
This document was generated with Documenter.jl version 1.4.1 on Friday 21 June 2024. Using Julia version 1.10.4.
+ reg_strength::Float32=1.0f0)
Calculate the mean squared error (MSE) loss for an autoencoder (AE) using separate input and target output vectors.
The AE loss is computed as: loss = MSE(xout, x̂) + regstrength × reg_term
Where:
x_out is the target output vector.
x̂ is the reconstructed output from the AE given x_in as input.
regstrength × regterm is an optional regularization term.
Arguments
ae::AE: An AE model.
x_in::AbstractArray: Input vector to the AE encoder.
x_out::AbstractArray: Target output vector to compute the reconstruction error.
Optional Keyword Arguments
reg_function::Union{Function, Nothing}=nothing: A function that computes the regularization term based on the ae outputs. Should return a Float32. This function must take as input the ae outputs and the keyword arguments provided in reg_kwargs.
reg_kwargs::Union{NamedTuple,Dict}=Dict(): Keyword arguments to pass to the regularization function.
reg_strength::Number=1.0f0: The strength of the regularization term.
Returns
The computed loss value between the target x_out and its reconstructed counterpart from x_in, including possible regularization terms.
Note
Ensure that the input data x_in matches the expected input dimensionality for the encoder in the AE.
A lot of recent research in the field of generative models has focused on the geometry of the learned latent space (see the references at the end of this section for examples). The non-linear nature of neural networks makes it relevant to consider the non-Euclidean geometry of the latent space when trying to gain insights into the structure of the learned space. In other words, given that neural networks involve a series of non-linear transformations of the input data, we cannot expect the latent space to be Euclidean, and thus, we need to account for curvature and other non-Euclidean properties. For this, we can borrow concepts and tools from Riemannian geometry, now applied to the latent space of generative models.
AutoEncoderToolkit.jl aims to provide the set of necessary tools to study the geometry of the latent space in the context of variational autoencoders generative models.
Note
This is very much work in progress. As always, contributions are welcome!
In what follows we will give a very short primer on some relevant concepts in differential geometry. This includes some basic definitions and concepts along with what we consider intuitive explanations of the concepts. We trade rigor for accessibility, so if you are looking for a more formal treatment, this is not the place.
Note
These notes are partially based on the 2022 paper by Chadebec et al. [2].
A $d$-dimensional manifold $\mathcal{M}$ is a manifold that is locally homeomorphic to a $d$-dimensional Euclidean space. This means that the manifold–some surface or high-dimensional shape–when observed from really close, can be stretched or bent without tearing or gluing it to make it resemble regular Euclidean space.
If the manifold is differentiable, it possesses a tangent space $T_z$ at any point $z \in \mathcal{M}$ composed of the tangent vectors of the curves passing by $z$.
If the manifold $\mathcal{M}$ is equipped with a smooth inner product,
\[g: z \rightarrow \langle \cdot \mid \cdot \rangle_z,
+
A lot of recent research in the field of generative models has focused on the geometry of the learned latent space (see the references at the end of this section for examples). The non-linear nature of neural networks makes it relevant to consider the non-Euclidean geometry of the latent space when trying to gain insights into the structure of the learned space. In other words, given that neural networks involve a series of non-linear transformations of the input data, we cannot expect the latent space to be Euclidean, and thus, we need to account for curvature and other non-Euclidean properties. For this, we can borrow concepts and tools from Riemannian geometry, now applied to the latent space of generative models.
AutoEncoderToolkit.jl aims to provide the set of necessary tools to study the geometry of the latent space in the context of variational autoencoders generative models.
Note
This is very much work in progress. As always, contributions are welcome!
In what follows we will give a very short primer on some relevant concepts in differential geometry. This includes some basic definitions and concepts along with what we consider intuitive explanations of the concepts. We trade rigor for accessibility, so if you are looking for a more formal treatment, this is not the place.
Note
These notes are partially based on the 2022 paper by Chadebec et al. [2].
A $d$-dimensional manifold $\mathcal{M}$ is a manifold that is locally homeomorphic to a $d$-dimensional Euclidean space. This means that the manifold–some surface or high-dimensional shape–when observed from really close, can be stretched or bent without tearing or gluing it to make it resemble regular Euclidean space.
If the manifold is differentiable, it possesses a tangent space $T_z$ at any point $z \in \mathcal{M}$ composed of the tangent vectors of the curves passing by $z$.
If the manifold $\mathcal{M}$ is equipped with a smooth inner product,
\[g: z \rightarrow \langle \cdot \mid \cdot \rangle_z,
\tag{1}\]
defined on the tangent space $T_z$ for any $z \in \mathcal{M}$, then $\mathcal{M}$ is a Riemannian manifold and $g$ is the associated Riemannian metric. With this, a local representation of $g$ at any point $z$ is given by the positive definite matrix $\mathbf{G}(z)$.
A chart (fancy name for a coordinate system) $(U, \phi)$ provides a homeomorphic mapping between an open set $U$ of the manifold and an open set $V$ of Euclidean space. This means that there is a way to bend and stretch any segment of the manifold to make it look like a segment of Euclidean space. Therefore, given a point $z \in U$, a chart–its coordinate–$\phi: (z_1, z_2, \ldots, z_d)$ induces a basis $\{\partial_{z_1}, \partial_{z_2}, \ldots, \partial_{z_d}\}$ on the tangent space $T_z \mathcal{M}$. In other words, the partial derivatives of the manifold with respect to the dimensions form a basis (think of $\hat{i}, \hat{j}, \hat{k}$ in 3D space) for the tangent space at that point. Hence, the metric–a "position-dependent scale-bar"–of a Riemannian manifold can be locally represented at $\phi$ as a positive definite matrix $\mathbf{G}(z)$ with components $g_{ij}(z)$ of the form
This implies that for every pair of vectors $v, w \in T_z \mathcal{M}$ and a point $z \in \mathcal{M}$, the inner product $\langle v \mid w \rangle_z$ is given by
\[\langle v \mid w \rangle_z = v^T \mathbf{G}(z) w.
\tag{3}\]
If $\mathcal{M}$ is connected–a continuous shape with no breaks–a Riemannian distance between two points $z_1, z_2 \in \mathcal{M}$ can be defined as
Function to compute the (discretized) integral defining the energy of a curve γ̲ on a Riemmanina manifold. The energy is defined as
E(γ̲) = ∫ dt ⟨γ̲̇(t), G̲̲ γ̲̇(t)⟩,
where γ̲̇(t) defines the velocity of the parametric curve, and G̲̲ is the Riemmanian metric tensor. For this function, we approximate the integral as
E(γ̲) ≈ ∑ᵢ Δt ⟨γ̲̇(tᵢ)ᵀ G̲̲ (γ̲(tᵢ+1) γ̲̇(tᵢ))⟩,
where Δt is the time step between points. Note that this Δt is assumed to be constant, thus, the time points t must be equally spaced.
Arguments
riemannian_metric::AbstractArray: d×d×N tensor where d is the dimension of the manifold on which the curve lies and N is the number of sampled time points along the curve. Each slice of the array represents the Riemmanian metric tensor for the curve at the corresponding time point.
curve_velocity::AbstractArray: d×N Matrix where d is the dimension of the manifold on which the curve lies and N is the number of sampled time points along the curve. Each column represents the velocity of the curve at the corresponding time point.
t::AbstractVector: Vector of time points at which the curve is sampled.
Returns
Energy::Number: Approximation of the Energy for the path on the manifold.
Chen, N. et al. Metrics for Deep Generative Models. in Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics 1540–1550 (PMLR, 2018).
Chadebec, C. & Allassonnière, S. A Geometric Perspective on Variational Autoencoders. Preprint at http://arxiv.org/abs/2209.07370 (2022).
Chadebec, C., Mantoux, C. & Allassonnière, S. Geometry-Aware Hamiltonian Variational Auto-Encoder. Preprint at http://arxiv.org/abs/2010.11518 (2020).
Arvanitidis, G., Hauberg, S., Hennig, P. & Schober, M. Fast and Robust Shortest Paths on Manifolds Learned from Data. in Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics 1506–1515 (PMLR, 2019).
Arvanitidis, G., Hauberg, S. & Schölkopf, B. Geometrically Enriched Latent Spaces. Preprint at https://doi.org/10.48550/arXiv.2008.00565 (2020).
Arvanitidis, G., González-Duque, M., Pouplin, A., Kalatzis, D. & Hauberg, S. Pulling back information geometry. Preprint at http://arxiv.org/abs/2106.05367 (2022).
Fröhlich, C., Gessner, A., Hennig, P., Schölkopf, B. & Arvanitidis, G. Bayesian Quadrature on Riemannian Data Manifolds.
Kalatzis, D., Eklund, D., Arvanitidis, G. & Hauberg, S. Variational Autoencoders with Riemannian Brownian Motion Priors. Preprint at http://arxiv.org/abs/2002.05227 (2020).
Arvanitidis, G., Hansen, L. K. & Hauberg, S. Latent Space Oddity: on the Curvature of Deep Generative Models. Preprint at http://arxiv.org/abs/1710.11379 (2021).
Settings
This document was generated with Documenter.jl version 1.4.1 on Friday 21 June 2024. Using Julia version 1.10.4.
+)
Function to compute the (discretized) integral defining the energy of a curve γ̲ on a Riemmanina manifold. The energy is defined as
E(γ̲) = ∫ dt ⟨γ̲̇(t), G̲̲ γ̲̇(t)⟩,
where γ̲̇(t) defines the velocity of the parametric curve, and G̲̲ is the Riemmanian metric tensor. For this function, we approximate the integral as
E(γ̲) ≈ ∑ᵢ Δt ⟨γ̲̇(tᵢ)ᵀ G̲̲ (γ̲(tᵢ+1) γ̲̇(tᵢ))⟩,
where Δt is the time step between points. Note that this Δt is assumed to be constant, thus, the time points t must be equally spaced.
Arguments
riemannian_metric::AbstractArray: d×d×N tensor where d is the dimension of the manifold on which the curve lies and N is the number of sampled time points along the curve. Each slice of the array represents the Riemmanian metric tensor for the curve at the corresponding time point.
curve_velocity::AbstractArray: d×N Matrix where d is the dimension of the manifold on which the curve lies and N is the number of sampled time points along the curve. Each column represents the velocity of the curve at the corresponding time point.
t::AbstractVector: Vector of time points at which the curve is sampled.
Returns
Energy::Number: Approximation of the Energy for the path on the manifold.
Chen, N. et al. Metrics for Deep Generative Models. in Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics 1540–1550 (PMLR, 2018).
Chadebec, C. & Allassonnière, S. A Geometric Perspective on Variational Autoencoders. Preprint at http://arxiv.org/abs/2209.07370 (2022).
Chadebec, C., Mantoux, C. & Allassonnière, S. Geometry-Aware Hamiltonian Variational Auto-Encoder. Preprint at http://arxiv.org/abs/2010.11518 (2020).
Arvanitidis, G., Hauberg, S., Hennig, P. & Schober, M. Fast and Robust Shortest Paths on Manifolds Learned from Data. in Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics 1506–1515 (PMLR, 2019).
Arvanitidis, G., Hauberg, S. & Schölkopf, B. Geometrically Enriched Latent Spaces. Preprint at https://doi.org/10.48550/arXiv.2008.00565 (2020).
Arvanitidis, G., González-Duque, M., Pouplin, A., Kalatzis, D. & Hauberg, S. Pulling back information geometry. Preprint at http://arxiv.org/abs/2106.05367 (2022).
Fröhlich, C., Gessner, A., Hennig, P., Schölkopf, B. & Arvanitidis, G. Bayesian Quadrature on Riemannian Data Manifolds.
Kalatzis, D., Eklund, D., Arvanitidis, G. & Hauberg, S. Variational Autoencoders with Riemannian Brownian Motion Priors. Preprint at http://arxiv.org/abs/2002.05227 (2020).
Arvanitidis, G., Hansen, L. K. & Hauberg, S. Latent Space Oddity: on the Curvature of Deep Generative Models. Preprint at http://arxiv.org/abs/1710.11379 (2021).
Settings
This document was generated with Documenter.jl version 1.5.0 on Monday 8 July 2024. Using Julia version 1.10.4.
Default encoder function for deterministic autoencoders. The encoder network is used to map the input data directly into the latent space representation.
Fields
encoder::Union{Flux.Chain,Flux.Dense}: The primary neural network used to process input data and map it into a latent space representation.
Forward propagate the input x through the Encoder to obtain the encoded representation in the latent space.
Arguments
x::Array: Input data to be encoded.
Returns
z: Encoded representation of the input data in the latent space.
Description
This method allows for a direct call on an instance of Encoder with the input data x. It runs the input through the encoder network and outputs the encoded representation in the latent space.
Default encoder function for deterministic autoencoders. The encoder network is used to map the input data directly into the latent space representation.
Fields
encoder::Union{Flux.Chain,Flux.Dense}: The primary neural network used to process input data and map it into a latent space representation.
Forward propagate the input x through the Encoder to obtain the encoded representation in the latent space.
Arguments
x::Array: Input data to be encoded.
Returns
z: Encoded representation of the input data in the latent space.
Description
This method allows for a direct call on an instance of Encoder with the input data x. It runs the input through the encoder network and outputs the encoded representation in the latent space.
Example
enc = Encoder(...)
z = enc(some_input)
Note
Ensure that the input x matches the expected dimensionality of the encoder's input layer.
Forward propagate the input x through the JointGaussianEncoder to obtain the mean (µ) and standard deviation (σ) of the latent space.
Arguments
x::AbstractArray: Input data to be encoded.
Returns
A NamedTuple (µ=µ, σ=σ,) where:
µ: Mean of the latent space after passing the input through the encoder and subsequently through the µ layer.
σ: Standard deviation of the latent space after passing the input through the encoder and subsequently through the σ layer.
Description
This method allows for a direct call on an instance of JointGaussianEncoder with the input data x. It first runs the input through the encoder network, then maps the output of the last encoder layer to both the mean and standard deviation of the latent space.
Example
je = JointGaussianEncoder(...)
µ, σ = je(some_input)
Note
Ensure that the input x matches the expected dimensionality of the encoder's input layer.
Default encoder function for variational autoencoders where the same encoder network is used to map to the latent space mean µ and log standard deviation logσ.
Fields
encoder::Flux.Chain: The primary neural network used to process input data and map it into a latent space representation.
µ::Union{Flux.Dense,Flux.Chain}: A dense layer or a chain of layers mapping from the output of the encoder to the mean of the latent space.
logσ::Union{Flux.Dense,Flux.Chain}: A dense layer or a chain of layers mapping from the output of the encoder to the log standard deviation of the latent space.
This method forward propagates the input x through the JointGaussianLogEncoder to compute the mean (mu) and log standard deviation (logσ) of the latent space.
Arguments
x::Array{Float32}: The input data to be encoded.
Returns
A NamedTuple (µ=µ, logσ=logσ,) where:
µ: The mean of the latent space. This is computed by passing the input through the encoder and subsequently through the µ layer.
logσ: The log standard deviation of the latent space. This is computed by passing the input through the encoder and subsequently through the logσ layer.
Description
This method allows for a direct call on an instance of JointGaussianLogEncoder with the input data x. It first processes the input through the encoder network, then maps the output of the last encoder layer to both the mean and log standard deviation of the latent space.
Example
je = JointGaussianLogEncoder(...)
-mu, logσ = je(some_input)
Note
Ensure that the input x matches the expected dimensionality of the encoder's input layer.
Default decoder function for deterministic autoencoders. The decoder network is used to map the latent space representation directly back to the original data space.
Fields
decoder::Flux.Chain: The primary neural network used to process the latent space representation and map it back to the data space.
Example
dec = Decoder(Flux.Chain(Dense(20, 400, relu), Dense(400, 784)))
Forward propagate the encoded representation z through the Decoder to obtain the reconstructed input data.
Arguments
z::AbstractArray: Encoded representation in the latent space.
Returns
x_reconstructed: Reconstructed version of the original input data after decoding from the latent space.
Description
This method allows for a direct call on an instance of Decoder with the encoded data z. It runs the encoded representation through the decoder network and outputs the reconstructed version of the original input data.
Example
julia dec = Decoder(...) x_reconstructed = dec(encoded_representation)`
Note
Ensure that the input z matches the expected dimensionality of the decoder's input layer.
A decoder structure for variational autoencoders (VAEs) that models the output data as a Bernoulli distribution. This is typically used when the outputs of the decoder are probabilities.
Fields
decoder::Flux.Chain: The primary neural network used to process the latent space and map it to the output (or reconstructed) space.
Description
BernoulliDecoder represents a VAE decoder that models the output data as a Bernoulli distribution. It's commonly used when the outputs of the decoder are probabilities, such as in a binary classification task or when modeling binary data. Unlike a Gaussian decoder, there's no need for separate paths or operations on the mean or log standard deviation.
Note
Ensure the last layer of the decoder outputs a value between 0 and 1, as this is required for a Bernoulli distribution.
Maps the given latent representation z through the BernoulliDecoder network to reconstruct the original input.
Arguments
z::AbstractArray: The latent space representation to be decoded. This can be a vector or a matrix, where each column represents a separate sample from the latent space of a VAE.
Returns
A NamedTuple (p=p,) where p is an array representing the output of the decoder, which should resemble the original input to the VAE (post encoding and sampling from the latent space).
Description
This function processes the latent space representation z using the neural network defined in the BernoulliDecoder struct. The aim is to decode or reconstruct the original input from this representation.
Note
Ensure that the latent space representation z matches the expected input dimensionality for the BernoulliDecoder.
A decoder structure for variational autoencoders (VAEs) that models the output data as a categorical distribution. This is typically used when the outputs of the decoder are categorical variables encoded as one-hot vectors.
Fields
decoder::Flux.Chain: The primary neural network used to process the latent space and map it to the output (or reconstructed) space.
Description
CategoricalDecoder represents a VAE decoder that models the output data as a categorical distribution. It's commonly used when the outputs of the decoder are categorical variables, such as in a multi-class one-hot encoded vectors. Unlike a Gaussian decoder, there's no need for separate paths or operations on the mean or log standard deviation.
Note
Ensure the last layer of the decoder outputs a probability distribution over the categories, as this is required for a categorical distribution. This can be done using a softmax activation function, for example.
Maps the given latent representation z through the CategoricalDecoder network to reconstruct the original input.
Arguments
z::AbstractArray: The latent space representation to be decoded. This can be a vector or a matrix, where each column represents a separate sample from the latent space of a VAE.
Returns
A NamedTuple (p=p,) where p is an array representing the output of the decoder, which should resemble the original input to the VAE (post encoding and sampling from the latent space).
Description
This function processes the latent space representation z using the neural network defined in the CategoricalDecoder struct. The aim is to decode or reconstruct the original input from this representation.
Note
Ensure that the latent space representation z matches the expected input dimensionality for the CategoricalDecoder.
A straightforward decoder structure for variational autoencoders (VAEs) that contains only a single decoder network.
Fields
decoder::Flux.Chain: The primary neural network used to process the latent space and map it to the output (or reconstructed) space.
Description
SimpleGaussianDecoder represents a basic VAE decoder without explicit components for the latent space's mean (µ) or log standard deviation (logσ). It's commonly used when the VAE's latent space distribution is implicitly defined, and there's no need for separate paths or operations on the mean or log standard deviation.
Maps the given latent representation z through the SimpleGaussianDecoder network to reconstruct the original input.
Arguments
z::AbstractArray: The latent space representation to be decoded. This can be a vector or a matrix, where each column represents a separate sample from the latent space of a VAE.
Returns
A NamedTuple (µ=µ,) where µ is an array representing the output of the decoder, which should resemble the original input to the VAE (post encoding and sampling from the latent space).
Description
This function processes the latent space representation z using the neural network defined in the SimpleGaussianDecoder struct. The aim is to decode or reconstruct the original input from this representation.
Default decoder function for deterministic autoencoders. The decoder network is used to map the latent space representation directly back to the original data space.
Fields
decoder::Flux.Chain: The primary neural network used to process the latent space representation and map it back to the data space.
Example
dec = Decoder(Flux.Chain(Dense(20, 400, relu), Dense(400, 784)))
Forward propagate the encoded representation z through the Decoder to obtain the reconstructed input data.
Arguments
z::AbstractArray: Encoded representation in the latent space.
Returns
x_reconstructed: Reconstructed version of the original input data after decoding from the latent space.
Description
This method allows for a direct call on an instance of Decoder with the encoded data z. It runs the encoded representation through the decoder network and outputs the reconstructed version of the original input data.
Example
julia dec = Decoder(...) x_reconstructed = dec(encoded_representation)`
Note
Ensure that the input z matches the expected dimensionality of the decoder's input layer.
A decoder structure for variational autoencoders (VAEs) that models the output data as a Bernoulli distribution. This is typically used when the outputs of the decoder are probabilities.
Fields
decoder::Flux.Chain: The primary neural network used to process the latent space and map it to the output (or reconstructed) space.
Description
BernoulliDecoder represents a VAE decoder that models the output data as a Bernoulli distribution. It's commonly used when the outputs of the decoder are probabilities, such as in a binary classification task or when modeling binary data. Unlike a Gaussian decoder, there's no need for separate paths or operations on the mean or log standard deviation.
Note
Ensure the last layer of the decoder outputs a value between 0 and 1, as this is required for a Bernoulli distribution.
Maps the given latent representation z through the BernoulliDecoder network to reconstruct the original input.
Arguments
z::AbstractArray: The latent space representation to be decoded. This can be a vector or a matrix, where each column represents a separate sample from the latent space of a VAE.
Returns
A NamedTuple (p=p,) where p is an array representing the output of the decoder, which should resemble the original input to the VAE (post encoding and sampling from the latent space).
Description
This function processes the latent space representation z using the neural network defined in the BernoulliDecoder struct. The aim is to decode or reconstruct the original input from this representation.
Note
Ensure that the latent space representation z matches the expected input dimensionality for the BernoulliDecoder.
A decoder structure for variational autoencoders (VAEs) that models the output data as a categorical distribution. This is typically used when the outputs of the decoder are categorical variables encoded as one-hot vectors.
Fields
decoder::Flux.Chain: The primary neural network used to process the latent space and map it to the output (or reconstructed) space.
Description
CategoricalDecoder represents a VAE decoder that models the output data as a categorical distribution. It's commonly used when the outputs of the decoder are categorical variables, such as in a multi-class one-hot encoded vectors. Unlike a Gaussian decoder, there's no need for separate paths or operations on the mean or log standard deviation.
Note
Ensure the last layer of the decoder outputs a probability distribution over the categories, as this is required for a categorical distribution. This can be done using a softmax activation function, for example.
Maps the given latent representation z through the CategoricalDecoder network to reconstruct the original input.
Arguments
z::AbstractArray: The latent space representation to be decoded. This can be a vector or a matrix, where each column represents a separate sample from the latent space of a VAE.
Returns
A NamedTuple (p=p,) where p is an array representing the output of the decoder, which should resemble the original input to the VAE (post encoding and sampling from the latent space).
Description
This function processes the latent space representation z using the neural network defined in the CategoricalDecoder struct. The aim is to decode or reconstruct the original input from this representation.
Note
Ensure that the latent space representation z matches the expected input dimensionality for the CategoricalDecoder.
A straightforward decoder structure for variational autoencoders (VAEs) that contains only a single decoder network.
Fields
decoder::Flux.Chain: The primary neural network used to process the latent space and map it to the output (or reconstructed) space.
Description
SimpleGaussianDecoder represents a basic VAE decoder without explicit components for the latent space's mean (µ) or log standard deviation (logσ). It's commonly used when the VAE's latent space distribution is implicitly defined, and there's no need for separate paths or operations on the mean or log standard deviation.
Maps the given latent representation z through the SimpleGaussianDecoder network to reconstruct the original input.
Arguments
z::AbstractArray: The latent space representation to be decoded. This can be a vector or a matrix, where each column represents a separate sample from the latent space of a VAE.
Returns
A NamedTuple (µ=µ,) where µ is an array representing the output of the decoder, which should resemble the original input to the VAE (post encoding and sampling from the latent space).
Description
This function processes the latent space representation z using the neural network defined in the SimpleGaussianDecoder struct. The aim is to decode or reconstruct the original input from this representation.
Example
decoder = SimpleGaussianDecoder(...)
z = ... # some latent space representation
-output = decoder(z)
Note
Ensure that the latent space representation z matches the expected input dimensionality for the SimpleGaussianDecoder.
An extended decoder structure for VAEs that incorporates separate layers for mapping from the latent space to both its mean (µ) and standard deviation (σ).
Fields
decoder::Flux.Chain: The primary neural network used to process the latent space before determining its mean and log standard deviation.
µ::Flux.Dense: A dense layer that maps from the output of the decoder to the mean of the latent space.
σ::Flux.Dense: A dense layer that maps from the output of the decoder to the standard deviation of the latent space.
Description
JointGaussianDecoder is tailored for VAE architectures where the same decoder network is used initially, and then splits into two separate paths for determining both the mean and standard deviation of the latent space.
Maps the given latent representation z through the JointGaussianDecoder network to produce both the mean (µ) and standard deviation (σ).
Arguments
z::AbstractArray: The latent space representation to be decoded. If array, the last dimension contains each of the latent space representations to be decoded.
Returns
A NamedTuple (µ=µ, σ=σ,) where:
µ::AbstractArray: The mean representation obtained from the decoder.
σ::AbstractArray: The standard deviation representation obtained from the decoder.
Description
This function processes the latent space representation z using the primary neural network of the JointGaussianDecoder struct. It then separately maps the output of this network to the mean and standard deviation using the µ and σ dense layers, respectively.
An extended decoder structure for VAEs that incorporates separate layers for mapping from the latent space to both its mean (µ) and standard deviation (σ).
Fields
decoder::Flux.Chain: The primary neural network used to process the latent space before determining its mean and log standard deviation.
µ::Flux.Dense: A dense layer that maps from the output of the decoder to the mean of the latent space.
σ::Flux.Dense: A dense layer that maps from the output of the decoder to the standard deviation of the latent space.
Description
JointGaussianDecoder is tailored for VAE architectures where the same decoder network is used initially, and then splits into two separate paths for determining both the mean and standard deviation of the latent space.
Maps the given latent representation z through the JointGaussianDecoder network to produce both the mean (µ) and standard deviation (σ).
Arguments
z::AbstractArray: The latent space representation to be decoded. If array, the last dimension contains each of the latent space representations to be decoded.
Returns
A NamedTuple (µ=µ, σ=σ,) where:
µ::AbstractArray: The mean representation obtained from the decoder.
σ::AbstractArray: The standard deviation representation obtained from the decoder.
Description
This function processes the latent space representation z using the primary neural network of the JointGaussianDecoder struct. It then separately maps the output of this network to the mean and standard deviation using the µ and σ dense layers, respectively.
Example
decoder = JointGaussianDecoder(...)
z = ... # some latent space representation
-output = decoder(z)
Note
Ensure that the latent space representation z matches the expected input dimensionality for the JointGaussianDecoder.
An extended decoder structure for VAEs that incorporates separate layers for mapping from the latent space to both its mean (µ) and log standard deviation (logσ).
Fields
decoder::Flux.Chain: The primary neural network used to process the latent space before determining its mean and log standard deviation.
µ::Flux.Dense: A dense layer that maps from the output of the decoder to the mean of the latent space.
logσ::Flux.Dense: A dense layer that maps from the output of the decoder to the log standard deviation of the latent space.
Description
JointGaussianLogDecoder is tailored for VAE architectures where the same decoder network is used initially, and then splits into two separate paths for determining both the mean and log standard deviation of the latent space.
Maps the given latent representation z through the JointGaussianLogDecoder network to produce both the mean (µ) and log standard deviation (logσ).
Arguments
z::AbstractArray: The latent space representation to be decoded. If array, the last dimension contains each of the latent space representations.
Returns
A NamedTuple (µ=µ, logσ=logσ,) where:
µ::Array: The mean representation obtained from the decoder.
logσ::Array: The log standard deviation representation obtained from the decoder.
Description
This function processes the latent space representation z using the primary neural network of the JointGaussianLogDecoder struct. It then separately maps the output of this network to the mean and log standard deviation using the µ and logσ dense layers, respectively.
An extended decoder structure for VAEs that incorporates separate layers for mapping from the latent space to both its mean (µ) and log standard deviation (logσ).
Fields
decoder::Flux.Chain: The primary neural network used to process the latent space before determining its mean and log standard deviation.
µ::Flux.Dense: A dense layer that maps from the output of the decoder to the mean of the latent space.
logσ::Flux.Dense: A dense layer that maps from the output of the decoder to the log standard deviation of the latent space.
Description
JointGaussianLogDecoder is tailored for VAE architectures where the same decoder network is used initially, and then splits into two separate paths for determining both the mean and log standard deviation of the latent space.
Maps the given latent representation z through the JointGaussianLogDecoder network to produce both the mean (µ) and log standard deviation (logσ).
Arguments
z::AbstractArray: The latent space representation to be decoded. If array, the last dimension contains each of the latent space representations.
Returns
A NamedTuple (µ=µ, logσ=logσ,) where:
µ::Array: The mean representation obtained from the decoder.
logσ::Array: The log standard deviation representation obtained from the decoder.
Description
This function processes the latent space representation z using the primary neural network of the JointGaussianLogDecoder struct. It then separately maps the output of this network to the mean and log standard deviation using the µ and logσ dense layers, respectively.
Example
decoder = JointGaussianLogDecoder(...)
z = ... # some latent space representation
-output = decoder(z)
Note
Ensure that the latent space representation z matches the expected input dimensionality for the JointGaussianLogDecoder.
A specialized decoder structure for VAEs that uses distinct neural networks for determining the mean (µ) and standard deviation (logσ) of the latent space.
Fields
decoder_µ::Flux.Chain: A neural network dedicated to processing the latent space and mapping it to its mean.
decoder_σ::Flux.Chain: A neural network dedicated to processing the latent space and mapping it to its standard deviation.
Description
SplitGaussianDecoder is designed for VAE architectures where separate decoder networks are preferred for computing the mean and log standard deviation, ensuring that each has its own distinct set of parameters and transformation logic.
Maps the given latent representation z through the separate networks of the SplitGaussianDecoder to produce both the mean (µ) and standard deviation (σ).
Arguments
z::AbstractArray: The latent space representation to be decoded. If array, the last dimension contains each of the latent space representations to be decoded.
Returns
A NamedTuple (µ=µ, σ=σ,) where:
µ::AbstractArray: The mean representation obtained using the dedicated decoder_µ network.
σ::AbstractArray: The standard deviation representation obtained using the dedicated decoder_σ network.
Description
This function processes the latent space representation z through two distinct neural networks within the SplitGaussianDecoder struct. The decoder_µ network is used to produce the mean representation, while the decoder_σ network is utilized for the standard deviation.
A specialized decoder structure for VAEs that uses distinct neural networks for determining the mean (µ) and standard deviation (logσ) of the latent space.
Fields
decoder_µ::Flux.Chain: A neural network dedicated to processing the latent space and mapping it to its mean.
decoder_σ::Flux.Chain: A neural network dedicated to processing the latent space and mapping it to its standard deviation.
Description
SplitGaussianDecoder is designed for VAE architectures where separate decoder networks are preferred for computing the mean and log standard deviation, ensuring that each has its own distinct set of parameters and transformation logic.
Maps the given latent representation z through the separate networks of the SplitGaussianDecoder to produce both the mean (µ) and standard deviation (σ).
Arguments
z::AbstractArray: The latent space representation to be decoded. If array, the last dimension contains each of the latent space representations to be decoded.
Returns
A NamedTuple (µ=µ, σ=σ,) where:
µ::AbstractArray: The mean representation obtained using the dedicated decoder_µ network.
σ::AbstractArray: The standard deviation representation obtained using the dedicated decoder_σ network.
Description
This function processes the latent space representation z through two distinct neural networks within the SplitGaussianDecoder struct. The decoder_µ network is used to produce the mean representation, while the decoder_σ network is utilized for the standard deviation.
Example
decoder = SplitGaussianDecoder(...)
z = ... # some latent space representation
-output = decoder(z)
Note
Ensure that the latent space representation z matches the expected input dimensionality for both networks in the SplitGaussianDecoder.
A specialized decoder structure for VAEs that uses distinct neural networks for determining the mean (µ) and log standard deviation (logσ) of the latent space.
Fields
decoder_µ::Flux.Chain: A neural network dedicated to processing the latent space and mapping it to its mean.
decoder_logσ::Flux.Chain: A neural network dedicated to processing the latent space and mapping it to its log standard deviation.
Description
SplitGaussianLogDecoder is designed for VAE architectures where separate decoder networks are preferred for computing the mean and log standard deviation, ensuring that each has its own distinct set of parameters and transformation logic.
Maps the given latent representation z through the separate networks of the SplitGaussianLogDecoder to produce both the mean (µ) and log standard deviation (logσ).
Arguments
z::AbstractArray: The latent space representation to be decoded. If array, the last dimension contains each of the latent space representations to be decoded.
Returns
A NamedTuple (µ=µ, logσ=logσ,) where:
µ::AbstractArray: The mean representation obtained using the dedicated decoder_µ network.
logσ::AbstractArray: The log standard deviation representation obtained using the dedicated decoder_logσ network.
Description
This function processes the latent space representation z through two distinct neural networks within the SplitGaussianLogDecoder struct. The decoder_µ network is used to produce the mean representation, while the decoder_logσ network is utilized for the log standard deviation.
A specialized decoder structure for VAEs that uses distinct neural networks for determining the mean (µ) and log standard deviation (logσ) of the latent space.
Fields
decoder_µ::Flux.Chain: A neural network dedicated to processing the latent space and mapping it to its mean.
decoder_logσ::Flux.Chain: A neural network dedicated to processing the latent space and mapping it to its log standard deviation.
Description
SplitGaussianLogDecoder is designed for VAE architectures where separate decoder networks are preferred for computing the mean and log standard deviation, ensuring that each has its own distinct set of parameters and transformation logic.
Maps the given latent representation z through the separate networks of the SplitGaussianLogDecoder to produce both the mean (µ) and log standard deviation (logσ).
Arguments
z::AbstractArray: The latent space representation to be decoded. If array, the last dimension contains each of the latent space representations to be decoded.
Returns
A NamedTuple (µ=µ, logσ=logσ,) where:
µ::AbstractArray: The mean representation obtained using the dedicated decoder_µ network.
logσ::AbstractArray: The log standard deviation representation obtained using the dedicated decoder_logσ network.
Description
This function processes the latent space representation z through two distinct neural networks within the SplitGaussianLogDecoder struct. The decoder_µ network is used to produce the mean representation, while the decoder_logσ network is utilized for the log standard deviation.
Example
decoder = SplitGaussianLogDecoder(...)
z = ... # some latent space representation
output = decoder(z))
Note
Ensure that the latent space representation z matches the expected input dimensionality for both networks in the SplitGaussianLogDecoder.
The package provides a set of functions to initialize encoder and decoder architectures. Although it gives the user less flexibility, it can be useful for quick prototyping.
Construct and initialize an Encoder struct that defines an encoder network for a deterministic autoencoder.
Arguments
n_input::Int: The dimensionality of the input data.
n_latent::Int: The dimensionality of the latent space.
encoder_neurons::Vector{<:Int}: A vector specifying the number of neurons in each layer of the encoder network.
encoder_activation::Vector{<:Function}: Activation functions corresponding to each layer in the encoder_neurons.
latent_activation::Function: Activation function for the latent space layer.
Optional Keyword Arguments
init::Function=Flux.glorot_uniform: The initialization function used for the neural network weights.
Returns
An Encoder struct initialized based on the provided arguments.
Examples
julia encoder = Encoder(784, 20, tanh, [400], [relu])`
Notes
The length of encoderneurons should match the length of encoderactivation, ensuring that each layer in the encoder has a corresponding activation function.
The length of encoderneurons should match the length of encoderactivation, ensuring that each layer in the encoder has a corresponding activation function.
The length of encoderneurons should match the length of encoderactivation, ensuring that each layer in the encoder has a corresponding activation function.
The length of encoderneurons should match the length of encoderactivation, ensuring that each layer in the encoder has a corresponding activation function.
The length of decoderneurons should match the length of decoderactivation, ensuring that each layer in the decoder has a corresponding activation function.
Constructs and initializes a SimpleGaussianDecoder object designed for variational autoencoders (VAEs). This function sets up a straightforward decoder network that maps from a latent space to an output space.
Arguments
n_input::Int: Dimensionality of the output data (or the data to be reconstructed).
n_latent::Int: Dimensionality of the latent space.
decoder_neurons::Vector{<:Int}: Vector of layer sizes for the decoder network, not including the input latent layer and the final output layer.
decoder_activation::Vector{<:Function}: Activation functions for each decoder layer, not including the final output layer.
output_activation::Function: Activation function for the final output layer.
Optional Keyword Arguments
init::Function=Flux.glorot_uniform: Initialization function for the network parameters.
Returns
A SimpleGaussianDecoder object with the specified architecture and initialized weights.
Description
This function constructs a SimpleGaussianDecoder object, setting up its decoder network based on the provided specifications. The architecture begins with a dense layer mapping from the latent space, goes through a sequence of middle layers if specified, and finally maps to the output space.
The function ensures that there are appropriate activation functions provided for each layer in the decoder_neurons and checks for potential mismatches in length.
The length of decoderneurons should match the length of decoderactivation, ensuring that each layer in the decoder has a corresponding activation function.
Constructs and initializes a SimpleGaussianDecoder object designed for variational autoencoders (VAEs). This function sets up a straightforward decoder network that maps from a latent space to an output space.
Arguments
n_input::Int: Dimensionality of the output data (or the data to be reconstructed).
n_latent::Int: Dimensionality of the latent space.
decoder_neurons::Vector{<:Int}: Vector of layer sizes for the decoder network, not including the input latent layer and the final output layer.
decoder_activation::Vector{<:Function}: Activation functions for each decoder layer, not including the final output layer.
output_activation::Function: Activation function for the final output layer.
Optional Keyword Arguments
init::Function=Flux.glorot_uniform: Initialization function for the network parameters.
Returns
A SimpleGaussianDecoder object with the specified architecture and initialized weights.
Description
This function constructs a SimpleGaussianDecoder object, setting up its decoder network based on the provided specifications. The architecture begins with a dense layer mapping from the latent space, goes through a sequence of middle layers if specified, and finally maps to the output space.
The function ensures that there are appropriate activation functions provided for each layer in the decoder_neurons and checks for potential mismatches in length.
where we use the loggamma function from SpecialFunctions.jl to compute the log of the factorial of x_i.
Warning
We only defined the decoder_loglikelihood method for z::AbstractVector. One should also include a method for z::AbstractMatrix used when performing batch training.
With these two functions defined, our PoissonDecoder is ready to be used with any of the different VAE flavors included in AutoEncoderToolkit.jl!
Settings
This document was generated with Documenter.jl version 1.4.1 on Friday 21 June 2024. Using Julia version 1.10.4.
+end # function
where we use the loggamma function from SpecialFunctions.jl to compute the log of the factorial of x_i.
Warning
We only defined the decoder_loglikelihood method for z::AbstractVector. One should also include a method for z::AbstractMatrix used when performing batch training.
With these two functions defined, our PoissonDecoder is ready to be used with any of the different VAE flavors included in AutoEncoderToolkit.jl!
Settings
This document was generated with Documenter.jl version 1.5.0 on Monday 8 July 2024. Using Julia version 1.10.4.
For support and further inquiries, consider checking the documentation and existing issues on the GitHub repository. If you still do not find the answer, you can open a new issue on the GitHub repository's issues page.
Settings
This document was generated with Documenter.jl version 1.5.0 on Monday 8 July 2024. Using Julia version 1.10.4.
The Hamiltonian Variational Autoencoder (HVAE) is a variant of the Variational autoencoder (VAE) that uses Hamiltonian dynamics to improve the sampling of the latent space representation. HVAE combines ideas from Hamiltonian Monte Carlo, annealed importance sampling, and variational inference to improve the latent space representation of the VAE.
For the implementation of the HVAE in AutoEncoderToolkit.jl, the HVAE struct inherits directly from the VAE struct and adds the necessary functions to compute the Hamiltonian dynamics steps as part of the training protocol. An HVAE object is created by simply passing a VAE object to the constructor. This way, we can use Julias multiple dispatch to extend the functionality of the VAE object without having to redefine the entire structure.
Warning
HVAEs require the computation of nested gradients. This means that the AutoDiff framework must differentiate a function of an already AutoDiff differentiated function. This is known to be problematic for Julia's AutoDiff backends. See details below to understand how to we circumvent this problem.
The Hamiltonian Variational Autoencoder (HVAE) is a variant of the Variational autoencoder (VAE) that uses Hamiltonian dynamics to improve the sampling of the latent space representation. HVAE combines ideas from Hamiltonian Monte Carlo, annealed importance sampling, and variational inference to improve the latent space representation of the VAE.
For the implementation of the HVAE in AutoEncoderToolkit.jl, the HVAE struct inherits directly from the VAE struct and adds the necessary functions to compute the Hamiltonian dynamics steps as part of the training protocol. An HVAE object is created by simply passing a VAE object to the constructor. This way, we can use Julias multiple dispatch to extend the functionality of the VAE object without having to redefine the entire structure.
Warning
HVAEs require the computation of nested gradients. This means that the AutoDiff framework must differentiate a function of an already AutoDiff differentiated function. This is known to be problematic for Julia's AutoDiff backends. See details below to understand how to we circumvent this problem.
Hamiltonian Variational Autoencoder (HVAE) model defined for Flux.jl.
Fields
vae::V: A Variational Autoencoder (VAE) model that forms the basis of the HVAE. V is a subtype of VAE with a specific AbstractVariationalEncoder and AbstractVariationalDecoder.
An HVAE is a type of Variational Autoencoder (VAE) that uses Hamiltonian Monte Carlo (HMC) to sample from the posterior distribution in the latent space. The VAE's encoder compresses the input into a low-dimensional probabilistic representation q(z|x). The VAE's decoder tries to reconstruct the original input from a sampled point in the latent space p(x|z).
The HMC sampling in the latent space allows the HVAE to better capture complex posterior distributions compared to a standard VAE, which assumes a simple Gaussian posterior. This can lead to more accurate reconstructions and better disentanglement of latent variables.
Compute the Hamiltonian Monte Carlo (HMC) estimate of the evidence lower bound (ELBO) for a Hamiltonian Variational Autoencoder (HVAE).
This function takes as input an HVAE and a vector of input data x. It performs K HMC steps with a leapfrog integrator and a tempering schedule to estimate the ELBO. The ELBO is computed as the difference between the log p̄ and log q̄ as
elbo = mean(log p̄ - log q̄),
Arguments
hvae::HVAE: The HVAE used to encode the input data and decode the latent space.
x_in::AbstractArray: The input data. If Array, the last dimension must contain each of the data points.
x_out::AbstractArray: The data against which the reconstruction is compared. If Array, the last dimension must contain each of the data points.
Optional Keyword Arguments
ϵ::Union{<:Number,<:AbstractVector}: The step size for the leapfrog integrator (default is 0.01).
K::Int: The number of HMC steps (default is 3).
βₒ::Number: The initial inverse temperature (default is 0.3).
∇U_kwargs::Union{Dict,NamedTuple}: Additional keyword arguments to be passed to the ∇potential_energy function. Defaults to a NamedTuple with :reconstruction_loglikelihood set to decoder_loglikelihood and :latent_logprior set to spherical_logprior.
tempering_schedule::Function: The tempering schedule function used in the HMC (default is quadratic_tempering).
return_outputs::Bool: Whether to return the outputs of the HVAE. Defaults to false. NOTE: This is necessary to avoid computing the forward pass twice when computing the loss function with regularization.
logp_prefactor::AbstractArray: A 3-element array to scale the log likelihood, log prior of the latent variables, and log prior of the momentum variables. Default is an array of ones.
logq_prefactor::AbstractArray: A 3-element array to scale the log posterior of the initial latent variables, log prior of the initial momentum variables, and the tempering Jacobian term. Default is an array of ones.
Returns
elbo::Number: The HMC estimate of the ELBO. If return_outputs is true, also returns the outputs of the HVAE.
Settings
This document was generated with Documenter.jl version 1.4.1 on Friday 21 June 2024. Using Julia version 1.10.4.
+)
Compute the Hamiltonian Monte Carlo (HMC) estimate of the evidence lower bound (ELBO) for a Hamiltonian Variational Autoencoder (HVAE).
This function takes as input an HVAE and a vector of input data x. It performs K HMC steps with a leapfrog integrator and a tempering schedule to estimate the ELBO. The ELBO is computed as the difference between the log p̄ and log q̄ as
elbo = mean(log p̄ - log q̄),
Arguments
hvae::HVAE: The HVAE used to encode the input data and decode the latent space.
x_in::AbstractArray: The input data. If Array, the last dimension must contain each of the data points.
x_out::AbstractArray: The data against which the reconstruction is compared. If Array, the last dimension must contain each of the data points.
Optional Keyword Arguments
ϵ::Union{<:Number,<:AbstractVector}: The step size for the leapfrog integrator (default is 0.01).
K::Int: The number of HMC steps (default is 3).
βₒ::Number: The initial inverse temperature (default is 0.3).
∇U_kwargs::Union{Dict,NamedTuple}: Additional keyword arguments to be passed to the ∇potential_energy function. Defaults to a NamedTuple with :reconstruction_loglikelihood set to decoder_loglikelihood and :latent_logprior set to spherical_logprior.
tempering_schedule::Function: The tempering schedule function used in the HMC (default is quadratic_tempering).
return_outputs::Bool: Whether to return the outputs of the HVAE. Defaults to false. NOTE: This is necessary to avoid computing the forward pass twice when computing the loss function with regularization.
logp_prefactor::AbstractArray: A 3-element array to scale the log likelihood, log prior of the latent variables, and log prior of the momentum variables. Default is an array of ones.
logq_prefactor::AbstractArray: A 3-element array to scale the log posterior of the initial latent variables, log prior of the initial momentum variables, and the tempering Jacobian term. Default is an array of ones.
Returns
elbo::Number: The HMC estimate of the ELBO. If return_outputs is true, also returns the outputs of the HVAE.
Settings
This document was generated with Documenter.jl version 1.5.0 on Monday 8 July 2024. Using Julia version 1.10.4.
Welcome to the AutoEncoderToolkit.jl documentation. This package provides a simple interface for training and using Flux.jl-based autoencoders and variational autoencoders in Julia.
The idea behind AutoEncoderToolkit.jl is to take advantage of Julia's multiple dispatch to provide a simple and flexible interface for training and using different types of autoencoders. The package is designed to be modular and allow the user to easily define and test custom encoder and decoder architectures. Moreover, when it comes to variational autoencoders, AutoEncoderToolkit.jl takes a probabilistic perspective, where the type of encoders and decoders defines (via multiple dispatch) the corresponding distribution used within the corresponding loss function.
For example, assume you want to train a variational autoencoder with convolutional layers in the encoder and deconvolutional layers in the decoder on the MNIST dataset. You can easily do this as follows:
Let's begin by defining the encoder. For this, we will use the JointGaussianLogEncoder type, which is a simple encoder that takes a Flux.Chain for the shared layers between the mean and log-variance layers and two Flux.Dense (or Flux.Chain) layers for the last layers of the encoder.
# Define dimensionality of latent space
+Home · AutoEncoderToolkit
Welcome to the AutoEncoderToolkit.jl documentation. This package provides a simple interface for training and using Flux.jl-based autoencoders and variational autoencoders in Julia.
The idea behind AutoEncoderToolkit.jl is to take advantage of Julia's multiple dispatch to provide a simple and flexible interface for training and using different types of autoencoders. The package is designed to be modular and allow the user to easily define and test custom encoder and decoder architectures. Moreover, when it comes to variational autoencoders, AutoEncoderToolkit.jl takes a probabilistic perspective, where the type of encoders and decoders defines (via multiple dispatch) the corresponding distribution used within the corresponding loss function.
For example, assume you want to train a variational autoencoder with convolutional layers in the encoder and deconvolutional layers in the decoder on the MNIST dataset. You can easily do this as follows:
Let's begin by defining the encoder. For this, we will use the JointGaussianLogEncoder type, which is a simple encoder that takes a Flux.Chain for the shared layers between the mean and log-variance layers and two Flux.Dense (or Flux.Chain) layers for the last layers of the encoder.
# Define dimensionality of latent space
n_latent = 2
# Define number of initial channels
@@ -49,4 +49,4 @@
decoder = AutoEncoderToolkit.SimpleGaussianDecoder(deconv_layers)
# Re-defining the variational autoencoder
-vae = encoder * decoder
Everything else in our training pipeline would remain the same thanks to multiple dispatch.
Furthermore, let's say that we would like to use a different flavor for our variational autoencoder. In particular the InfoVAE (also known as MMD-VAE) includes extra terms in the loss function to maximize mutual information between the latent space and the input data. We can easily take our vae model and convert it into a MMDVAE-type object from the MMDVAEs submodule as follows:
mmdvae = AutoEncoderToolkit.MMDVAEs.MMDVAE(vae)
This is the power of AutoEncoderToolkit.jl and Julia's multiple dispatch!
If you are interested in contributing to the package to add a new model, please check the GitHub repository. We are always looking to expand the list of available models. And AutoEncoderToolkit.jl's structure should make it relatively easy.
AutoEncoderToolkit.jl supports GPU training out of the box for CUDA.jl-compatible GPUs. The CUDA functionality is provided as an extension. Therefore, to train a model on the GPU, simply import CUDA into the current environment, then move the model and data to the GPU. The rest of the training pipeline remains the same.
Settings
This document was generated with Documenter.jl version 1.4.1 on Friday 21 June 2024. Using Julia version 1.10.4.
+vae = encoder * decoder
Everything else in our training pipeline would remain the same thanks to multiple dispatch.
Furthermore, let's say that we would like to use a different flavor for our variational autoencoder. In particular the InfoVAE (also known as MMD-VAE) includes extra terms in the loss function to maximize mutual information between the latent space and the input data. We can easily take our vae model and convert it into a MMDVAE-type object from the MMDVAEs submodule as follows:
mmdvae = AutoEncoderToolkit.MMDVAEs.MMDVAE(vae)
This is the power of AutoEncoderToolkit.jl and Julia's multiple dispatch!
If you are interested in contributing to the package to add a new model, please check the GitHub repository. We are always looking to expand the list of available models. And AutoEncoderToolkit.jl's structure should make it relatively easy.
AutoEncoderToolkit.jl supports GPU training out of the box for CUDA.jl-compatible GPUs. The CUDA functionality is provided as an extension. Therefore, to train a model on the GPU, simply import CUDA into the current environment, then move the model and data to the GPU. The rest of the training pipeline remains the same.
Settings
This document was generated with Documenter.jl version 1.5.0 on Monday 8 July 2024. Using Julia version 1.10.4.
The InfoMax VAE is a variant of the Variational Autoencoder (VAE) that aims to explicitly account for the maximization of mutual information between the latent space representation and the input data. The main difference between the InfoMax VAE and the MMD-VAE (InfoVAE) is that rather than using the Maximum-Mean Discrepancy (MMD) as a measure of the "distance" between the latent space, the InfoMax VAE explicitly models the mutual information between latent representations and data inputs via a separate neural network. The loss function for this separate network then takes the form of a variational lower bound on the mutual information between the latent space and the input data.
Because of the need of this separate network, the InfoMaxVAE struct in AutoEncoderToolkit.jl takes two arguments to construct: the original VAE struct and a network to compute the mutual information. To properly deploy all relevant functions associated with this second network, we also provide a MutualInfoChain struct.
Furthermore, because of the two networks and the way the training algorithm is set up, the loss function for the InfoMax VAE includes two separate loss functions: one for the MutualInfoChain and one for the InfoMaxVAE.
Rezaabad, A. L. & Vishwanath, S. Learning Representations by Maximizing Mutual Information in Variational Autoencoders. Preprint at http://arxiv.org/abs/1912.13361 (2020).
A MutualInfoChain is used to compute the variational mutual information when training an InfoMaxVAE. The chain is composed of a series of layers that must end with a single output: the mutual information between the latent variables and the input data.
Arguments
data::Union{Flux.Dense,Flux.Chain}: The data layer of the MutualInfoChain. This layer is used to input the data.
latent::Union{Flux.Dense,Flux.Chain}: The latent layer of the MutualInfoChain. This layer is used to input the latent variables.
mlp::Flux.Chain: A multi-layer perceptron (MLP) that is used to compute the mutual information between the inputs and the latent representations. The MLP takes as input the latent variables and outputs a scalar representing the estimated variational mutual information.
Citation
Rezaabad, A. L. & Vishwanath, S. Learning Representations by Maximizing Mutual Information in Variational Autoencoders. in 2020 IEEE International Symposium on Information Theory (ISIT) 2729–2734 (IEEE, 2020). doi:10.1109/ISIT44484.2020.9174424.
Note
If the input data is not a flat array, make sure to include a flattening layer within data.
struct encapsulating an InfoMax variational autoencoder (InfoMaxVAE), an architecture designed to enhance the VAE framework by maximizing mutual information between the inputs and the latent representations, as per the methods described by Rezaabad and Vishwanath (2020).
The model aims to learn representations that preserve mutual information with the input data, arguably capturing more meaningful factors of variation.
Fields
vae::VAE: The core variational autoencoder, consisting of an encoder that maps input data into a latent space representation, and a decoder that attempts to reconstruct the input from the latent representation.
mi::MutualInfoChain: A multi-layer perceptron (MLP) that estimates the mutual information between the input data and the latent representations.
Usage
The InfoMaxVAE struct is utilized in a similar manner to a standard VAE, with the added capability of mutual information maximization as part of the training process. This involves an additional loss term that considers the output of the mi network to encourage latent representations that are informative about the input data.
Example
# Assuming definitions for `encoder`, `decoder`, and `mi` are provided:
+InfoMax-VAE · AutoEncoderToolkit
The InfoMax VAE is a variant of the Variational Autoencoder (VAE) that aims to explicitly account for the maximization of mutual information between the latent space representation and the input data. The main difference between the InfoMax VAE and the MMD-VAE (InfoVAE) is that rather than using the Maximum-Mean Discrepancy (MMD) as a measure of the "distance" between the latent space, the InfoMax VAE explicitly models the mutual information between latent representations and data inputs via a separate neural network. The loss function for this separate network then takes the form of a variational lower bound on the mutual information between the latent space and the input data.
Because of the need of this separate network, the InfoMaxVAE struct in AutoEncoderToolkit.jl takes two arguments to construct: the original VAE struct and a network to compute the mutual information. To properly deploy all relevant functions associated with this second network, we also provide a MutualInfoChain struct.
Furthermore, because of the two networks and the way the training algorithm is set up, the loss function for the InfoMax VAE includes two separate loss functions: one for the MutualInfoChain and one for the InfoMaxVAE.
Rezaabad, A. L. & Vishwanath, S. Learning Representations by Maximizing Mutual Information in Variational Autoencoders. Preprint at http://arxiv.org/abs/1912.13361 (2020).
A MutualInfoChain is used to compute the variational mutual information when training an InfoMaxVAE. The chain is composed of a series of layers that must end with a single output: the mutual information between the latent variables and the input data.
Arguments
data::Union{Flux.Dense,Flux.Chain}: The data layer of the MutualInfoChain. This layer is used to input the data.
latent::Union{Flux.Dense,Flux.Chain}: The latent layer of the MutualInfoChain. This layer is used to input the latent variables.
mlp::Flux.Chain: A multi-layer perceptron (MLP) that is used to compute the mutual information between the inputs and the latent representations. The MLP takes as input the latent variables and outputs a scalar representing the estimated variational mutual information.
Citation
Rezaabad, A. L. & Vishwanath, S. Learning Representations by Maximizing Mutual Information in Variational Autoencoders. in 2020 IEEE International Symposium on Information Theory (ISIT) 2729–2734 (IEEE, 2020). doi:10.1109/ISIT44484.2020.9174424.
Note
If the input data is not a flat array, make sure to include a flattening layer within data.
struct encapsulating an InfoMax variational autoencoder (InfoMaxVAE), an architecture designed to enhance the VAE framework by maximizing mutual information between the inputs and the latent representations, as per the methods described by Rezaabad and Vishwanath (2020).
The model aims to learn representations that preserve mutual information with the input data, arguably capturing more meaningful factors of variation.
Fields
vae::VAE: The core variational autoencoder, consisting of an encoder that maps input data into a latent space representation, and a decoder that attempts to reconstruct the input from the latent representation.
mi::MutualInfoChain: A multi-layer perceptron (MLP) that estimates the mutual information between the input data and the latent representations.
Usage
The InfoMaxVAE struct is utilized in a similar manner to a standard VAE, with the added capability of mutual information maximization as part of the training process. This involves an additional loss term that considers the output of the mi network to encourage latent representations that are informative about the input data.
Example
# Assuming definitions for `encoder`, `decoder`, and `mi` are provided:
info_max_vae = InfoMaxVAE(VAE(encoder, decoder), mi)
# During training, one would maximize both the variational lower bound and the
@@ -64,4 +64,4 @@
mlp_activations::Vector{<:Function},
output_activation::Function;
init::Function = Flux.glorot_uniform
-)
Constructs a default MutualInfoChain.
Arguments
n_input::Int: Number of input features to the MutualInfoChain.
n_latent::Int: The dimensionality of the latent space.
mlp_neurons::Vector{<:Int}: A vector of integers where each element represents the number of neurons in the corresponding hidden layer of the MLP.
mlp_activations::Vector{<:Function}: A vector of activation functions to be used in the hidden layers. Length must match that of mlp_neurons.
output_activation::Function: Activation function for the output neuron of the MLP.
Optional Keyword Arguments
init::Function: Initialization function for the weights of all layers in the MutualInfoChain. Defaults to Flux.glorot_uniform.
Returns
MutualInfoChain: A MutualInfoChain instance with the specified MLP architecture.
Notes
The function will throw an error if the number of provided activation functions does not match the number of layers specified in mlp_neurons.
Settings
This document was generated with Documenter.jl version 1.4.1 on Friday 21 June 2024. Using Julia version 1.10.4.
+)
Constructs a default MutualInfoChain.
Arguments
n_input::Int: Number of input features to the MutualInfoChain.
n_latent::Int: The dimensionality of the latent space.
mlp_neurons::Vector{<:Int}: A vector of integers where each element represents the number of neurons in the corresponding hidden layer of the MLP.
mlp_activations::Vector{<:Function}: A vector of activation functions to be used in the hidden layers. Length must match that of mlp_neurons.
output_activation::Function: Activation function for the output neuron of the MLP.
Optional Keyword Arguments
init::Function: Initialization function for the weights of all layers in the MutualInfoChain. Defaults to Flux.glorot_uniform.
Returns
MutualInfoChain: A MutualInfoChain instance with the specified MLP architecture.
Notes
The function will throw an error if the number of provided activation functions does not match the number of layers specified in mlp_neurons.
Settings
This document was generated with Documenter.jl version 1.5.0 on Monday 8 July 2024. Using Julia version 1.10.4.
AutoEncoderToolkit.jl provides a set of commonly-used custom layers for building autoencoders. These layers need to be explicitly defined if you want to save a train model and load it later. For example, if the input to the encoder is an image in format HWC (height, width, channel), somewhere in the encoder there must be a function that flattens its input to a vector for the mapping to the latent space to be possible. If you were to define this with a simple function, the libraries to save the the model such as JLD2 or BSON would not work with these anonymous function. This is why we provide this set of custom layers that play along these libraries.
A custom layer for Flux that reshapes its input to a specified shape.
This layer is useful when you need to change the dimensions of your data within a Flux model. Unlike the built-in reshape operation in Julia, this custom layer can be saved and loaded using packages such as BSON or JLD2.
Arguments
shape: The target shape. This can be any tuple of integers and colons. Colons are used to indicate dimensions whose size should be inferred such that the total number of elements remains the same.
Examples
julia> r = Reshape(10, :)
+Custom Layers · AutoEncoderToolkit
AutoEncoderToolkit.jl provides a set of commonly-used custom layers for building autoencoders. These layers need to be explicitly defined if you want to save a train model and load it later. For example, if the input to the encoder is an image in format HWC (height, width, channel), somewhere in the encoder there must be a function that flattens its input to a vector for the mapping to the latent space to be possible. If you were to define this with a simple function, the libraries to save the the model such as JLD2 or BSON would not work with these anonymous function. This is why we provide this set of custom layers that play along these libraries.
A custom layer for Flux that reshapes its input to a specified shape.
This layer is useful when you need to change the dimensions of your data within a Flux model. Unlike the built-in reshape operation in Julia, this custom layer can be saved and loaded using packages such as BSON or JLD2.
Arguments
shape: The target shape. This can be any tuple of integers and colons. Colons are used to indicate dimensions whose size should be inferred such that the total number of elements remains the same.
A custom layer for Flux that flattens its input into a 1D vector.
This layer is useful when you need to change the dimensions of your data within a Flux model. Unlike the built-in flatten operation in Julia, this custom layer can be saved and loaded by packages such as BSON and JLD2.
Examples
julia> f = Flatten()
julia> f(rand(5, 2))
-10-element Vector{Float64}:
Note
When saving and loading the model, make sure to include Flatten in the list of layers to be processed by BSON or JLD2.
A custom layer for Flux that applies an activation function over specified dimensions.
This layer is useful when you need to apply an activation function over specific dimensions of your data within a Flux model. Unlike the built-in activation functions in Julia, this custom layer can be saved and loaded using the BSON or JLD2 package.
Arguments
σ::Function: The activation function to be applied.
dims: The dimensions over which the activation function should be applied.
Note
When saving and loading the model, make sure to include ActivationOverDims in the list of layers to be processed by BSON or JLD2.
This function is called during the forward pass of the model. It applies the activation function σ.σ over the dimensions σ.dims of the input x.
Arguments
σ::ActivationOverDims: An instance of the ActivationOverDims struct.
x: The input to which the activation function should be applied.
Returns
The input x with the activation function applied over the specified dimensions.
Note
This custom layer can be saved and loaded using the BSON package. When saving and loading the model, make sure to include ActivationOverDims in the list of layers to be processed by BSON or JLD2.
Settings
This document was generated with Documenter.jl version 1.4.1 on Friday 21 June 2024. Using Julia version 1.10.4.
+10-element Vector{Float64}:
Note
When saving and loading the model, make sure to include Flatten in the list of layers to be processed by BSON or JLD2.
A custom layer for Flux that applies an activation function over specified dimensions.
This layer is useful when you need to apply an activation function over specific dimensions of your data within a Flux model. Unlike the built-in activation functions in Julia, this custom layer can be saved and loaded using the BSON or JLD2 package.
Arguments
σ::Function: The activation function to be applied.
dims: The dimensions over which the activation function should be applied.
Note
When saving and loading the model, make sure to include ActivationOverDims in the list of layers to be processed by BSON or JLD2.
This function is called during the forward pass of the model. It applies the activation function σ.σ over the dimensions σ.dims of the input x.
Arguments
σ::ActivationOverDims: An instance of the ActivationOverDims struct.
x: The input to which the activation function should be applied.
Returns
The input x with the activation function applied over the specified dimensions.
Note
This custom layer can be saved and loaded using the BSON package. When saving and loading the model, make sure to include ActivationOverDims in the list of layers to be processed by BSON or JLD2.
Settings
This document was generated with Documenter.jl version 1.5.0 on Monday 8 July 2024. Using Julia version 1.10.4.
The Maximum-Mean Discrepancy Variational Autoencoder (MMD-VAE) is a variant of the Variational Autoencoder (VAE) that adds an extra term to the evidence lower bound (ELBO) that aims to maximize the mutual information between the latent space representation and the input data. In particular, the MMD-VAE uses the Maximum-Mean Discrepancy (MMD) as a measure of the "distance" between the latent space distribution and the input data distribution.
For the implementation of the MMD-VAE in AutoEncoderToolkit.jl, the MMDVAE struct inherits directly from the VAE struct and adds the necessary functions to compute the extra terms in the loss function. An MMDVAE object is created by simply passing a VAE object to the constructor. This way, we can use Julias multiple dispatch to extend the functionality of the VAE object without having to redefine the entire structure.
Maximum-Mean Discrepancy Variational Autoencoders Zhao, S., Song, J. & Ermon, S. InfoVAE: Information Maximizing Variational Autoencoders. Preprint at http://arxiv.org/abs/1706.02262 (2018).
The Maximum-Mean Discrepancy Variational Autoencoder (MMD-VAE) is a variant of the Variational Autoencoder (VAE) that adds an extra term to the evidence lower bound (ELBO) that aims to maximize the mutual information between the latent space representation and the input data. In particular, the MMD-VAE uses the Maximum-Mean Discrepancy (MMD) as a measure of the "distance" between the latent space distribution and the input data distribution.
For the implementation of the MMD-VAE in AutoEncoderToolkit.jl, the MMDVAE struct inherits directly from the VAE struct and adds the necessary functions to compute the extra terms in the loss function. An MMDVAE object is created by simply passing a VAE object to the constructor. This way, we can use Julias multiple dispatch to extend the functionality of the VAE object without having to redefine the entire structure.
Maximum-Mean Discrepancy Variational Autoencoders Zhao, S., Song, J. & Ermon, S. InfoVAE: Information Maximizing Variational Autoencoders. Preprint at http://arxiv.org/abs/1706.02262 (2018).
A struct representing a Maximum-Mean Discrepancy Variational Autoencoder (MMD-VAE).
Fields
vae::V: A Variational Autoencoder (VAE) that forms the basis of the MMD-VAE. The VAE should be composed of an AbstractVariationalEncoder and an AbstractVariationalDecoder.
Description
The MMDVAE struct is a subtype of AbstractVariationalAutoEncoder and represents a specific type of VAE known as an MMD-VAE. The MMD-VAE modifies the standard VAE by replacing the KL-divergence term in the loss function with a Maximum-Mean Discrepancy (MMD) term, which measures the distance between the aggregated posterior of the latent codes and the prior. This can help to alleviate the issue of posterior collapse, where the aggregated posterior fails to cover significant parts of the prior, commonly seen in VAEs.
Citation
Maximum-Mean Discrepancy Variational Autoencoders. Zhao, S., Song, J. & Ermon, S. InfoVAE: Information Maximizing Variational Autoencoders. Preprint at http://arxiv.org/abs/1706.02262 (2018).
Defines the forward pass for the Maximum-Mean Discrepancy Variational Autoencoder (MMD-VAE).
Arguments
x::AbstractArray: Input data.
Optional Keyword Arguments
latent::Bool: Whether to return the latent variables along with the decoder output. If true, the function returns a tuple containing the encoder outputs, the latent sample, and the decoder outputs. If false, the function only returns the decoder outputs. Defaults to false.
Returns
If latent is true, returns a NamedTuple containing:
encoder: The outputs of the encoder.
z: The latent sample.
decoder: The outputs of the decoder.
If latent is false, returns the outputs of the decoder.
mmdvae::MMDVAE: Struct containing the elements of the MMD-VAE.
x::AbstractArray: Input data.
Optional Arguments
λ::Number=1.0f0: Hyperparameter that emphasizes the importance of the KL divergence between qᵩ(z) and π(z) during training.
α::Number=0.0f0: Hyperparameter that emphasizes the importance of the Mutual Information term during optimization.
n_latent_samples::Int=50: Number of samples to take from the latent space prior π(z) when computing the MMD divergence.
kernel::Function=gaussian_kernel: Kernel used to compute the divergence. Default is the Gaussian Kernel.
kernel_kwargs::Union{NamedTuple,Dict}=Dict(): Additional keyword arguments to be passed to the kernel function.
reconstruction_loglikelihood::Function=decoder_loglikelihood: Function that computes the log likelihood of the reconstructed input.
kl_divergence::Function=encoder_kl: Function that computes the Kullback-Leibler divergence between the encoder distribution and the prior.
Returns
Single value defining the loss function for entry x when compared with reconstructed output x̂.
Description
This function calculates the loss for the MMD-VAE. It computes the log likelihood of the reconstructed input, the MMD divergence between the encoder distribution and the prior, and the Kullback-Leibler divergence between the approximate decoder and the prior. These quantities are combined according to the formula above to compute the loss.
In this guide we will use external packages with functions not directly related to AutoEncoderToolkit.jl. such as Flux.jl and MLDatasets.jl. Make sure to install them before running the code if you want to follow along.
For this quick start guide, we will prepare different autoencoders to be trained on a fraction of the MNIST dataset. Let us begin by importing the necessary packages.
Note
We prefer to load functions using the import keyword instead of using. This is a personal preference and you can use using if you prefer.
In this guide we will use external packages with functions not directly related to AutoEncoderToolkit.jl. such as Flux.jl and MLDatasets.jl. Make sure to install them before running the code if you want to follow along.
For this quick start guide, we will prepare different autoencoders to be trained on a fraction of the MNIST dataset. Let us begin by importing the necessary packages.
Note
We prefer to load functions using the import keyword instead of using. This is a personal preference and you can use using if you prefer.
# Import project package
import AutoEncoderToolkit as AET
# Import ML libraries
@@ -388,4 +388,4 @@
# Save model output
nng_ex[:, :, (epoch÷n_save)+1] = nng(t_array)
end # if
-end # for
Now that we have trained the network, we can visualize the path between the initial and final points in the latent space. The color code in the following plot matches the epoch at which the path was computed.
Settings
This document was generated with Documenter.jl version 1.4.1 on Friday 21 June 2024. Using Julia version 1.10.4.
+end # for
Now that we have trained the network, we can visualize the path between the initial and final points in the latent space. The color code in the following plot matches the epoch at which the path was computed.
Settings
This document was generated with Documenter.jl version 1.5.0 on Monday 8 July 2024. Using Julia version 1.10.4.
The Riemannian Hamiltonian Variational Autoencoder (RHVAE) is a variant of the Hamiltonian Variational Autoencoder (HVAE) that uses concepts from Riemannian geometry to improve the sampling of the latent space representation. As the HVAE, the RHVAE uses Hamiltonian dynamics to improve the sampling of the latent. However, the RHVAE accounts for the geometry of the latent space by learning a Riemannian metric tensor that is used to compute the kinetic energy of the dynamical system. This allows the RHVAE to sample the latent space more evenly while learning the curvature of the latent space.
For the implementation of the RHVAE in AutoEncoderToolkit.jl, the RHVAE requires two arguments to construct: the original VAE as well as a separate neural network used to compute the metric tensor. To facilitate the dispatch of the necessary functions associated with this second network, we also provide a MetricChain struct.
Warning
RHVAEs require the computation of nested gradients. This means that the AutoDiff framework must differentiate a function of an already AutoDiff differentiated function. This is known to be problematic for Julia's AutoDiff backends. See details below to understand how to we circumvent this problem.
A MetricChain is used to compute the Riemannian metric tensor in the latent space of a Riemannian Hamiltonian Variational AutoEncoder (RHVAE).
Fields
mlp::Flux.Chain: A multi-layer perceptron (MLP) consisting of the hidden layers. The inputs are first run through this MLP.
diag::Flux.Dense: A dense layer that computes the diagonal elements of a lower-triangular matrix. The output of the mlp is fed into this layer.
lower::Flux.Dense: A dense layer that computes the off-diagonal elements of the lower-triangular matrix. The output of the mlp is also fed into this layer.
The outputs of diag and lower are used to construct a lower-triangular matrix used to compute the Riemannian metric tensor in latent space.
Note
If the dimension of the latent space is n, the number of neurons in the output layer of diag must be n, and the number of neurons in the output layer of lower must be n * (n - 1) ÷ 2.
The Riemannian Hamiltonian Variational Autoencoder (RHVAE) is a variant of the Hamiltonian Variational Autoencoder (HVAE) that uses concepts from Riemannian geometry to improve the sampling of the latent space representation. As the HVAE, the RHVAE uses Hamiltonian dynamics to improve the sampling of the latent. However, the RHVAE accounts for the geometry of the latent space by learning a Riemannian metric tensor that is used to compute the kinetic energy of the dynamical system. This allows the RHVAE to sample the latent space more evenly while learning the curvature of the latent space.
For the implementation of the RHVAE in AutoEncoderToolkit.jl, the RHVAE requires two arguments to construct: the original VAE as well as a separate neural network used to compute the metric tensor. To facilitate the dispatch of the necessary functions associated with this second network, we also provide a MetricChain struct.
Warning
RHVAEs require the computation of nested gradients. This means that the AutoDiff framework must differentiate a function of an already AutoDiff differentiated function. This is known to be problematic for Julia's AutoDiff backends. See details below to understand how to we circumvent this problem.
A MetricChain is used to compute the Riemannian metric tensor in the latent space of a Riemannian Hamiltonian Variational AutoEncoder (RHVAE).
Fields
mlp::Flux.Chain: A multi-layer perceptron (MLP) consisting of the hidden layers. The inputs are first run through this MLP.
diag::Flux.Dense: A dense layer that computes the diagonal elements of a lower-triangular matrix. The output of the mlp is fed into this layer.
lower::Flux.Dense: A dense layer that computes the off-diagonal elements of the lower-triangular matrix. The output of the mlp is also fed into this layer.
The outputs of diag and lower are used to construct a lower-triangular matrix used to compute the Riemannian metric tensor in latent space.
Note
If the dimension of the latent space is n, the number of neurons in the output layer of diag must be n, and the number of neurons in the output layer of lower must be n * (n - 1) ÷ 2.
Construct a Riemannian Hamiltonian Variational Autoencoder (RHVAE) from a standard VAE and a metric chain.
Arguments
vae::VAE: A standard Variational Autoencoder (VAE) model.
metric_chain::MetricChain: A chain of metrics to be used for the Riemannian Hamiltonian Monte Carlo (RHMC) sampler.
centroids_data::AbstractArray: An array of data centroids. Each column represents a centroid. N is a subtype of Number.
T::N: The temperature parameter for the inverse metric tensor. N is a subtype of Number.
λ::N: The regularization parameter for the inverse metric tensor. N is a subtype of Number.
Returns
A new RHVAE object.
Description
The constructor initializes the latent centroids and the metric tensor M to their default values. The latent centroids are initialized to a zero matrix of the same size as centroids_data, and M is initialized to a 3D array of identity matrices, one for each centroid.
Settings
This document was generated with Documenter.jl version 1.4.1 on Friday 21 June 2024. Using Julia version 1.10.4.
+)
Construct a Riemannian Hamiltonian Variational Autoencoder (RHVAE) from a standard VAE and a metric chain.
Arguments
vae::VAE: A standard Variational Autoencoder (VAE) model.
metric_chain::MetricChain: A chain of metrics to be used for the Riemannian Hamiltonian Monte Carlo (RHMC) sampler.
centroids_data::AbstractArray: An array of data centroids. Each column represents a centroid. N is a subtype of Number.
T::N: The temperature parameter for the inverse metric tensor. N is a subtype of Number.
λ::N: The regularization parameter for the inverse metric tensor. N is a subtype of Number.
Returns
A new RHVAE object.
Description
The constructor initializes the latent centroids and the metric tensor M to their default values. The latent centroids are initialized to a zero matrix of the same size as centroids_data, and M is initialized to a 3D array of identity matrices, one for each centroid.
Settings
This document was generated with Documenter.jl version 1.5.0 on Monday 8 July 2024. Using Julia version 1.10.4.
diff --git a/dev/search_index.js b/dev/search_index.js
index 9a8a925..3782942 100644
--- a/dev/search_index.js
+++ b/dev/search_index.js
@@ -1,3 +1,3 @@
var documenterSearchIndex = {"docs":
-[{"location":"mmdvae/#MMDVAEsmodule","page":"MMD-VAE (InfoVAE)","title":"MMD-VAE (InfoVAE)","text":"","category":"section"},{"location":"mmdvae/","page":"MMD-VAE (InfoVAE)","title":"MMD-VAE (InfoVAE)","text":"The Maximum-Mean Discrepancy Variational Autoencoder (MMD-VAE) is a variant of the Variational Autoencoder (VAE) that adds an extra term to the evidence lower bound (ELBO) that aims to maximize the mutual information between the latent space representation and the input data. In particular, the MMD-VAE uses the Maximum-Mean Discrepancy (MMD) as a measure of the \"distance\" between the latent space distribution and the input data distribution.","category":"page"},{"location":"mmdvae/","page":"MMD-VAE (InfoVAE)","title":"MMD-VAE (InfoVAE)","text":"For the implementation of the MMD-VAE in AutoEncoderToolkit.jl, the MMDVAE struct inherits directly from the VAE struct and adds the necessary functions to compute the extra terms in the loss function. An MMDVAE object is created by simply passing a VAE object to the constructor. This way, we can use Julias multiple dispatch to extend the functionality of the VAE object without having to redefine the entire structure.","category":"page"},{"location":"mmdvae/#Reference","page":"MMD-VAE (InfoVAE)","title":"Reference","text":"","category":"section"},{"location":"mmdvae/","page":"MMD-VAE (InfoVAE)","title":"MMD-VAE (InfoVAE)","text":"Maximum-Mean Discrepancy Variational Autoencoders Zhao, S., Song, J. & Ermon, S. InfoVAE: Information Maximizing Variational Autoencoders. Preprint at http://arxiv.org/abs/1706.02262 (2018).","category":"page"},{"location":"mmdvae/#MMDVAEstruct","page":"MMD-VAE (InfoVAE)","title":"MMDVAE struct","text":"","category":"section"},{"location":"mmdvae/","page":"MMD-VAE (InfoVAE)","title":"MMD-VAE (InfoVAE)","text":"AutoEncoderToolkit.MMDVAEs.MMDVAE{AutoEncoderToolkit.VAEs.VAE}","category":"page"},{"location":"mmdvae/#AutoEncoderToolkit.MMDVAEs.MMDVAE","page":"MMD-VAE (InfoVAE)","title":"AutoEncoderToolkit.MMDVAEs.MMDVAE","text":"`MMDVAE{\n V<:VAE{<:AbstractVariationalEncoder,<:AbstractVariationalDecoder}\n } <: AbstractVariationalAutoEncoder`\n\nA struct representing a Maximum-Mean Discrepancy Variational Autoencoder (MMD-VAE).\n\nFields\n\nvae::V: A Variational Autoencoder (VAE) that forms the basis of the MMD-VAE. The VAE should be composed of an AbstractVariationalEncoder and an AbstractVariationalDecoder.\n\nDescription\n\nThe MMDVAE struct is a subtype of AbstractVariationalAutoEncoder and represents a specific type of VAE known as an MMD-VAE. The MMD-VAE modifies the standard VAE by replacing the KL-divergence term in the loss function with a Maximum-Mean Discrepancy (MMD) term, which measures the distance between the aggregated posterior of the latent codes and the prior. This can help to alleviate the issue of posterior collapse, where the aggregated posterior fails to cover significant parts of the prior, commonly seen in VAEs.\n\nCitation\n\nMaximum-Mean Discrepancy Variational Autoencoders. Zhao, S., Song, J. & Ermon, S. InfoVAE: Information Maximizing Variational Autoencoders. Preprint at http://arxiv.org/abs/1706.02262 (2018).\n\n\n\n\n\n","category":"type"},{"location":"mmdvae/#Forward-pass","page":"MMD-VAE (InfoVAE)","title":"Forward pass","text":"","category":"section"},{"location":"mmdvae/","page":"MMD-VAE (InfoVAE)","title":"MMD-VAE (InfoVAE)","text":"AutoEncoderToolkit.MMDVAEs.MMDVAE(::AbstractArray)","category":"page"},{"location":"mmdvae/#AutoEncoderToolkit.MMDVAEs.MMDVAE-Tuple{AbstractArray}","page":"MMD-VAE (InfoVAE)","title":"AutoEncoderToolkit.MMDVAEs.MMDVAE","text":"(mmdvae::MMDVAE)(x::AbstractArray; latent::Bool=false)\n\nDefines the forward pass for the Maximum-Mean Discrepancy Variational Autoencoder (MMD-VAE).\n\nArguments\n\nx::AbstractArray: Input data.\n\nOptional Keyword Arguments\n\nlatent::Bool: Whether to return the latent variables along with the decoder output. If true, the function returns a tuple containing the encoder outputs, the latent sample, and the decoder outputs. If false, the function only returns the decoder outputs. Defaults to false. \n\nReturns\n\nIf latent is true, returns a NamedTuple containing:\nencoder: The outputs of the encoder.\nz: The latent sample.\ndecoder: The outputs of the decoder.\nIf latent is false, returns the outputs of the decoder.\n\n\n\n\n\n","category":"method"},{"location":"mmdvae/#Loss-function","page":"MMD-VAE (InfoVAE)","title":"Loss function","text":"","category":"section"},{"location":"mmdvae/","page":"MMD-VAE (InfoVAE)","title":"MMD-VAE (InfoVAE)","text":"AutoEncoderToolkit.MMDVAEs.loss","category":"page"},{"location":"mmdvae/#AutoEncoderToolkit.MMDVAEs.loss","page":"MMD-VAE (InfoVAE)","title":"AutoEncoderToolkit.MMDVAEs.loss","text":"loss(mmdvae::MMDVAE, x::AbstractArray; σ::Number=1.0f0, λ::Number=1.0f0, α::Number=0.0f0, n_latent_samples::Int=50, kernel::Function=gaussian_kernel, kernel_kwargs::Union{NamedTuple,Dict}=Dict(), reconstruction_loglikelihood::Function=decoder_loglikelihood, kl_divergence::Function=encoder_kl)\n\nLoss function for the Maximum-Mean Discrepancy variational autoencoder (MMD-VAE). The loss function is defined as:\n\nloss = -⟨log p(x|z)⟩ + (1 - α) * Dₖₗ(qᵩ(z | x) || p(z)) + (λ + α - 1) * MMD-D(qᵩ(z) || p(z)),\n\nArguments\n\nmmdvae::MMDVAE: Struct containing the elements of the MMD-VAE.\nx::AbstractArray: Input data.\n\nOptional Arguments\n\nλ::Number=1.0f0: Hyperparameter that emphasizes the importance of the KL divergence between qᵩ(z) and π(z) during training.\nα::Number=0.0f0: Hyperparameter that emphasizes the importance of the Mutual Information term during optimization.\nn_latent_samples::Int=50: Number of samples to take from the latent space prior π(z) when computing the MMD divergence.\nkernel::Function=gaussian_kernel: Kernel used to compute the divergence. Default is the Gaussian Kernel.\nkernel_kwargs::Union{NamedTuple,Dict}=Dict(): Additional keyword arguments to be passed to the kernel function.\nreconstruction_loglikelihood::Function=decoder_loglikelihood: Function that computes the log likelihood of the reconstructed input.\nkl_divergence::Function=encoder_kl: Function that computes the Kullback-Leibler divergence between the encoder distribution and the prior.\n\nReturns\n\nSingle value defining the loss function for entry x when compared with reconstructed output x̂.\n\nDescription\n\nThis function calculates the loss for the MMD-VAE. It computes the log likelihood of the reconstructed input, the MMD divergence between the encoder distribution and the prior, and the Kullback-Leibler divergence between the approximate decoder and the prior. These quantities are combined according to the formula above to compute the loss.\n\n\n\n\n\nloss(\n mmdvae::MMDVAE, x_in::AbstractArray, x_out::AbstractArray; \n λ::Number=1.0f0, α::Number=0.0f0, \n n_latent_samples::Int=50, \n kernel::Function=gaussian_kernel, \n kernel_kwargs::Union{NamedTuple,Dict}=Dict(), \n reconstruction_loglikelihood::Function=decoder_loglikelihood, \n kl_divergence::Function=encoder_kl\n)\n\nLoss function for the Maximum-Mean Discrepancy variational autoencoder (MMD-VAE). The loss function is defined as:\n\nloss = -⟨log p(x|z)⟩ + (1 - α) * Dₖₗ(qᵩ(z | x) || p(z)) + (λ + α - 1) * MMD-D(qᵩ(z) || p(z)),\n\nArguments\n\nmmdvae::MMDVAE: Struct containing the elements of the MMD-VAE.\nx_in::AbstractArray: Input data.\nx_out::AbstractArray: Data against which to compare the reconstructed output.\n\nOptional Arguments\n\nλ::Number=1.0f0: Hyperparameter that emphasizes the importance of the KL divergence between qᵩ(z) and π(z) during training.\nα::Number=0.0f0: Hyperparameter that emphasizes the importance of the Mutual Information term during optimization.\nn_latent_samples::Int=50: Number of samples to take from the latent space prior π(z) when computing the MMD divergence.\nkernel::Function=gaussian_kernel: Kernel used to compute the divergence. Default is the Gaussian Kernel.\nkernel_kwargs::Union{NamedTuple,Dict}=Dict(): Additional keyword arguments to be passed to the kernel function.\nreconstruction_loglikelihood::Function=decoder_loglikelihood: Function that computes the log likelihood of the reconstructed input.\nkl_divergence::Function=encoder_kl: Function that computes the Kullback-Leibler divergence between the encoder distribution and the prior.\n\nReturns\n\nSingle value defining the loss function for entry x when compared with reconstructed output x̂.\n\nDescription\n\nThis function calculates the loss for the MMD-VAE. It computes the log likelihood of the reconstructed input, the MMD divergence between the encoder distribution and the prior, and the Kullback-Leibler divergence between the approximate decoder and the prior. These quantities are combined according to the formula above to compute the loss.\n\n\n\n\n\n","category":"function"},{"location":"mmdvae/#Training","page":"MMD-VAE (InfoVAE)","title":"Training","text":"","category":"section"},{"location":"mmdvae/","page":"MMD-VAE (InfoVAE)","title":"MMD-VAE (InfoVAE)","text":"AutoEncoderToolkit.MMDVAEs.train!","category":"page"},{"location":"mmdvae/#AutoEncoderToolkit.MMDVAEs.train!","page":"MMD-VAE (InfoVAE)","title":"AutoEncoderToolkit.MMDVAEs.train!","text":"train!(mmdvae, x, opt; loss_function, loss_kwargs, verbose, loss_return)\n\nCustomized training function to update parameters of a variational autoencoder given a specified loss function.\n\nArguments\n\nmmdvae::MMDVAE: A struct containing the elements of a Maximum-Mean Discrepancy Variational Autoencoder (MMD-VAE).\nx::AbstractArray: Data on which to evaluate the loss function. The last dimension is taken as having each of the samples in a batch.\nopt::NamedTuple: State of the optimizer for updating parameters. Typically initialized using Flux.Train.setup.\n\nOptional Keyword Arguments\n\nloss_function::Function=loss: The loss function used for training. It should accept the MMDVAE model, data x, and keyword arguments in that order.\nloss_kwargs::Union{NamedTuple,Dict} = Dict(): Arguments for the loss function. These might include parameters like α, or β, depending on the specific loss function in use.\nverbose::Bool=false: If true, the loss value will be printed during training.\nloss_return::Bool=false: If true, the loss value will be returned after training.\n\nDescription\n\nTrains the MMDVAE by:\n\nComputing the gradient of the loss w.r.t the MMDVAE parameters.\nUpdating the MMDVAE parameters using the optimizer.\n\n\n\n\n\ntrain!(mmdvae, x_in, x_out, opt; loss_function, loss_kwargs, verbose, loss_return)\n\nCustomized training function to update parameters of a variational autoencoder given a specified loss function.\n\nArguments\n\nmmdvae::MMDVAE: A struct containing the elements of a Maximum-Mean Discrepancy Variational Autoencoder (MMD-VAE).\nx_in::AbstractArray: Data on which to evaluate the loss function. The last dimension is taken as having each of the samples in a batch.\nx_out::AbstractArray: Data against which to compare the reconstructed output.\nopt::NamedTuple: State of the optimizer for updating parameters. Typically initialized using Flux.Train.setup.\n\nOptional Keyword Arguments\n\nloss_function::Function=loss: The loss function used for training. It should accept the MMDVAE model, data x, and keyword arguments in that order.\nloss_kwargs::Union{NamedTuple,Dict} = Dict(): Arguments for the loss function. These might include parameters like α, or β, depending on the specific loss function in use.\nverbose::Bool=false: If true, the loss value will be printed during training.\nloss_return::Bool=false: If true, the loss value will be returned after training.\n\nDescription\n\nTrains the MMDVAE by:\n\nComputing the gradient of the loss w.r.t the MMDVAE parameters.\nUpdating the MMDVAE parameters using the optimizer.\n\n\n\n\n\n","category":"function"},{"location":"mmdvae/#Other-Functions","page":"MMD-VAE (InfoVAE)","title":"Other Functions","text":"","category":"section"},{"location":"mmdvae/","page":"MMD-VAE (InfoVAE)","title":"MMD-VAE (InfoVAE)","text":"AutoEncoderToolkit.MMDVAEs.gaussian_kernel\nAutoEncoderToolkit.MMDVAEs.mmd_div\nAutoEncoderToolkit.MMDVAEs.logP_mmd_ratio","category":"page"},{"location":"mmdvae/#AutoEncoderToolkit.MMDVAEs.gaussian_kernel","page":"MMD-VAE (InfoVAE)","title":"AutoEncoderToolkit.MMDVAEs.gaussian_kernel","text":"gaussian_kernel(\n x::AbstractArray, y::AbstractArray; ρ::Float32=1.0f0, dims::Int=2\n)\n\nFunction to compute the Gaussian Kernel between two arrays x and y, defined as \n\n k(x, y) = exp(-||x - y ||² / ρ²)\n\nArguments\n\nx::AbstractArray: First input array for the kernel.\ny::AbstractArray: Second input array for the kernel. \n\nOptional Keyword Arguments\n\nρ=1.0f0: Kernel amplitude hyperparameter. Larger ρ gives a smoother kernel.\ndims::Int=2: Number of dimensions to compute pairwise distances over.\n\nReturns\n\nk::AbstractArray: Kernel matrix where each element is computed as \n\nTheory\n\nThe Gaussian kernel measures the similarity between two points x and y. It is widely used in many machine learning algorithms. This implementation computes the squared Euclidean distance between all pairs of rows in x and y, scales the distance by ρ² and takes the exponential.\n\n\n\n\n\n","category":"function"},{"location":"mmdvae/#AutoEncoderToolkit.MMDVAEs.mmd_div","page":"MMD-VAE (InfoVAE)","title":"AutoEncoderToolkit.MMDVAEs.mmd_div","text":"mmd_div(\n x::AbstractArray, y::AbstractArray; \n kernel::Function=gaussian_kernel, \n kernel_kwargs::Union{NamedTuple,Dict}=Dict()\n)\n\nCompute the Maximum Mean Discrepancy (MMD) divergence between two arrays x and y.\n\nArguments\n\nx::AbstractArray: First input array.\ny::AbstractArray: Second input array.\n\nKeyword Arguments\n\nkernel::Function=gaussian_kernel: Kernel function to use. Default is the Gaussian kernel.\nkernel_kwargs::Union{NamedTuple,Dict}=Dict(): Additional keyword arguments to be passed to the kernel function.\n\nReturns\n\nmmd::Number: MMD divergence value. \n\nTheory\n\nMMD measures the difference between two distributions based on embeddings in a Reproducing Kernel Hilbert Space (RKHS). It is widely used for two-sample tests.\n\nThis function implements MMD as:\n\nMMD(x, y) = mean(k(x, x)) - 2 * mean(k(x, y)) + mean(k(y, y))\n\nwhere k is a positive definite kernel (e.g., Gaussian).\n\n\n\n\n\n","category":"function"},{"location":"mmdvae/#AutoEncoderToolkit.MMDVAEs.logP_mmd_ratio","page":"MMD-VAE (InfoVAE)","title":"AutoEncoderToolkit.MMDVAEs.logP_mmd_ratio","text":"logP_mmd_ratio(\n mmdvae::MMDVAE, x::AbstractArray; \n n_latent_samples::Int=100, kernel=gaussian_kernel, \n kernel_kwargs::Union{NamedTuple,Dict}=NamedTuple(), \n reconstruction_loglikelihood::Function=decoder_loglikelihood\n)\n\nFunction to compute the absolute ratio between the log likelihood ⟨log p(x|z)⟩ and the MMD divergence MMD-D(qᵩ(z|x)||p(z)).\n\nArguments\n\nmmdvae::MMDVAE: Struct containing the elements of the MMD-VAE.\nx::AbstractArray: Data to train the MMD-VAE.\n\nOptional Keyword Arguments\n\nn_latent_samples::Int=100: Number of samples to take from the latent space prior p(z) when computing the MMD divergence.\nkernel=gaussian_kernel: Kernel used to compute the divergence. Default is the Gaussian Kernel.\nkernel_kwargs::Union{NamedTuple,Dict}=NamedTuple(): Tuple containing arguments for the Kernel function.\nreconstruction_loglikelihood::Function=decoder_loglikelihood: Function that computes the log likelihood of the reconstructed input.\n\nReturns\n\nabs(⟨log p(x|z)⟩ / MMD-D(qᵩ(z|x)||p(z)))\n\nDescription\n\nThis function calculates:\n\nThe log likelihood ⟨log p(x|z)⟩ of x under the MMD-VAE decoder, averaged over\n\nall samples. 2. The MMD divergence between the encoder distribution q(z|x) and prior p(z). \n\nThe absolute ratio of these two quantities is returned.\n\nNote\n\nThis ratio is useful for setting the Lagrangian multiplier λ in training MMD-VAEs.\n\n\n\n\n\n","category":"function"},{"location":"utils/#Utils","page":"Utilities","title":"Utils","text":"","category":"section"},{"location":"utils/","page":"Utilities","title":"Utilities","text":"AutoEncoderToolkit.jl offers a series of utility functions for different tasks. ","category":"page"},{"location":"utils/#Training-Utilities","page":"Utilities","title":"Training Utilities","text":"","category":"section"},{"location":"utils/","page":"Utilities","title":"Utilities","text":"AutoEncoderToolkit.utils.step_scheduler\nAutoEncoderToolkit.utils.cycle_anneal\nAutoEncoderToolkit.utils.locality_sampler","category":"page"},{"location":"utils/#AutoEncoderToolkit.utils.step_scheduler","page":"Utilities","title":"AutoEncoderToolkit.utils.step_scheduler","text":"`step_scheduler(epoch, epoch_change, learning_rates)`\n\nSimple function to define different learning rates at specified epochs.\n\nArguments\n\nepoch::Int: Epoch at which to define learning rate.\nepoch_change::Vector{<:Int}: Number of epochs at which to change learning rate. It must include the initial learning rate!\nlearning_rates::Vector{<:AbstractFloat}: Learning rate value for the epoch range. Must be the same length as epoch_change\n\nReturns\n\nη::AbstractFloat: Learning rate for the current epoch.\n\n\n\n\n\n","category":"function"},{"location":"utils/#AutoEncoderToolkit.utils.cycle_anneal","page":"Utilities","title":"AutoEncoderToolkit.utils.cycle_anneal","text":"cycle_anneal(\n epoch::Int, \n n_epoch::Int, \n n_cycles::Int; \n frac::AbstractFloat=0.5f0, \n βmax::Number=1.0f0, \n βmin::Number=0.0f0, \n T::Type=Float32\n)\n\nFunction that computes the value of the annealing parameter β for a variational autoencoder as a function of the epoch number according to the cyclical annealing strategy.\n\nArguments\n\nepoch::Int: Epoch on which to evaluate the value of the annealing parameter.\nn_epoch::Int: Number of epochs that will be run to train the VAE.\nn_cycles::Int: Number of annealing cycles to be fit within the number of epochs.\n\nOptional Arguments\n\nfrac::AbstractFloat= 0.5f0: Fraction of the cycle in which the annealing parameter β will increase from the minimum to the maximum value.\nβmax::Number=1.0f0: Maximum value that the annealing parameter can reach.\nβmin::Number=0.0f0: Minimum value that the annealing parameter can reach.\nT::Type=Float32: The type of the output. The function will convert the output to this type.\n\nReturns\n\nβ::T: Value of the annealing parameter.\n\nCitation\n\nFu, H. et al. Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing. Preprint at http://arxiv.org/abs/1903.10145 (2019).\n\n\n\n\n\n","category":"function"},{"location":"utils/#AutoEncoderToolkit.utils.locality_sampler","page":"Utilities","title":"AutoEncoderToolkit.utils.locality_sampler","text":"locality_sampler(data, dist_tree, n_primary, n_secondary, k_neighbors; index=false)\n\nAlgorithm to generate mini-batches based on spatial locality as determined by a pre-constructed nearest neighbors tree.\n\nArguments\n\ndata::AbstractArray: An array containing the data points. The data points can be of any dimension.\ndist_tree::NearestNeighbors.NNTree: NearestNeighbors.jl tree used to determine the distance between data points.\nn_primary::Int: Number of primary points to sample.\nn_secondary::Int: Number of secondary points to sample from the neighbors of each primary point.\nk_neighbors::Int: Number of nearest neighbors from which to potentially sample the secondary points.\n\nOptional Keyword Arguments\n\nindex::Bool: If true, returns the indices of the selected samples. If false, returns the data corresponding to the indexes. Defaults to false.\n\nReturns\n\nIf index is true, returns sample_idx::Vector{Int64}: Indices of data points to include in the mini-batch.\nIf index is false, returns sample_data::AbstractArray: The data points to include in the mini-batch.\n\nDescription\n\nThis sampling algorithm consists of three steps:\n\nFor each datapoint, determine the k_neighbors nearest neighbors using the dist_tree.\nUniformly sample n_primary points without replacement from all data points.\nFor each primary point, sample n_secondary points without replacement from its k_neighbors nearest neighbors.\n\nExamples\n\n# Pre-constructed NearestNeighbors.jl tree\ndist_tree = NearestNeighbors.KDTree(data, metric)\nsample_indices = locality_sampler(data, dist_tree, 10, 5, 50)\n\nCitation\n\nSkafte, N., Jø rgensen, M. & Hauberg, S. ren. Reliable training and estimation of variance networks. in Advances in Neural Information Processing Systems vol. 32 (Curran Associates, Inc., 2019).\n\n\n\n\n\n","category":"function"},{"location":"utils/#centroidutils","page":"Utilities","title":"Centroid Finding Utilities","text":"","category":"section"},{"location":"utils/","page":"Utilities","title":"Utilities","text":"Some VAE models, such as the RHVAE, require clustering of the data. Specifically RHVAE can take a fixed subset of the training data as a reference for the computation of the metric tensor. The following functions can be used to define this reference subset to be used as centroids for the metric tensor computation.","category":"page"},{"location":"utils/","page":"Utilities","title":"Utilities","text":"AutoEncoderToolkit.utils.centroids_kmeans\nAutoEncoderToolkit.utils.centroids_kmedoids","category":"page"},{"location":"utils/#AutoEncoderToolkit.utils.centroids_kmeans","page":"Utilities","title":"AutoEncoderToolkit.utils.centroids_kmeans","text":"centroids_kmeans(\n x::AbstractMatrix, \n n_centroids::Int; \n assign::Bool=false\n)\n\nPerform k-means clustering on the input and return the centers. This function can be used to down-sample the number of points used when computing the metric tensor in training a Riemannian Hamiltonian Variational Autoencoder (RHVAE).\n\nArguments\n\nx::AbstractMatrix: The input data. Rows represent individual samples.\nn_centroids::Int: The number of centroids to compute.\n\nOptional Keyword Arguments\n\nassign::Bool=false: If true, also return the assignments of each point to a centroid.\n\nReturns\n\nIf assign is false, returns a matrix where each column is a centroid.\nIf assign is true, returns a tuple where the first element is the matrix of centroids and the second element is a vector of assignments.\n\nExamples\n\ndata = rand(100, 10)\ncentroids = centroids_kmeans(data, 5)\n\n\n\n\n\ncentroids_kmeans(\n x::AbstractArray, \n n_centroids::Int; \n reshape_centroids::Bool=true, \n assign::Bool=false\n)\n\nPerform k-means clustering on the input and return the centers. This function can be used to down-sample the number of points used when computing the metric tensor in training a Riemannian Hamiltonian Variational Autoencoder (RHVAE).\n\nThe input data is flattened into a matrix before performing k-means clustering. This is done because k-means operates on a set of data points in a vector space and cannot handle multi-dimensional arrays. Flattening the input ensures that the k-means algorithm can process the data correctly.\n\nBy default, the output centroids are reshaped back to the original input shape. This is controlled by the reshape_centroids argument.\n\nArguments\n\nx::AbstractArray: The input data. It can be a multi-dimensional array where the last dimension represents individual samples.\nn_centroids::Int: The number of centroids to compute.\n\nOptional Keyword Arguments\n\nreshape_centroids::Bool=true: If true, reshape the output centroids back to the original input shape.\nassign::Bool=false: If true, also return the assignments of each point to a centroid.\n\nReturns\n\nIf assign is false, returns a matrix where each column is a centroid.\nIf assign is true, returns a tuple where the first element is the matrix of centroids and the second element is a vector of assignments.\n\nExamples\n\ndata = rand(100, 10)\ncentroids = centroids_kmeans(data, 5)\n\n\n\n\n\n","category":"function"},{"location":"utils/#AutoEncoderToolkit.utils.centroids_kmedoids","page":"Utilities","title":"AutoEncoderToolkit.utils.centroids_kmedoids","text":" centroids_kmedoids(\n x::AbstractMatrix, n_centroids::Int; assign::Bool=false\n )\n\nPerform k-medoids clustering on the input and return the centers. This function can be used to down-sample the number of points used when computing the metric tensor in training a Riemannian Hamiltonian Variational Autoencoder (RHVAE).\n\nArguments\n\nx::AbstractMatrix: The input data. Rows represent individual samples.\nn_centroids::Int: The number of centroids to compute.\ndist::Distances.PreMetric=Distances.Euclidean(): The distance metric to use when computing the pairwise distance matrix.\n\nOptional Keyword Arguments\n\nassign::Bool=false: If true, also return the assignments of each point to a centroid.\n\nReturns\n\nIf assign is false, returns a matrix where each column is a centroid.\nIf assign is true, returns a tuple where the first element is the matrix of centroids and the second element is a vector of assignments.\n\nExamples\n\ndata = rand(100, 10)\ncentroids = centroids_kmedoids(data, 5)\n\n\n\n\n\ncentroids_kmedoids(\n x::AbstractArray,\n n_centroids::Int,\n dist::Distances.PreMetric=Distances.Euclidean();\n assign::Bool=false\n)\n\nPerform k-medoids clustering on the input and return the centers. This function can be used to down-sample the number of points used when computing the metric tensor in training a Riemannian Hamiltonian Variational Autoencoder (RHVAE).\n\nArguments\n\nx::AbstractArray: The input data. The last dimension of x should contain each of the samples that should be clustered.\nn_centroids::Int: The number of centroids to compute.\ndist::Distances.PreMetric=Distances.Euclidean(): The distance metric to use for the clustering. Defaults to Euclidean distance.\n\nOptional Keyword Arguments\n\nassign::Bool=false: If true, also return the assignments of each point to a centroid.\n\nReturns\n\nIf assign is false, returns an array where each column is a centroid.\nIf assign is true, returns a tuple where the first element is the array of centroids and the second element is a vector of assignments.\n\nExamples\n\ndata = rand(10, 100)\ncentroids = centroids_kmedoids(data, 5)\n\n\n\n\n\n","category":"function"},{"location":"utils/#Other-Utilities","page":"Utilities","title":"Other Utilities","text":"","category":"section"},{"location":"utils/","page":"Utilities","title":"Utilities","text":"AutoEncoderToolkit.utils.storage_type\nAutoEncoderToolkit.utils.vec_to_ltri\nAutoEncoderToolkit.utils.vec_mat_vec_batched\nAutoEncoderToolkit.utils.slogdet\nAutoEncoderToolkit.utils.sample_MvNormalCanon\nAutoEncoderToolkit.utils.unit_vector\nAutoEncoderToolkit.utils.finite_difference_gradient\nAutoEncoderToolkit.utils.taylordiff_gradient","category":"page"},{"location":"utils/#AutoEncoderToolkit.utils.storage_type","page":"Utilities","title":"AutoEncoderToolkit.utils.storage_type","text":"storage_type(A::AbstractArray)\n\nDetermine the storage type of an array.\n\nThis function recursively checks the parent of the array until it finds the base storage type. This is useful for determining whether an array or its subarrays are stored on the CPU or GPU.\n\nArguments\n\nA::AbstractArray: The array whose storage type is to be determined.\n\nReturns\n\nThe type of the array that is the base storage of A.\n\n\n\n\n\n","category":"function"},{"location":"utils/#AutoEncoderToolkit.utils.vec_to_ltri","page":"Utilities","title":"AutoEncoderToolkit.utils.vec_to_ltri","text":" vec_to_ltri(diag::AbstractVecOrMat, lower::AbstractVecOrMat)\n\nConvert two one-dimensional vectors or matrices into a lower triangular matrix or a 3D tensor.\n\nArguments\n\ndiag::AbstractVecOrMat: The input vector or matrix to be converted into the diagonal of the matrix. If it's a matrix, each column is considered as a separate vector.\nlower::AbstractVecOrMat: The input vector or matrix to be converted into the lower triangular part of the matrix. The length of this vector or the number of rows in this matrix should be a triangular number (i.e., the sum of the first n natural numbers for some n). If it's a matrix, each column is considered the lower part of a separate lower triangular matrix.\n\nReturns\n\nA lower triangular matrix or a 3D tensor where each slice is a lower triangular matrix constructed from diag and lower.\n\nDescription\n\nThis function constructs a lower triangular matrix or a 3D tensor from two input vectors or matrices, diag and lower. The diag vector or matrix provides the diagonal elements of the matrix, while the lower vector or matrix provides the elements below the diagonal. The function uses a comprehension to construct the matrix or tensor, with the lower_index function calculating the appropriate index in the lower vector or matrix for each element below the diagonal.\n\nGPU Support\n\nThe function supports both CPU and GPU arrays. For GPU arrays, the data is first transferred to the CPU, the lower triangular matrix or tensor is constructed, and then it is transferred back to the GPU.\n\n\n\n\n\n","category":"function"},{"location":"utils/#AutoEncoderToolkit.utils.vec_mat_vec_batched","page":"Utilities","title":"AutoEncoderToolkit.utils.vec_mat_vec_batched","text":"vec_mat_vec_batched(\n v::AbstractVector, \n M::AbstractMatrix, \n w::AbstractVector\n)\n\nCompute the product of a vector, a matrix, and another vector in the form v̲ᵀ M̲̲ w̲.\n\nThis function takes two vectors v and w, and a matrix M, and computes the product v̲ M̲̲ w̲. This function is added for consistency when calling multiple dispatch.\n\nArguments\n\nv::AbstractVector: A d dimensional vector.\nM::AbstractMatrix: A d×d matrix.\nw::AbstractVector: A d dimensional vector.\n\nReturns\n\nA scalar which is the result of the product v̲ M̲̲ w̲ for the corresponding vectors and matrix.\n\nNotes\n\nThis function uses the LinearAlgebra.dot function to perform the multiplication of the matrix M with the vector w. The resulting vector is then element-wise multiplied with the vector v and summed over the dimensions to obtain the final result. This function is added for consistency when calling multiple dispatch.\n\n\n\n\n\nvec_mat_vec_batched(\n v::AbstractMatrix, \n M::AbstractArray, \n w::AbstractMatrix\n)\n\nCompute the batched product of vectors and matrices in the form v̲ᵀ M̲̲ w̲.\n\nThis function takes two matrices v and w, and a 3D array M, and computes the batched product v̲ M̲̲ w̲. The computation is performed in a broadcasted manner using the Flux.batched_vec function.\n\nArguments\n\nv::AbstractMatrix: A d×n matrix, where d is the dimension of the vectors and n is the number of vectors.\nM::AbstractArray: A d×d×n array, where d is the dimension of the matrices and n is the number of matrices.\nw::AbstractMatrix: A d×n matrix, where d is the dimension of the vectors and n is the number of vectors.\n\nReturns\n\nAn n dimensional array where each element is the result of the product v̲ M̲̲ w̲ for the corresponding vectors and matrix.\n\nNotes\n\nThis function uses the Flux.batched_vec function to perform the batched multiplication of the matrices in M with the vectors in w. The resulting vectors are then element-wise multiplied with the vectors in v and summed over the dimensions to obtain the final result.\n\n\n\n\n\nvec_mat_vec_batched(\n v::AbstractVector{T}, \n M::AbstractMatrix{S}, \n w::AbstractVector{T}\n) where {T<:TaylorDiff.TaylorScalar{Float32,2},S<:Number}\n\nCompute the product of a vector and a matrix in the form v̲ᵀ M̲ w̲ for a specific type of matrix and vectors.\n\nThis function takes two vectors v and w of type TaylorDiff.TaylorScalar{Float32,2}, and a matrix M of type Number, and computes the product v̲ M̲ w̲. The computation is performed by first performing the matrix-vector multiplication M̲ w̲, and then computing the dot product of the resulting vector with v.\n\nArguments\n\nv::AbstractVector{T}: A d dimensional vector. T is a subtype of TaylorDiff.TaylorScalar{Float32,2}.\nM::AbstractMatrix{S}: A d×d matrix. S is a subtype of Number.\nw::AbstractVector{T}: A d dimensional vector. T is a subtype of TaylorDiff.TaylorScalar{Float32,2}.\n\nReturns\n\nA scalar which is the result of the product v̲ M̲ w̲.\n\nNotes\n\nThis function uses the dot function to compute the final dot product.\n\n\n\n\n\nvec_mat_vec_batched(\n v::AbstractMatrix{T}, \n M::AbstractArray{S,3}, \n w::AbstractMatrix{T}\n) where {T<:TaylorDiff.TaylorScalar{Float32,2},S<:Number}\n\nCompute the batched product of vectors and matrices in the form v̲ᵀ M̲̲ w̲ for a specific type of matrices and vectors.\n\nThis function takes two matrices v and w of type TaylorDiff.TaylorScalar{Float32,2}, and a 3D array M of type Number, and computes the batched product v̲ M̲̲ w̲. The computation is performed by first extracting each slice of M and each column of w, then performing the vector-matrix multiplication for each pair of slices, and finally computing the element-wise multiplication of the resulting matrix with v and summing over the dimensions.\n\nArguments\n\nv::AbstractMatrix{T}: A d×n matrix, where d is the dimension of the vectors and n is the number of vectors. T is a subtype of TaylorDiff.TaylorScalar{Float32,2}.\nM::AbstractArray{S,3}: A d×d×n array, where d is the dimension of the matrices and n is the number of matrices. S is a subtype of Number.\nw::AbstractMatrix{T}: A d×n matrix, where d is the dimension of the vectors and n is the number of vectors. T is a subtype of TaylorDiff.TaylorScalar{Float32,2}.\n\nReturns\n\nAn n dimensional array where each element is the result of the product v̲ M̲̲ w̲ for the corresponding vectors and matrix.\n\nNotes\n\nThis function uses the eachslice and eachcol functions to extract the slices of M and the columns of w, respectively. It then uses a list comprehension to perform the vector-matrix multiplication for each pair of slices, and finally computes the element-wise multiplication of the resulting matrix with v and sums over the dimensions to obtain the final result.\n\n\n\n\n\n","category":"function"},{"location":"utils/#AutoEncoderToolkit.utils.slogdet","page":"Utilities","title":"AutoEncoderToolkit.utils.slogdet","text":"slogdet(A::AbstractArray{T}; check::Bool=false) where {T<:Number}\n\nCompute the log determinant of a positive-definite matrix A or a 3D array of such matrices.\n\nArguments\n\nA::AbstractArray{T}: A positive-definite matrix or a 3D array of positive-definite matrices whose log determinant is to be computed. \ncheck::Bool=false: A flag that determines whether to check if the input matrix A is positive-definite. Defaults to false due to numerical instability.\n\nReturns\n\nThe log determinant of A. If A is a 3D array, returns a 1D array of log determinants, one for each slice along the third dimension of A.\n\nDescription\n\nThis function computes the log determinant of a positive-definite matrix A or a 3D array of such matrices. It first computes the Cholesky decomposition of A, and then calculates the log determinant as twice the sum of the log of the diagonal elements of the lower triangular matrix from the Cholesky decomposition.\n\nConditions\n\nThe input matrix A must be a positive-definite matrix, i.e., it must be symmetric and all its eigenvalues must be positive. If check is set to true, the function will throw an error if A is not positive-definite.\n\nGPU Support\n\nThe function supports both CPU and GPU arrays. \n\n\n\n\n\n","category":"function"},{"location":"utils/#AutoEncoderToolkit.utils.sample_MvNormalCanon","page":"Utilities","title":"AutoEncoderToolkit.utils.sample_MvNormalCanon","text":"sample_MvNormalCanon(Σ⁻¹::AbstractArray{T}) where {T<:Number}\n\nDraw a random sample from a multivariate normal distribution in canonical form.\n\nArguments\n\nΣ⁻¹::AbstractArray{T}: The precision matrix (inverse of the covariance matrix) of the multivariate normal distribution. This can be a 2D array (matrix) or a 3D array.\n\nReturns\n\nA random sample drawn from the multivariate normal distribution specified by the input precision matrix. If Σ⁻¹ is a 3D array, returns a 2D array of samples, one for each slice along the third dimension of Σ⁻¹.\n\nDescription\n\nThis function draws a random sample from a multivariate normal distribution specified by a precision matrix Σ⁻¹. The precision matrix can be a 2D array (matrix) or a 3D array. If Σ⁻¹ is a 3D array, the function draws a sample for each slice along the third dimension of Σ⁻¹.\n\nThe function first inverts the precision matrix to obtain the covariance matrix, then performs a Cholesky decomposition of the covariance matrix. It then draws a sample from a standard normal distribution and multiplies it by the lower triangular matrix from the Cholesky decomposition to obtain the final sample.\n\nGPU Support\n\nThe function supports both CPU and GPU arrays.\n\n\n\n\n\n","category":"function"},{"location":"utils/#AutoEncoderToolkit.utils.unit_vector","page":"Utilities","title":"AutoEncoderToolkit.utils.unit_vector","text":"unit_vector(x::AbstractVector, i::Int)\n\nCreate a unit vector of the same length as x with the i-th element set to 1.\n\nArguments\n\nx::AbstractVector: The vector whose length is used to determine the dimension of the unit vector.\ni::Int: The index of the element to be set to 1.\n\nReturns\n\nA unit vector of type eltype(x) and length equal to x with the i-th element set to 1.\n\nDescription\n\nThis function creates a unit vector of the same length as x with the i-th element set to 1. All other elements are set to 0.\n\nNote\n\nThis function is marked with the @ignore_derivatives macro from the ChainRulesCore package, which means that all AutoDiff backends will ignore any call to this function when computing gradients.\n\n\n\n\n\nunit_vector(x::AbstractMatrix, i::Int)\n\nCreate a unit vector of the same length as the number of rows in x with the i-th element set to 1.\n\nArguments\n\nx::AbstractMatrix: The matrix whose number of rows is used to determine the dimension of the unit vector.\ni::Int: The index of the element to be set to 1.\n\nReturns\n\nA unit vector of type eltype(x) and length equal to the number of rows in x with the i-th element set to 1.\n\nDescription\n\nThis function creates a unit vector of the same length as the number of rows in x with the i-th element set to 1. All other elements are set to 0. \n\n\n\n\n\n","category":"function"},{"location":"utils/#AutoEncoderToolkit.utils.finite_difference_gradient","page":"Utilities","title":"AutoEncoderToolkit.utils.finite_difference_gradient","text":"finite_difference_gradient(\n f::Function,\n x::AbstractVecOrMat;\n fdtype::Symbol=:central\n)\n\nCompute the finite difference gradient of a function f at a point x.\n\nArguments\n\nf::Function: The function for which the gradient is to be computed. This function must return a scalar value.\nx::AbstractVecOrMat: The point at which the gradient is to be computed. Can be a vector or a matrix. If a matrix, each column represents a point where the function f is to be evaluated and the derivative computed.\n\nOptional Keyword Arguments\n\nfdtype::Symbol=:central: The finite difference type. It can be either :forward or :central. Defaults to :central.\n\nReturns\n\nA vector or a matrix representing the gradient of f at x, depending on the input type of x.\n\nDescription\n\nThis function computes the finite difference gradient of a function f at a point x. The gradient is a vector or a matrix where the i-th element is the partial derivative of f with respect to the i-th element of x.\n\nThe partial derivatives are computed using the forward or central difference formula, depending on the fdtype argument:\n\nForward difference formula: ∂f/∂xᵢ ≈ [f(x + ε * eᵢ) - f(x)] / ε\nCentral difference formula: ∂f/∂xᵢ ≈ [f(x + ε * eᵢ) - f(x - ε * eᵢ)] / 2ε\n\nwhere ε is the step size and eᵢ is the i-th unit vector.\n\nGPU Support\n\nThis function supports both CPU and GPU arrays.\n\n\n\n\n\n","category":"function"},{"location":"utils/#AutoEncoderToolkit.utils.taylordiff_gradient","page":"Utilities","title":"AutoEncoderToolkit.utils.taylordiff_gradient","text":" taylordiff_gradient(\n f::Function,\n x::AbstractVecOrMat\n )\n\nCompute the gradient of a function f at a point x using Taylor series differentiation.\n\nArguments\n\nf::Function: The function for which the gradient is to be computed. This must be a scalar function.\nx::AbstractVecOrMat: The point at which the gradient is to be computed. Can be a vector or a matrix. If a matrix, each column represents a point where the function f is to be evaluated and the derivative computed.\n\nReturns\n\nA vector or a matrix representing the gradient of f at x, depending on the input type of x.\n\nDescription\n\nThis function computes the gradient of a function f at a point x using Taylor series differentiation. The gradient is a vector or a matrix where the i-th element or column is the partial derivative of f with respect to the i-th element of x.\n\nThe partial derivatives are computed using the TaylorDiff.derivative function.\n\nGPU Support\n\nThis function currently only supports CPU arrays.\n\n\n\n\n\n","category":"function"},{"location":"encoders/#encodersdecoders","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.jl provides a set of predefined encoders and decoders that can be used to define custom (variational) autoencoder architectures.","category":"page"},{"location":"encoders/#Encoders","page":"Encoders & Decoders","title":"Encoders","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"The tree structure of the encoder types looks like this (🧱 represents concrete types):","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AbstractEncoder\nAbstractDeterministicEncoder\nEncoder 🧱\nAbstractVariationalEncoder\nAbstractGaussianEncoder\nAbstractGaussianLinearEncoder\nJointGaussianEncoder 🧱\nAbstractGaussianLogEncoder\nJointGaussianLogEncoder 🧱","category":"page"},{"location":"encoders/#Encoder","page":"Encoders & Decoders","title":"Encoder","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.Encoder\nAutoEncoderToolkit.Encoder(::AbstractArray)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.Encoder","page":"Encoders & Decoders","title":"AutoEncoderToolkit.Encoder","text":"struct Encoder\n\nDefault encoder function for deterministic autoencoders. The encoder network is used to map the input data directly into the latent space representation.\n\nFields\n\nencoder::Union{Flux.Chain,Flux.Dense}: The primary neural network used to process input data and map it into a latent space representation.\n\nExample\n\nenc = Encoder(Flux.Chain(Dense(784, 400, relu), Dense(400, 20)))\n\n\n\n\n\n","category":"type"},{"location":"encoders/#AutoEncoderToolkit.Encoder-Tuple{AbstractArray}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.Encoder","text":"(encoder::Encoder)(x)\n\nForward propagate the input x through the Encoder to obtain the encoded representation in the latent space.\n\nArguments\n\nx::Array: Input data to be encoded.\n\nReturns\n\nz: Encoded representation of the input data in the latent space.\n\nDescription\n\nThis method allows for a direct call on an instance of Encoder with the input data x. It runs the input through the encoder network and outputs the encoded representation in the latent space.\n\nExample\n\nenc = Encoder(...)\nz = enc(some_input)\n\nNote\n\nEnsure that the input x matches the expected dimensionality of the encoder's input layer.\n\n\n\n\n\n","category":"method"},{"location":"encoders/#JointGaussianEncoder","page":"Encoders & Decoders","title":"JointGaussianEncoder","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.JointGaussianEncoder\nAutoEncoderToolkit.JointGaussianEncoder(::AbstractArray)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.JointGaussianEncoder","page":"Encoders & Decoders","title":"AutoEncoderToolkit.JointGaussianEncoder","text":"struct JointGaussianEncoder <: AbstractGaussianLinearEncoder\n\nEncoder function for variational autoencoders where the same encoder network is used to map to the latent space mean µ and standard deviation σ.\n\nFields\n\nencoder::Flux.Chain: The primary neural network used to process input data and map it into a latent space representation.\nµ::Flux.Dense: A dense layer mapping from the output of the encoder to the mean of the latent space.\nσ::Flux.Dense: A dense layer mapping from the output of the encoder to the standard deviation of the latent space.\n\nExample\n\nenc = JointGaussianEncoder(\n Flux.Chain(Dense(784, 400, relu)), Flux.Dense(400, 20), Flux.Dense(400, 20)\n)\n\n\n\n\n\n","category":"type"},{"location":"encoders/#AutoEncoderToolkit.JointGaussianEncoder-Tuple{AbstractArray}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.JointGaussianEncoder","text":" (encoder::JointGaussianEncoder)(x::AbstractArray)\n\nForward propagate the input x through the JointGaussianEncoder to obtain the mean (µ) and standard deviation (σ) of the latent space.\n\nArguments\n\nx::AbstractArray: Input data to be encoded.\n\nReturns\n\nA NamedTuple (µ=µ, σ=σ,) where:\nµ: Mean of the latent space after passing the input through the encoder and subsequently through the µ layer.\nσ: Standard deviation of the latent space after passing the input through the encoder and subsequently through the σ layer.\n\nDescription\n\nThis method allows for a direct call on an instance of JointGaussianEncoder with the input data x. It first runs the input through the encoder network, then maps the output of the last encoder layer to both the mean and standard deviation of the latent space.\n\nExample\n\nje = JointGaussianEncoder(...)\nµ, σ = je(some_input)\n\nNote\n\nEnsure that the input x matches the expected dimensionality of the encoder's input layer.\n\n\n\n\n\n","category":"method"},{"location":"encoders/#JointGaussianLogEncoder","page":"Encoders & Decoders","title":"JointGaussianLogEncoder","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.JointGaussianLogEncoder\nAutoEncoderToolkit.JointGaussianLogEncoder(::AbstractArray)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.JointGaussianLogEncoder","page":"Encoders & Decoders","title":"AutoEncoderToolkit.JointGaussianLogEncoder","text":"struct JointGaussianLogEncoder <: AbstractGaussianLogEncoder\n\nDefault encoder function for variational autoencoders where the same encoder network is used to map to the latent space mean µ and log standard deviation logσ.\n\nFields\n\nencoder::Flux.Chain: The primary neural network used to process input data and map it into a latent space representation.\nµ::Union{Flux.Dense,Flux.Chain}: A dense layer or a chain of layers mapping from the output of the encoder to the mean of the latent space.\nlogσ::Union{Flux.Dense,Flux.Chain}: A dense layer or a chain of layers mapping from the output of the encoder to the log standard deviation of the latent space.\n\nExample\n\nenc = JointGaussianLogEncoder(\n Flux.Chain(Dense(784, 400, relu)), Flux.Dense(400, 20), Flux.Dense(400, 20)\n)\n\n\n\n\n\n","category":"type"},{"location":"encoders/#AutoEncoderToolkit.JointGaussianLogEncoder-Tuple{AbstractArray}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.JointGaussianLogEncoder","text":" (encoder::JointGaussianLogEncoder)(x)\n\nThis method forward propagates the input x through the JointGaussianLogEncoder to compute the mean (mu) and log standard deviation (logσ) of the latent space.\n\nArguments\n\nx::Array{Float32}: The input data to be encoded.\n\nReturns\n\nA NamedTuple (µ=µ, logσ=logσ,) where:\nµ: The mean of the latent space. This is computed by passing the input through the encoder and subsequently through the µ layer. \nlogσ: The log standard deviation of the latent space. This is computed by passing the input through the encoder and subsequently through the logσ layer.\n\nDescription\n\nThis method allows for a direct call on an instance of JointGaussianLogEncoder with the input data x. It first processes the input through the encoder network, then maps the output of the last encoder layer to both the mean and log standard deviation of the latent space.\n\nExample\n\nje = JointGaussianLogEncoder(...)\nmu, logσ = je(some_input)\n\nNote\n\nEnsure that the input x matches the expected dimensionality of the encoder's input layer.\n\n\n\n\n\n","category":"method"},{"location":"encoders/#Decoders","page":"Encoders & Decoders","title":"Decoders","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"The tree structure of the decoder types looks like this (🧱 represents concrete types):","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AbstractDecoder\nAbstractDeterministicDecoder\nDecoder 🧱\nAbstractVariationalDecoder\nBernoulliDecoder 🧱\nCategoricalDecoder 🧱\nAbstractGaussianDecoder\nSimpleGaussianDecoder 🧱\nAbstractGaussianLinearDecoder\nJointGaussianDecoder 🧱\nSplitGaussianDecoder 🧱\nAbstractGaussianLogDecoder\nJointGaussianLogDecoder 🧱\nSplitGaussianLogDecoder 🧱","category":"page"},{"location":"encoders/#Decoder","page":"Encoders & Decoders","title":"Decoder","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.Decoder\nAutoEncoderToolkit.Decoder(::AbstractArray)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.Decoder","page":"Encoders & Decoders","title":"AutoEncoderToolkit.Decoder","text":"struct Decoder\n\nDefault decoder function for deterministic autoencoders. The decoder network is used to map the latent space representation directly back to the original data space.\n\nFields\n\ndecoder::Flux.Chain: The primary neural network used to process the latent space representation and map it back to the data space.\n\nExample\n\ndec = Decoder(Flux.Chain(Dense(20, 400, relu), Dense(400, 784)))\n\n\n\n\n\n","category":"type"},{"location":"encoders/#AutoEncoderToolkit.Decoder-Tuple{AbstractArray}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.Decoder","text":"(decoder::Decoder)(z::AbstractArray)\n\nForward propagate the encoded representation z through the Decoder to obtain the reconstructed input data.\n\nArguments\n\nz::AbstractArray: Encoded representation in the latent space.\n\nReturns\n\nx_reconstructed: Reconstructed version of the original input data after decoding from the latent space.\n\nDescription\n\nThis method allows for a direct call on an instance of Decoder with the encoded data z. It runs the encoded representation through the decoder network and outputs the reconstructed version of the original input data.\n\nExample\n\njulia dec = Decoder(...) x_reconstructed = dec(encoded_representation)`\n\nNote\n\nEnsure that the input z matches the expected dimensionality of the decoder's input layer.\n\n\n\n\n\n","category":"method"},{"location":"encoders/#BernoulliDecoder","page":"Encoders & Decoders","title":"BernoulliDecoder","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.BernoulliDecoder\nAutoEncoderToolkit.BernoulliDecoder(::AbstractArray)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.BernoulliDecoder","page":"Encoders & Decoders","title":"AutoEncoderToolkit.BernoulliDecoder","text":" BernoulliDecoder <: AbstractVariationalDecoder\n\nA decoder structure for variational autoencoders (VAEs) that models the output data as a Bernoulli distribution. This is typically used when the outputs of the decoder are probabilities.\n\nFields\n\ndecoder::Flux.Chain: The primary neural network used to process the latent space and map it to the output (or reconstructed) space.\n\nDescription\n\nBernoulliDecoder represents a VAE decoder that models the output data as a Bernoulli distribution. It's commonly used when the outputs of the decoder are probabilities, such as in a binary classification task or when modeling binary data. Unlike a Gaussian decoder, there's no need for separate paths or operations on the mean or log standard deviation.\n\nNote\n\nEnsure the last layer of the decoder outputs a value between 0 and 1, as this is required for a Bernoulli distribution.\n\n\n\n\n\n","category":"type"},{"location":"encoders/#AutoEncoderToolkit.BernoulliDecoder-Tuple{AbstractArray}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.BernoulliDecoder","text":" (decoder::BernoulliDecoder)(z::AbstractArray)\n\nMaps the given latent representation z through the BernoulliDecoder network to reconstruct the original input.\n\nArguments\n\nz::AbstractArray: The latent space representation to be decoded. This can be a vector or a matrix, where each column represents a separate sample from the latent space of a VAE.\n\nReturns\n\nA NamedTuple (p=p,) where p is an array representing the output of the decoder, which should resemble the original input to the VAE (post encoding and sampling from the latent space).\n\nDescription\n\nThis function processes the latent space representation z using the neural network defined in the BernoulliDecoder struct. The aim is to decode or reconstruct the original input from this representation.\n\nNote\n\nEnsure that the latent space representation z matches the expected input dimensionality for the BernoulliDecoder.\n\n\n\n\n\n","category":"method"},{"location":"encoders/#CategoricalDecoder","page":"Encoders & Decoders","title":"CategoricalDecoder","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.CategoricalDecoder\nAutoEncoderToolkit.CategoricalDecoder(::AbstractArray)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.CategoricalDecoder","page":"Encoders & Decoders","title":"AutoEncoderToolkit.CategoricalDecoder","text":"CategoricalDecoder <: AbstractVariationalDecoder\n\nA decoder structure for variational autoencoders (VAEs) that models the output data as a categorical distribution. This is typically used when the outputs of the decoder are categorical variables encoded as one-hot vectors.\n\nFields\n\ndecoder::Flux.Chain: The primary neural network used to process the latent space and map it to the output (or reconstructed) space.\n\nDescription\n\nCategoricalDecoder represents a VAE decoder that models the output data as a categorical distribution. It's commonly used when the outputs of the decoder are categorical variables, such as in a multi-class one-hot encoded vectors. Unlike a Gaussian decoder, there's no need for separate paths or operations on the mean or log standard deviation.\n\nNote\n\nEnsure the last layer of the decoder outputs a probability distribution over the categories, as this is required for a categorical distribution. This can be done using a softmax activation function, for example.\n\n\n\n\n\n","category":"type"},{"location":"encoders/#AutoEncoderToolkit.CategoricalDecoder-Tuple{AbstractArray}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.CategoricalDecoder","text":"(decoder::CategoricalDecoder)(z::AbstractArray)\n\nMaps the given latent representation z through the CategoricalDecoder network to reconstruct the original input.\n\nArguments\n\nz::AbstractArray: The latent space representation to be decoded. This can be a vector or a matrix, where each column represents a separate sample from the latent space of a VAE.\n\nReturns\n\nA NamedTuple (p=p,) where p is an array representing the output of the decoder, which should resemble the original input to the VAE (post encoding and sampling from the latent space).\n\nDescription\n\nThis function processes the latent space representation z using the neural network defined in the CategoricalDecoder struct. The aim is to decode or reconstruct the original input from this representation.\n\nNote\n\nEnsure that the latent space representation z matches the expected input dimensionality for the CategoricalDecoder.\n\n\n\n\n\n","category":"method"},{"location":"encoders/#SimpleGaussianDecoder","page":"Encoders & Decoders","title":"SimpleGaussianDecoder","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.SimpleGaussianDecoder\nAutoEncoderToolkit.SimpleGaussianDecoder(::AbstractArray)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.SimpleGaussianDecoder","page":"Encoders & Decoders","title":"AutoEncoderToolkit.SimpleGaussianDecoder","text":"SimpleGaussianDecoder <: AbstractGaussianDecoder\n\nA straightforward decoder structure for variational autoencoders (VAEs) that contains only a single decoder network.\n\nFields\n\ndecoder::Flux.Chain: The primary neural network used to process the latent space and map it to the output (or reconstructed) space.\n\nDescription\n\nSimpleGaussianDecoder represents a basic VAE decoder without explicit components for the latent space's mean (µ) or log standard deviation (logσ). It's commonly used when the VAE's latent space distribution is implicitly defined, and there's no need for separate paths or operations on the mean or log standard deviation.\n\n\n\n\n\n","category":"type"},{"location":"encoders/#AutoEncoderToolkit.SimpleGaussianDecoder-Tuple{AbstractArray}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.SimpleGaussianDecoder","text":"(decoder::SimpleGaussianDecoder)(z::AbstractVecOrMat)\n\nMaps the given latent representation z through the SimpleGaussianDecoder network to reconstruct the original input.\n\nArguments\n\nz::AbstractArray: The latent space representation to be decoded. This can be a vector or a matrix, where each column represents a separate sample from the latent space of a VAE.\n\nReturns\n\nA NamedTuple (µ=µ,) where µ is an array representing the output of the decoder, which should resemble the original input to the VAE (post encoding and sampling from the latent space).\n\nDescription\n\nThis function processes the latent space representation z using the neural network defined in the SimpleGaussianDecoder struct. The aim is to decode or reconstruct the original input from this representation.\n\nExample\n\ndecoder = SimpleGaussianDecoder(...)\nz = ... # some latent space representation\noutput = decoder(z)\n\nNote\n\nEnsure that the latent space representation z matches the expected input dimensionality for the SimpleGaussianDecoder.\n\n\n\n\n\n","category":"method"},{"location":"encoders/#JointGaussianDecoder","page":"Encoders & Decoders","title":"JointGaussianDecoder","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.JointGaussianDecoder\nAutoEncoderToolkit.JointGaussianDecoder(::AbstractArray)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.JointGaussianDecoder","page":"Encoders & Decoders","title":"AutoEncoderToolkit.JointGaussianDecoder","text":"JointGaussianDecoder <: AbstractGaussianLinearDecoder\n\nAn extended decoder structure for VAEs that incorporates separate layers for mapping from the latent space to both its mean (µ) and standard deviation (σ).\n\nFields\n\ndecoder::Flux.Chain: The primary neural network used to process the latent space before determining its mean and log standard deviation.\nµ::Flux.Dense: A dense layer that maps from the output of the decoder to the mean of the latent space.\nσ::Flux.Dense: A dense layer that maps from the output of the decoder to the standard deviation of the latent space.\n\nDescription\n\nJointGaussianDecoder is tailored for VAE architectures where the same decoder network is used initially, and then splits into two separate paths for determining both the mean and standard deviation of the latent space.\n\n\n\n\n\n","category":"type"},{"location":"encoders/#AutoEncoderToolkit.JointGaussianDecoder-Tuple{AbstractArray}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.JointGaussianDecoder","text":" (decoder::JointGaussianDecoder)(z::AbstractArray)\n\nMaps the given latent representation z through the JointGaussianDecoder network to produce both the mean (µ) and standard deviation (σ).\n\nArguments\n\nz::AbstractArray: The latent space representation to be decoded. If array, the last dimension contains each of the latent space representations to be decoded.\n\nReturns\n\nA NamedTuple (µ=µ, σ=σ,) where:\nµ::AbstractArray: The mean representation obtained from the decoder.\nσ::AbstractArray: The standard deviation representation obtained from the decoder.\n\nDescription\n\nThis function processes the latent space representation z using the primary neural network of the JointGaussianDecoder struct. It then separately maps the output of this network to the mean and standard deviation using the µ and σ dense layers, respectively.\n\nExample\n\ndecoder = JointGaussianDecoder(...)\nz = ... # some latent space representation\noutput = decoder(z)\n\nNote\n\nEnsure that the latent space representation z matches the expected input dimensionality for the JointGaussianDecoder.\n\n\n\n\n\n","category":"method"},{"location":"encoders/#JointGaussianLogDecoder","page":"Encoders & Decoders","title":"JointGaussianLogDecoder","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.JointGaussianLogDecoder\nAutoEncoderToolkit.JointGaussianLogDecoder(::AbstractArray)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.JointGaussianLogDecoder","page":"Encoders & Decoders","title":"AutoEncoderToolkit.JointGaussianLogDecoder","text":"JointGaussianLogDecoder <: AbstractGaussianLogDecoder\n\nAn extended decoder structure for VAEs that incorporates separate layers for mapping from the latent space to both its mean (µ) and log standard deviation (logσ).\n\nFields\n\ndecoder::Flux.Chain: The primary neural network used to process the latent space before determining its mean and log standard deviation.\nµ::Flux.Dense: A dense layer that maps from the output of the decoder to the mean of the latent space.\nlogσ::Flux.Dense: A dense layer that maps from the output of the decoder to the log standard deviation of the latent space.\n\nDescription\n\nJointGaussianLogDecoder is tailored for VAE architectures where the same decoder network is used initially, and then splits into two separate paths for determining both the mean and log standard deviation of the latent space.\n\n\n\n\n\n","category":"type"},{"location":"encoders/#AutoEncoderToolkit.JointGaussianLogDecoder-Tuple{AbstractArray}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.JointGaussianLogDecoder","text":" (decoder::JointGaussianLogDecoder)(z::AbstractArray)\n\nMaps the given latent representation z through the JointGaussianLogDecoder network to produce both the mean (µ) and log standard deviation (logσ).\n\nArguments\n\nz::AbstractArray: The latent space representation to be decoded. If array, the last dimension contains each of the latent space representations.\n\nReturns\n\nA NamedTuple (µ=µ, logσ=logσ,) where:\nµ::Array: The mean representation obtained from the decoder.\nlogσ::Array: The log standard deviation representation obtained from the decoder.\n\nDescription\n\nThis function processes the latent space representation z using the primary neural network of the JointGaussianLogDecoder struct. It then separately maps the output of this network to the mean and log standard deviation using the µ and logσ dense layers, respectively.\n\nExample\n\ndecoder = JointGaussianLogDecoder(...)\nz = ... # some latent space representation\noutput = decoder(z)\n\nNote\n\nEnsure that the latent space representation z matches the expected input dimensionality for the JointGaussianLogDecoder.\n\n\n\n\n\n","category":"method"},{"location":"encoders/#SplitGaussianDecoder","page":"Encoders & Decoders","title":"SplitGaussianDecoder","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.SplitGaussianDecoder\nAutoEncoderToolkit.SplitGaussianDecoder(::AbstractArray)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.SplitGaussianDecoder","page":"Encoders & Decoders","title":"AutoEncoderToolkit.SplitGaussianDecoder","text":"SplitGaussianDecoder <: AbstractGaussianLinearDecoder\n\nA specialized decoder structure for VAEs that uses distinct neural networks for determining the mean (µ) and standard deviation (logσ) of the latent space.\n\nFields\n\ndecoder_µ::Flux.Chain: A neural network dedicated to processing the latent space and mapping it to its mean.\ndecoder_σ::Flux.Chain: A neural network dedicated to processing the latent space and mapping it to its standard deviation.\n\nDescription\n\nSplitGaussianDecoder is designed for VAE architectures where separate decoder networks are preferred for computing the mean and log standard deviation, ensuring that each has its own distinct set of parameters and transformation logic.\n\n\n\n\n\n","category":"type"},{"location":"encoders/#AutoEncoderToolkit.SplitGaussianDecoder-Tuple{AbstractArray}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.SplitGaussianDecoder","text":" (decoder::SplitGaussianDecoder)(z::AbstractArray)\n\nMaps the given latent representation z through the separate networks of the SplitGaussianDecoder to produce both the mean (µ) and standard deviation (σ).\n\nArguments\n\nz::AbstractArray: The latent space representation to be decoded. If array, the last dimension contains each of the latent space representations to be decoded.\n\nReturns\n\nA NamedTuple (µ=µ, σ=σ,) where:\nµ::AbstractArray: The mean representation obtained using the dedicated decoder_µ network.\nσ::AbstractArray: The standard deviation representation obtained using the dedicated decoder_σ network.\n\nDescription\n\nThis function processes the latent space representation z through two distinct neural networks within the SplitGaussianDecoder struct. The decoder_µ network is used to produce the mean representation, while the decoder_σ network is utilized for the standard deviation.\n\nExample\n\ndecoder = SplitGaussianDecoder(...)\nz = ... # some latent space representation\noutput = decoder(z)\n\nNote\n\nEnsure that the latent space representation z matches the expected input dimensionality for both networks in the SplitGaussianDecoder.\n\n\n\n\n\n","category":"method"},{"location":"encoders/#SplitGaussianLogDecoder","page":"Encoders & Decoders","title":"SplitGaussianLogDecoder","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.SplitGaussianLogDecoder\nAutoEncoderToolkit.SplitGaussianLogDecoder(::AbstractArray)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.SplitGaussianLogDecoder","page":"Encoders & Decoders","title":"AutoEncoderToolkit.SplitGaussianLogDecoder","text":"SplitGaussianLogDecoder <: AbstractGaussianLogDecoder\n\nA specialized decoder structure for VAEs that uses distinct neural networks for determining the mean (µ) and log standard deviation (logσ) of the latent space.\n\nFields\n\ndecoder_µ::Flux.Chain: A neural network dedicated to processing the latent space and mapping it to its mean.\ndecoder_logσ::Flux.Chain: A neural network dedicated to processing the latent space and mapping it to its log standard deviation.\n\nDescription\n\nSplitGaussianLogDecoder is designed for VAE architectures where separate decoder networks are preferred for computing the mean and log standard deviation, ensuring that each has its own distinct set of parameters and transformation logic.\n\n\n\n\n\n","category":"type"},{"location":"encoders/#AutoEncoderToolkit.SplitGaussianLogDecoder-Tuple{AbstractArray}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.SplitGaussianLogDecoder","text":" (decoder::SplitGaussianLogDecoder)(z::AbstractArray)\n\nMaps the given latent representation z through the separate networks of the SplitGaussianLogDecoder to produce both the mean (µ) and log standard deviation (logσ).\n\nArguments\n\nz::AbstractArray: The latent space representation to be decoded. If array, the last dimension contains each of the latent space representations to be decoded.\n\nReturns\n\nA NamedTuple (µ=µ, logσ=logσ,) where:\nµ::AbstractArray: The mean representation obtained using the dedicated decoder_µ network.\nlogσ::AbstractArray: The log standard deviation representation obtained using the dedicated decoder_logσ network.\n\nDescription\n\nThis function processes the latent space representation z through two distinct neural networks within the SplitGaussianLogDecoder struct. The decoder_µ network is used to produce the mean representation, while the decoder_logσ network is utilized for the log standard deviation.\n\nExample\n\ndecoder = SplitGaussianLogDecoder(...)\nz = ... # some latent space representation\noutput = decoder(z))\n\nNote\n\nEnsure that the latent space representation z matches the expected input dimensionality for both networks in the SplitGaussianLogDecoder.\n\n\n\n\n\n","category":"method"},{"location":"encoders/#Default-initializations","page":"Encoders & Decoders","title":"Default initializations","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"The package provides a set of functions to initialize encoder and decoder architectures. Although it gives the user less flexibility, it can be useful for quick prototyping.","category":"page"},{"location":"encoders/#Encoder-initializations","page":"Encoders & Decoders","title":"Encoder initializations","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.Encoder(\n ::Int, ::Int, ::Vector{<:Int}, ::Vector{<:Function}, ::Function\n)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.Encoder-Tuple{Int64, Int64, Vector{<:Int64}, Vector{<:Function}, Function}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.Encoder","text":"Encoder(n_input, n_latent, latent_activation, encoder_neurons, \n encoder_activation; init=Flux.glorot_uniform)\n\nConstruct and initialize an Encoder struct that defines an encoder network for a deterministic autoencoder.\n\nArguments\n\nn_input::Int: The dimensionality of the input data.\nn_latent::Int: The dimensionality of the latent space.\nencoder_neurons::Vector{<:Int}: A vector specifying the number of neurons in each layer of the encoder network.\nencoder_activation::Vector{<:Function}: Activation functions corresponding to each layer in the encoder_neurons.\nlatent_activation::Function: Activation function for the latent space layer.\n\nOptional Keyword Arguments\n\ninit::Function=Flux.glorot_uniform: The initialization function used for the neural network weights.\n\nReturns\n\nAn Encoder struct initialized based on the provided arguments.\n\nExamples\n\njulia encoder = Encoder(784, 20, tanh, [400], [relu])`\n\nNotes\n\nThe length of encoderneurons should match the length of encoderactivation, ensuring that each layer in the encoder has a corresponding activation function.\n\n\n\n\n\n","category":"method"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.JointGaussianLogEncoder( \n ::Int, \n ::Int, \n ::Vector{<:Int}, \n ::Vector{<:Function}, \n ::Function;\n)\nAutoEncoderToolkit.JointGaussianLogEncoder(\n ::Int,\n ::Int,\n ::Vector{<:Int},\n ::Vector{<:Function},\n ::Vector{<:Function};\n)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.JointGaussianLogEncoder-Tuple{Int64, Int64, Vector{<:Int64}, Vector{<:Function}, Function}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.JointGaussianLogEncoder","text":"JointGaussianLogEncoder(n_input, n_latent, encoder_neurons, encoder_activation, \n latent_activation; init=Flux.glorot_uniform)\n\nConstruct and initialize a JointGaussianLogEncoder struct that defines an encoder network for a variational autoencoder.\n\nArguments\n\nn_input::Int: The dimensionality of the input data.\nn_latent::Int: The dimensionality of the latent space.\nencoder_neurons::Vector{<:Int}: A vector specifying the number of neurons in each layer of the encoder network.\nencoder_activation::Vector{<:Function}: Activation functions corresponding to each layer in the encoder_neurons.\nlatent_activation::Function: Activation function for the latent space layers (both µ and logσ).\n\nOptional Keyword Arguments\n\ninit::Function=Flux.glorot_uniform: The initialization function used for the neural network weights.\n\nReturns\n\nA JointGaussianLogEncoder struct initialized based on the provided arguments.\n\nExamples\n\nencoder = JointGaussianLogEncoder(784, 20, [400], [relu], tanh)\n\nNotes\n\nThe length of encoderneurons should match the length of encoderactivation, ensuring that each layer in the encoder has a corresponding activation function.\n\n\n\n\n\n","category":"method"},{"location":"encoders/#AutoEncoderToolkit.JointGaussianLogEncoder-Tuple{Int64, Int64, Vector{<:Int64}, Vector{<:Function}, Vector{<:Function}}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.JointGaussianLogEncoder","text":"JointGaussianLogEncoder(n_input, n_latent, encoder_neurons, encoder_activation, \n latent_activation; init=Flux.glorot_uniform)\n\nConstruct and initialize a JointGaussianLogEncoder struct that defines an encoder network for a variational autoencoder.\n\nArguments\n\nn_input::Int: The dimensionality of the input data.\nn_latent::Int: The dimensionality of the latent space.\nencoder_neurons::Vector{<:Int}: A vector specifying the number of neurons in each layer of the encoder network.\nencoder_activation::Vector{<:Function}: Activation functions corresponding to each layer in the encoder_neurons.\nlatent_activation::Vector{<:Function}: Activation functions for the latent space layers (both µ and logσ).\n\nOptional Keyword Arguments\n\ninit::Function=Flux.glorot_uniform: The initialization function used for the neural network weights.\n\nReturns\n\nA JointGaussianLogEncoder struct initialized based on the provided arguments.\n\nExamples\n\nencoder = JointGaussianLogEncoder(784, 20, [400], [relu], tanh)\n\nNotes\n\nThe length of encoderneurons should match the length of encoderactivation, ensuring that each layer in the encoder has a corresponding activation function.\n\n\n\n\n\n","category":"method"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.JointGaussianEncoder(\n ::Int,\n ::Int,\n ::Vector{<:Int},\n ::Vector{<:Function},\n ::Vector{<:Function};\n)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.JointGaussianEncoder-Tuple{Int64, Int64, Vector{<:Int64}, Vector{<:Function}, Vector{<:Function}}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.JointGaussianEncoder","text":"JointGaussianEncoder(n_input, n_latent, encoder_neurons, encoder_activation, \n latent_activation; init=Flux.glorot_uniform)\n\nConstruct and initialize a JointGaussianLogEncoder struct that defines an encoder network for a variational autoencoder.\n\nArguments\n\nn_input::Int: The dimensionality of the input data.\nn_latent::Int: The dimensionality of the latent space.\nencoder_neurons::Vector{<:Int}: A vector specifying the number of neurons in each layer of the encoder network.\nencoder_activation::Vector{<:Function}: Activation functions corresponding to each layer in the encoder_neurons.\nlatent_activation::Vector{<:Function}: Activation function for the latent space layers. This vector must contain the activation for both µ and logσ.\n\nOptional Keyword Arguments\n\ninit::Function=Flux.glorot_uniform: The initialization function used for the neural network weights.\n\nReturns\n\nA JointGaussianEncoder struct initialized based on the provided arguments.\n\nExamples\n\nencoder = JointGaussianEncoder(784, 20, [400], [relu], [tanh, softplus])\n\nNotes\n\nThe length of encoderneurons should match the length of encoderactivation, ensuring that each layer in the encoder has a corresponding activation function.\n\n\n\n\n\n","category":"method"},{"location":"encoders/#Decoder-initializations","page":"Encoders & Decoders","title":"Decoder initializations","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.Decoder(\n ::Int,\n ::Int,\n ::Vector{<:Int},\n ::Vector{<:Function},\n ::Function;\n)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.Decoder-Tuple{Int64, Int64, Vector{<:Int64}, Vector{<:Function}, Function}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.Decoder","text":"Decoder(n_input, n_latent, decoder_neurons, decoder_activation, \n output_activation; init=Flux.glorot_uniform)\n\nConstruct and initialize a Decoder struct that defines a decoder network for a deterministic autoencoder.\n\nArguments\n\nn_input::Int: The dimensionality of the output data (which typically matches the input data dimensionality of the autoencoder).\nn_latent::Int: The dimensionality of the latent space.\ndecoder_neurons::Vector{<:Int}: A vector specifying the number of neurons in each layer of the decoder network.\ndecoder_activation::Vector{<:Function}: Activation functions corresponding to each layer in the decoder_neurons.\noutput_activation::Function: Activation function for the final output layer.\n\nOptional Keyword Arguments\n\ninit::Function=Flux.glorot_uniform: The initialization function used for the neural network weights.\n\nReturns\n\nA Decoder struct initialized based on the provided arguments.\n\nExamples\n\ndecoder = Decoder(784, 20, sigmoid, [400], [relu])\n\nNotes\n\nThe length of decoderneurons should match the length of decoderactivation, ensuring that each layer in the decoder has a corresponding activation function.\n\n\n\n\n\n","category":"method"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.SimpleGaussianDecoder(\n ::Int,\n ::Int,\n ::Vector{<:Int},\n ::Vector{<:Function},\n ::Function;\n)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.SimpleGaussianDecoder-Tuple{Int64, Int64, Vector{<:Int64}, Vector{<:Function}, Function}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.SimpleGaussianDecoder","text":"SimpleGaussianDecoder(n_input, n_latent, decoder_neurons, decoder_activation, \n output_activation; init=Flux.glorot_uniform)\n\nConstructs and initializes a SimpleGaussianDecoder object designed for variational autoencoders (VAEs). This function sets up a straightforward decoder network that maps from a latent space to an output space.\n\nArguments\n\nn_input::Int: Dimensionality of the output data (or the data to be reconstructed).\nn_latent::Int: Dimensionality of the latent space.\ndecoder_neurons::Vector{<:Int}: Vector of layer sizes for the decoder network, not including the input latent layer and the final output layer.\ndecoder_activation::Vector{<:Function}: Activation functions for each decoder layer, not including the final output layer.\noutput_activation::Function: Activation function for the final output layer.\n\nOptional Keyword Arguments\n\ninit::Function=Flux.glorot_uniform: Initialization function for the network parameters.\n\nReturns\n\nA SimpleGaussianDecoder object with the specified architecture and initialized weights.\n\nDescription\n\nThis function constructs a SimpleGaussianDecoder object, setting up its decoder network based on the provided specifications. The architecture begins with a dense layer mapping from the latent space, goes through a sequence of middle layers if specified, and finally maps to the output space.\n\nThe function ensures that there are appropriate activation functions provided for each layer in the decoder_neurons and checks for potential mismatches in length.\n\nExample\n\nn_input = 28*28\nn_latent = 64\ndecoder_neurons = [128, 256]\ndecoder_activation = [relu, relu]\noutput_activation = sigmoid\ndecoder = SimpleGaussianDecoder(\n n_input, n_latent, decoder_neurons, decoder_activation, output_activation\n)\n\nNote\n\nEnsure that the lengths of decoderneurons and decoderactivation match, excluding the output layer.\n\n\n\n\n\n","category":"method"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.JointGaussianLogDecoder(\n ::Int,\n ::Int,\n ::Vector{<:Int},\n ::Vector{<:Function},\n ::Function;\n)\nAutoEncoderToolkit.JointGaussianLogDecoder(\n ::Int,\n ::Int,\n ::Vector{<:Int},\n ::Vector{<:Function},\n ::Vector{<:Function};\n)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.JointGaussianLogDecoder-Tuple{Int64, Int64, Vector{<:Int64}, Vector{<:Function}, Function}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.JointGaussianLogDecoder","text":"JointGaussianLogDecoder(n_input, n_latent, decoder_neurons, decoder_activation, \n latent_activation; init=Flux.glorot_uniform)\n\nConstructs and initializes a JointGaussianLogDecoder object for variational autoencoders (VAEs). This function sets up a decoder network that first processes the latent space and then maps it separately to both its mean (µ) and log standard deviation (logσ).\n\nArguments\n\nn_input::Int: Dimensionality of the output data (or the data to be reconstructed).\nn_latent::Int: Dimensionality of the latent space.\ndecoder_neurons::Vector{<:Int}: Vector of layer sizes for the primary decoder network, not including the input latent layer.\ndecoder_activation::Vector{<:Function}: Activation functions for each primary decoder layer.\noutput_activation::Function: Activation function for the mean (µ) and log standard deviation (logσ) layers.\n\nOptional Keyword Arguments\n\ninit::Function=Flux.glorot_uniform: Initialization function for the network parameters.\n\nReturns\n\nA JointGaussianLogDecoder object with the specified architecture and initialized weights.\n\nDescription\n\nThis function constructs a JointGaussianLogDecoder object, setting up its primary decoder network based on the provided specifications. The architecture begins with a dense layer mapping from the latent space and goes through a sequence of middle layers if specified. After processing the latent space through the primary decoder, it then maps separately to both its mean (µ) and log standard deviation (logσ).\n\nExample\n\nn_input = 28*28\nn_latent = 64\ndecoder_neurons = [128, 256]\ndecoder_activation = [relu, relu]\noutput_activation = tanh\ndecoder = JointGaussianLogDecoder(\n n_input, n_latent, decoder_neurons, decoder_activation, output_activation\n)\n\nNote\n\nEnsure that the lengths of decoderneurons and decoderactivation match.\n\n\n\n\n\n","category":"method"},{"location":"encoders/#AutoEncoderToolkit.JointGaussianLogDecoder-Tuple{Int64, Int64, Vector{<:Int64}, Vector{<:Function}, Vector{<:Function}}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.JointGaussianLogDecoder","text":"JointGaussianLogDecoder(n_input, n_latent, decoder_neurons, decoder_activation, \n latent_activation; init=Flux.glorot_uniform)\n\nConstructs and initializes a JointGaussianLogDecoder object for variational autoencoders (VAEs). This function sets up a decoder network that first processes the latent space and then maps it separately to both its mean (µ) and log standard deviation (logσ).\n\nArguments\n\nn_input::Int: Dimensionality of the output data (or the data to be reconstructed).\nn_latent::Int: Dimensionality of the latent space.\ndecoder_neurons::Vector{<:Int}: Vector of layer sizes for the primary decoder network, not including the input latent layer.\ndecoder_activation::Vector{<:Function}: Activation functions for each primary decoder layer.\noutput_activation::Vector{<:Function}: Activation functions for the mean (µ) and log standard deviation (logσ) layers.\n\nOptional Keyword Arguments\n\ninit::Function=Flux.glorot_uniform: Initialization function for the network parameters.\n\nReturns\n\nA JointGaussianLogDecoder object with the specified architecture and initialized weights.\n\nDescription\n\nThis function constructs a JointGaussianLogDecoder object, setting up its primary decoder network based on the provided specifications. The architecture begins with a dense layer mapping from the latent space and goes through a sequence of middle layers if specified. After processing the latent space through the primary decoder, it then maps separately to both its mean (µ) and log standard deviation (logσ).\n\nExample\n\nn_input = 28*28\nn_latent = 64\ndecoder_neurons = [128, 256]\ndecoder_activation = [relu, relu]\noutput_activation = [tanh, identity]\ndecoder = JointGaussianLogDecoder(\n n_input, n_latent, decoder_neurons, decoder_activation, latent_activation\n)\n\nNote\n\nEnsure that the lengths of decoderneurons and decoderactivation match.\n\n\n\n\n\n","category":"method"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.JointGaussianDecoder(\n ::Int,\n ::Int,\n ::Vector{<:Int},\n ::Vector{<:Function},\n ::Function;\n)\nAutoEncoderToolkit.JointGaussianDecoder(\n ::Int,\n ::Int,\n ::Vector{<:Int},\n ::Vector{<:Function},\n ::Vector{<:Function};\n)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.JointGaussianDecoder-Tuple{Int64, Int64, Vector{<:Int64}, Vector{<:Function}, Function}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.JointGaussianDecoder","text":"JointGaussianDecoder(n_input, n_latent, decoder_neurons, decoder_activation, \n latent_activation; init=Flux.glorot_uniform)\n\nConstructs and initializes a JointGaussianLogDecoder object for variational autoencoders (VAEs). This function sets up a decoder network that first processes the latent space and then maps it separately to both its mean (µ) and log standard deviation (logσ).\n\nArguments\n\nn_input::Int: Dimensionality of the output data (or the data to be reconstructed).\nn_latent::Int: Dimensionality of the latent space.\ndecoder_neurons::Vector{<:Int}: Vector of layer sizes for the primary decoder network, not including the input latent layer.\ndecoder_activation::Vector{<:Function}: Activation functions for each primary decoder layer.\noutput_activation::Function: Activation function for the mean (µ) and log standard deviation (logσ) layers.\n\nOptional Keyword Arguments\n\ninit::Function=Flux.glorot_uniform: Initialization function for the network parameters.\n\nReturns\n\nA JointGaussianDecoder object with the specified architecture and initialized weights.\n\nDescription\n\nThis function constructs a JointGaussianDecoder object, setting up its primary decoder network based on the provided specifications. The architecture begins with a dense layer mapping from the latent space and goes through a sequence of middle layers if specified. After processing the latent space through the primary decoder, it then maps separately to both its mean (µ) and standard deviation (σ).\n\nExample\n\nn_input = 28*28\nn_latent = 64\ndecoder_neurons = [128, 256]\ndecoder_activation = [relu, relu]\noutput_activation = tanh\ndecoder = JointGaussianDecoder(\n n_input, n_latent, decoder_neurons, decoder_activation, output_activation\n)\n\nNote\n\nEnsure that the lengths of decoderneurons and decoderactivation match.\n\n\n\n\n\n","category":"method"},{"location":"encoders/#AutoEncoderToolkit.JointGaussianDecoder-Tuple{Int64, Int64, Vector{<:Int64}, Vector{<:Function}, Vector{<:Function}}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.JointGaussianDecoder","text":"JointGaussianDecoder(n_input, n_latent, decoder_neurons, decoder_activation, \n latent_activation; init=Flux.glorot_uniform)\n\nConstructs and initializes a JointGaussianDecoder object for variational autoencoders (VAEs). This function sets up a decoder network that first processes the latent space and then maps it separately to both its mean (µ) and standard deviation (σ).\n\nArguments\n\nn_input::Int: Dimensionality of the output data (or the data to be reconstructed).\nn_latent::Int: Dimensionality of the latent space.\ndecoder_neurons::Vector{<:Int}: Vector of layer sizes for the primary decoder network, not including the input latent layer.\ndecoder_activation::Vector{<:Function}: Activation functions for each primary decoder layer.\noutput_activation::Function: Activation function for the mean (µ) and standard deviation (σ) layers.\n\nOptional Keyword Arguments\n\ninit::Function=Flux.glorot_uniform: Initialization function for the network parameters.\n\nReturns\n\nA JointGaussianDecoder object with the specified architecture and initialized weights.\n\nDescription\n\nThis function constructs a JointGaussianDecoder object, setting up its primary decoder network based on the provided specifications. The architecture begins with a dense layer mapping from the latent space and goes through a sequence of middle layers if specified. After processing the latent space through the primary decoder, it then maps separately to both its mean (µ) and standard deviation (σ).\n\nExample\n\nn_input = 28*28\nn_latent = 64\ndecoder_neurons = [128, 256]\ndecoder_activation = [relu, relu]\nlatent_activation = [tanh, softplus]\ndecoder = JointGaussianDecoder(\n n_input, n_latent, decoder_neurons, decoder_activation, latent_activation\n)\n\nNote\n\nEnsure that the lengths of decoderneurons and decoderactivation match.\n\n\n\n\n\n","category":"method"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.SplitGaussianLogDecoder(\n ::Int,\n ::Int,\n ::Vector{<:Int},\n ::Vector{<:Function},\n ::Vector{<:Int},\n ::Vector{<:Function};\n)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.SplitGaussianLogDecoder-Tuple{Int64, Int64, Vector{<:Int64}, Vector{<:Function}, Vector{<:Int64}, Vector{<:Function}}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.SplitGaussianLogDecoder","text":"SplitGaussianLogDecoder(n_input, n_latent, µ_neurons, µ_activation, logσ_neurons, \n logσ_activation; init=Flux.glorot_uniform)\n\nConstructs and initializes a SplitGaussianLogDecoder object for variational autoencoders (VAEs). This function sets up two distinct decoder networks, one dedicated for determining the mean (µ) and the other for the log standard deviation (logσ) of the latent space.\n\nArguments\n\nn_input::Int: Dimensionality of the output data (or the data to be reconstructed).\nn_latent::Int: Dimensionality of the latent space.\nµ_neurons::Vector{<:Int}: Vector of layer sizes for the µ decoder network, not including the input latent layer.\nµ_activation::Vector{<:Function}: Activation functions for each µ decoder layer.\nlogσ_neurons::Vector{<:Int}: Vector of layer sizes for the logσ decoder network, not including the input latent layer.\nlogσ_activation::Vector{<:Function}: Activation functions for each logσ decoder layer.\n\nOptional Keyword Arguments\n\ninit::Function=Flux.glorot_uniform: Initialization function for the network parameters.\n\nReturns\n\nA SplitGaussianLogDecoder object with two distinct networks initialized with the specified architectures and weights.\n\nDescription\n\nThis function constructs a SplitGaussianLogDecoder object, setting up two separate decoder networks based on the provided specifications. The first network, dedicated to determining the mean (µ), and the second for the log standard deviation (logσ), both begin with a dense layer mapping from the latent space and go through a sequence of middle layers if specified.\n\nExample\n\nn_latent = 64\nµ_neurons = [128, 256]\nµ_activation = [relu, relu]\nlogσ_neurons = [128, 256]\nlogσ_activation = [relu, relu]\ndecoder = SplitGaussianLogDecoder(\n n_latent, µ_neurons, µ_activation, logσ_neurons, logσ_activation\n)\n\nNotes\n\nEnsure that the lengths of µneurons with µactivation and logσneurons with logσactivation match respectively.\nIf µneurons[end] or logσneurons[end] do not match n_input, the function automatically changes this number to match the right dimensionality\n\n\n\n\n\n","category":"method"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.SplitGaussianDecoder(\n ::Int,\n ::Int,\n ::Vector{<:Int},\n ::Vector{<:Function},\n ::Vector{<:Int},\n ::Vector{<:Function};\n)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.SplitGaussianDecoder-Tuple{Int64, Int64, Vector{<:Int64}, Vector{<:Function}, Vector{<:Int64}, Vector{<:Function}}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.SplitGaussianDecoder","text":"SplitGaussianDecoder(n_input, n_latent, µ_neurons, µ_activation, logσ_neurons, \n logσ_activation; init=Flux.glorot_uniform)\n\nConstructs and initializes a SplitGaussianDecoder object for variational autoencoders (VAEs). This function sets up two distinct decoder networks, one dedicated for determining the mean (µ) and the other for the standard deviation (σ) of the latent space.\n\nArguments\n\nn_input::Int: Dimensionality of the output data (or the data to be reconstructed).\nn_latent::Int: Dimensionality of the latent space.\nµ_neurons::Vector{<:Int}: Vector of layer sizes for the µ decoder network, not including the input latent layer.\nµ_activation::Vector{<:Function}: Activation functions for each µ decoder layer.\nσ_neurons::Vector{<:Int}: Vector of layer sizes for the σ decoder network, not including the input latent layer.\nσ_activation::Vector{<:Function}: Activation functions for each σ decoder layer.\n\nOptional Keyword Arguments\n\ninit::Function=Flux.glorot_uniform: Initialization function for the network parameters.\n\nReturns\n\nA SplitGaussianDecoder object with two distinct networks initialized with the specified architectures and weights.\n\nDescription\n\nThis function constructs a SplitGaussianDecoder object, setting up two separate decoder networks based on the provided specifications. The first network, dedicated to determining the mean (µ), and the second for the standard deviation (σ), both begin with a dense layer mapping from the latent space and go through a sequence of middle layers if specified.\n\nExample\n\nn_latent = 64\nµ_neurons = [128, 256]\nµ_activation = [relu, relu]\nσ_neurons = [128, 256]\nσ_activation = [relu, relu]\ndecoder = SplitGaussianDecoder(\n n_latent, µ_neurons, µ_activation, σ_neurons, σ_activation\n)\n\nNotes\n\nEnsure that the lengths of µneurons with µactivation and σneurons with σactivation match respectively.\nIf µneurons[end] or σneurons[end] do not match n_input, the function automatically changes this number to match the right dimensionality\nEnsure that σ_neurons[end] maps to a positive value. Activation functions such as softplus are needed to guarantee the positivity of the standard deviation.\n\n\n\n\n\n","category":"method"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.BernoulliDecoder(\n ::Int,\n ::Int,\n ::Vector{<:Int},\n ::Vector{<:Function},\n ::Function;\n)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.BernoulliDecoder-Tuple{Int64, Int64, Vector{<:Int64}, Vector{<:Function}, Function}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.BernoulliDecoder","text":" BernoulliDecoder(n_input, n_latent, decoder_neurons, decoder_activation, \n output_activation; init=Flux.glorot_uniform)\n\nConstructs and initializes a BernoulliDecoder object designed for variational autoencoders (VAEs). This function sets up a decoder network that maps from a latent space to an output space.\n\nArguments\n\nn_input::Int: Dimensionality of the output data (or the data to be reconstructed).\nn_latent::Int: Dimensionality of the latent space.\ndecoder_neurons::Vector{<:Int}: Vector of layer sizes for the decoder network, not including the input latent layer and the final output layer.\ndecoder_activation::Vector{<:Function}: Activation functions for each decoder layer, not including the final output layer.\noutput_activation::Function: Activation function for the final output layer.\n\nOptional Keyword Arguments\n\ninit::Function=Flux.glorot_uniform: Initialization function for the network parameters.\n\nReturns\n\nA BernoulliDecoder object with the specified architecture and initialized weights.\n\nDescription\n\nThis function constructs a BernoulliDecoder object, setting up its decoder network based on the provided specifications. The architecture begins with a dense layer mapping from the latent space, goes through a sequence of middle layers if specified, and finally maps to the output space.\n\nThe function ensures that there are appropriate activation functions provided for each layer in the decoder_neurons and checks for potential mismatches in length.\n\nExample\n\nn_input = 28*28\nn_latent = 64\ndecoder_neurons = [128, 256]\ndecoder_activation = [relu, relu]\noutput_activation = sigmoid\ndecoder = BernoulliDecoder(\n n_input, \n n_latent, \n decoder_neurons, \n decoder_activation, \n output_activation\n)\n\nNote\n\nEnsure that the lengths of decoderneurons and decoderactivation match, excluding the output layer. Also, the output activation function should return values between 0 and 1, as the decoder models the output data as a Bernoulli distribution. \n\n\n\n\n\n","category":"method"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.CategoricalDecoder(\n ::AbstractVector{<:Int},\n ::Int,\n ::Vector{<:Int},\n ::Vector{<:Function},\n ::Function;\n)\nAutoEncoderToolkit.CategoricalDecoder(\n ::Int,\n ::Int,\n ::Vector{<:Int},\n ::Vector{<:Function},\n ::Function;\n)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.CategoricalDecoder-Tuple{AbstractVector{<:Int64}, Int64, Vector{<:Int64}, Vector{<:Function}, Function}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.CategoricalDecoder","text":" CategoricalDecoder(\n size_input, n_latent, decoder_neurons, decoder_activation, \n output_activation; init=Flux.glorot_uniform\n )\n\nConstructs and initializes a CategoricalDecoder object designed for variational autoencoders (VAEs). This function sets up a decoder network that maps from a latent space to an output space.\n\nArguments\n\nsize_input::AbstractVector{<:Int}: Dimensionality of the output data (or the data to be reconstructed) in the form of a vector where each element represents the size of a dimension.\nn_latent::Int: Dimensionality of the latent space.\ndecoder_neurons::Vector{<:Int}: Vector of layer sizes for the decoder network, not including the input latent layer and the final output layer.\ndecoder_activation::Vector{<:Function}: Activation functions for each decoder layer, not including the final output layer.\noutput_activation::Function: Activation function for the final output layer.\n\nOptional Keyword Arguments\n\ninit::Function=Flux.glorot_uniform: Initialization function for the network parameters.\n\nReturns\n\nA CategoricalDecoder object with the specified architecture and initialized weights.\n\nDescription\n\nThis function constructs a CategoricalDecoder object, setting up its decoder network based on the provided specifications. The architecture begins with a dense layer mapping from the latent space, goes through a sequence of middle layers if specified, and finally maps to the output space.\n\nThe function ensures that there are appropriate activation functions provided for each layer in the decoder_neurons and checks for potential mismatches in length.\n\nThe output layer uses the identity function as its activation function, and the output is reshaped to match the dimensions specified in size_input. The output_activation function is then applied over the first dimension of the reshaped output.\n\nNote\n\nEnsure that the lengths of decoderneurons and decoderactivation match, excluding the output layer. Also, the output activation function should return values that can be interpreted as probabilities, as the decoder models the output data as a categorical distribution. \n\n\n\n\n\n","category":"method"},{"location":"encoders/#AutoEncoderToolkit.CategoricalDecoder-Tuple{Int64, Int64, Vector{<:Int64}, Vector{<:Function}, Function}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.CategoricalDecoder","text":"CategoricalDecoder(\n n_input, n_latent, decoder_neurons, decoder_activation,\n output_activation; init=Flux.glorot_uniform\n)\n\nConstructs and initializes a CategoricalDecoder object designed for variational autoencoders (VAEs). This function sets up a decoder network that maps from a latent space to an output space.\n\nArguments\n\nsize_input::AbstractVector{<:Int}: Dimensionality of the output data (or the data to be reconstructed).\nn_latent::Int: Dimensionality of the latent space.\ndecoder_neurons::Vector{<:Int}: Vector of layer sizes for the decoder network, not including the input latent layer and the final output layer.\ndecoder_activation::Vector{<:Function}: Activation functions for each decoder layer, not including the final output layer.\noutput_activation::Function: Activation function for the final output layer.\n\nOptional Keyword Arguments\n\ninit::Function=Flux.glorot_uniform: Initialization function for the network parameters.\n\nReturns\n\nA CategoricalDecoder object with the specified architecture and initialized weights.\n\nDescription\n\nThis function constructs a CategoricalDecoder object, setting up its decoder network based on the provided specifications. The architecture begins with a dense layer mapping from the latent space, goes through a sequence of middle layers if specified, and finally maps to the output space.\n\nThe function ensures that there are appropriate activation functions provided for each layer in the decoder_neurons and checks for potential mismatches in length.\n\nNote\n\nEnsure that the lengths of decoderneurons and decoderactivation match, excluding the output layer. Also, the output activation function should return values that can be interpreted as probabilities, as the decoder models the output data as a categorical distribution. \n\n\n\n\n\n","category":"method"},{"location":"encoders/#Probabilistic-functions","page":"Encoders & Decoders","title":"Probabilistic functions","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"Given the probability-centered design of AutoEncoderToolkit.jl, each variational encoder and decoder has an associated probabilistic function used when computing the evidence lower bound (ELBO). The following functions are available:","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.encoder_logposterior","category":"page"},{"location":"encoders/#AutoEncoderToolkit.encoder_logposterior","page":"Encoders & Decoders","title":"AutoEncoderToolkit.encoder_logposterior","text":"encoder_logposterior(\n z::AbstractVector,\n encoder::AbstractGaussianLogEncoder,\n encoder_output::NamedTuple\n)\n\nComputes the log-posterior of the latent variable z given the encoder output under a Gaussian distribution with mean and standard deviation given by the encoder.\n\nArguments\n\nz::AbstractVector: The latent variable for which the log-posterior is to be computed.\nencoder::AbstractGaussianLogEncoder: The encoder of the VAE, which is not used in the computation of the log-posterior. This argument is only used to know which method to call.\nencoder_output::NamedTuple: The output of the encoder, which includes the mean and log standard deviation of the Gaussian distribution.\n\nReturns\n\nlogposterior::T: The computed log-posterior of the latent variable z given the encoder output.\n\nDescription\n\nThe function computes the log-posterior of the latent variable z given the encoder output under a Gaussian distribution. The mean and log standard deviation of the Gaussian distribution are extracted from the encoder_output. The standard deviation is then computed by exponentiating the log standard deviation. The log-posterior is computed using the formula for the log-posterior of a Gaussian distribution.\n\nNote\n\nEnsure the dimensions of z match the expected input dimensionality of the encoder.\n\n\n\n\n\nencoder_logposterior(\n z::AbstractMatrix,\n encoder::AbstractGaussianLogEncoder,\n encoder_output::NamedTuple\n)\n\nComputes the log-posterior of the latent variable z given the encoder output under a Gaussian distribution with mean and standard deviation given by the encoder.\n\nArguments\n\nz::AbstractMatrix: The latent variable for which the log-posterior is to be computed. Each column of z represents a different data point.\nencoder::AbstractGaussianLogEncoder: The encoder of the VAE, which is not used in the computation of the log-posterior. This argument is only used to know which method to call.\nencoder_output::NamedTuple: The output of the encoder, which includes the mean and log standard deviation of the Gaussian distribution.\n\nReturns\n\nlogposterior::Vector: The computed log-posterior of the latent variable z given the encoder output. Each element of the vector corresponds to a different data point.\n\nDescription\n\nThe function computes the log-posterior of the latent variable z given the encoder output under a Gaussian distribution. The mean and log standard deviation of the Gaussian distribution are extracted from the encoder_output. The standard deviation is then computed by exponentiating the log standard deviation. The log-posterior is computed using the formula for the log-posterior of a Gaussian distribution.\n\nNote\n\nEnsure the dimensions of z match the expected input dimensionality of the encoder.\n\n\n\n\n\nencoder_logposterior(\n z::AbstractVector,\n encoder::AbstractGaussianLogEncoder,\n encoder_output::NamedTuple,\n index::Int\n)\n\nComputes the log-posterior of the latent variable z for a single data point specified by index given the encoder output under a Gaussian distribution with mean and standard deviation given by the encoder.\n\nArguments\n\nz::AbstractVector: The latent variable for which the log-posterior is to be computed. \nencoder::AbstractGaussianLogEncoder: The encoder of the VAE, which is not used in the computation of the log-posterior. This argument is only used to know which method to call.\nencoder_output::NamedTuple: The output of the encoder, which includes the mean and log standard deviation of the Gaussian distribution for multiple data points.\nindex::Int: The index of the data point for which the log-posterior is to be computed.\n\nReturns\n\nlogposterior::Float32: The computed log-posterior of the latent variable z for the specified data point given the encoder output.\n\nDescription\n\nThe function computes the log-posterior of the latent variable z for a single data point specified by index given the encoder output under a Gaussian distribution. The mean and log standard deviation of the Gaussian distribution are extracted from the encoder_output for the specified data point. The standard deviation is then computed by exponentiating the log standard deviation. The log-posterior is computed using the formula for the log-posterior of a Gaussian distribution.\n\nNote\n\nEnsure the dimensions of z match the expected input dimensionality of the encoder. Also, ensure that index is a valid index for the data points in encoder_output.\n\n\n\n\n\n","category":"function"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.encoder_kl","category":"page"},{"location":"encoders/#AutoEncoderToolkit.encoder_kl","page":"Encoders & Decoders","title":"AutoEncoderToolkit.encoder_kl","text":"encoder_kl(\n encoder::AbstractGaussianLogEncoder,\n encoder_output::NamedTuple\n)\n\nCalculate the Kullback-Leibler (KL) divergence between the approximate posterior distribution and the prior distribution in a variational autoencoder with a Gaussian encoder.\n\nThe KL divergence for a Gaussian encoder with mean encoder_µ and log standard deviation encoder_logσ is computed against a standard Gaussian prior.\n\nArguments\n\nencoder::AbstractGaussianLogEncoder: Encoder network. This argument is not used in the computation of the KL divergence, but is included to allow for multiple encoder types to be used with the same function.\nencoder_output::NamedTuple: NamedTuple containing all the encoder outputs. It should have fields μ and logσ representing the mean and log standard deviation of the encoder's output.\n\nReturns\n\nkl_div::Union{Number, Vector}: The KL divergence for the entire batch of data points. If encoder_µ is a vector, kl_div is a scalar. If encoder_µ is a matrix, kl_div is a vector where each element corresponds to the KL divergence for a batch of data points.\n\nNote\n\nIt is assumed that the mapping from data space to latent parameters (encoder_µ and encoder_logσ) has been performed prior to calling this function. The encoder argument is provided to indicate the type of decoder network used, but it is not used within the function itself.\n\n\n\n\n\n","category":"function"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.spherical_logprior","category":"page"},{"location":"encoders/#AutoEncoderToolkit.spherical_logprior","page":"Encoders & Decoders","title":"AutoEncoderToolkit.spherical_logprior","text":"spherical_logprior(z::AbstractVector, σ::Real=1.0f0)\n\nComputes the log-prior of the latent variable z under a spherical Gaussian distribution with zero mean and standard deviation σ.\n\nArguments\n\nz::AbstractVector: The latent variable for which the log-prior is to be computed.\nσ::T=1.0f0: The standard deviation of the spherical Gaussian distribution. Defaults to 1.0f0.\n\nReturns\n\nlogprior::T: The computed log-prior of the latent variable z.\n\nDescription\n\nThe function computes the log-prior of the latent variable z under a spherical Gaussian distribution with zero mean and standard deviation σ. The log-prior is computed using the formula for the log-prior of a Gaussian distribution.\n\nNote\n\nEnsure the dimension of z matches the expected dimensionality of the latent space.\n\n\n\n\n\nspherical_logprior(z::AbstractMatrix, σ::Real=1.0f0)\n\nComputes the log-prior of the latent variable z under a spherical Gaussian distribution with zero mean and standard deviation σ.\n\nArguments\n\nz::AbstractMatrix: The latent variable for which the log-prior is to be computed. Each column of z represents a different latent variable.\nσ::Real=1.0f0: The standard deviation of the spherical Gaussian distribution. Defaults to 1.0f0.\n\nReturns\n\nlogprior::T: The computed log-prior(s) of the latent variable z.\n\nDescription\n\nThe function computes the log-prior of the latent variable z under a spherical Gaussian distribution with zero mean and standard deviation σ. The log-prior is computed using the formula for the log-prior of a Gaussian distribution.\n\nNote\n\nEnsure the dimension of z matches the expected dimensionality of the latent space.\n\n\n\n\n\n","category":"function"},{"location":"encoders/#Defining-custom-encoder-and-decoder-types","page":"Encoders & Decoders","title":"Defining custom encoder and decoder types","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"note: Note\nWe will omit all docstrings in the following examples for brevity. However, every struct and function in AutoEncoderToolkit.jl is well-documented.","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"Let us imagine your particular task requires a custom encoder or decoder type. For example, let's imagine that for a particular application, you need a decoder whose output distribution is Poisson. In other words, the assumption is that each dimension in the input x_i is a sample from a Poisson distribution with mean lambda_i. Thus, on the decoder side, what the decoder return is a vector of these lambda paraeters. We thus need to define a custom decoder type.","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"struct PoissonDecoder <: AbstractVariationalDecoder\n decoder::Flux.Chain\nend # struct","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"With this struct defined, we need to define the forward-pass function for our custom PoissonDecoder. All decoders in AutoEncoderToolkit.jl return a NamedTuple with the corresponding parameters of the distribution that defines them. In this case, the Poisson distribution is defined by a single parameter lambda. Thus, we have a forward-pass of the form","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"function (decoder::PoissonDecoder)(z::AbstractArray)\n # Run input to decoder network\n return (λ=decoder.decoder(z),)\nend # function","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"Next, we need to define the probabilistic function associated with this decoder. We know that the probability of observing x_i given lambda_i is given by","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"P(x_i lambda_i) = fraclambda_i^x_i e^-lambda_ix_i\ntag1","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"If each x_i is independent, then the probability of observing the entire input x given the entire output lambda is given by the product of the individual probabilities, i.e.","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"P(x lambda) = prod_i P(x_i lambda_i)\ntag2","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"The log-likehood of the data given the output of the decoder is then given by","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"mathcalL(x lambda) = log P(x lambda) = sum_i log P(x_i lambda_i)\ntag3","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"which, by using the properties of the logarithm, can be written as","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"mathcalL(x lambda) = sum_i x_i log lambda_i - lambda_i - log(x_i)\ntag4","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"We can then define the probabilistic function associated with the PoissonDecoder as","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"function decoder_loglikelihood(\n x::AbstractArray,\n z::AbstractVector,\n decoder::PoissonDecoder,\n decoder_output::NamedTuple;\n)\n # Extract the lambda parameter of the Poisson distribution\n λ = decoder_output.λ\n\n # Compute log-likelihood\n loglikelihood = sum(x .* log.(λ) - λ - loggamma.(x .+ 1))\n\n return loglikelihood\nend # function","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"where we use the loggamma function from SpecialFunctions.jl to compute the log of the factorial of x_i.","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"warning: Warning\nWe only defined the decoder_loglikelihood method for z::AbstractVector. One should also include a method for z::AbstractMatrix used when performing batch training.","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"With these two functions defined, our PoissonDecoder is ready to be used with any of the different VAE flavors included in AutoEncoderToolkit.jl!","category":"page"},{"location":"diffgeo/#Differential-Geometry-of-Generative-Models","page":"Differential Geometry","title":"Differential Geometry of Generative Models","text":"","category":"section"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"A lot of recent research in the field of generative models has focused on the geometry of the learned latent space (see the references at the end of this section for examples). The non-linear nature of neural networks makes it relevant to consider the non-Euclidean geometry of the latent space when trying to gain insights into the structure of the learned space. In other words, given that neural networks involve a series of non-linear transformations of the input data, we cannot expect the latent space to be Euclidean, and thus, we need to account for curvature and other non-Euclidean properties. For this, we can borrow concepts and tools from Riemannian geometry, now applied to the latent space of generative models.","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"AutoEncoderToolkit.jl aims to provide the set of necessary tools to study the geometry of the latent space in the context of variational autoencoders generative models.","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"note: Note\nThis is very much work in progress. As always, contributions are welcome!","category":"page"},{"location":"diffgeo/#A-word-on-Riemannian-geometry","page":"Differential Geometry","title":"A word on Riemannian geometry","text":"","category":"section"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"In what follows we will give a very short primer on some relevant concepts in differential geometry. This includes some basic definitions and concepts along with what we consider intuitive explanations of the concepts. We trade rigor for accessibility, so if you are looking for a more formal treatment, this is not the place.","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"note: Note\nThese notes are partially based on the 2022 paper by Chadebec et al. [2].","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"A d-dimensional manifold mathcalM is a manifold that is locally homeomorphic to a d-dimensional Euclidean space. This means that the manifold–some surface or high-dimensional shape–when observed from really close, can be stretched or bent without tearing or gluing it to make it resemble regular Euclidean space. ","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"If the manifold is differentiable, it possesses a tangent space T_z at any point z in mathcalM composed of the tangent vectors of the curves passing by z. ","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"(Image: )","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"If the manifold mathcalM is equipped with a smooth inner product, ","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"g z rightarrow langle cdot mid cdot rangle_z\ntag1","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"defined on the tangent space T_z for any z in mathcalM, then mathcalM is a Riemannian manifold and g is the associated Riemannian metric. With this, a local representation of g at any point z is given by the positive definite matrix mathbfG(z).","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"A chart (fancy name for a coordinate system) (U phi) provides a homeomorphic mapping between an open set U of the manifold and an open set V of Euclidean space. This means that there is a way to bend and stretch any segment of the manifold to make it look like a segment of Euclidean space. Therefore, given a point z in U, a chart–its coordinate–phi (z_1 z_2 ldots z_d) induces a basis partial_z_1 partial_z_2 ldots partial_z_d on the tangent space T_z mathcalM. In other words, the partial derivatives of the manifold with respect to the dimensions form a basis (think of hati hatj hatk in 3D space) for the tangent space at that point. Hence, the metric–a \"position-dependent scale-bar\"–of a Riemannian manifold can be locally represented at phi as a positive definite matrix mathbfG(z) with components g_ij(z) of the form","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"g_ij(z) = langle partial_z_i mid partial_z_j rangle_z\ntag2","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"This implies that for every pair of vectors v w in T_z mathcalM and a point z in mathcalM, the inner product langle v mid w rangle_z is given by","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"langle v mid w rangle_z = v^T mathbfG(z) w\ntag3","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"If mathcalM is connected–a continuous shape with no breaks–a Riemannian distance between two points z_1 z_2 in mathcalM can be defined as","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"textdist(z_1 z_2) = min_gamma int_0^1 dt\nsqrtlangle dotgamma(t) mid dotgamma(t) rangle_gamma(t)\ntag4","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"where gamma is a 1D curve traveling from z_1 to z_2, i.e., gamma(0) = z_1 and gamma(1) = z_2. Another way to state this is that the length of a curve on the manifold gamma is given by","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"L(gamma) = int_0^1 dt \nsqrtlangle dotgamma(t) mid dotgamma(t) rangle_gamma(t)\ntag5","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"If L minimizes the distance between the initial and final points, then gamma is a geodesic curve.","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"The concept of geodesic is so important the study of the Riemannian manifold learned by generative models that let's try to give another intuitive explanation. Let us consider a curve gamma such that","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"gamma 0 1 rightarrow mathbbR^d\ntag6","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"In words, gamma is a function that, without loss of generality, maps a number between zero and one to the dimensionality of the latent space (the dimensionality of our manifold). Let us define f to be a continuous function that embeds any point along the curve gamma into the data space, i.e.,","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"f gamma(t) rightarrow x in mathbbR^n\ntag7","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"where n is the dimensionality of the data space. ","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"(Image: )","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"The length of this curve in the data space is given by","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"L(gamma) = int_0^1 dt\nleft fracd fdt right_2\ntag8","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"After some manipulation, we can show that the length of the curve in the data space is given by","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"L(gamma) = int_0^1 dt\nsqrt\n dotgamma(t)^T mathbfG(gamma(t)) dotgamma(t)\n\ntag9","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"where dotgamma(t) is the derivative of gamma with respect to t, and T denotes the transpose of a vector. For a Euclidean space, the length of the curve would take the same functional form, except that the metric tensor would be given by the identity matrix. This is why the metric tensor can be thought of as a position-dependent scale-bar.","category":"page"},{"location":"diffgeo/#neuralgeodesic","page":"Differential Geometry","title":"Neural Geodesic Networks","text":"","category":"section"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"Computing a geodesic on a Riemannian manifold is a non-trivial task, especially when the manifold is parametrized by a neural network. Thus, knowing the function gamma that minimizes the distance between two points z_1 and z_2 is not straightforward. However, as first suggested by Chen et al. [1], we can repurpose the expressivity of neural networks to approximate almost any function to approximate the geodesic curve. This is the idea behind the Neural Geodesic module in AutoEncoderToolkit.jl.","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"Briefly, to approximate the geodesic curve between two points z_1 and z_2 in latent space, we define a neural network g_omega such that","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"g_omega mathbbR rightarrow mathbbR^d\ntag10","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"i.e., the neural network takes a number between zero and one and maps it to the dimensionality of the latent space. The intention is to have g_omega approx gamma, where omega are the parameters of the neural network we are free to optimize.","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"We approximate the integral defining the length of the curve in the latent space with n equidistantly sampled points t_i between zero and one. The length of the curve is then approximated by","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"L(g_gamma(t)) approx frac1n sum_i=1^n \nsqrt\n dotg_omega(t_i)^T mathbfG(g_omega(t_i)) dotg_omega(t_i)\n","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"By setting the loss function to be this approximation of the length of the curve, we can train the neural network to approximate the geodesic curve.","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"AutoEncoderToolkit.jl provides the NeuralGeodesic struct to implement this idea. The struct takes three inputs:","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"The multi-layer perceptron (MLP) that approximates the geodesic curve.\nThe initial point in latent space.\nThe final point in latent space.","category":"page"},{"location":"diffgeo/#NeuralGeodesic-struct","page":"Differential Geometry","title":"NeuralGeodesic struct","text":"","category":"section"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"AutoEncoderToolkit.diffgeo.NeuralGeodesics.NeuralGeodesic","category":"page"},{"location":"diffgeo/#AutoEncoderToolkit.diffgeo.NeuralGeodesics.NeuralGeodesic","page":"Differential Geometry","title":"AutoEncoderToolkit.diffgeo.NeuralGeodesics.NeuralGeodesic","text":"NeuralGeodesic\n\nType to define a neural network that approximates a geodesic curve on a Riemanian manifold. If a curve γ̲(t) represents a geodesic curve on a manifold, i.e.,\n\nL(γ̲) = min_γ ∫ dt √(⟨γ̲̇(t), M̲̲ γ̲̇(t)⟩),\n\nwhere M̲̲ is the Riemmanian metric, then this type defines a neural network g_ω(t) such that\n\nγ̲(t) ≈ g_ω(t).\n\nThis neural network must have a single input (1D). The dimensionality of the output must match the dimensionality of the manifold.\n\nFields\n\nmlp::Flux.Chain: Neural network that approximates the geodesic curve. The dimensionality of the input must be one.\nz_init::AbstractVector: Initial position of the geodesic curve on the latent space.\nz_end::AbstractVector: Final position of the geodesic curve on the latent space.\n\nCitation\n\nChen, N. et al. Metrics for Deep Generative Models. in Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics 1540–1550 (PMLR, 2018).\n\n\n\n\n\n","category":"type"},{"location":"diffgeo/#NeuralGeodesic-forward-pass","page":"Differential Geometry","title":"NeuralGeodesic forward pass","text":"","category":"section"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"AutoEncoderToolkit.diffgeo.NeuralGeodesics.NeuralGeodesic(::AbstractVector)","category":"page"},{"location":"diffgeo/#AutoEncoderToolkit.diffgeo.NeuralGeodesics.NeuralGeodesic-Tuple{AbstractVector}","page":"Differential Geometry","title":"AutoEncoderToolkit.diffgeo.NeuralGeodesics.NeuralGeodesic","text":" (g::NeuralGeodesic)(t::AbstractArray)\n\nComputes the output of the NeuralGeodesic at each given time in t by scaling and shifting the output of the neural network.\n\nArguments\n\nt::AbstractArray: An array of times at which the output of the NeuralGeodesic is to be computed. This must be within the interval [0, 1].\n\nReturns\n\noutput::Array: The computed output of the NeuralGeodesic at each time in t.\n\nDescription\n\nThe function computes the output of the NeuralGeodesic at each given time in t. The steps are:\n\nCompute the output of the neural network at each time in t.\nCompute the output of the neural network at time 0 and 1.\nCompute scale and shift parameters based on the initial and end points of the geodesic and the neural network outputs at times 0 and 1.\nScale and shift the output of the neural network at each time in t according to these parameters. The result is the output of the NeuralGeodesic at each time in t.\n\nScale and shift parameters are defined as:\n\nscale = (zinit - zend) / (ẑinit - ẑend)\nshift = (zinit * ẑend - zend * ẑinit) / (ẑinit - ẑend)\n\nwhere zinit and zend are the initial and end points of the geodesic, and ẑinit and ẑend are the outputs of the neural network at times 0 and 1, respectively.\n\nNote\n\nEnsure that each t in the array is within the interval [0, 1].\n\n\n\n\n\n","category":"method"},{"location":"diffgeo/#NeuralGeodesic-loss-function","page":"Differential Geometry","title":"NeuralGeodesic loss function","text":"","category":"section"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"AutoEncoderToolkit.diffgeo.NeuralGeodesics.loss","category":"page"},{"location":"diffgeo/#AutoEncoderToolkit.diffgeo.NeuralGeodesics.loss","page":"Differential Geometry","title":"AutoEncoderToolkit.diffgeo.NeuralGeodesics.loss","text":"loss(\n curve::NeuralGeodesic,\n rhvae::RHVAE,\n t::AbstractVector;\n curve_velocity::Function=curve_velocity_TaylorDiff,\n curve_integral::Function=curve_length,\n)\n\nFunction to compute the loss for a given curve on a Riemmanian manifold. The loss is defined as the integral over the curve, computed using the provided curve_integral function (either length or energy).\n\nArguments\n\ncurve::NeuralGeodesic: The curve on the Riemmanian manifold.\nrhvae::RHVAE: The Riemmanian Hamiltonian Variational AutoEncoder used to compute the Riemmanian metric tensor.\nt::AbstractVector: Vector of time points at which the curve is sampled.\n\nOptional Keyword Arguments\n\ncurve_velocity::Function=curve_velocity_TaylorDiff: Function to compute the velocity of the curve. Default is curve_velocity_TaylorDiff. Also accepts curve_velocity_finitediff.\ncurve_integral::Function=curve_length: Function to compute the integral over the curve. Default is curve_energy. Also accepts curve_length.\n\nReturns\n\nLoss::Number: The computed loss for the given curve.\n\nNotes\n\nThis function first computes the geodesic curve using the provided curve function. It then computes the Riemmanian metric tensor using the metric_tensor function from the RHVAE module with the computed curve and the provided rhvae. The velocity of the curve is then computed using the provided curve_velocity function. Finally, the integral over the curve is computed using the provided curve_integral function and returned as the loss.\n\n\n\n\n\n","category":"function"},{"location":"diffgeo/#NeuralGeodesic-training","page":"Differential Geometry","title":"NeuralGeodesic training","text":"","category":"section"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"AutoEncoderToolkit.diffgeo.NeuralGeodesics.train!","category":"page"},{"location":"diffgeo/#AutoEncoderToolkit.diffgeo.NeuralGeodesics.train!","page":"Differential Geometry","title":"AutoEncoderToolkit.diffgeo.NeuralGeodesics.train!","text":"train!(\n curve::NeuralGeodesic,\n rhvae::RHVAE,\n t::AbstractVector,\n opt::NamedTuple;\n loss::Function=loss,\n loss_kwargs::Dict=Dict(),\n verbose::Bool=false,\n loss_return::Bool=false,\n)\n\nFunction to train a NeuralGeodesic model using a Riemmanian Hamiltonian Variational AutoEncoder (RHVAE). The training process involves computing the gradient of the loss function and updating the model parameters accordingly.\n\nArguments\n\ncurve::NeuralGeodesic: The curve on the Riemmanian manifold.\nrhvae::RHVAE: The Riemmanian Hamiltonian Variational AutoEncoder used to compute the Riemmanian metric tensor.\nt::AbstractVector: Vector of time points at which the curve is sampled. These must be equally spaced.\nopt::NamedTuple: The optimization parameters.\n\nOptional Keyword Arguments\n\nloss_function::Function=loss: The loss function to be minimized during training. Default is loss.\nloss_kwargs::Dict=Dict(): Additional keyword arguments to be passed to the loss function.\nverbose::Bool=false: If true, the loss value is printed at each iteration.\nloss_return::Bool=false: If true, the function returns the loss value.\n\nReturns\n\nLoss::Number: The computed loss for the given curve. This is only returned if loss_return is true.\n\nNotes\n\nThis function first computes the gradient of the loss function with respect to the model parameters. It then updates the model parameters using the computed gradient and the provided optimization parameters. If verbose is true, the loss value is printed at each iteration. If loss_return is true, the function returns the loss value.\n\n\n\n\n\n","category":"function"},{"location":"diffgeo/#Other-functions-for-NeuralGeodesic","page":"Differential Geometry","title":"Other functions for NeuralGeodesic","text":"","category":"section"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"AutoEncoderToolkit.diffgeo.NeuralGeodesics.curve_velocity_TaylorDiff\nAutoEncoderToolkit.diffgeo.NeuralGeodesics.curve_velocity_finitediff\nAutoEncoderToolkit.diffgeo.NeuralGeodesics.curve_length\nAutoEncoderToolkit.diffgeo.NeuralGeodesics.curve_energy","category":"page"},{"location":"diffgeo/#AutoEncoderToolkit.diffgeo.NeuralGeodesics.curve_velocity_TaylorDiff","page":"Differential Geometry","title":"AutoEncoderToolkit.diffgeo.NeuralGeodesics.curve_velocity_TaylorDiff","text":"curve_velocity_TaylorDiff(\n curve::NeuralGeodesic,\n t\n)\n\nCompute the velocity of a neural geodesic curve at a given time using Taylor differentiation.\n\nThis function takes a NeuralGeodesic instance and a time t, and computes the velocity of the curve at that time using Taylor differentiation. The computation is performed for each dimension of the latent space.\n\nArguments\n\ncurve::NeuralGeodesic: The neural geodesic curve.\nt: The time at which to compute the velocity.\n\nReturns\n\nA vector representing the velocity of the curve at time t.\n\nNotes\n\nThis function uses the TaylorDiff package to compute derivatives. Please note that TaylorDiff has limited support for certain activation functions. If you encounter an error while using this function, it may be due to the activation function used in your NeuralGeodesic instance.\n\n\n\n\n\ncurve_velocity_TaylorDiff(\n curve::NeuralGeodesic,\n t::AbstractVector\n)\n\nCompute the velocity of a neural geodesic curve at each time in a vector of times using Taylor differentiation.\n\nThis function takes a NeuralGeodesic instance and a vector of times t, and computes the velocity of the curve at each time using Taylor differentiation. The computation is performed for each dimension of the latent space and each time in t.\n\nArguments\n\ncurve::NeuralGeodesic: The neural geodesic curve.\nt::AbstractVector: The vector of times at which to compute the velocity.\n\nReturns\n\nA matrix where each column represents the velocity of the curve at a time in t.\n\nNotes\n\nThis function uses the TaylorDiff package to compute derivatives. Please note that TaylorDiff has limited support for certain activation functions. If you encounter an error while using this function, it may be due to the activation function used in your NeuralGeodesic instance.\n\n\n\n\n\n","category":"function"},{"location":"diffgeo/#AutoEncoderToolkit.diffgeo.NeuralGeodesics.curve_velocity_finitediff","page":"Differential Geometry","title":"AutoEncoderToolkit.diffgeo.NeuralGeodesics.curve_velocity_finitediff","text":"curve_velocity_finitediff(\n curve::NeuralGeodesic,\n t::AbstractVector;\n fdtype::Symbol=:central,\n)\n\nCompute the velocity of a neural geodesic curve at each time in a vector of times using finite difference methods.\n\nThis function takes a NeuralGeodesic instance, a vector of times t, and an optional finite difference type fdtype (which can be either :forward or :central), and computes the velocity of the curve at each time using the specified finite difference method. The computation is performed for each dimension of the latent space and each time in t.\n\nArguments\n\ncurve::NeuralGeodesic: The neural geodesic curve.\nt::AbstractVector: The vector of times at which to compute the velocity.\nfdtype::Symbol=:central: The type of finite difference method to use. Can be either :forward or :central. Default is :central.\n\nReturns\n\nA matrix where each column represents the velocity of the curve at a time in t.\n\nNotes\n\nThis function uses finite difference methods to compute derivatives. Please note that the accuracy of the computed velocities depends on the chosen finite difference method and the step size used, which is determined by the machine epsilon of the type of t.\n\n\n\n\n\n","category":"function"},{"location":"diffgeo/#AutoEncoderToolkit.diffgeo.NeuralGeodesics.curve_length","page":"Differential Geometry","title":"AutoEncoderToolkit.diffgeo.NeuralGeodesics.curve_length","text":"curve_length(\n riemannian_metric::AbstractArray,\n curve_velocity::AbstractArray,\n t::AbstractVector;\n)\n\nFunction to compute the (discretized) integral defining the length of a curve γ̲ on a Riemmanina manifold. The length is defined as\n\nL(γ̲) = ∫ dt √(⟨γ̲̇(t), G̲̲ γ̲̇(t)⟩),\n\nwhere γ̲̇(t) defines the velocity of the parametric curve, and G̲̲ is the Riemmanian metric tensor. For this function, we approximate the integral as\n\nL(γ̲) ≈ ∑ᵢ Δt √(⟨γ̲̇(tᵢ)ᵀ G̲̲ (γ̲(tᵢ+1)) γ̲̇(tᵢ))⟩),\n\nwhere Δt is the time step between points. Note that this Δt is assumed to be constant, thus, the time points t must be equally spaced.\n\nArguments\n\nriemannian_metric::AbstractArray: d×d×N tensor where d is the dimension of the manifold on which the curve lies and N is the number of sampled time points along the curve. Each slice of the array represents the Riemmanian metric tensor for the curve at the corresponding time point.\ncurve_velocity::AbstractArray: d×N Matrix where d is the dimension of the manifold on which the curve lies and N is the number of sampled time points along the curve. Each column represents the velocity of the curve at the corresponding time point.\nt::AbstractVector: Vector of time points at which the curve is sampled.\n\nReturns\n\nLength::Number: Approximation of the Length for the path on the manifold.\n\n\n\n\n\n","category":"function"},{"location":"diffgeo/#AutoEncoderToolkit.diffgeo.NeuralGeodesics.curve_energy","page":"Differential Geometry","title":"AutoEncoderToolkit.diffgeo.NeuralGeodesics.curve_energy","text":"curve_energy(\n riemannian_metric::AbstractArray,\n curve_velocity::AbstractArray,\n t::AbstractVector;\n)\n\nFunction to compute the (discretized) integral defining the energy of a curve γ̲ on a Riemmanina manifold. The energy is defined as\n\n E(γ̲) = ∫ dt ⟨γ̲̇(t), G̲̲ γ̲̇(t)⟩,\n\nwhere γ̲̇(t) defines the velocity of the parametric curve, and G̲̲ is the Riemmanian metric tensor. For this function, we approximate the integral as\n\n E(γ̲) ≈ ∑ᵢ Δt ⟨γ̲̇(tᵢ)ᵀ G̲̲ (γ̲(tᵢ+1) γ̲̇(tᵢ))⟩,\n\nwhere Δt is the time step between points. Note that this Δt is assumed to be constant, thus, the time points t must be equally spaced.\n\nArguments\n\nriemannian_metric::AbstractArray: d×d×N tensor where d is the dimension of the manifold on which the curve lies and N is the number of sampled time points along the curve. Each slice of the array represents the Riemmanian metric tensor for the curve at the corresponding time point.\ncurve_velocity::AbstractArray: d×N Matrix where d is the dimension of the manifold on which the curve lies and N is the number of sampled time points along the curve. Each column represents the velocity of the curve at the corresponding time point.\nt::AbstractVector: Vector of time points at which the curve is sampled.\n\nReturns\n\nEnergy::Number: Approximation of the Energy for the path on the manifold.\n\n\n\n\n\n","category":"function"},{"location":"diffgeo/#diffgeoref","page":"Differential Geometry","title":"References","text":"","category":"section"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"Chen, N. et al. Metrics for Deep Generative Models. in Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics 1540–1550 (PMLR, 2018).\nChadebec, C. & Allassonnière, S. A Geometric Perspective on Variational Autoencoders. Preprint at http://arxiv.org/abs/2209.07370 (2022).\nChadebec, C., Mantoux, C. & Allassonnière, S. Geometry-Aware Hamiltonian Variational Auto-Encoder. Preprint at http://arxiv.org/abs/2010.11518 (2020).\nArvanitidis, G., Hauberg, S., Hennig, P. & Schober, M. Fast and Robust Shortest Paths on Manifolds Learned from Data. in Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics 1506–1515 (PMLR, 2019).\nArvanitidis, G., Hauberg, S. & Schölkopf, B. Geometrically Enriched Latent Spaces. Preprint at https://doi.org/10.48550/arXiv.2008.00565 (2020).\nArvanitidis, G., González-Duque, M., Pouplin, A., Kalatzis, D. & Hauberg, S. Pulling back information geometry. Preprint at http://arxiv.org/abs/2106.05367 (2022).\nFröhlich, C., Gessner, A., Hennig, P., Schölkopf, B. & Arvanitidis, G. Bayesian Quadrature on Riemannian Data Manifolds.\nKalatzis, D., Eklund, D., Arvanitidis, G. & Hauberg, S. Variational Autoencoders with Riemannian Brownian Motion Priors. Preprint at http://arxiv.org/abs/2002.05227 (2020).\nArvanitidis, G., Hansen, L. K. & Hauberg, S. Latent Space Oddity: on the Curvature of Deep Generative Models. Preprint at http://arxiv.org/abs/1710.11379 (2021).","category":"page"},{"location":"ae/#AEsmodule","page":"Deterministic Autoencoders","title":"Deterministic Autoencoder","text":"","category":"section"},{"location":"ae/","page":"Deterministic Autoencoders","title":"Deterministic Autoencoders","text":"The deterministic autoencoders are a type of neural network that learns to embed high-dimensional data into a lower-dimensional space in a one-to-one fashion. The AEs module provides the necessary tools to train these networks. The main type is the AE struct, which is a simple feedforward neural network composed of two parts: an Encoder and a Decoder.","category":"page"},{"location":"ae/#Autoencoder-struct-AE","page":"Deterministic Autoencoders","title":"Autoencoder struct AE","text":"","category":"section"},{"location":"ae/","page":"Deterministic Autoencoders","title":"Deterministic Autoencoders","text":"AutoEncoderToolkit.AEs.AE","category":"page"},{"location":"ae/#AutoEncoderToolkit.AEs.AE","page":"Deterministic Autoencoders","title":"AutoEncoderToolkit.AEs.AE","text":"struct AE{E<:AbstractDeterministicEncoder, D<:AbstractDeterministicDecoder}\n\nAutoencoder (AE) model defined for Flux.jl\n\nFields\n\nencoder::E: Neural network that encodes the input into the latent space. E is a subtype of AbstractDeterministicEncoder.\ndecoder::D: Neural network that decodes the latent representation back to the original input space. D is a subtype of AbstractDeterministicDecoder.\n\nAn AE consists of an encoder and decoder network with a bottleneck latent space in between. The encoder compresses the input into a low-dimensional representation. The decoder tries to reconstruct the original input from the point in the latent space. \n\n\n\n\n\n","category":"type"},{"location":"ae/#Forward-pass","page":"Deterministic Autoencoders","title":"Forward pass","text":"","category":"section"},{"location":"ae/","page":"Deterministic Autoencoders","title":"Deterministic Autoencoders","text":"AutoEncoderToolkit.AEs.AE(::AbstractArray)","category":"page"},{"location":"ae/#AutoEncoderToolkit.AEs.AE-Tuple{AbstractArray}","page":"Deterministic Autoencoders","title":"AutoEncoderToolkit.AEs.AE","text":"(ae::AE{Encoder, Decoder})(x::AbstractArray; latent::Bool=false)\n\nProcesses the input data x through the autoencoder (AE) that consists of an encoder and a decoder.\n\nArguments\n\nx::AbstractVecOrMat{Float32}: The data to be decoded. This can be a vector or a matrix where each column represents a separate sample.\n\nOptional Keyword Arguments\n\nlatent::Bool: If set to true, returns a dictionary containing the latent representation alongside the reconstructed data. Defaults to false.\n\nReturns\n\nIf latent=false: A Namedtuple with key :decoder that contains the reconstructed data after processing through the encoder and decoder.\nIf latent=true: A Namedtuplewith keys :encoder, and :decoder, containing the corresponding values.\n\nDescription\n\nThe function first encodes the input x using the encoder to get the encoded representation in the latent space. This latent representation is then decoded using the decoder to produce the reconstructed data. If latent is set to true, it also returns the latent representation.\n\nNote\n\nEnsure the input data x matches the expected input dimensionality for the encoder in the AE.\n\n\n\n\n\n","category":"method"},{"location":"ae/#Loss-function","page":"Deterministic Autoencoders","title":"Loss function","text":"","category":"section"},{"location":"ae/#MSE-loss","page":"Deterministic Autoencoders","title":"MSE loss","text":"","category":"section"},{"location":"ae/","page":"Deterministic Autoencoders","title":"Deterministic Autoencoders","text":"AutoEncoderToolkit.AEs.mse_loss","category":"page"},{"location":"ae/#AutoEncoderToolkit.AEs.mse_loss","page":"Deterministic Autoencoders","title":"AutoEncoderToolkit.AEs.mse_loss","text":"mse_loss(ae::AE, \n x::AbstractArray; \n regularization::Union{Function, Nothing}=nothing, \n reg_strength::Float32=1.0f0\n)\n\nCalculate the loss for an autoencoder (AE) by computing the mean squared error (MSE) reconstruction loss and a possible regularization term.\n\nThe AE loss is given by: loss = MSE(x, x̂) + regstrength × regterm\n\nWhere:\n\nx is the input Array.\nx̂ is the reconstructed output from the AE.\nregstrength × regterm is an optional regularization term.\n\nArguments\n\nae::AE: An AE model.\nx::AbstractArray: Input data.\n\nOptional Keyword Arguments\n\nreg_function::Union{Function, Nothing}=nothing: A function that computes the regularization term based on the ae outputs. Should return a Float32. This function must take as input the ae outputs and the keyword arguments provided in reg_kwargs.\nreg_kwargs::Union{NamedTuple,Dict}=Dict(): Keyword arguments to pass to the regularization function.\nreg_strength::Number=1.0f0: The strength of the regularization term.\n\nReturns\n\nThe computed average AE loss value for the given input x, including possible regularization terms.\n\nNotes\n\nEnsure that the dimensionality of the input data x aligns with the encoder's expected input in the AE.\n\n\n\n\n\nmse_loss(ae::AE, \n x_in::AbstractArray, \n x_out::AbstractArray;\n regularization::Union{Function, Nothing}=nothing, \n reg_strength::Float32=1.0f0)\n\nCalculate the mean squared error (MSE) loss for an autoencoder (AE) using separate input and target output vectors.\n\nThe AE loss is computed as: loss = MSE(xout, x̂) + regstrength × reg_term\n\nWhere:\n\nx_out is the target output vector.\nx̂ is the reconstructed output from the AE given x_in as input.\nregstrength × regterm is an optional regularization term.\n\nArguments\n\nae::AE: An AE model.\nx_in::AbstractArray: Input vector to the AE encoder.\nx_out::AbstractArray: Target output vector to compute the reconstruction error.\n\nOptional Keyword Arguments\n\nreg_function::Union{Function, Nothing}=nothing: A function that computes the regularization term based on the ae outputs. Should return a Float32. This function must take as input the ae outputs and the keyword arguments provided in reg_kwargs.\nreg_kwargs::Union{NamedTuple,Dict}=Dict(): Keyword arguments to pass to the regularization function.\nreg_strength::Number=1.0f0: The strength of the regularization term.\n\nReturns\n\nThe computed loss value between the target x_out and its reconstructed counterpart from x_in, including possible regularization terms.\n\nNote\n\nEnsure that the input data x_in matches the expected input dimensionality for the encoder in the AE.\n\n\n\n\n\n","category":"function"},{"location":"ae/#Training","page":"Deterministic Autoencoders","title":"Training","text":"","category":"section"},{"location":"ae/","page":"Deterministic Autoencoders","title":"Deterministic Autoencoders","text":"AutoEncoderToolkit.AEs.train!","category":"page"},{"location":"ae/#AutoEncoderToolkit.AEs.train!","page":"Deterministic Autoencoders","title":"AutoEncoderToolkit.AEs.train!","text":"`train!(ae, x, opt; loss_function, loss_kwargs...)`\n\nCustomized training function to update parameters of an autoencoder given a specified loss function.\n\nArguments\n\nae::AE: A struct containing the elements of an autoencoder.\nx::AbstractArray: Input data on which the autoencoder will be trained.\nopt::NamedTuple: State of the optimizer for updating parameters. Typically initialized using Flux.Train.setup.\n\nOptional Keyword Arguments\n\nloss_function::Function: The loss function used for training. It should accept the autoencoder model and input data x, and return a loss value.\nloss_kwargs::Union{NamedTuple,Dict} = Dict(): Additional arguments for the loss function.\nverbose::Bool=false: If true, the loss value will be printed during training.\nloss_return::Bool=false: If true, the loss value will be returned after training.\n\nDescription\n\nTrains the autoencoder by:\n\nComputing the gradient of the loss with respect to the autoencoder parameters.\nUpdating the autoencoder parameters using the optimizer.\n\n\n\n\n\ntrain!(ae, x_in, x_out, opt; loss_function, loss_kwargs...)\n\nCustomized training function to update parameters of an autoencoder given a specified loss function.\n\nArguments\n\nae::AE: A struct containing the elements of an autoencoder.\nx_in::AbstractArray: Input data on which the autoencoder will be trained.\nx_out::AbstractArray: Target output data for the autoencoder.\nopt::NamedTuple: State of the optimizer for updating parameters. Typically initialized using Flux.Train.setup.\n\nOptional Keyword Arguments\n\nloss_function::Function: The loss function used for training. It should accept the autoencoder model and input data x, and return a loss value.\nloss_kwargs::Union{NamedTuple,Dict} = Dict(): Additional arguments for the loss function.\nverbose::Bool=false: If true, the loss value will be printed during training.\nloss_return::Bool=false: If true, the loss value will be returned after training.\n\nDescription\n\nTrains the autoencoder by:\n\nComputing the gradient of the loss with respect to the autoencoder parameters.\nUpdating the autoencoder parameters using the optimizer.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#RHVAEsmodule","page":"RHVAE","title":"Riemannian Hamiltonian Variational Autoencoder","text":"","category":"section"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"The Riemannian Hamiltonian Variational Autoencoder (RHVAE) is a variant of the Hamiltonian Variational Autoencoder (HVAE) that uses concepts from Riemannian geometry to improve the sampling of the latent space representation. As the HVAE, the RHVAE uses Hamiltonian dynamics to improve the sampling of the latent. However, the RHVAE accounts for the geometry of the latent space by learning a Riemannian metric tensor that is used to compute the kinetic energy of the dynamical system. This allows the RHVAE to sample the latent space more evenly while learning the curvature of the latent space.","category":"page"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"For the implementation of the RHVAE in AutoEncoderToolkit.jl, the RHVAE requires two arguments to construct: the original VAE as well as a separate neural network used to compute the metric tensor. To facilitate the dispatch of the necessary functions associated with this second network, we also provide a MetricChain struct.","category":"page"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"warning: Warning\nRHVAEs require the computation of nested gradients. This means that the AutoDiff framework must differentiate a function of an already AutoDiff differentiated function. This is known to be problematic for Julia's AutoDiff backends. See details below to understand how to we circumvent this problem.","category":"page"},{"location":"rhvae/#Reference","page":"RHVAE","title":"Reference","text":"","category":"section"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"Chadebec, C., Mantoux, C. & Allassonnière, S. Geometry-Aware Hamiltonian Variational Auto-Encoder. Preprint at http://arxiv.org/abs/2010.11518 (2020).","category":"page"},{"location":"rhvae/#MetricChainstruct","page":"RHVAE","title":"MetricChain struct","text":"","category":"section"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"AutoEncoderToolkit.RHVAEs.MetricChain","category":"page"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.MetricChain","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.MetricChain","text":"MetricChain <: AbstractMetricChain\n\nA MetricChain is used to compute the Riemannian metric tensor in the latent space of a Riemannian Hamiltonian Variational AutoEncoder (RHVAE).\n\nFields\n\nmlp::Flux.Chain: A multi-layer perceptron (MLP) consisting of the hidden layers. The inputs are first run through this MLP.\ndiag::Flux.Dense: A dense layer that computes the diagonal elements of a lower-triangular matrix. The output of the mlp is fed into this layer.\nlower::Flux.Dense: A dense layer that computes the off-diagonal elements of the lower-triangular matrix. The output of the mlp is also fed into this layer.\n\nThe outputs of diag and lower are used to construct a lower-triangular matrix used to compute the Riemannian metric tensor in latent space.\n\nNote\n\nIf the dimension of the latent space is n, the number of neurons in the output layer of diag must be n, and the number of neurons in the output layer of lower must be n * (n - 1) ÷ 2.\n\nExample\n\nmlp = Flux.Chain(Dense(10, 10, relu), Dense(10, 10, relu))\ndiag = Flux.Dense(10, 5)\nlower = Flux.Dense(10, 15)\nmetric_chain = MetricChain(mlp, diag, lower)\n\n\n\n\n\n","category":"type"},{"location":"rhvae/#RHVAEstruct","page":"RHVAE","title":"RHVAE struct","text":"","category":"section"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"AutoEncoderToolkit.RHVAEs.RHVAE","category":"page"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.RHVAE","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.RHVAE","text":"RHVAE{\n V<:VAE{<:AbstractVariationalEncoder,<:AbstractVariationalDecoder}\n} <: AbstractVariationalAutoEncoder\n\nA Riemannian Hamiltonian Variational AutoEncoder (RHVAE) as described in Chadebec, C., Mantoux, C. & Allassonnière, S. Geometry-Aware Hamiltonian Variational Auto-Encoder. Preprint at http://arxiv.org/abs/2010.11518 (2020).\n\nThe RHVAE is a type of Variational AutoEncoder (VAE) that incorporates a Riemannian metric in the latent space. This metric is computed by a MetricChain, which is a struct that contains a multi-layer perceptron (MLP) and two dense layers for computing the elements of a lower-triangular matrix.\n\nThe inverse metric is computed as follows:\n\nG⁻¹(z) = ∑ᵢ₌₁ⁿ Lψᵢ Lψᵢᵀ exp(-‖z - cᵢ‖₂² / T²) + λIₗ\n\nwhere L_ψᵢ is computed by the MetricChain, T is the temperature, λ is a regularization factor, and each column of centroids are the cᵢ.\n\nFields\n\nvae::V: The underlying VAE, where V is a subtype of VAE with an AbstractVariationalEncoder and an AbstractVariationalDecoder.\nmetric_chain::MetricChain: The MetricChain that computes the Riemannian metric in the latent space.\ncentroids_data::AbstractArray: An array where the last dimension represents a data point xᵢ from which the centroids cᵢ are computed by passing them through the encoder.\ncentroids_latent::AbstractMatrix: A matrix where each column represents a centroid cᵢ in the inverse metric computation.\nL::AbstractArray{<:Number, 3}: A 3D array where each slice represents a Lψᵢ matrix. Lψᵢ can intuitively be seen as the triangular matrix in the Cholesky decomposition of G⁻¹(centroids_latentᵢ) up to a regularization factor.\nM::AbstractArray{<:Number, 3}: A 3D array where each slice represents a Lψᵢ Lψᵢᵀ.\nT::Number: The temperature parameter in the inverse metric computation. \nλ::Number: The regularization factor in the inverse metric computation.\n\n\n\n\n\n","category":"type"},{"location":"rhvae/#Forward-pass","page":"RHVAE","title":"Forward pass","text":"","category":"section"},{"location":"rhvae/#MetricChain","page":"RHVAE","title":"Metric Network","text":"","category":"section"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"AutoEncoderToolkit.RHVAEs.MetricChain(::AbstractArray)","category":"page"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.MetricChain-Tuple{AbstractArray}","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.MetricChain","text":"(m::MetricChain)(x::AbstractArray; matrix::Bool=false)\n\nPerform a forward pass through the MetricChain.\n\nArguments\n\nx::AbstractArray: The input data to be processed. \nmatrix::Bool=false: A boolean flag indicating whether to return the result as a lower triangular matrix (if true) or as a tuple of diagonal and lower off-diagonal elements (if false). Defaults to false.\n\nReturns\n\nIf matrix is true, returns a lower triangular matrix constructed from the outputs of the diag and lower components of the MetricChain.\nIf matrix is false, returns a NamedTuple with two elements: diag, the output of the diag component of the MetricChain, and lower, the output of the lower component of the MetricChain.\n\nExample\n\nm = MetricChain(...)\nx = rand(Float32, 100, 10)\nm(x, matrix=true) # Returns a lower triangular matrix\n\n\n\n\n\n","category":"method"},{"location":"rhvae/#RHVAE","page":"RHVAE","title":"RHVAE","text":"","category":"section"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"AutoEncoderToolkit.RHVAEs.RHVAE(::AbstractArray)","category":"page"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.RHVAE-Tuple{AbstractArray}","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.RHVAE","text":"(rhvae::RHVAE{VAE{E,D}})(\n x::AbstractArray;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n K::Int=3,\n βₒ::Number=0.3f0,\n ∇H::Function=∇hamiltonian_TaylorDiff,\n ∇H_kwargs::Union{NamedTuple,Dict}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n position_logprior=spherical_logprior,\n momentum_logprior=riemannian_logprior,\n G_inv=G_inv,\n ),\n tempering_schedule::Function=quadratic_tempering,\n latent::Bool=false,\n) where where {E<:AbstractGaussianLogEncoder,D<:AbstractVariationalDecoder}\n\nRun the Riemannian Hamiltonian Variational Autoencoder (RHVAE) on the given input.\n\nArguments\n\nx::AbstractArray: The input to the RHVAE. If it is a vector, it represents a single data point. If Array, the last dimension must contain each of the data points.\n\nOptional Keyword Arguments\n\nK::Int=3: The number of leapfrog steps to perform in the Hamiltonian Monte Carlo (HMC) part of the RHVAE.\nϵ::Union{<:Number,<:AbstractVector}=0.01f0: The step size for the leapfrog steps in the HMC part of the RHVAE. If it is a scalar, the same step size is used for all dimensions. If it is an array, each element corresponds to the step size for a specific dimension.\nβₒ::Number=0.3f0: The initial inverse temperature for the tempering schedule.\nsteps::Int: The number of fixed-point iterations to perform. Default is 3.\n∇H::Function=∇hamiltonian_finite: The function to compute the gradient of the Hamiltonian in the HMC part of the RHVAE.\n∇H_kwargs::Union{NamedTuple,Dict}: Additional keyword arguments to be passed to the ∇hamiltonian function. Default is a NamedTuple with reconstruction_loglikelihood, position_logprior, and momentum_logprior. \nG_inv::Function=G_inv: The function to compute the inverse of the Riemannian metric tensor.\ntempering_schedule::Function=quadratic_tempering: The function to compute the tempering schedule in the RHVAE.\nlatent::Bool=false: If true, the function returns a NamedTuple containing the outputs of the encoder and decoder, and the final state of the phase space after the leapfrog and tempering steps. If false, the function only returns the output of the decoder.\n\nReturns\n\nIf latent=true, the function returns a NamedTuple with the following fields:\n\nencoder: The outputs of the encoder.\ndecoder: The output of the decoder.\nphase_space: The final state of the phase space after the leapfrog and tempering steps.\n\nIf latent=false, the function only returns the output of the decoder.\n\nDescription\n\nThis function runs the RHVAE on the given input. It first passes the input through the encoder to obtain the mean and log standard deviation of the latent space. It then uses the reparameterization trick to sample from the latent space. After that, it performs the leapfrog and tempering steps to refine the sample from the latent space. Finally, it passes the refined sample through the decoder to obtain the output.\n\nNotes\n\nEnsure that the dimensions of x match the input dimensions of the RHVAE, and that the dimensions of ϵ match the dimensions of the latent space.\n\n\n\n\n\n","category":"method"},{"location":"rhvae/#Loss-function","page":"RHVAE","title":"Loss function","text":"","category":"section"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"AutoEncoderToolkit.RHVAEs.loss","category":"page"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.loss","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.loss","text":"loss(\n rhvae::RHVAE,\n x::AbstractArray;\n K::Int=3,\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n βₒ::Number=0.3f0,\n steps::Int=3,\n ∇H_kwargs::Union{NamedTuple,Dict}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n position_logprior=spherical_logprior,\n momentum_logprior=riemannian_logprior,\n ),\n G_inv::Function=G_inv,\n tempering_schedule::Function=quadratic_tempering,\n reg_function::Union{Function,Nothing}=nothing,\n reg_kwargs::Union{NamedTuple,Dict}=Dict(),\n reg_strength::Number=1.0f0,\n logp_prefactor::AbstractArray=ones(Float32, 3),\n logq_prefactor::AbstractArray=ones(Float32, 3),\n)\n\nCompute the loss for a Riemannian Hamiltonian Variational Autoencoder (RHVAE).\n\nArguments\n\nrhvae::RHVAE: The RHVAE used to encode the input data and decode the latent space.\nx::AbstractArray: Input data to the RHVAE encoder. The last dimension is taken as having each of the samples in a batch.\n\nOptional Keyword Arguments\n\nK::Int: The number of HMC steps (default is 3).\nϵ::Union{<:Number,<:AbstractVector}: The step size for the leapfrog integrator (default is 0.001).\nβₒ::Number: The initial inverse temperature (default is 0.3).\nsteps::Int: The number of steps in the leapfrog integrator (default is 3).\n∇H_kwargs::Union{NamedTuple,Dict}: Additional keyword arguments to be passed to the ∇hamiltonian function.\nG_inv::Function: The function to compute the inverse of the Riemannian metric tensor (default is G_inv).\ntempering_schedule::Function: The tempering schedule function used in the HMC (default is quadratic_tempering).\nreg_function::Union{Function, Nothing}=nothing: A function that computes the regularization term based on the VAE outputs. This function must take as input the VAE outputs and the keyword arguments provided in reg_kwargs.\nreg_kwargs::Union{NamedTuple,Dict}=Dict(): Keyword arguments to pass to the regularization function.\nreg_strength::Number=1.0f0: The strength of the regularization term.\nlogp_prefactor::AbstractArray: A 3-element array to scale the log likelihood, log prior of the latent variables, and log prior of the momentum variables. Default is an array of ones.\nlogq_prefactor::AbstractArray: A 3-element array to scale the log posterior of the initial latent variables, log prior of the initial momentum variables, and the tempering Jacobian term. Default is an array of ones.\n\nReturns\n\nThe computed loss.\n\n\n\n\n\nloss(\n rhvae::RHVAE,\n x_in::AbstractArray,\n x_out::AbstractArray;\n K::Int=3,\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n βₒ::Number=0.3f0,\n steps::Int=3,\n ∇H_kwargs::Union{NamedTuple,Dict}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n position_logprior=spherical_logprior,\n momentum_logprior=riemannian_logprior,\n ),\n G_inv::Function=G_inv,\n tempering_schedule::Function=quadratic_tempering,\n reg_function::Union{Function,Nothing}=nothing,\n reg_kwargs::Union{NamedTuple,Dict}=Dict(),\n reg_strength::Number=1.0f0,\n logp_prefactor::AbstractArray=ones(Float32, 3),\n logq_prefactor::AbstractArray=ones(Float32, 3),\n)\n\nCompute the loss for a Riemannian Hamiltonian Variational Autoencoder (RHVAE).\n\nArguments\n\nrhvae::RHVAE: The RHVAE used to encode the input data and decode the latent space.\nx_in::AbstractArray: Input data to the RHVAE encoder. The last dimension is taken as having each of the samples in a batch.\nx_out::AbstractArray: Target data to compute the reconstruction error. The last dimension is taken as having each of the samples in a batch.\n\nOptional Keyword Arguments\n\nK::Int: The number of HMC steps (default is 3).\nϵ::Union{<:Number,<:AbstractVector}: The step size for the leapfrog integrator (default is 0.001).\nβₒ::Number: The initial inverse temperature (default is 0.3).\nsteps::Int: The number of steps in the leapfrog integrator (default is 3).\n∇H_kwargs::Union{NamedTuple,Dict}: Additional keyword arguments to be passed to the ∇hamiltonian function.\nG_inv::Function: The function to compute the inverse of the Riemannian metric tensor (default is G_inv).\ntempering_schedule::Function: The tempering schedule function used in the HMC (default is quadratic_tempering).\nreg_function::Union{Function, Nothing}=nothing: A function that computes the regularization term based on the VAE outputs. This function must take as input the VAE outputs and the keyword arguments provided in reg_kwargs.\nreg_kwargs::Union{NamedTuple,Dict}=Dict(): Keyword arguments to pass to the regularization function.\nreg_strength::Number=1.0f0: The strength of the regularization term.\nlogp_prefactor::AbstractArray: A 3-element array to scale the log likelihood, log prior of the latent variables, and log prior of the momentum variables. Default is an array of ones.\nlogq_prefactor::AbstractArray: A 3-element array to scale the log posterior of the initial latent variables, log prior of the initial momentum variables, and the tempering Jacobian term. Default is an array of ones.\n\nReturns\n\nThe computed loss.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#Training","page":"RHVAE","title":"Training","text":"","category":"section"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"AutoEncoderToolkit.RHVAEs.train!","category":"page"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.train!","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.train!","text":"train!(\n rhvae::RHVAE, \n x::AbstractArray, \n opt::NamedTuple; \n loss_function::Function=loss, \n loss_kwargs::Union{NamedTuple,Dict}=Dict(),\n verbose::Bool=false,\n loss_return::Bool=false,\n)\n\nCustomized training function to update parameters of a Riemannian Hamiltonian Variational Autoencoder given a specified loss function.\n\nArguments\n\nrhvae::RHVAE: A struct containing the elements of a Riemannian Hamiltonian Variational Autoencoder.\nx::AbstractArray: Input data to the RHVAE encoder. The last dimension is taken as having each of the samples in a batch.\nopt::NamedTuple: State of the optimizer for updating parameters. Typically initialized using Flux.Optimisers.update!.\n\nOptional Keyword Arguments\n\nloss_function::Function=loss: The loss function used for training. It should accept the RHVAE model, data x, and keyword arguments in that order.\nloss_kwargs::Dict=Dict(): Arguments for the loss function. These might include parameters like K, ϵ, βₒ, steps, ∇H, ∇H_kwargs, tempering_schedule, reg_function, reg_kwargs, reg_strength, depending on the specific loss function in use.\nverbose::Bool=false: Whether to print the loss at each iteration.\nloss_return::Bool=false: Whether to return the loss at each iteration.\n\nDescription\n\nTrains the RHVAE by:\n\nComputing the gradient of the loss w.r.t the RHVAE parameters.\nUpdating the RHVAE parameters using the optimizer.\nUpdating the metric parameters.\n\n\n\n\n\ntrain!(\n rhvae::RHVAE, \n x_in::AbstractArray,\n x_out::AbstractArray,\n opt::NamedTuple; \n loss_function::Function=loss, \n loss_kwargs::Union{NamedTuple,Dict}=Dict(),\n verbose::Bool=false,\n loss_return::Bool=false,\n)\n\nCustomized training function to update parameters of a Riemannian Hamiltonian Variational Autoencoder given a specified loss function.\n\nArguments\n\nrhvae::RHVAE: A struct containing the elements of a Riemannian Hamiltonian Variational Autoencoder.\nx_in::AbstractArray: Input data to the RHVAE encoder. The last dimension is taken as having each of the samples in a batch.\nx_out::AbstractArray: Target data to compute the reconstruction error. The last dimension is taken as having each of the samples in a batch.\nopt::NamedTuple: State of the optimizer for updating parameters. Typically initialized using Flux.Optimisers.update!.\n\nOptional Keyword Arguments\n\nloss_function::Function=loss: The loss function used for training. It should accept the RHVAE model, data x, and keyword arguments in that order.\nloss_kwargs::Dict=Dict(): Arguments for the loss function. These might include parameters like K, ϵ, βₒ, steps, ∇H, ∇H_kwargs, tempering_schedule, reg_function, reg_kwargs, reg_strength, depending on the specific loss function in use.\nverbose::Bool=false: Whether to print the loss at each iteration.\nloss_return::Bool=false: Whether to return the loss at each iteration.\n\nDescription\n\nTrains the RHVAE by:\n\nComputing the gradient of the loss w.r.t the RHVAE parameters.\nUpdating the RHVAE parameters using the optimizer.\nUpdating the metric parameters.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#gradhamiltonian","page":"RHVAE","title":"Computing the gradient of the potential energy","text":"","category":"section"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"One of the crucial components in the training of the RHVAE is the computation of the gradient of the Hamiltonian nabla H with respect to the latent space representation. This gradient is used in the leapfrog steps of the generalized Hamiltonian dynamics. When training the RHVAE, we need to backpropagate through the leapfrog steps to update the parameters of the neural network. This requires computing a gradient of a function of the gradient of the Hamiltonian, i.e., nested gradients. Zygote.jl the main AutoDiff backend in Flux.jl famously struggle with these types of computations. Specifically, Zygote.jl does not support Zygote over Zygote differentiation (meaning differentiating a function of something previously differentiated with Zygote using Zygote), or Zygote over ForwardDiff (meaning differentiating a function of something differentiated with ForwardDiff using Zygote).","category":"page"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"With this, we are left with a couple of options to compute the gradient of the potential energy:","category":"page"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"Use finite differences to approximate the gradient of the potential energy.\nUse the relatively new TaylorDiff.jl AutoDiff backend to compute the gradient of the potential energy. This backend is composable with Zygote.jl, so we can, in principle, do Zygote over TaylorDiff differentiation.","category":"page"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"The second option would be preferred, as the gradients computed with TaylorDiff are much more accurate than the ones computed with finite differences. However, there are two problems with this approach:","category":"page"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"The TaylorDiff nested gradient capability stopped working with Julia ≥ 1.10, as discussed in #70.\nEven for Julia < 1.10, we could not get TaylorDiff to work on CUDA devices. (PRs are welcome!)","category":"page"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"With these limitations in mind, we have implemented the gradient of the potential using both finite differences and TaylorDiff. The user can choose which method to use by setting the adtype keyword argument in the ∇H_kwargs in the loss function to either :finite or :TaylorDiff. This means that for the train! function, the user can pass loss_kwargs that looks like this:","category":"page"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"# Define the autodiff backend to use\nloss_kwargs = Dict(\n :∇H_kwargs => Dict(\n :adtype => :finite\n )\n)","category":"page"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"note: Note\nAlthough verbose, the nested dictionaries help to keep everything organized. (PRs with better design ideas are welcome!)","category":"page"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"The default both for cpu and gpu devices is :finite.","category":"page"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"AutoEncoderToolkit.RHVAEs.∇hamiltonian_finite\nAutoEncoderToolkit.RHVAEs.∇hamiltonian_TaylorDiff\nAutoEncoderToolkit.RHVAEs.∇hamiltonian_ForwardDiff","category":"page"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.∇hamiltonian_finite","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.∇hamiltonian_finite","text":"∇hamiltonian_finite(\n x::AbstractArray,\n z::AbstractVecOrMat,\n ρ::AbstractVecOrMat,\n G⁻¹::AbstractArray,\n logdetG::Union{<:Number,AbstractVector},\n decoder::AbstractVariationalDecoder,\n decoder_output::NamedTuple,\n var::Symbol;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n position_logprior::Function=spherical_logprior,\n momentum_logprior::Function=riemannian_logprior,\n fdtype::Symbol=:central,\n)\n\nCompute the gradient of the Hamiltonian with respect to a given variable using a naive finite difference method.\n\nThis function takes a point x in the data space, a point z in the latent space, a momentum ρ, the inverse of the Riemannian metric tensor G⁻¹, a decoder of type AbstractVariationalDecoder, a decoder_output NamedTuple, and a variable var (:z or :ρ), and computes the gradient of the Hamiltonian with respect to var using a simple finite differences method. The computation is based on the log-likelihood of the decoder, the log-prior of the latent space, and G⁻¹.\n\nThe Hamiltonian is computed as follows:\n\nHₓ(z, ρ) = Uₓ(z) + κ(ρ),\n\nwhere Uₓ(z) is the potential energy, and κ(ρ) is the kinetic energy. The potential energy is defined as follows:\n\nUₓ(z) = -log p(x|z) - log p(z),\n\nwhere p(x|z) is the log-likelihood of the decoder and p(z) is the log-prior in latent space. The kinetic energy is defined as follows:\n\nκ(ρ) = 0.5 * log((2π)ᴰ det G(z)) + 0.5 * ρᵀ G⁻¹ ρ\n\nwhere D is the dimension of the latent space, and G(z) is the metric tensor at the point z.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractVecOrMat: The point in the latent space. If matrix, each column represents a point in the latent space.\nρ::AbstractVecOrMat: The momentum. If matrux, each column represents a momentum vector.\nG⁻¹::AbstractArray: The inverse of the Riemannian metric tensor. If 3D array, each slice along the third dimension represents the inverse of the metric tensor at the corresponding column of z.\nlogdetG::Union{<:Number,AbstractVector}: The log determinant of the Riemannian metric tensor. If vector, each element represents the log determinant of the metric tensor at the corresponding column of z.\ndecoder::AbstractVariationalDecoder: The decoder instance.\ndecoder_output::NamedTuple: The output of the decoder.\nvar::Symbol: The variable with respect to which the gradient is computed. Must be :z or :ρ.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function: The function to compute the log-likelihood of the decoder reconstruction. Default is decoder_loglikelihood. This function must take as input the decoder, the point x in the data space, and the decoder_output.\nposition_logprior::Function: The function to compute the log-prior of the latent space position. Default is spherical_logprior. This function must take as input the point z in the latent space.\nmomentum_logprior::Function: The function to compute the log-prior of the momentum. Default is riemannian_logprior. This function must take as input the momentum ρ and G⁻¹.\nfdtype::Symbol=:central: The type of finite difference method to use. Must be :central or :forward. Default is :central.\n\nReturns\n\nA vector representing the gradient of the Hamiltonian at the point (z, ρ) with respect to variable var.\n\n\n\n\n\n∇hamiltonian_finite(\n x::AbstractArray,\n z::AbstractVecOrMat,\n ρ::AbstractVecOrMat,\n rhvae::RHVAE,\n var::Symbol;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n position_logprior::Function=spherical_logprior,\n momentum_logprior::Function=riemannian_logprior,\n G_inv::Function=G_inv,\n fdtype::Symbol=:central,\n)\n\nCompute the gradient of the Hamiltonian with respect to a given variable using a naive finite difference method.\n\nThis function takes a point x in the data space, a point z in the latent space, a momentum ρ, an instance of RHVAE, and a variable var (:z or :ρ), and computes the gradient of the Hamiltonian with respect to var using a simple finite differences method. The computation is based on the log-likelihood of the decoder, the log-prior of the latent space, and the inverse of the metric tensor G at the point z.\n\nThe Hamiltonian is computed as follows:\n\nHₓ(z, ρ) = Uₓ(z) + κ(ρ),\n\nwhere Uₓ(z) is the potential energy, and κ(ρ) is the kinetic energy. The potential energy is defined as follows:\n\nUₓ(z) = -log p(x|z) - log p(z),\n\nwhere p(x|z) is the log-likelihood of the decoder and p(z) is the log-prior in latent space. The kinetic energy is defined as follows:\n\nκ(ρ) = 0.5 * log((2π)ᴰ det G(z)) + 0.5 * ρᵀ G⁻¹ ρ\n\nwhere D is the dimension of the latent space, and G(z) is the metric tensor at the point z.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractVecOrMat: The point in the latent space. If matrix, each column represents a point in the latent space.\nρ::AbstractVecOrMat: The momentum. If matrux, each column represents a momentum vector.\nrhvae::RHVAE: An instance of the RHVAE model.\nvar::Symbol: The variable with respect to which the gradient is computed. Must be :z or :ρ.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function: The function to compute the log-likelihood of the decoder reconstruction. Default is decoder_loglikelihood. This function must take as input the decoder, the point x in the data space, and the decoder_output.\nposition_logprior::Function: The function to compute the log-prior of the latent space position. Default is spherical_logprior. This function must take as input the point z in the latent space.\nmomentum_logprior::Function: The function to compute the log-prior of the momentum. Default is riemannian_logprior. This function must take as input the momentum ρ and the inverse of the Riemannian metric tensor G⁻¹.\nG_inv::Function: The function to compute the inverse of the Riemannian metric tensor. Default is G_inv. This function must take as input the point z in the latent space and the rhvae instance.\nfdtype::Symbol=:central: The type of finite difference method to use. Must be :central or :forward. Default is :central.\n\nReturns\n\nA vector representing the gradient of the Hamiltonian at the point (z, ρ) with respect to variable var.\n\nNote\n\nThe inverse of the Riemannian metric tensor G⁻¹, the log determinant of the metric tensor, and the output of the decoder are computed internally in this function. The user does not need to provide these as inputs.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.∇hamiltonian_TaylorDiff","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.∇hamiltonian_TaylorDiff","text":"∇hamiltonian_TaylorDiff(\n x::AbstractArray,\n z::AbstractVector,\n ρ::AbstractVector,\n G⁻¹::AbstractMatrix,\n logdetG::Union{<:Number,AbstractVector},\n decoder::AbstractVariationalDecoder,\n decoder_output::NamedTuple,\n var::Symbol;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n position_logprior::Function=spherical_logprior,\n momentum_logprior::Function=riemannian_logprior,\n)\n\nCompute the gradient of the Hamiltonian with respect to a given variable using the TaylorDiff.jl automatic differentiation library.\n\nThis function takes a point x in the data space, a point z in the latent space, a momentum ρ, an instance of AbstractVariationalDecoder, and a variable var (:z or :ρ), and computes the gradient of the Hamiltonian with respect to var using TaylorDiff.jl.\n\nThe Hamiltonian is computed as follows:\n\nHₓ(z, ρ) = Uₓ(z) + κ(ρ),\n\nwhere Uₓ(z) is the potential energy, and κ(ρ) is the kinetic energy. The potential energy is defined as follows:\n\nUₓ(z) = -log p(x|z) - log p(z),\n\nwhere p(x|z) is the log-likelihood of the decoder and p(z) is the log-prior in latent space. The kinetic energy is defined as follows:\n\nκ(ρ) = 0.5 * log((2π)ᴰ det G(z)) + 0.5 * ρᵀ G⁻¹ ρ\n\nwhere D is the dimension of the latent space, and G(z) is the metric tensor at the point z.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractVector: The point in the latent space.\nρ::AbstractVector: The momentum.\nG⁻¹::AbstractMatrix: The inverse of the Riemannian metric tensor.\nlogdetG::Number: The logarithm of the determinant of the Riemannian metric tensor.\ndecoder::AbstractVariationalDecoder: An instance of the decoder model.\ndecoder_output::NamedTuple: The output of the decoder model.\nvar::Symbol: The variable with respect to which the gradient is computed. Must be :z or :ρ.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function: The function to compute the log-likelihood of the decoder reconstruction. Default is decoder_loglikelihood. This function must take as input the decoder, the point x in the data space, and the decoder_output.\nposition_logprior::Function: The function to compute the log-prior of the latent space position. Default is spherical_logprior. This function must take as input the point z in the latent space.\nmomentum_logprior::Function: The function to compute the log-prior of the momentum. Default is riemannian_logprior. This function must take as input the momentum ρ and the inverse of the Riemannian metric tensor G⁻¹.\n\nReturns\n\nA vector representing the gradient of the Hamiltonian at the point (z, ρ) with respect to variable var.\n\nNote\n\nTaylorDiff.jl is composable with Zygote.jl. Thus, for backpropagation using this function one should use Zygote.jl.\n\n\n\n\n\n∇hamiltonian_TaylorDiff(\n x::AbstractArray,\n z::AbstractVecOrMat,\n ρ::AbstractVecOrMat,\n rhvae::RHVAE,\n var::Symbol;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n position_logprior::Function=spherical_logprior,\n momentum_logprior::Function=riemannian_logprior,\n G_inv::Function=G_inv,\n)\n\nCompute the gradient of the Hamiltonian with respect to a given variable using the TaylorDiff.jl automatic differentiation library.\n\nThis function takes a point x in the data space, a point z in the latent space, a momentum ρ, an instance of RHVAE, and a variable var (:z or :ρ), and computes the gradient of the Hamiltonian with respect to var using TaylorDiff.jl.\n\nThe Hamiltonian is computed as follows:\n\nHₓ(z, ρ) = Uₓ(z) + κ(ρ),\n\nwhere Uₓ(z) is the potential energy, and κ(ρ) is the kinetic energy. The potential energy is defined as follows:\n\nUₓ(z) = -log p(x|z) - log p(z),\n\nwhere p(x|z) is the log-likelihood of the decoder and p(z) is the log-prior in latent space. The kinetic energy is defined as follows:\n\nκ(ρ) = 0.5 * log((2π)ᴰ det G(z)) + 0.5 * ρᵀ G⁻¹ ρ\n\nwhere D is the dimension of the latent space, and G(z) is the metric tensor at the point z.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractVecOrMat: The point in the latent space. If matrix, each column represents a point in the latent space.\nρ::AbstractVecOrMat: The momentum. If matrix, each column represents a momentum vector.\nrhvae::RHVAE: An instance of the RHVAE model.\nvar::Symbol: The variable with respect to which the gradient is computed. Must be :z or :ρ.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function: The function to compute the log-likelihood of the decoder reconstruction. Default is decoder_loglikelihood. This function must take as input the decoder, the point x in the data space, and the decoder_output.\nposition_logprior::Function: The function to compute the log-prior of the latent space position. Default is spherical_logprior. This function must take as input the point z in the latent space.\nmomentum_logprior::Function: The function to compute the log-prior of the momentum. Default is riemannian_logprior. This function must take as input the momentum ρ and the inverse of the Riemannian metric tensor G⁻¹.\nG_inv::Function: The function to compute the inverse of the Riemannian metric tensor. Default is G_inv. This function must take as input the point z in the latent space and the rhvae instance.\n\nReturns\n\nA matrix representing the gradient of the Hamiltonian at the point (z, ρ) with respect to variable var.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.∇hamiltonian_ForwardDiff","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.∇hamiltonian_ForwardDiff","text":"∇hamiltonian_ForwardDiff(\n x::AbstractArray,\n z::AbstractVector,\n ρ::AbstractVector,\n G⁻¹::AbstractMatrix,\n logdetG::Union{<:Number,AbstractVector},\n decoder::AbstractVariationalDecoder,\n decoder_output::NamedTuple,\n var::Symbol;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n position_logprior::Function=spherical_logprior,\n momentum_logprior::Function=riemannian_logprior,\n)\n\nCompute the gradient of the Hamiltonian with respect to a given variable using the ForwardDiff.jl automatic differentiation library.\n\nThis function takes a point x in the data space, a point z in the latent space, a momentum ρ, the inverse of the Riemannian metric tensor G⁻¹, a decoder of type AbstractVariationalDecoder, a decoder_output NamedTuple, and a variable var (:z or :ρ), and computes the gradient of the Hamiltonian with respect to var using ForwardDiff.jl.\n\nThe Hamiltonian is computed as follows:\n\nHₓ(z, ρ) = Uₓ(z) + κ(ρ),\n\nwhere Uₓ(z) is the potential energy, and κ(ρ) is the kinetic energy. The potential energy is defined as follows:\n\nUₓ(z) = -log p(x|z) - log p(z),\n\nwhere p(x|z) is the log-likelihood of the decoder and p(z) is the log-prior in latent space. The kinetic energy is defined as follows:\n\nκ(ρ) = 0.5 * log((2π)ᴰ det G(z)) + 0.5 * ρᵀ G⁻¹ ρ\n\nwhere D is the dimension of the latent space, and G(z) is the metric tensor at the point z.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractVector: The point in the latent space.\nρ::AbstractVector: The momentum.\nG⁻¹::AbstractMatrix: The inverse of the Riemannian metric tensor.\nlogdetG::Union{<:Number,AbstractVector}: The log determinant of the Riemannian metric tensor.\ndecoder::AbstractVariationalDecoder: The decoder instance.\ndecoder_output::NamedTuple: The output of the decoder.\nvar::Symbol: The variable with respect to which the gradient is computed. Must be :z or :ρ.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function: The function to compute the log-likelihood of the decoder reconstruction. Default is decoder_loglikelihood. This function must take as input the decoder, the point x in the data space, and the decoder_output.\nposition_logprior::Function: The function to compute the log-prior of the latent space position. Default is spherical_logprior. This function must take as input the point z in the latent space.\nmomentum_logprior::Function: The function to compute the log-prior of the momentum. Default is riemannian_logprior. This function must take as input the momentum ρ and G⁻¹.\n\nReturns\n\nA vector representing the gradient of the Hamiltonian at the point (z, ρ) with respect to variable var.\n\nNote\n\nForwardDiff.jl is not composable with Zygote.jl. Thus, for backpropagation using this function one should use ReverseDiff.jl.\n\n\n\n\n\n∇hamiltonian_ForwardDiff(\n x::AbstractArray,\n z::AbstractMatrix,\n ρ::AbstractMatrix,\n G⁻¹::AbstractArray,\n logdetG::Union{<:Number,AbstractVector},\n decoder::AbstractVariationalDecoder,\n decoder_output::NamedTuple,\n var::Symbol;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n position_logprior::Function=spherical_logprior,\n momentum_logprior::Function=riemannian_logprior,\n)\n\nCompute the gradient of the Hamiltonian with respect to a given variable using the ForwardDiff.jl automatic differentiation library.\n\nThis function takes a point x in the data space, a point z in the latent space, a momentum ρ, the inverse of the Riemannian metric tensor G⁻¹, a decoder of type AbstractVariationalDecoder, a decoder_output NamedTuple, and a variable var (:z or :ρ), and computes the gradient of the Hamiltonian with respect to var using ForwardDiff.jl.\n\nThe Hamiltonian is computed as follows:\n\nHₓ(z, ρ) = Uₓ(z) + κ(ρ),\n\nwhere Uₓ(z) is the potential energy, and κ(ρ) is the kinetic energy. The potential energy is defined as follows:\n\nUₓ(z) = -log p(x|z) - log p(z),\n\nwhere p(x|z) is the log-likelihood of the decoder and p(z) is the log-prior in latent space. The kinetic energy is defined as follows:\n\nκ(ρ) = 0.5 * log((2π)ᴰ det G(z)) + 0.5 * ρᵀ G⁻¹ ρ\n\nwhere D is the dimension of the latent space, and G(z) is the metric tensor at the point z.\n\nThe Jacobian is computed with respect to var to compute derivatives for all columns at once. The relevant terms for each column's gradient are then extracted from the Jacobian.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractMatrix: The point in the latent space.\nρ::AbstractMatrix: The momentum.\nG⁻¹::AbstractArray: The inverse of the Riemannian metric tensor.\nlogdetG::Union{<:Number,AbstractVector}: The log determinant of the Riemannian metric tensor.\ndecoder::AbstractVariationalDecoder: The decoder instance.\ndecoder_output::NamedTuple: The output of the decoder.\nvar::Symbol: The variable with respect to which the gradient is computed. Must be :z or :ρ.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function: The function to compute the log-likelihood of the decoder reconstruction. Default is decoder_loglikelihood. This function must take as input the decoder, the point x in the data space, and the decoder_output.\nposition_logprior::Function: The function to compute the log-prior of the latent space position. Default is spherical_logprior. This function must take as input the point z in the latent space.\nmomentum_logprior::Function: The function to compute the log-prior of the momentum. Default is riemannian_logprior. This function must take as input the momentum ρ and G⁻¹.\n\nReturns\n\nA matrix representing the gradient of the Hamiltonian at the point (z, ρ) with respect to variable var.\n\nNote\n\nForwardDiff.jl is not composable with Zygote.jl. Thus, for backpropagation using this function one should use ReverseDiff.jl.\n\n\n\n\n\n∇hamiltonian_ForwardDiff(\n x::AbstractArray,\n z::AbstractVecOrMat,\n ρ::AbstractVecOrMat,\n rhvae::RHVAE,\n var::Symbol;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n position_logprior::Function=spherical_logprior,\n momentum_logprior::Function=riemannian_logprior,\n G_inv::Function=G_inv,\n)\n\nCompute the gradient of the Hamiltonian with respect to a given variable using the ForwardDiff.jl automatic differentiation library.\n\nThis function takes a point x in the data space, a point z in the latent space, a momentum ρ, an instance of RHVAE, and a variable var (:z or :ρ), and computes the gradient of the Hamiltonian with respect to var using ForwardDiff.jl.\n\nThe Hamiltonian is computed as follows:\n\nHₓ(z, ρ) = Uₓ(z) + κ(ρ),\n\nwhere Uₓ(z) is the potential energy, and κ(ρ) is the kinetic energy. The potential energy is defined as follows:\n\nUₓ(z) = -log p(x|z) - log p(z),\n\nwhere p(x|z) is the log-likelihood of the decoder and p(z) is the log-prior in latent space. The kinetic energy is defined as follows:\n\nκ(ρ) = 0.5 * log((2π)ᴰ det G(z)) + 0.5 * ρᵀ G⁻¹ ρ\n\nwhere D is the dimension of the latent space, and G(z) is the metric tensor at the point z.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractVecOrMat: The point in the latent space. If matrix, each column represents a point in the latent space.\nρ::AbstractVecOrMat: The momentum. If matrix, each column represents a momentum vector.\nrhvae::RHVAE: An instance of the RHVAE model.\nvar::Symbol: The variable with respect to which the gradient is computed. Must be :z or :ρ.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function: The function to compute the log-likelihood of the decoder reconstruction. Default is decoder_loglikelihood. This function must take as input the decoder, the point x in the data space, and the decoder_output.\nposition_logprior::Function: The function to compute the log-prior of the latent space position. Default is spherical_logprior. This function must take as input the point z in the latent space.\nmomentum_logprior::Function: The function to compute the log-prior of the momentum. Default is riemannian_logprior. This function must take as input the momentum ρ and the inverse of the Riemannian metric tensor G⁻¹.\nG_inv::Function: The function to compute the inverse of the Riemannian metric tensor. Default is G_inv. This function must take as input the point z in the latent space and the rhvae instance.\n\nReturns\n\nA matrix representing the gradient of the Hamiltonian at the point (z, ρ) with respect to variable var.\n\nNote\n\nForwardDiff.jl is not composable with Zygote.jl. Thus, for backpropagation using this function one should use ReverseDiff.jl.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#Other-Functions","page":"RHVAE","title":"Other Functions","text":"","category":"section"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"AutoEncoderToolkit.RHVAEs.update_metric\nAutoEncoderToolkit.RHVAEs.update_metric!\nAutoEncoderToolkit.RHVAEs.G_inv\nAutoEncoderToolkit.RHVAEs.metric_tensor\nAutoEncoderToolkit.RHVAEs.riemannian_logprior\nAutoEncoderToolkit.RHVAEs.hamiltonian\nAutoEncoderToolkit.RHVAEs.∇hamiltonian\nAutoEncoderToolkit.RHVAEs._leapfrog_first_step\nAutoEncoderToolkit.RHVAEs._leapfrog_second_step\nAutoEncoderToolkit.RHVAEs._leapfrog_third_step\nAutoEncoderToolkit.RHVAEs.general_leapfrog_step\nAutoEncoderToolkit.RHVAEs.general_leapfrog_tempering_step\nAutoEncoderToolkit.RHVAEs._log_p̄\nAutoEncoderToolkit.RHVAEs._log_q̄\nAutoEncoderToolkit.RHVAEs.riemannian_hamiltonian_elbo","category":"page"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.update_metric","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.update_metric","text":"update_metric(\n rhvae::RHVAE{<:VAE{<:AbstractGaussianEncoder,<:AbstractVariationalDecoder}}\n)\n\nCompute the centroids_latent and M field of a RHVAE instance without modifying the instance. This method is used when needing to backpropagate through the RHVAE during training.\n\nArguments\n\nrhvae::RHVAE{<:VAE{<:AbstractGaussianEncoder,<:AbstractVariationalDecoder}}: The RHVAE instance to be updated.\n\nReturns\n\nNamedTuple with the following fields:\ncentroids_latent::Matrix: A matrix where each column represents a centroid cᵢ in the inverse metric computation.\nL::Array{<:Number, 3}: A 3D array where each slice represents a L_ψᵢ matrix.\nM::Array{<:Number, 3}: A 3D array where each slice represents a Lψᵢ Lψᵢᵀ.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.update_metric!","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.update_metric!","text":"update_metric!(\n rhvae::RHVAE{<:VAE{<:AbstractGaussianEncoder,<:AbstractVariationalDecoder}},\n params::NamedTuple\n)\n\nUpdate the centroids_latent and M fields of a RHVAE instance in place.\n\nThis function takes a RHVAE instance and a named tuple params containing the new values for centroids_latent and M. It updates the centroids_latent, L, and M fields of the RHVAE instance with the provided values.\n\nArguments\n\nrhvae::RHVAE{<:VAE{<:AbstractGaussianEncoder,<:AbstractVariationalDecoder}}: The RHVAE instance to update.\nparams::NamedTuple: A named tuple containing the new values for centroids_latent and M. Must have the keys :centroids_latent, :L, and :M.\n\nReturns\n\nNothing. The RHVAE instance is updated in place.\n\n\n\n\n\nupdate_metric!(\n rhvae::RHVAE{\n <:VAE{<:AbstractGaussianEncoder,<:AbstractVariationalDecoder}\n }\n)\n\nUpdate the centroids_latent, and M fields of a RHVAE instance in place.\n\nThis function takes a RHVAE instance as input and modifies its centroids_latent and M fields. The centroids_latent field is updated by running the centroids_data through the encoder of the underlying VAE and extracting the mean (µ) of the resulting Gaussian distribution. The M field is updated by running each column of the centroids_data through the metric_chain and concatenating the results along the third dimension, then each slice is updated by multiplying each slice of L by its transpose and concating the results along the third dimension.\n\nArguments\n\nrhvae::RHVAE{<:VAE{<:AbstractGaussianEncoder,<:AbstractVariationalDecoder}}: The RHVAE instance to be updated.\n\nNotes\n\nThis function modifies the RHVAE instance in place, so it does not return anything. The changes are made directly to the centroids_latent, L, and M fields of the input RHVAE instance.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.G_inv","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.G_inv","text":"G_inv(\n z::AbstractVecOrMat,\n centroids_latent::AbstractMatrix,\n M::AbstractArray{<:Number,3},\n T::Number,\n λ::Number,\n)\n\nCompute the inverse of the metric tensor G for a given point in the latent space.\n\nThis function takes a point z in the latent space, the centroids_latent of the RHVAE instance, a 3D array M representing the metric tensor, a temperature T, and a regularization factor λ, and computes the inverse of the metric tensor G at that point. The computation is based on the centroids and the temperature, as well as a regularization term. The inverse metric is computed as follows:\n\nG⁻¹(z) = ∑ᵢ₌₁ⁿ Lψᵢ Lψᵢᵀ exp(-‖z - cᵢ‖₂² / T²) + λIₗ,\n\nwhere Lψᵢ is computed by the MetricChain, T is the temperature, λ is a regularization factor, and each column of `centroidslatent` are the cᵢ.\n\nArguments\n\nz::AbstractVecOrMat: The point in the latent space. If a matrix, each column represents a point in the latent space.\ncentroids_latent::AbstractMatrix: The centroids in the latent space.\nM::AbstractArray{<:Number,3}: The 3D array containing the symmetric matrices used to compute the inverse metric tensor.\nT::N: The temperature.\nλ::N: The regularization factor.\n\nReturns\n\nA matrix or 3D array representing the inverse of the metric tensor G at the point z. If a 3D array, each slice represents the inverse metric tensor at a different point in the latent space.\n\nNotes\n\nThe computation involves the squared Euclidean distance between z and each centroid, the exponential of the negative of these distances divided by the square of the temperature, and a regularization term proportional to the identity matrix. The result is a matrix of the same size as the latent space.\n\nGPU support\n\nThis function supports CPU and GPU arrays.\n\n\n\n\n\nG_inv( \n z::AbstractVecOrMat,\n metric_param::Union{RHVAE,NamedTuple},\n)\n\nCompute the inverse of the metric tensor G for a given point in the latent space.\n\nThis function takes a RHVAE instance and a point z in the latent space, and computes the inverse of the metric tensor G at that point. The computation is based on the centroids and the temperature of the RHVAE instance, as well as a regularization term. The inverse metric is computed as follows:\n\nG⁻¹(z) = ∑ᵢ₌₁ⁿ Lψᵢ Lψᵢᵀ exp(-‖z - cᵢ‖₂² / T²) + λIₗ,\n\nwhere Lψᵢ is computed by the MetricChain, T is the temperature, λ is a regularization factor, and each column of `centroidslatent` are the cᵢ.\n\nArguments\n\nz::AbstractVecOrMat: The point in the latent space. If a matrix, each column represents a point in the latent space.\nmetric_param::Union{RHVAE,NamedTuple}: Either an RHVAE instance or a named tuple containing the fields centroids_latent, M, T, and λ.\n\nReturns\n\nA matrix representing the inverse of the metric tensor G at the point z.\n\nNotes\n\nThe computation involves the squared Euclidean distance between z and each centroid of the RHVAE instance, the exponential of the negative of these distances divided by the square of the temperature, and a regularization term proportional to the identity matrix. The result is a matrix of the same size as the latent space.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.metric_tensor","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.metric_tensor","text":"metric_tensor(\n z::AbstractVecOrMat,\n metric_param::Union{RHVAE,NamedTuple},\n)\n\nCompute the metric tensor G for a given point in the latent space. This function is a wrapper that determines the type of the input z and calls the appropriate specialized function _metric_tensor to perform the actual computation.\n\nThis function takes a RHVAE instance or a named tuple containing the fields centroids_latent, M, T, and λ, and a point z in the latent space, and computes the metric tensor G at that point. The computation is based on the inverse of the metric tensor G, which is computed by the G_inv function.\n\nArguments\n\nz::AbstractVecOrMat: The point in the latent space. If a matrix, each column represents a point in the latent space.\nmetric_param::Union{RHVAE,NamedTuple}: Either an RHVAE instance or a named tuple containing the fields centroids_latent, M, T, and λ.\n\nReturns\n\nA matrix representing the metric tensor G at the point z.\n\nNotes\n\nThe computation involves the inverse of the metric tensor G at the point z. The result is a matrix of the same size as the latent space.\n\nGPU Support\n\nThis function supports CPU and GPU arrays.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.riemannian_logprior","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.riemannian_logprior","text":"riemannian_logprior(\n ρ::AbstractVector,\n G⁻¹::AbstractMatrix,\n logdetG::Number;\n)\n\nCPU AbstractVector version of the riemannian_logprior function.\n\n\n\n\n\nriemannian_logprior(\n ρ::AbstractVector,\n G⁻¹::AbstractMatrix,\n logdetG::Number,\n)\n\nCPU AbstractMatrix version of the riemannian_logprior function.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.hamiltonian","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.hamiltonian","text":"hamiltonian(\n x::AbstractArray,\n z::AbstractVecOrMat,\n ρ::AbstractVecOrMat,\n G⁻¹::AbstractArray,\n logdetG::Union{<:Number,<:AbstractVector},\n decoder::AbstractVariationalDecoder,\n decoder_output::NamedTuple;\n decoder_loglikelihood::Function=decoder_loglikelihood,\n position_logprior::Function=spherical_logprior,\n momentum_logprior::Function=riemannian_logprior,\n)\n\nCompute the Hamiltonian for a given point in the latent space and a given momentum.\n\nThis function takes a point x in the data space, a point z in the latent space, a momentum ρ, the inverse of the Riemannian metric tensor G⁻¹, a decoder of type AbstractVariationalDecoder, and a decoder_output NamedTuple, and computes the Hamiltonian. The computation is based on the log-likelihood of the decoder, the log-prior of the latent space, and the inverse of the metric tensor G at the point z.\n\nThe Hamiltonian is computed as follows:\n\nHₓ(z, ρ) = Uₓ(z) + κ(ρ),\n\nwhere Uₓ(z) is the potential energy, and κ(ρ) is the kinetic energy. The potential energy is defined as follows:\n\nUₓ(z) = -log p(x|z) - log p(z),\n\nwhere p(x|z) is the log-likelihood of the decoder and p(z) is the log-prior in latent space. The kinetic energy is defined as follows:\n\nκ(ρ) = -log p(ρ),\n\nwhere p(ρ) is the log-prior of the momentum.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported, but the last dimension of the array should be of size 1.\nz::AbstractVecOrMat: The point in the latent space.\nρ::AbstractVecOrMat: The momentum.\nG⁻¹::AbstractArray: The inverse of the Riemannian metric tensor. This should be computed elsewhere and should correspond to the given z value.\nlogdetG::Union{<:Number,AbstractVector}: The log determinant of the Riemannian metric tensor. This should be computed elsewhere and should correspond to the given z value.\ndecoder::AbstractVariationalDecoder: The decoder instance. This is not used in the computation of the Hamiltonian, but is passed to the decoder_loglikelihood function to know which method to use.\ndecoder_output::NamedTuple: The output of the decoder.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function: The function to compute the log-likelihood of the decoder reconstruction. Default is decoder_loglikelihood. This function must take as input the decoder, the point x in the data space, and the decoder_output.\nposition_logprior::Function: The function to compute the log-prior of the latent space position. Default is spherical_logprior. This function must take as input the point z in the latent space.\nmomentum_logprior::Function: The function to compute the log-prior of the momentum. Default is riemannian_logprior. This function must take as input the momentum ρ and the inverse of the Riemannian metric tensor G⁻¹.\n\nReturns\n\nA scalar representing the Hamiltonian at the point z with the momentum ρ.\n\nNote\n\nThe inverse of the Riemannian metric tensor G⁻¹ is assumed to be computed elsewhere. The user must ensure that the provided G⁻¹ corresponds to the given z value.\n\n\n\n\n\nhamiltonian(\n x::AbstractArray,\n z::AbstractVecOrMat,\n ρ::AbstractVecOrMat,\n rhvae::RHVAE;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n position_logprior::Function=spherical_logprior,\n momentum_logprior::Function=riemannian_logprior,\n G_inv::Function=G_inv,\n)\n\nCompute the Hamiltonian for a given point in the latent space and a given momentum.\n\nThis function takes a point x in the data space, a point z in the latent space, a momentum ρ, and an instance of RHVAE. It computes the inverse of the Riemannian metric tensor G⁻¹ and the output of the decoder internally, and then computes the Hamiltonian. The computation is based on the log-likelihood of the decoder, the log-prior of the latent space, and the inverse of the metric tensor G at the point z.\n\nThe Hamiltonian is computed as follows:\n\nHₓ(z, ρ) = Uₓ(z) + κ(ρ),\n\nwhere Uₓ(z) is the potential energy, and κ(ρ) is the kinetic energy. The potential energy is defined as follows:\n\nUₓ(z) = -log p(x|z) - log p(z),\n\nwhere p(x|z) is the log-likelihood of the decoder and p(z) is the log-prior in latent space. The kinetic energy is defined as follows:\n\nκ(ρ) = -log p(ρ),\n\nwhere p(ρ) is the log-prior of the momentum.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported, but the last dimension of the array should be of size 1.\nz::AbstractVector: The point in the latent space.\nρ::AbstractVector: The momentum.\nrhvae::RHVAE: An instance of the RHVAE model.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function: The function to compute the log-likelihood of the decoder reconstruction. Default is decoder_loglikelihood. This function must take as input the decoder, the point x in the data space, and the decoder_output.\nposition_logprior::Function: The function to compute the log-prior of the latent space position. Default is spherical_logprior. This function must take as input the point z in the latent space.\nmomentum_logprior::Function: The function to compute the log-prior of the momentum. Default is riemannian_logprior. This function must take as input the momentum ρ and the inverse of the Riemannian metric tensor G⁻¹.\nG_inv::Function: The function to compute the inverse of the Riemannian metric tensor. Default is G_inv. This function must take as input the point z in the latent space and the rhvae instance.\n\nReturns\n\nA scalar representing the Hamiltonian at the point z with the momentum ρ.\n\nNote\n\nThe inverse of the Riemannian metric tensor G⁻¹, the log determinant of the metric tensor, and the output of the decoder are computed internally in this function. The user does not need to provide these as inputs.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.∇hamiltonian","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.∇hamiltonian","text":"∇hamiltonian(\n x::AbstractArray,\n z::AbstractVecOrMat,\n ρ::AbstractVecOrMat,\n G⁻¹::AbstractArray,\n logdetG::Union{<:Number,AbstractVector},\n decoder::AbstractVariationalDecoder,\n decoder_output::NamedTuple,\n var::Symbol;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n position_logprior::Function=spherical_logprior,\n momentum_logprior::Function=riemannian_logprior,\n adtype::Symbol=:TaylorDiff,\n adkwargs::Union{NamedTuple,Dict}=Dict(),\n)\n\nCompute the gradient of the Hamiltonian with respect to a given variable using a specified automatic differentiation method.\n\nThis function takes a point x in the data space, a point z in the latent space, a momentum ρ, the inverse of the Riemannian metric tensor G⁻¹, a decoder of type AbstractVariationalDecoder, a decoder_output NamedTuple, and a variable var (:z or :ρ), and computes the gradient of the Hamiltonian with respect to var using the specified automatic differentiation method. The computation is based on the log-likelihood of the decoder, the log-prior of the latent space, and G⁻¹.\n\nThe Hamiltonian is computed as follows:\n\nHₓ(z, ρ) = Uₓ(z) + κ(ρ),\n\nwhere Uₓ(z) is the potential energy, and κ(ρ) is the kinetic energy. The potential energy is defined as follows:\n\nUₓ(z) = -log p(x|z) - log p(z),\n\nwhere p(x|z) is the log-likelihood of the decoder and p(z) is the log-prior in latent space. The kinetic energy is defined as follows:\n\nκ(ρ) = 0.5 * log((2π)ᴰ det G(z)) + 0.5 * ρᵀ G⁻¹ ρ\n\nwhere D is the dimension of the latent space, and G(z) is the metric tensor at the point z.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractVecOrMat: The point in the latent space. If matrix, each column represents a point in the latent space.\nρ::AbstractVecOrMat: The momentum. If matrix, each column represents a momentum vector.\nG⁻¹::AbstractArray: The inverse of the Riemannian metric tensor. If 3D array, each slice along the third dimension represents the inverse of the metric tensor at the corresponding column of z.\nlogdetG::Union{<:Number,AbstractVector}: The log determinant of the Riemannian metric tensor. If vector, each element represents the log determinant of the metric tensor at the corresponding column of z.\ndecoder::AbstractVariationalDecoder: The decoder instance.\ndecoder_output::NamedTuple: The output of the decoder.\nvar::Symbol: The variable with respect to which the gradient is computed. Must be :z or :ρ.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function: The function to compute the log-likelihood of the decoder reconstruction. Default is decoder_loglikelihood. This function must take as input the decoder, the point x in the data space, and the decoder_output.\nposition_logprior::Function: The function to compute the log-prior of the latent space position. Default is spherical_logprior. This function must take as input the point z in the latent space.\nmomentum_logprior::Function: The function to compute the log-prior of the momentum. Default is riemannian_logprior. This function must take as input the momentum ρ and G⁻¹.\nadtype::Symbol=:finite: The type of automatic differentiation method to use. Must be:finite,:ForwardDiff, or:TaylorDiff. Default is:finite`.\nadkwargs::Union{NamedTuple,Dict}=Dict(): Additional keyword arguments to pass to the automatic differentiation method.\n\nReturns\n\nA vector representing the gradient of the Hamiltonian at the point (z, ρ) with respect to variable var.\n\n\n\n\n\n∇hamiltonian(\n x::AbstractArray,\n z::AbstractVecOrMat,\n ρ::AbstractVecOrMat,\n rhvae::RHVAE,\n var::Symbol;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n position_logprior::Function=spherical_logprior,\n momentum_logprior::Function=riemannian_logprior,\n G_inv::Function=G_inv,\n adtype::Symbol=:TaylorDiff,\n adkwargs::Union{NamedTuple,Dict}=Dict(),\n)\n\nCompute the gradient of the Hamiltonian with respect to a given variable using a specified automatic differentiation method.\n\nThis function takes a point x in the data space, a point z in the latent space, a momentum ρ, an instance of RHVAE, and a variable var (:z or :ρ), and computes the gradient of the Hamiltonian with respect to var using the specified automatic differentiation method. The computation is based on the log-likelihood of the decoder, the log-prior of the latent space, and G_inv.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractVecOrMat: The point in the latent space. If matrix, each column represents a point in the latent space.\nρ::AbstractVecOrMat: The momentum. If matrix, each column represents a momentum vector.\nrhvae::RHVAE: An instance of the RHVAE model.\nvar::Symbol: The variable with respect to which the gradient is computed. Must be :z or :ρ.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function: The function to compute the log-likelihood of the decoder reconstruction. Default is decoder_loglikelihood. This function must take as input the decoder, the point x in the data space, and the decoder_output.\nposition_logprior::Function: The function to compute the log-prior of the latent space position. Default is spherical_logprior. This function must take as input the point z in the latent space.\nmomentum_logprior::Function: The function to compute the log-prior of the momentum. Default is riemannian_logprior. This function must take as input the momentum ρ and G_inv.\nG_inv::Function: The function to compute the inverse of the Riemannian metric tensor. Default is G_inv.\nadtype::Symbol=:finite: The type of automatic differentiation method to use. Must be:finite,:ForwardDiff, or:TaylorDiff. Default is:finite`.\nadkwargs::Union{NamedTuple,Dict}=Dict(): Additional keyword arguments to pass to the automatic differentiation method.\n\nReturns\n\nA vector representing the gradient of the Hamiltonian at the point (z, ρ) with respect to variable var.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs._leapfrog_first_step","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs._leapfrog_first_step","text":"_leapfrog_first_step(\n x::AbstractArray,\n z::AbstractVecOrMat,\n ρ::AbstractVecOrMat,\n G⁻¹::AbstractArray,\n logdetG::Union{<:Number,AbstractVector},\n decoder::AbstractVariationalDecoder,\n decoder_output::NamedTuple;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n steps::Int=3,\n ∇H_kwargs::Union{NamedTuple,Dict}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n position_logprior=spherical_logprior,\n momentum_logprior=riemannian_logprior,\n ),\n)\n\nPerform the first step of the generalized leapfrog integrator for Hamiltonian dynamics, defined as\n\nρ(t + ϵ/2) = ρ(t) - 0.5 * ϵ * ∇z_H(z(t), ρ(t + ϵ/2)).\n\nThis function is part of the generalized leapfrog integrator used in Hamiltonian dynamics. Unlike the standard leapfrog integrator, the generalized leapfrog integrator is implicit, which means it requires the use of fixed-point iterations to be solved.\n\nThe function takes a point x in the data space, a point z in the latent space, a momentum ρ, the inverse of the Riemannian metric tensor G⁻¹, a decoder of type AbstractVariationalDecoder, the output of the decoder decoder_output, a step size ϵ, and optionally the number of fixed-point iterations to perform (steps), a function to compute the gradient of the Hamiltonian (∇H), and a set of keyword arguments for ∇H (∇H_kwargs).\n\nThe function performs the following update for steps times:\n\nρ̃ = ρ̃ - 0.5 * ϵ * ∇hamiltonian(x, z, ρ̃, G⁻¹, decoder, decoderoutput, :z; ∇Hkwargs...)\n\nwhere ∇H is the gradient of the Hamiltonian with respect to the position variables z. The result is returned as ρ̃.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractVecOrMat: The point in the latent space. If matrix, each column represents a point in the latent space.\nρ::AbstractVecOrMat: The momentum. If matrux, each column represents a momentum vector.\nG⁻¹::AbstractArray: The inverse of the Riemannian metric tensor. If 3D array, each slice along the third dimension represents the inverse of the metric tensor at the corresponding column of z.\nlogdetG::Union{<:Number,AbstractVector}: The log determinant of the Riemannian metric tensor. If vector, each element represents the log determinant of the metric tensor at the corresponding column of z.\ndecoder::AbstractVariationalDecoder: The decoder instance.\ndecoder_output::NamedTuple: The output of the decoder.\n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}=0.01f0: The leapfrog step size. Default is 0.01f0.\nsteps::Int=3: The number of fixed-point iterations to perform. Default is 3.\n∇H_kwargs::Union{NamedTuple,Dict}: The keyword arguments for ∇hamiltonian. Default is a tuple with reconstruction_loglikelihood, position_logprior, momentum_logprior, and G_inv.\n\nReturns\n\nA vector representing the updated momentum after performing the first step of the generalized leapfrog integrator.\n\n\n\n\n\n_leapfrog_first_step(\n x::AbstractArray,\n z::AbstractVecOrMat,\n ρ::AbstractVecOrMat,\n rhvae::RHVAE;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n steps::Int=3,\n ∇H_kwargs::Union{NamedTuple,Dict}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n position_logprior=spherical_logprior,\n momentum_logprior=riemannian_logprior,\n ),\n G_inv::Function=G_inv,\n)\n\nPerform the first step of the generalized leapfrog integrator for Hamiltonian dynamics, defined as\n\nρ(t + ϵ/2) = ρ(t) - 0.5 * ϵ * ∇z_H(z(t), ρ(t + ϵ/2)).\n\nThis function is part of the generalized leapfrog integrator used in Hamiltonian dynamics. Unlike the standard leapfrog integrator, the generalized leapfrog integrator is implicit, which means it requires the use of fixed-point iterations to be solved.\n\nThe function takes a RHVAE instance, a point x in the data space, a point z in the latent space, a momentum ρ, a step size ϵ, and optionally the number of fixed-point iterations to perform (steps), a function to compute the gradient of the Hamiltonian (∇H), and a set of keyword arguments for ∇H (∇H_kwargs).\n\nThe function performs the following update for steps times:\n\nρ̃ = ρ̃ - 0.5 * ϵ * ∇hamiltonian(rhvae, x, z, ρ̃, :z; ∇H_kwargs...)\n\nwhere ∇H is the gradient of the Hamiltonian with respect to the position variables z. The result is returned as ρ̃.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractVecOrMat: The point in the latent space. If matrix, each column represents a point in the latent space.\nρ::AbstractVecOrMat: The momentum. If matrux, each column represents a momentum vector.\nrhvae::RHVAE: The RHVAE instance.\n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}=0.01f0: The leapfrog step size. Default is 0.01f0.\nsteps::Int=3: The number of fixed-point iterations to perform. Default is 3.\n∇H_kwargs::Union{NamedTuple,Dict}: The keyword arguments for ∇hamiltonian. Default is a tuple with reconstruction_loglikelihood, position_logprior, and momentum_logprior.\nG_inv::Function: The function to compute the inverse of the Riemannian metric tensor. Default is G_inv.\n\nReturns\n\nA vector representing the updated momentum after performing the first step of the generalized leapfrog integrator.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs._leapfrog_second_step","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs._leapfrog_second_step","text":"_leapfrog_second_step(\n x::AbstractArray,\n z::AbstractVecOrMat,\n ρ::AbstractVecOrMat,\n G⁻¹::AbstractArray,\n logdetG::Union{<:Number,AbstractVector},\n decoder::AbstractVariationalDecoder,\n decoder_output::NamedTuple;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n steps::Int=3,\n ∇H_kwargs::Union{NamedTuple,Dict}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n position_logprior=spherical_logprior,\n momentum_logprior=riemannian_logprior,\n ),\n)\n\nPerform the second step of the generalized leapfrog integrator for Hamiltonian dynamics, defined as\n\nz(t + ϵ) = z(t) + 0.5 * ϵ * [∇ρH(z(t), ρ(t+ϵ/2)) + ∇ρH(z(t + ϵ), ρ(t+ϵ/2))].\n\nThis function is part of the generalized leapfrog integrator used in Hamiltonian dynamics. Unlike the standard leapfrog integrator, the generalized leapfrog integrator is implicit, which means it requires the use of fixed-point iterations to be solved.\n\nThe function takes a point x in the data space, a point z in the latent space, a momentum ρ, the inverse of the Riemannian metric tensor G⁻¹, a decoder of type AbstractVariationalDecoder, the output of the decoder decoder_output, a step size ϵ, and optionally the number of fixed-point iterations to perform (steps), a function to compute the gradient of the Hamiltonian (∇H), and a set of keyword arguments for ∇H (∇H_kwargs).\n\nThe function performs the following update for steps times:\n\nz̄ = z̄ + 0.5 * ϵ * ( ∇hamiltonian(x, z̄, ρ, G⁻¹, decoder, decoderoutput, :ρ; ∇Hkwargs...) + ∇hamiltonian(x, z, ρ, G⁻¹, decoder, decoderoutput, :ρ; ∇Hkwargs...) )\n\nwhere ∇H is the gradient of the Hamiltonian with respect to the momentum variables ρ. The result is returned as z̄.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractVecOrMat: The point in the latent space. If matrix, each column represents a point in the latent space.\nρ::AbstractVecOrMat: The momentum. If matrux, each column represents a momentum vector.\nG⁻¹::AbstractArray: The inverse of the Riemannian metric tensor. If 3D array, each slice along the third dimension represents the inverse of the metric tensor at the corresponding column of z.\nlogdetG::Union{<:Number,AbstractVector}: The log determinant of the Riemannian metric tensor. If vector, each element represents the log determinant of the metric tensor at the corresponding column of z.\ndecoder::AbstractVariationalDecoder: The decoder instance.\ndecoder_output::NamedTuple: The output of the decoder.\n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}=0.01f0: The step size. Default is 0.01.\nsteps::Int=3: The number of fixed-point iterations to perform. Default is 3.\n∇H_kwargs::Union{NamedTuple,Dict}: The keyword arguments for ∇hamiltonian. Default is a tuple with reconstruction_loglikelihood, position_logprior, momentum_logprior.\n\nReturns\n\nA vector representing the updated position after performing the second step of the generalized leapfrog integrator.\n\n\n\n\n\n_leapfrog_second_step(\n x::AbstractArray,\n z::AbstractVecOrMat,\n ρ::AbstractVecOrMat,\n rhvae::RHVAE;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n steps::Int=3,\n ∇H_kwargs::Union{NamedTuple,Dict}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n position_logprior=spherical_logprior,\n momentum_logprior=riemannian_logprior,\n ),\n G_inv::Function=G_inv,\n)\n\nPerform the second step of the generalized leapfrog integrator for Hamiltonian dynamics, defined as\n\nz(t + ϵ) = z(t) + 0.5 * ϵ * [∇ρH(z(t), ρ(t+ϵ/2)) + ∇ρH(z(t + ϵ), ρ(t+ϵ/2))].\n\nThis function is part of the generalized leapfrog integrator used in Hamiltonian dynamics. Unlike the standard leapfrog integrator, the generalized leapfrog integrator is implicit, which means it requires the use of fixed-point iterations to be solved.\n\nThe function takes a RHVAE instance, a point x in the data space, a point z in the latent space, a momentum ρ, a step size ϵ, and optionally the number of fixed-point iterations to perform (steps), a function to compute the gradient of the Hamiltonian (∇H), and a set of keyword arguments for ∇H (∇H_kwargs).\n\nThe function performs the following update for steps times:\n\nz̄ = z̄ + 0.5 * ϵ * ( ∇hamiltonian(rhvae, x, z̄, ρ, :ρ; ∇Hkwargs...) + ∇hamiltonian(rhvae, x, z, ρ, :ρ; ∇Hkwargs...) )\n\nwhere ∇H is the gradient of the Hamiltonian with respect to the momentum variables ρ. The result is returned as z̄.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractVecOrMat: The point in the latent space. If matrix, each column represents a point in the latent space.\nρ::AbstractVecOrMat: The momentum. If matrux, each column represents a momentum vector.\nrhvae::RHVAE: The RHVAE instance.\n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}=0.01f0: The leapfrog step size. Default is 0.01f0.\nsteps::Int=3: The number of fixed-point iterations to perform. Default is 3. Typically, 3 iterations are sufficient.\n∇H_kwargs::Union{NamedTuple,Dict}: The keyword arguments for ∇hamiltonian. Default is a tuple with reconstruction_loglikelihood, position_logprior, and momentum_logprior.\nG_inv::Function: The function to compute the inverse of the Riemannian metric tensor. Default is G_inv.\n\nReturns\n\nA vector representing the updated position after performing the second step of the generalized leapfrog integrator.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs._leapfrog_third_step","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs._leapfrog_third_step","text":"_leapfrog_third_step(\n x::AbstractArray,\n z::AbstractVecOrMat,\n ρ::AbstractVecOrMat,\n G⁻¹::AbstractArray,\n logdetG::Union{<:Number,AbstractVector},\n decoder::AbstractVariationalDecoder,\n decoder_output::NamedTuple;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n ∇H_kwargs::Union{NamedTuple,Dict}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n position_logprior=spherical_logprior,\n momentum_logprior=riemannian_logprior,\n ),\n)\n\nPerform the third step of the generalized leapfrog integrator for Hamiltonian dynamics, defined as\n\nρ(t + ϵ) = ρ(t + ϵ/2) - 0.5 * ϵ * ∇z_H(z(t + ϵ), ρ(t + ϵ/2)).\n\nThis function is part of the generalized leapfrog integrator used in Hamiltonian dynamics. Unlike the standard leapfrog integrator, the generalized leapfrog integrator is implicit, which means it requires the use of fixed-point iterations to be solved.\n\nThe function takes a point x in the data space, a point z in the latent space, a momentum ρ, the inverse of the Riemannian metric tensor G⁻¹, a decoder of type AbstractVariationalDecoder, the output of the decoder decoder_output, a step size ϵ, a function to compute the gradient of the Hamiltonian (∇H), and a set of keyword arguments for ∇H (∇H_kwargs).\n\nThe function performs the following update:\n\nρ̃ = ρ - 0.5 * ϵ * ∇hamiltonian( x, z, ρ, G⁻¹, decoder, decoderoutput, :z; ∇Hkwargs... )\n\nwhere ∇H is the gradient of the Hamiltonian with respect to the position variables z. The result is returned as ρ̃.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractVecOrMat: The point in the latent space. If matrix, each column represents a point in the latent space.\nρ::AbstractVecOrMat: The momentum. If matrux, each column represents a momentum vector.\nG⁻¹::AbstractArray: The inverse of the Riemannian metric tensor. If 3D array, each slice along the third dimension represents the inverse of the metric tensor at the corresponding column of z.\nlogdetG::Union{<:Number,AbstractVector}: The log determinant of the Riemannian metric tensor. If vector, each element represents the log determinant of the metric tensor at the corresponding column of z.\ndecoder::AbstractVariationalDecoder: The decoder instance.\ndecoder_output::NamedTuple: The output of the decoder.\n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}=0.01f0: The step size. Default is 0.01f0.\n∇H_kwargs::Union{NamedTuple,Dict}: The keyword arguments for ∇hamiltonian. Default is a tuple with reconstruction_loglikelihood, position_logprior, momentum_logprior.\n\nReturns\n\nA vector representing the updated momentum after performing the third step of the generalized leapfrog integrator.\n\n\n\n\n\n_leapfrog_third_step(\n x::AbstractArray,\n z::AbstractVecOrMat,\n ρ::AbstractVecOrMat,\n rhvae::RHVAE;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n steps::Int=3,\n ∇H_kwargs::Union{NamedTuple,Dict}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n position_logprior=spherical_logprior,\n momentum_logprior=riemannian_logprior,\n ),\n G_inv::Function=G_inv,\n)\n\nPerform the third step of the generalized leapfrog integrator for Hamiltonian dynamics, defined as\n\nρ(t + ϵ) = ρ(t + ϵ/2) - 0.5 * ϵ * ∇z_H(z(t + ϵ), ρ(t + ϵ/2)).\n\nThis function is part of the generalized leapfrog integrator used in Hamiltonian dynamics. Unlike the standard leapfrog integrator, the generalized leapfrog integrator is implicit, which means it requires the use of fixed-point iterations to be solved.\n\nThe function takes a RHVAE instance, a point x in the data space, a point z in the latent space, a momentum ρ, a step size ϵ, the number of fixed-point iterations to perform (steps), a function to compute the gradient of the Hamiltonian (∇H), and a set of keyword arguments for ∇H (∇H_kwargs).\n\nThe function performs the following update:\n\nρ̃ = ρ - 0.5 * ϵ * ∇hamiltonian(rhvae, x, z, ρ, :z; ∇H_kwargs...)\n\nwhere ∇H is the gradient of the Hamiltonian with respect to the position variables z. The result is returned as ρ̃.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractVecOrMat: The point in the latent space. If matrix, each column represents a point in the latent space.\nρ::AbstractVecOrMat: The momentum. If matrux, each column represents a momentum vector.\nrhvae::RHVAE: The RHVAE instance.\n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}: The leapfrog step size. Default is 0.01f0.\nsteps::Int=3: The number of fixed-point iterations to perform. Default is 3.\n∇H_kwargs::Union{NamedTuple,Dict}: The keyword arguments for ∇hamiltonian. Default is a tuple with reconstruction_loglikelihood, position_logprior, and momentum_logprior.\nG_inv::Function: The function to compute the inverse of the Riemannian metric tensor. Default is G_inv.\n\nReturns\n\nA vector representing the updated momentum after performing the third step of the generalized leapfrog integrator.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.general_leapfrog_step","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.general_leapfrog_step","text":"general_leapfrog_step(\n x::AbstractArray,\n z::AbstractVecOrMat,\n ρ::AbstractVecOrMat,\n G⁻¹::AbstractArray,\n logdetG::Union{<:Number,AbstractVector},\n decoder::AbstractVariationalDecoder,\n decoder_output::NamedTuple,\n metric_param::NamedTuple;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n steps::Int=3,\n ∇H_kwargs::Union{NamedTuple,Dict}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n position_logprior=spherical_logprior,\n momentum_logprior=riemannian_logprior,\n ),\n G_inv::Function=G_inv,\n)\n\nPerform a full step of the generalized leapfrog integrator for Hamiltonian dynamics.\n\nThe leapfrog integrator is a numerical integration scheme used to simulate Hamiltonian dynamics. It consists of three steps:\n\nHalf update of the momentum variable: \nρ(t + ϵ/2) = ρ(t) - 0.5 * ϵ * ∇z_H(z(t), ρ(t + ϵ/2)).\nFull update of the position variable: \n\nz(t + ϵ) = z(t) + 0.5 * ϵ * [∇ρH(z(t), ρ(t+ϵ/2)) + ∇ρH(z(t + ϵ), ρ(t+ϵ/2))].\n\nHalf update of the momentum variable: \nρ(t + ϵ) = ρ(t + ϵ/2) - 0.5 * ϵ * ∇z_H(z(t + ϵ), ρ(t + ϵ/2)).\n\nThis function performs these three steps in sequence, using the _leapfrog_first_step, _leapfrog_second_step and _leapfrog_third_step helper functions.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractVecOrMat: The point in the latent space. If matrix, each column represents a point in the latent space.\nρ::AbstractVecOrMat: The momentum. If matrux, each column represents a momentum vector.\nG⁻¹::AbstractArray: The inverse of the Riemannian metric tensor. If 3D array, each slice along the third dimension represents the inverse of the metric tensor at the corresponding column of z.\nlogdetG::Union{<:Number,AbstractVector}: The log determinant of the Riemannian metric tensor. If vector, each element represents the log determinant of the metric tensor at the corresponding column of z.\ndecoder::AbstractVariationalDecoder: The decoder instance.\ndecoder_output::NamedTuple: The output of the decoder.\nmetric_param::NamedTuple: The parameters for the metric tensor.\n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}=0.01f0: The step size. Default is 0.01.\nsteps::Int=3: The number of fixed-point iterations to perform. Default is 3. Typically, 3 iterations are sufficient.\n∇H_kwargs::Union{NamedTuple,Dict}: The keyword arguments for ∇hamiltonian. Default is a tuple with decoder_loglikelihood, position_logprior, momentum_logprior, and G_inv.\nG_inv::Function=G_inv: The function to compute the inverse of the Riemannian metric tensor.\n\nReturns\n\nA tuple (z̄, ρ̄, Ḡ⁻¹, logdetḠ, decoder_update) representing the updated position, momentum, the inverse of the updated Riemannian metric tensor, the log of the determinant of the metric tensor and the updated decoder outputs after performing the full leapfrog step.\n\n\n\n\n\ngeneral_leapfrog_step(\n x::AbstractArray,\n z::AbstractVecOrMat,\n ρ::AbstractVecOrMat,\n rhvae::RHVAE;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n steps::Int=3,\n ∇H_kwargs::Union{NamedTuple,Dict}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n position_logprior=spherical_logprior,\n momentum_logprior=riemannian_logprior,\n G_inv=G_inv,\n ),\n)\n\nPerform a full step of the generalized leapfrog integrator for Hamiltonian dynamics.\n\nThe leapfrog integrator is a numerical integration scheme used to simulate Hamiltonian dynamics. It consists of three steps:\n\nHalf update of the momentum variable: ρ(t + ϵ/2) = ρ(t) - 0.5 * ϵ * ∇z_H(z(t), ρ(t + ϵ/2)).\nFull update of the position variable: z(t + ϵ) = z(t) + 0.5 * ϵ * [∇ρ_H(z(t),\n\nρ(t+ϵ/2)) + ∇ρ_H(z(t + ϵ), ρ(t+ϵ/2))].\n\nHalf update of the momentum variable: ρ(t + ϵ) = ρ(t + ϵ/2) - 0.5 * ϵ * ∇z_H(z(t + ϵ), ρ(t + ϵ/2)).\n\nThis function performs these three steps in sequence, using the _leapfrog_first_step and _leapfrog_second_step helper functions.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractVecOrMat: The point in the latent space. If matrix, each column represents a point in the latent space.\nρ::AbstractVecOrMat: The momentum. If matrux, each column represents a momentum vector.\nrhvae::RHVAE: The RHVAE instance.\n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}=0.01f0: The leapfrog step size. Default is 0.01f0.\nsteps::Int=3: The number of fixed-point iterations to perform. Default is 3. Typically, 3 iterations are sufficient.\n∇H_kwargs::Union{NamedTuple,Dict}: The keyword arguments for ∇hamiltonian. Default is a tuple with decoder_loglikelihood, position_logprior, and momentum_logprior\nG_inv::Function: The function to compute the inverse of the Riemannian metric tensor. Default is G_inv.\nA tuple (z̄, ρ̄, Ḡ⁻¹, logdetḠ, decoder_update) representing the updated position, momentum, the inverse of the updated Riemannian metric tensor, the log of the determinant of the metric tensor, and the updated decoder outputs after performing the full leapfrog step.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.general_leapfrog_tempering_step","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.general_leapfrog_tempering_step","text":"general_leapfrog_tempering_step(\n x::AbstractArray,\n zₒ::AbstractVecOrMat,\n Gₒ⁻¹::AbstractArray,\n logdetGₒ::Union{<:Number,AbstractVector},\n decoder::AbstractVariationalDecoder,\n decoder_output::NamedTuple,\n metric_param::NamedTuple;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n K::Int=3,\n βₒ::Number=0.3f0,\n steps::Int=3,\n ∇H_kwargs::Union{NamedTuple,Dict}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n position_logprior=spherical_logprior,\n momentum_logprior=riemannian_logprior,\n G_inv=G_inv,\n ),\n tempering_schedule::Function=quadratic_tempering,\n)\n\nCombines the leapfrog and tempering steps into a single function for the Riemannian Hamiltonian Variational Autoencoder (RHVAE).\n\nArguments\n\nx::AbstractArray: The data to be processed. If Array, the last dimension must be of size 1.\nzₒ::AbstractVector: The initial latent variable. \nGₒ⁻¹::AbstractArray: The initial inverse of the Riemannian metric tensor.\nlogdetGₒ::Union{<:Number,AbstractVector}: The log determinant of the initial Riemannian metric tensor. If vector, each element represents the log determinant of the metric tensor at the corresponding column of zₒ.\ndecoder::AbstractVariationalDecoder: The decoder of the RHVAE model.\ndecoder_output::NamedTuple: The output of the decoder.\nmetric_param::NamedTuple: The parameters of the metric tensor.\n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}: The step size for the leapfrog steps in the HMC algorithm. This can be a scalar or an array. Default is 0.01f0. \nK::Int: The number of leapfrog steps to perform in the Hamiltonian Monte Carlo (HMC) algorithm. Default is 3.\nβₒ::Number: The initial inverse temperature for the tempering schedule. Default is 0.3f0.\nsteps::Int: The number of fixed-point iterations to perform. Default is 3.\n∇H_kwargs::Union{NamedTuple,Dict}: Additional keyword arguments to be passed to the ∇hamiltonian function. Default is a NamedTuple with reconstruction_loglikelihood, position_logprior, and momentum_logprior.\ntempering_schedule::Function: The function to compute the inverse temperature at each step in the HMC algorithm. Defaults to quadratic_tempering. This function must take three arguments: First, βₒ, an initial inverse temperature, second, k, the current step in the tempering schedule, and third, K, the total number of steps in the tempering schedule.\n\nReturns\n\nA NamedTuple with the following keys: \nz_init: The initial latent variable. \nρ_init: The initial momentum variable. \nGinv_init: The initial inverse of the Riemannian metric tensor. \nlogdetG_init: The initial log determinant of the Riemannian metric tensor.\nz_final: The final latent variable after K leapfrog steps. \nρ_final: The final momentum variable after K leapfrog steps. \nGinv_final: The final inverse of the Riemannian metric tensor after K leapfrog steps.\nlogdetG_final: The final log determinant of the Riemannian metric tensor after K leapfrog steps.\nThe decoder output at the final latent variable is also returned. Note: This is not in the same named tuple as the other outputs, but as a separate output.\n\nDescription\n\nThe function first samples a random momentum variable γₒ from a standard normal distribution and scales it by the inverse square root of the initial inverse temperature βₒ to obtain the initial momentum variable ρₒ. Then, it performs K leapfrog steps, each followed by a tempering step, to generate a new sample from the latent space.\n\nNote\n\nEnsure the input data x and the initial latent variable zₒ match the expected input dimensionality for the RHVAE model.\n\n\n\n\n\ngeneral_leapfrog_tempering_step(\n x::AbstractArray,\n zₒ::AbstractVecOrMat,\n rhvae::RHVAE;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n K::Int=3,\n βₒ::Number=0.3f0,\n steps::Int=3,\n ∇H_kwargs::Union{NamedTuple,Dict}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n position_logprior=spherical_logprior,\n momentum_logprior=riemannian_logprior,\n ),\n G_inv::Function=G_inv,\n tempering_schedule::Function=quadratic_tempering,\n)\n\nCombines the leapfrog and tempering steps into a single function for the Riemannian Hamiltonian Variational Autoencoder (RHVAE).\n\nArguments\n\nx::AbstractArray: The data to be processed. If Array, the last dimension must be of size 1.\nzₒ::AbstractVecOrMat: The initial latent variable. \n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}: The step size for the leapfrog steps in the HMC algorithm. This can be a scalar or an array. Default is 0.01f0. \nK::Int: The number of leapfrog steps to perform in the Hamiltonian Monte Carlo (HMC) algorithm. Default is 3.\nβₒ::Number: The initial inverse temperature for the tempering schedule. Default is 0.3f0.\nsteps::Int: The number of fixed-point iterations to perform. Default is 3.\n∇H_kwargs::Union{NamedTuple,Dict}: Additional keyword arguments to be passed to the ∇hamiltonian function. Default is a NamedTuple with reconstruction_loglikelihood, position_logprior, and momentum_logprior.\ntempering_schedule::Function: The function to compute the inverse temperature at each step in the HMC algorithm. Defaults to quadratic_tempering. This function must take three arguments: First, βₒ, an initial inverse temperature, second, k, the current step in the tempering schedule, and third, K, the total number of steps in the tempering schedule.\n\nReturns\n\nA NamedTuple with the following keys: \nz_init: The initial latent variable. \nρ_init: The initial momentum variable. \nGinv_init: The initial inverse of the Riemannian metric tensor. \nz_final: The final latent variable after K leapfrog steps. \nρ_final: The final momentum variable after K leapfrog steps. \nGinv_final: The final inverse of the Riemannian metric tensor after K leapfrog steps.\nThe decoder output at the final latent variable is also returned. Note: This is not in the same named tuple as the other outputs, but as a separate output.\n\nDescription\n\nThe function first samples a random momentum variable γₒ from a standard normal distribution and scales it by the inverse square root of the initial inverse temperature βₒ to obtain the initial momentum variable ρₒ. Then, it performs K leapfrog steps, each followed by a tempering step, to generate a new sample from the latent space.\n\nNote\n\nEnsure the input data x and the initial latent variable zₒ match the expected input dimensionality for the RHVAE model.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs._log_p̄","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs._log_p̄","text":"_log_p̄(\n x::AbstractArray,\n rhvae::RHVAE{VAE{E,D}},\n rhvae_outputs::NamedTuple;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n position_logprior::Function=spherical_logprior,\n momentum_logprior::Function=riemannian_logprior,\n prefactor::AbstractArray=ones(Float32, 3),\n)\n\nThis is an internal function used in riemannian_hamiltonian_elbo to compute the numerator of the unbiased estimator of the marginal likelihood. The function computes the sum of the log likelihood of the data given the latent variables, the log prior of the latent variables, and the log prior of the momentum variables.\n\nlog p̄ = log p(x | zₖ) + log p(zₖ) + log p(ρₖ(zₖ))\n\nArguments\n\nx::AbstractArray: The input data. If Array, the last dimension must contain each of the data points.\nrhvae::RHVAE{<:VAE{<:AbstractGaussianEncoder,<:AbstractGaussianLogDecoder}}: The Riemannian Hamiltonian Variational Autoencoder (RHVAE) model.\nrhvae_outputs::NamedTuple: The outputs of the RHVAE, including the final latent variables zₖ and the final momentum variables ρₖ.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function: The function to compute the log likelihood of the data given the latent variables. Default is decoder_loglikelihood.\nposition_logprior::Function: The function to compute the log prior of the latent variables. Default is spherical_logprior.\nmomentum_logprior::Function: The function to compute the log prior of the momentum variables. Default is riemannian_logprior.\nprefactor::AbstractArray: A 3-element array to scale the log likelihood, log prior of the latent variables, and log prior of the momentum variables. Default is an array of ones.\n\nReturns\n\nlog_p̄::AbstractVector: The first term of the log of the unbiased estimator of the marginal likelihood for each data point.\n\nNote\n\nThis is an internal function and should not be called directly. It is used as part of the riemannian_hamiltonian_elbo function.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs._log_q̄","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs._log_q̄","text":"_log_q̄(\n rhvae::RHVAE,\n rhvae_outputs::NamedTuple,\n βₒ::Number;\n momentum_logprior::Function=riemannian_logprior,\n prefactor::AbstractArray=ones(Float32, 3),\n)\n\nThis is an internal function used in riemannian_hamiltonian_elbo to compute the second term of the unbiased estimator of the marginal likelihood. The function computes the sum of the log posterior of the initial latent variables and the log prior of the initial momentum variables, minus a term that depends on the dimensionality of the latent space and the initial temperature.\n\n log q̄ = log q(zₒ) + log p(ρₒ) - d/2 log(βₒ)\n\nArguments\n\nrhvae::RHVAE: The Riemannian Hamiltonian Variational Autoencoder (RHVAE) model.\nrhvae_outputs::NamedTuple: The outputs of the RHVAE, including the initial latent variables zₒ and the initial momentum variables ρₒ.\nβₒ::Number: The initial temperature for the tempering steps.\n\nOptional Keyword Arguments\n\nmomentum_logprior::Function: The function to compute the log prior of the momentum variables. Default is riemannian_logprior.\nprefactor::AbstractArray: A 3-element array to scale the log posterior of the initial latent variables, log prior of the initial momentum variables, and the tempering Jacobian term. Default is an array of ones.\n\nReturns\n\nlog_q̄::Vector: The second term of the log of the unbiased estimator of the marginal likelihood for each data point.\n\nNote\n\nThis is an internal function and should not be called directly. It is used as part of the riemannian_hamiltonian_elbo function.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.riemannian_hamiltonian_elbo","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.riemannian_hamiltonian_elbo","text":"riemannian_hamiltonian_elbo(\n rhvae::RHVAE,\n metric_param::NamedTuple,\n x::AbstractArray;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n K::Int=3,\n βₒ::Number=0.3f0,\n steps::Int=3,\n ∇H_kwargs::Union{NamedTuple,Dict}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n position_logprior=spherical_logprior,\n momentum_logprior=riemannian_logprior,\n G_inv=G_inv,\n ),\n tempering_schedule::Function=quadratic_tempering,\n return_outputs::Bool=false,\n logp_prefactor::AbstractArray=ones(Float32, 3),\n logq_prefactor::AbstractArray=ones(Float32, 3),\n)\n\nCompute the Riemannian Hamiltonian Monte Carlo (RHMC) estimate of the evidence lower bound (ELBO) for a Riemannian Hamiltonian Variational Autoencoder (RHVAE).\n\nThis function takes as input an RHVAE, a NamedTuple of metric parameters, and a vector of input data x. It performs K RHMC steps with a leapfrog integrator and a tempering schedule to estimate the ELBO. The ELBO is computed as the difference between the log p̄ and log q̄ as\n\nelbo = mean(log p̄ - log q̄),\n\nArguments\n\nrhvae::RHVAE: The RHVAE used to encode the input data and decode the latent space.\nmetric_param::NamedTuple: The parameters used to compute the metric tensor.\nx::AbstractArray: The input data. If Array, the last dimension must contain each of the data points.\n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}: The step size for the leapfrog integrator (default is 0.01).\nK::Int: The number of RHMC steps (default is 3).\nβₒ::Number: The initial inverse temperature (default is 0.3).\nsteps::Int: The number of leapfrog steps (default is 3).\n∇H_kwargs::Union{NamedTuple,Dict}: Additional keyword arguments to be passed to the ∇hamiltonian function. Defaults to a NamedTuple with :decoder_loglikelihood set to decoder_loglikelihood, :position_logprior set to spherical_logprior, and :momentum_logprior set to riemannian_logprior.\nG_inv::Function: The function to compute the inverse of the Riemannian metric tensor. Defaults to G_inv.\ntempering_schedule::Function: The tempering schedule function used in the RHMC (default is quadratic_tempering).\nreturn_outputs::Bool: Whether to return the outputs of the RHVAE. Defaults to false. NOTE: This is necessary to avoid computing the forward pass twice when computing the loss function with regularization.\nlogp_prefactor::AbstractArray: A 3-element array to scale the log likelihood, log prior of the latent variables, and log prior of the momentum variables. Default is an array of ones.\nlogq_prefactor::AbstractArray: A 3-element array to scale the log posterior of the initial latent variables, log prior of the initial momentum variables, and the tempering Jacobian term. Default is an array of ones.\n\nReturns\n\nelbo::Number: The RHMC estimate of the ELBO. If return_outputs is true, also returns the outputs of the RHVAE.\n\n\n\n\n\nriemannian_hamiltonian_elbo(\n rhvae::RHVAE,\n x::AbstractVector;\n K::Int=3,\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n βₒ::Number=0.3f0,\n steps::Int=3,\n ∇H_kwargs::Union{NamedTuple,Dict}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n position_logprior=spherical_logprior,\n momentum_logprior=riemannian_logprior,\n G_inv=G_inv,\n ),\n tempering_schedule::Function=quadratic_tempering,\n return_outputs::Bool=false,\n logp_prefactor::AbstractArray=ones(Float32, 3),\n logq_prefactor::AbstractArray=ones(Float32, 3),\n)\n\nCompute the Riemannian Hamiltonian Monte Carlo (RHMC) estimate of the evidence lower bound (ELBO) for a Riemannian Hamiltonian Variational Autoencoder (RHVAE).\n\nThis function takes as input an RHVAE, a NamedTuple of metric parameters, and a vector of input data x. It performs K RHMC steps with a leapfrog integrator and a tempering schedule to estimate the ELBO. The ELBO is computed as the difference between the log p̄ and log q̄ as\n\nelbo = mean(log p̄ - log q̄)\n\nArguments\n\nrhvae::RHVAE: The RHVAE used to encode the input data and decode the latent space.\nx::AbstractVector: The input data.\n\nOptional Keyword Arguments\n\n∇H_kwargs::Union{NamedTuple,Dict}: Additional keyword arguments to be passed to the ∇hamiltonian function. Defaults to a NamedTuple with :decoder_loglikelihood set to decoder_loglikelihood, :position_logprior set to spherical_logprior, :momentum_logprior set to riemannian_logprior, and :G_inv set to G_inv.\nK::Int: The number of RHMC steps (default is 3).\nϵ::Union{<:Number,<:AbstractVector}: The step size for the leapfrog integrator (default is 0.001).\nβₒ::Number: The initial inverse temperature (default is 0.3).\nsteps::Int: The number of leapfrog steps (default is 3).\nG_inv::Function: The function to compute the inverse of the Riemannian metric tensor (default is G_inv).\ntempering_schedule::Function: The tempering schedule function used in the RHMC (default is quadratic_tempering).\nreturn_outputs::Bool: Whether to return the outputs of the RHVAE. Defaults to false. NOTE: This is necessary to avoid computing the forward pass twice when computing the loss function with regularization.\nlogp_prefactor::AbstractArray: A 3-element array to scale the log likelihood, log prior of the latent variables, and log prior of the momentum variables. Default is an array of ones.\nlogq_prefactor::AbstractArray: A 3-element array to scale the log posterior of the initial latent variables, log prior of the initial momentum variables, and the tempering Jacobian term. Default is an array of ones.\n\nReturns\n\nelbo::Number: The RHMC estimate of the ELBO. If return_outputs is true, also returns the outputs of the RHVAE.\n\n\n\n\n\nriemannian_hamiltonian_elbo(\n rhvae::RHVAE,\n metric_param::NamedTuple,\n x_in::AbstractArray,\n x_out::AbstractArray;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n K::Int=3,\n βₒ::Number=0.3f0,\n steps::Int=3,\n ∇H_kwargs::Union{NamedTuple,Dict}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n position_logprior=spherical_logprior,\n momentum_logprior=riemannian_logprior,\n G_inv=G_inv,\n ),\n tempering_schedule::Function=quadratic_tempering,\n return_outputs::Bool=false,\n logp_prefactor::AbstractArray=ones(Float32, 3),\n logq_prefactor::AbstractArray=ones(Float32, 3),\n)\n\nCompute the Riemannian Hamiltonian Monte Carlo (RHMC) estimate of the evidence lower bound (ELBO) for a Riemannian Hamiltonian Variational Autoencoder (RHVAE).\n\nThis function takes as input an RHVAE, a NamedTuple of metric parameters, and a vector of input data x. It performs K RHMC steps with a leapfrog integrator and a tempering schedule to estimate the ELBO. The ELBO is computed as the difference between the log p̄ and log q̄ as\n\nelbo = mean(log p̄ - log q̄),\n\nArguments\n\nrhvae::RHVAE: The RHVAE used to encode the input data and decode the latent space.\nmetric_param::NamedTuple: The parameters used to compute the metric tensor.\nx_in::AbstractArray: Input data to the RHVAE encoder. The last dimension is taken as having each of the samples in a batch.\nx_out::AbstractArray: Target data to compute the reconstruction error. The last dimension is taken as having each of the samples in a batch.\n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}: The step size for the leapfrog integrator (default is 0.01).\nK::Int: The number of RHMC steps (default is 3).\nβₒ::Number: The initial inverse temperature (default is 0.3).\nsteps::Int: The number of leapfrog steps (default is 3).\n∇H_kwargs::Union{NamedTuple,Dict}: Additional keyword arguments to be passed to the ∇hamiltonian function. Defaults to a NamedTuple with :decoder_loglikelihood set to decoder_loglikelihood, :position_logprior set to spherical_logprior, and :momentum_logprior set to riemannian_logprior.\nG_inv::Function: The function to compute the inverse of the Riemannian metric tensor. Defaults to G_inv.\ntempering_schedule::Function: The tempering schedule function used in the RHMC (default is quadratic_tempering).\nreturn_outputs::Bool: Whether to return the outputs of the RHVAE. Defaults to false. NOTE: This is necessary to avoid computing the forward pass twice when computing the loss function with regularization.\nlogp_prefactor::AbstractArray: A 3-element array to scale the log likelihood, log prior of the latent variables, and log prior of the momentum variables. Default is an array of ones.\nlogq_prefactor::AbstractArray: A 3-element array to scale the log posterior of the initial latent variables, log prior of the initial momentum variables, and the tempering Jacobian term. Default is an array of ones.\n\nReturns\n\nelbo::Number: The RHMC estimate of the ELBO. If return_outputs is true, also returns the outputs of the RHVAE.\n\n\n\n\n\nriemannian_hamiltonian_elbo(\n rhvae::RHVAE,\n x_in::AbstractArray,\n x_out::AbstractArray;\n K::Int=3,\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n βₒ::Number=0.3f0,\n steps::Int=3,\n ∇H_kwargs::Union{NamedTuple,Dict}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n position_logprior=spherical_logprior,\n momentum_logprior=riemannian_logprior,\n G_inv=G_inv,\n ),\n tempering_schedule::Function=quadratic_tempering,\n return_outputs::Bool=false,\n logp_prefactor::AbstractArray=ones(Float32, 3),\n logq_prefactor::AbstractArray=ones(Float32, 3),\n)\n\nCompute the Riemannian Hamiltonian Monte Carlo (RHMC) estimate of the evidence lower bound (ELBO) for a Riemannian Hamiltonian Variational Autoencoder (RHVAE).\n\nThis function takes as input an RHVAE, a NamedTuple of metric parameters, and a vector of input data x. It performs K RHMC steps with a leapfrog integrator and a tempering schedule to estimate the ELBO. The ELBO is computed as the difference between the log p̄ and log q̄ as\n\nelbo = mean(log p̄ - log q̄).\n\nArguments\n\nrhvae::RHVAE: The RHVAE used to encode the input data and decode the latent space.\nx_in::AbstractArray: Input data to the RHVAE encoder. The last dimension is taken as having each of the samples in a batch.\nx_out::AbstractArray: Target data to compute the reconstruction error. The last dimension is taken as having each of the samples in a batch.\n\nOptional Keyword Arguments\n\n∇H_kwargs::Union{NamedTuple,Dict}: Additional keyword arguments to be passed to the ∇hamiltonian function. Defaults to a NamedTuple with :decoder_loglikelihood set to decoder_loglikelihood, :position_logprior set to spherical_logprior, :momentum_logprior set to riemannian_logprior, and :G_inv set to G_inv.\nK::Int: The number of RHMC steps (default is 3).\nϵ::Union{<:Number,<:AbstractVector}: The step size for the leapfrog integrator (default is 0.001).\nβₒ::Number: The initial inverse temperature (default is 0.3).\nsteps::Int: The number of leapfrog steps (default is 3).\nG_inv::Function: The function to compute the inverse of the Riemannian metric tensor (default is G_inv).\ntempering_schedule::Function: The tempering schedule function used in the RHMC (default is quadratic_tempering).\nreturn_outputs::Bool: Whether to return the outputs of the RHVAE. Defaults to false. NOTE: This is necessary to avoid computing the forward pass twice when computing the loss function with regularization.\nlogp_prefactor::AbstractArray: A 3-element array to scale the log likelihood, log prior of the latent variables, and log prior of the momentum variables. Default is an array of ones.\nlogq_prefactor::AbstractArray: A 3-element array to scale the log posterior of the initial latent variables, log prior of the initial momentum variables, and the tempering Jacobian term. Default is an array of ones.\n\nReturns\n\nelbo::Number: The RHMC estimate of the ELBO. If return_outputs is true, also returns the outputs of the RHVAE.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#Default-initializations","page":"RHVAE","title":"Default initializations","text":"","category":"section"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"AutoEncoderToolkit.jl provides default initializations for both the metric tensor network and the RHVAE. Although less flexible than defining your own initial networks, these can serve as a good starting point for your experiments.","category":"page"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"AutoEncoderToolkit.RHVAEs.MetricChain(\n ::Int,\n ::Int,\n ::Vector{<:Int},\n ::Vector{<:Function},\n ::Function;\n)\nAutoEncoderToolkit.RHVAEs.RHVAE(\n ::AutoEncoderToolkit.VAEs.VAE,\n ::AutoEncoderToolkit.RHVAEs.MetricChain,\n ::AbstractArray{AbstractFloat},\n T::AbstractFloat,\n λ::AbstractFloat\n)","category":"page"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.MetricChain-Tuple{Int64, Int64, Vector{<:Int64}, Vector{<:Function}, Function}","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.MetricChain","text":"MetricChain(\n n_input::Int,\n n_latent::Int,\n metric_neurons::Vector{<:Int},\n metric_activation::Vector{<:Function},\n output_activation::Function;\n init::Function=Flux.glorot_uniform\n) -> MetricChain\n\nConstruct a MetricChain for computing the Riemannian metric tensor in the latent space.\n\nArguments\n\nn_input::Int: The number of input features.\nn_latent::Int: The dimension of the latent space.\nmetric_neurons::Vector{<:Int}: The number of neurons in each hidden layer of the MLP.\nmetric_activation::Vector{<:Function}: The activation function for each hidden layer of the MLP.\noutput_activation::Function: The activation function for the output layer.\ninit::Function: The initialization function for the weights in the layers (default is Flux.glorot_uniform).\n\nReturns\n\nMetricChain: A MetricChain object that includes the MLP, and two dense layers for computing the elements of a lower-triangular matrix used to compute the Riemannian metric tensor in latent space.\n\n\n\n\n\n","category":"method"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.RHVAE-Tuple{AutoEncoderToolkit.VAEs.VAE, AutoEncoderToolkit.RHVAEs.MetricChain, AbstractArray{AbstractFloat}, AbstractFloat, AbstractFloat}","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.RHVAE","text":"RHVAE(\n vae::VAE, \n metric_chain::MetricChain, \n centroids_data::AbstractArray, \n T::Number, \n λ::Number\n)\n\nConstruct a Riemannian Hamiltonian Variational Autoencoder (RHVAE) from a standard VAE and a metric chain.\n\nArguments\n\nvae::VAE: A standard Variational Autoencoder (VAE) model.\nmetric_chain::MetricChain: A chain of metrics to be used for the Riemannian Hamiltonian Monte Carlo (RHMC) sampler.\ncentroids_data::AbstractArray: An array of data centroids. Each column represents a centroid. N is a subtype of Number.\nT::N: The temperature parameter for the inverse metric tensor. N is a subtype of Number.\nλ::N: The regularization parameter for the inverse metric tensor. N is a subtype of Number.\n\nReturns\n\nA new RHVAE object.\n\nDescription\n\nThe constructor initializes the latent centroids and the metric tensor M to their default values. The latent centroids are initialized to a zero matrix of the same size as centroids_data, and M is initialized to a 3D array of identity matrices, one for each centroid.\n\n\n\n\n\n","category":"method"},{"location":"hvae/#HVAEsmodule","page":"HVAE","title":"Hamiltonian Variational Autoencoder","text":"","category":"section"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"The Hamiltonian Variational Autoencoder (HVAE) is a variant of the Variational autoencoder (VAE) that uses Hamiltonian dynamics to improve the sampling of the latent space representation. HVAE combines ideas from Hamiltonian Monte Carlo, annealed importance sampling, and variational inference to improve the latent space representation of the VAE.","category":"page"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"For the implementation of the HVAE in AutoEncoderToolkit.jl, the HVAE struct inherits directly from the VAE struct and adds the necessary functions to compute the Hamiltonian dynamics steps as part of the training protocol. An HVAE object is created by simply passing a VAE object to the constructor. This way, we can use Julias multiple dispatch to extend the functionality of the VAE object without having to redefine the entire structure.","category":"page"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"warning: Warning\nHVAEs require the computation of nested gradients. This means that the AutoDiff framework must differentiate a function of an already AutoDiff differentiated function. This is known to be problematic for Julia's AutoDiff backends. See details below to understand how to we circumvent this problem.","category":"page"},{"location":"hvae/#Reference","page":"HVAE","title":"Reference","text":"","category":"section"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"Caterini, A. L., Doucet, A. & Sejdinovic, D. Hamiltonian Variational Auto-Encoder. 11 (2018).","category":"page"},{"location":"hvae/#HVAEstruct","page":"HVAE","title":"HVAE struct","text":"","category":"section"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"AutoEncoderToolkit.HVAEs.HVAE","category":"page"},{"location":"hvae/#AutoEncoderToolkit.HVAEs.HVAE","page":"HVAE","title":"AutoEncoderToolkit.HVAEs.HVAE","text":"struct HVAE{\n V<:VAE{<:AbstractVariationalEncoder,<:AbstractVariationalDecoder}\n} <: AbstractVariationalAutoEncoder\n\nHamiltonian Variational Autoencoder (HVAE) model defined for Flux.jl.\n\nFields\n\nvae::V: A Variational Autoencoder (VAE) model that forms the basis of the HVAE. V is a subtype of VAE with a specific AbstractVariationalEncoder and AbstractVariationalDecoder.\n\nAn HVAE is a type of Variational Autoencoder (VAE) that uses Hamiltonian Monte Carlo (HMC) to sample from the posterior distribution in the latent space. The VAE's encoder compresses the input into a low-dimensional probabilistic representation q(z|x). The VAE's decoder tries to reconstruct the original input from a sampled point in the latent space p(x|z). \n\nThe HMC sampling in the latent space allows the HVAE to better capture complex posterior distributions compared to a standard VAE, which assumes a simple Gaussian posterior. This can lead to more accurate reconstructions and better disentanglement of latent variables.\n\n\n\n\n\n","category":"type"},{"location":"hvae/#Forward-pass","page":"HVAE","title":"Forward pass","text":"","category":"section"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"AutoEncoderToolkit.HVAEs.HVAE(::AbstractArray)","category":"page"},{"location":"hvae/#AutoEncoderToolkit.HVAEs.HVAE-Tuple{AbstractArray}","page":"HVAE","title":"AutoEncoderToolkit.HVAEs.HVAE","text":"(hvae::HVAE{VAE{E,D}})(\n x::AbstractArray;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n K::Int=3,\n βₒ::Number=0.3f0,\n ∇U_kwargs::Union{Dict,NamedTuple}=(\n reconstruction_loglikelihood=reconstruction_loglikelihood,\n latent_logprior=spherical_logprior,\n ),\n tempering_schedule::Function=quadratic_tempering,\n latent::Bool=false,\n) where {E<:AbstractGaussianLogEncoder,D<:AbstractVariationalDecoder}\n\nRun the Hamiltonian Variational Autoencoder (HVAE) on the given input.\n\nArguments\n\nx::AbstractArray: The input to the HVAE. If Vector, it represents a single data point. If Array, the last dimension must contain each of the data points.\n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}=0.0001: The step size for the leapfrog steps in the HMC part of the HVAE. If it is a scalar, the same step size is used for all dimensions. If it is an array, each element corresponds to the step size for a specific dimension.\nK::Int=3: The number of leapfrog steps to perform in the Hamiltonian Monte Carlo (HMC) part of the HVAE.\nβₒ::Number=0.3f0: The initial inverse temperature for the tempering schedule.\n∇U_kwargs::Union{Dict,NamedTuple}: Additional keyword arguments to be passed to the ∇potential_energy function. Default is a NamedTuple with reconstruction_loglikelihood and latent_logprior.\ntempering_schedule::Function=quadratic_tempering: The function to compute the tempering schedule in the HVAE.\nlatent::Bool=false: If true, the function returns a NamedTuple containing the outputs of the encoder and decoder, and the final state of the phase space after the leapfrog and tempering steps. If false, the function only returns the output of the decoder.\n\nReturns\n\nIf latent=true, the function returns a NamedTuple with the following fields:\n\nencoder: The outputs of the encoder.\ndecoder: The output of the decoder.\nphase_space: The final state of the phase space after the leapfrog and tempering steps.\n\nIf latent=false, the function only returns the output of the decoder.\n\nDescription\n\nThis function runs the HVAE on the given input. It first passes the input through the encoder to obtain the mean and log standard deviation of the latent space. It then uses the reparameterization trick to sample from the latent space. After that, it performs the leapfrog and tempering steps to refine the sample from the latent space. Finally, it passes the refined sample through the decoder to obtain the output.\n\nNotes\n\nEnsure that the dimensions of x match the input dimensions of the HVAE, and that the dimensions of ϵ match the dimensions of the latent space.\n\n\n\n\n\n","category":"method"},{"location":"hvae/#Loss-function","page":"HVAE","title":"Loss function","text":"","category":"section"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"AutoEncoderToolkit.HVAEs.loss","category":"page"},{"location":"hvae/#AutoEncoderToolkit.HVAEs.loss","page":"HVAE","title":"AutoEncoderToolkit.HVAEs.loss","text":"loss(\n hvae::HVAE,\n x::AbstractArray;\n K::Int=3,\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n βₒ::Number=0.3f0,\n ∇U_kwargs::Union{Dict,NamedTuple}=(\n reconstruction_loglikelihood=reconstruction_loglikelihood,\n latent_logprior=spherical_logprior,\n ),\n tempering_schedule::Function=quadratic_tempering,\n reg_function::Union{Function,Nothing}=nothing,\n reg_kwargs::Union{NamedTuple,Dict}=Dict(),\n reg_strength::Float32=1.0f0,\n logp_prefactor::AbstractArray=ones(Float32, 3),\n logq_prefactor::AbstractArray=ones(Float32, 3),\n)\n\nCompute the loss for a Hamiltonian Variational Autoencoder (HVAE).\n\nArguments\n\nhvae::HVAE: The HVAE used to encode the input data and decode the latent space.\nx::AbstractArray: Input data to the HVAE encoder. The last dimension is taken as having each of the samples in a batch.\n\nOptional Keyword Arguments\n\nK::Int: The number of HMC steps (default is 3).\nϵ::Union{<:Number,<:AbstractVector}: The step size for the leapfrog integrator (default is 0.001).\nβₒ::Number: The initial inverse temperature (default is 0.3).\n∇U_kwargs::Union{Dict,NamedTuple}: Additional keyword arguments to be passed to the ∇potential_energy function.\ntempering_schedule::Function: The tempering schedule function used in the HMC (default is quadratic_tempering).\nreg_function::Union{Function, Nothing}=nothing: A function that computes the regularization term based on the VAE outputs. This function must take as input the VAE outputs and the keyword arguments provided in reg_kwargs.\nreg_kwargs::Union{NamedTuple,Dict}=Dict(): Keyword arguments to pass to the regularization function.\nreg_strength::Float32=1.0f0: The strength of the regularization term.\nlogp_prefactor::AbstractArray: A 3-element array to scale the log likelihood, log prior of the latent variables, and log prior of the momentum variables. Default is an array of ones.\nlogq_prefactor::AbstractArray: A 3-element array to scale the log posterior of the initial latent variables, log prior of the initial momentum variables, and the tempering Jacobian term. Default is an array of ones.\n\nReturns\n\nThe computed loss.\n\n\n\n\n\nloss(\n hvae::HVAE,\n x_in::AbstractArray,\n x_out::AbstractArray;\n K::Int=3,\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n βₒ::Number=0.3f0,\n ∇U_kwargs::Union{Dict,NamedTuple}=(\n reconstruction_loglikelihood=reconstruction_loglikelihood,\n latent_logprior=spherical_logprior,\n ),\n tempering_schedule::Function=quadratic_tempering,\n reg_function::Union{Function,Nothing}=nothing,\n reg_kwargs::Union{NamedTuple,Dict}=Dict(),\n reg_strength::Float32=1.0f0,\n logp_prefactor::AbstractArray=ones(Float32, 3),\n logq_prefactor::AbstractArray=ones(Float32, 3),\n)\n\nCompute the loss for a Hamiltonian Variational Autoencoder (HVAE).\n\nArguments\n\nhvae::HVAE: The HVAE used to encode the input data and decode the latent space.\nx_in::AbstractArray: Input data to the HVAE encoder. The last dimension is taken as having each of the samples in a batch.\nx_out::AbstractArray: The data against which the reconstruction is compared. If Array, the last dimension must contain each of the data points.\n\nOptional Keyword Arguments\n\nK::Int: The number of HMC steps (default is 3).\nϵ::Union{<:Number,<:AbstractVector}: The step size for the leapfrog integrator (default is 0.001).\nβₒ::Number: The initial inverse temperature (default is 0.3).\n∇U_kwargs::Union{Dict,NamedTuple}: Additional keyword arguments to be passed to the ∇potential_energy function.\ntempering_schedule::Function: The tempering schedule function used in the HMC (default is quadratic_tempering).\nreg_function::Union{Function, Nothing}=nothing: A function that computes the regularization term based on the VAE outputs. This function must take as input the VAE outputs and the keyword arguments provided in reg_kwargs.\nreg_kwargs::Union{NamedTuple,Dict}=Dict(): Keyword arguments to pass to the regularization function.\nreg_strength::Float32=1.0f0: The strength of the regularization term.\nlogp_prefactor::AbstractArray: A 3-element array to scale the log likelihood, log prior of the latent variables, and log prior of the momentum variables. Default is an array of ones.\nlogq_prefactor::AbstractArray: A 3-element array to scale the log posterior of the initial latent variables, log prior of the initial momentum variables, and the tempering Jacobian term. Default is an array of ones.\n\nReturns\n\nThe computed loss.\n\n\n\n\n\n","category":"function"},{"location":"hvae/#Training","page":"HVAE","title":"Training","text":"","category":"section"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"AutoEncoderToolkit.HVAEs.train!","category":"page"},{"location":"hvae/#AutoEncoderToolkit.HVAEs.train!","page":"HVAE","title":"AutoEncoderToolkit.HVAEs.train!","text":"train!(\n hvae::HVAE, \n x::AbstractArray, \n opt::NamedTuple; \n loss_function::Function=loss, \n loss_kwargs::Union{NamedTuple,Dict}=Dict(),\n verbose::Bool=false,\n loss_return::Bool=false,\n)\n\nCustomized training function to update parameters of a Hamiltonian Variational Autoencoder given a specified loss function.\n\nArguments\n\nhvae::HVAE: A struct containing the elements of a Hamiltonian Variational Autoencoder.\nx::AbstractArray: Input data to the HVAE encoder. The last dimension is taken as having each of the samples in a batch.\nopt::NamedTuple: State of the optimizer for updating parameters. Typically initialized using Flux.Optimisers.update!.\n\nOptional Keyword Arguments\n\nloss_function::Function=loss: The loss function used for training. It should accept the HVAE model, data x, and keyword arguments in that order.\nloss_kwargs::Dict=Dict(): Arguments for the loss function. These might include parameters like K, ϵ, βₒ, steps, ∇H, ∇H_kwargs, tempering_schedule, reg_function, reg_kwargs, reg_strength, depending on the specific loss function in use.\nverbose::Bool=false: Whether to print the loss at each iteration.\nloss_return::Bool=false: Whether to return the loss at each iteration.\n\nDescription\n\nTrains the HVAE by:\n\nComputing the gradient of the loss w.r.t the HVAE parameters.\nUpdating the HVAE parameters using the optimizer.\nUpdating the metric parameters.\n\n\n\n\n\ntrain!(\n hvae::HVAE, \n x_in::AbstractArray,\n x_out::AbstractArray,\n opt::NamedTuple; \n loss_function::Function=loss, \n loss_kwargs::Union{NamedTuple,Dict}=Dict(),\n verbose::Bool=false,\n loss_return::Bool=false,\n)\n\nCustomized training function to update parameters of a Hamiltonian Variational Autoencoder given a specified loss function.\n\nArguments\n\nhvae::HVAE: A struct containing the elements of a Hamiltonian Variational Autoencoder.\nx_in::AbstractArray: Input data to the HVAE encoder. The last dimension is taken as having each of the samples in a batch.\nx_out::AbstractArray: Target data to compute the reconstruction error. The last dimension is taken as having each of the samples in a batch.\nopt::NamedTuple: State of the optimizer for updating parameters. Typically initialized using Flux.Optimisers.update!.\n\nOptional Keyword Arguments\n\nloss_function::Function=loss: The loss function used for training. It should accept the HVAE model, data x, and keyword arguments in that order.\nloss_kwargs::Dict=Dict(): Arguments for the loss function. These might include parameters like K, ϵ, βₒ, steps, ∇H, ∇H_kwargs, tempering_schedule, reg_function, reg_kwargs, reg_strength, depending on the specific loss function in use.\nverbose::Bool=false: Whether to print the loss at each iteration.\nloss_return::Bool=false: Whether to return the loss at each iteration.\n\nDescription\n\nTrains the HVAE by:\n\nComputing the gradient of the loss w.r.t the HVAE parameters.\nUpdating the HVAE parameters using the optimizer.\nUpdating the metric parameters.\n\n\n\n\n\n","category":"function"},{"location":"hvae/#gradpotenergy","page":"HVAE","title":"Computing the gradient of the potential energy","text":"","category":"section"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"One of the crucial components in the training of the HVAE is the computation of the gradient of the potential energy nabla U with respect to the latent space representation. This gradient is used in the leapfrog steps of the Hamiltonian dynamics. When training the HVAE, we need to backpropagate through the leapfrog steps to update the parameters of the neural network. This requires computing a gradient of a function of the gradient of the potential energy, i.e., nested gradients. Zygote.jl the main AutoDiff backend in Flux.jl famously struggle with these types of computations. Specifically, Zygote.jl does not support Zygote over Zygote differentiation (meaning differentiating a function of something previously differentiated with Zygote using Zygote), or Zygote over ForwardDiff (meaning differentiating a function of something differentiated with ForwardDiff using Zygote).","category":"page"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"With this, we are left with a couple of options to compute the gradient of the potential energy:","category":"page"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"Use finite differences to approximate the gradient of the potential energy.\nUse the relatively new TaylorDiff.jl AutoDiff backend to compute the gradient of the potential energy. This backend is composable with Zygote.jl, so we can, in principle, do Zygote over TaylorDiff differentiation.","category":"page"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"The second option would be preferred, as the gradients computed with TaylorDiff are much more accurate than the ones computed with finite differences. However, there are two problems with this approach:","category":"page"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"The TaylorDiff nested gradient capability stopped working with Julia ≥ 1.10, as discussed in #70.\nEven for Julia < 1.10, we could not get TaylorDiff to work on CUDA devices. (PRs are welcome!)","category":"page"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"With these limitations in mind, we have implemented the gradient of the potential using both finite differences and TaylorDiff. The user can choose which method to use by setting the adtype keyword argument in the ∇U_kwargs in the loss function to either :finite or :TaylorDiff. This means that for the train! function, the user can pass loss_kwargs that looks like this:","category":"page"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"# Define the autodiff backend to use\nloss_kwargs = Dict(\n :∇U_kwargs => Dict(\n :adtype => :finite\n )\n)","category":"page"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"note: Note\nAlthough verbose, the nested dictionaries help to keep everything organized. (PRs with better design ideas are welcome!)","category":"page"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"The default both for cpu and gpu devices is :finite.","category":"page"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"AutoEncoderToolkit.HVAEs.∇potential_energy_finite\nAutoEncoderToolkit.HVAEs.∇potential_energy_TaylorDiff","category":"page"},{"location":"hvae/#AutoEncoderToolkit.HVAEs.∇potential_energy_finite","page":"HVAE","title":"AutoEncoderToolkit.HVAEs.∇potential_energy_finite","text":"∇potential_energy_finite(\n x::AbstractArray,\n z::AbstractVecOrMat,\n decoder::AbstractVariationalDecoder,\n decoder_output::NamedTuple;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n latent_logprior::Function=spherical_logprior,\n fdtype::Symbol=:central\n)\n\nCompute the gradient of the potential energy of a Hamiltonian Variational Autoencoder (HVAE) with respect to the latent variables z using finite difference method. This function returns the gradient of the potential energy computed for given data x and latent variable z.\n\nArguments\n\nx::AbstractArray: An array representing the input data. The last dimension corresponds to different data points.\nz::AbstractVecOrMat: A latent variable encoding of the input data. If a matrix, each column corresponds to a different data point.\ndecoder::AbstractVariationalDecoder: A decoder that maps the latent variables to the data space.\ndecoder_output::NamedTuple: The output of the decoder.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function=decoder_loglikelihood: A function representing the log-likelihood function used by the decoder. The function must take as first input an AbstractVariationalDecoder struct, as second input an array x representing the data, and as third input a vector or matrix z representing the latent variable. Default is decoder_loglikelihood.\nlatent_logprior::Function=spherical_logprior: A function representing the log-prior distribution used in the autoencoder. The function must take as single input a vector or matrix z representing the latent variable. Default is spherical_logprior. \nfdtype::Symbol=:central: A symbol representing the type of finite difference method to use. Default is :central, but it can also be :forward.\n\nReturns\n\ngradient: The computed gradient of the potential energy for the given input x and latent variable z.\n\n\n\n\n\n∇potential_energy_finite(\n x::AbstractArray,\n z::AbstractVecOrMat,\n hvae::HVAE;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n latent_logprior::Function=spherical_logprior,\n fdtype::Symbol=:central\n)\n\nCompute the gradient of the potential energy of a Hamiltonian Variational Autoencoder (HVAE) with respect to the latent variables z using finite difference method. This function returns the gradient of the potential energy computed for given data x and latent variable z.\n\nArguments\n\nx::AbstractArray: An array representing the input data. The last dimension corresponds to different data points.\nz::AbstractVecOrMat: A latent variable encoding of the input data. If a matrix, each column corresponds to a different data point.\nhvae::HVAE: An HVAE model that contains a decoder which maps the latent variables to the data space.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function=decoder_loglikelihood: A function representing the log-likelihood function used by the decoder. The function must take as first input an array x representing the data, as second input a vector or matrix z representing the latent variable, and as third input a decoder. Default is decoder_loglikelihood.\nlatent_logprior::Function=spherical_logprior: A function representing the log-prior distribution used in the autoencoder. The function must take as single input a vector or matrix z representing the latent variable. Default is spherical_logprior. \nfdtype::Symbol=:central: A symbol representing the type of finite difference method to use. Default is :central, but it can also be :forward.\n\nReturns\n\ngradient: The computed gradient of the potential energy for the given input x and latent variable z.\n\n\n\n\n\n","category":"function"},{"location":"hvae/#AutoEncoderToolkit.HVAEs.∇potential_energy_TaylorDiff","page":"HVAE","title":"AutoEncoderToolkit.HVAEs.∇potential_energy_TaylorDiff","text":"∇potential_energy_TaylorDiff(\n x::AbstractArray,\n z::AbstractVecOrMat,\n hvae::HVAE;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n latent_logprior::Function=spherical_logprior,\n)\n\nCompute the gradient of the potential energy of a Hamiltonian Variational Autoencoder (HVAE) with respect to the latent variables z using Taylor series differentiation. This function returns the gradient of the potential energy computed for given data x and latent variable z.\n\nArguments\n\nx::AbstractArray: An array representing the input data. The last dimension corresponds to different data points.\nz::AbstractVecOrMat: A latent variable encoding of the input data. If a matrix, each column corresponds to a different data point.\nhvae::HVAE: An HVAE model that contains a decoder which maps the latent variables to the data space.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function=decoder_loglikelihood: A function representing the log-likelihood function used by the decoder. The function must take as first input an array x representing the data, as second input a vector or matrix z representing the latent variable, and as third input a decoder. Default is decoder_loglikelihood.\nlatent_logprior::Function=spherical_logprior: A function representing the log-prior distribution used in the autoencoder. The function must take as single input a vector or matrix z representing the latent variable. Default is spherical_logprior. \n\nReturns\n\ngradient: The computed gradient of the potential energy for the given input x and latent variable z.\n\n\n\n\n\n∇potential_energy_TaylorDiff(\n x::AbstractArray,\n z::AbstractVecOrMat,\n hvae::HVAE;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n latent_logprior::Function=spherical_logprior,\n)\n\nCompute the gradient of the potential energy of a Hamiltonian Variational Autoencoder (HVAE) with respect to the latent variables z using Taylor series differentiation. This function returns the gradient of the potential energy computed for given data x and latent variable z.\n\nArguments\n\nx::AbstractArray: An array representing the input data. The last dimension corresponds to different data points.\nz::AbstractVecOrMat: A latent variable encoding of the input data. If a matrix, each column corresponds to a different data point.\nhvae::HVAE: An HVAE model that contains a decoder which maps the latent variables to the data space.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function=decoder_loglikelihood: A function representing the log-likelihood function used by the decoder. The function must take as first input an array x representing the data, as second input a vector or matrix z representing the latent variable, and as third input a decoder. Default is decoder_loglikelihood.\nlatent_logprior::Function=spherical_logprior: A function representing the log-prior distribution used in the autoencoder. The function must take as single input a vector or matrix z representing the latent variable. Default is spherical_logprior. \n\nReturns\n\ngradient: The computed gradient of the potential energy for the given input x and latent variable z.\n\n\n\n\n\n","category":"function"},{"location":"hvae/#Other-Functions","page":"HVAE","title":"Other Functions","text":"","category":"section"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"AutoEncoderToolkit.HVAEs.potential_energy\nAutoEncoderToolkit.HVAEs.∇potential_energy\nAutoEncoderToolkit.HVAEs.leapfrog_step\nAutoEncoderToolkit.HVAEs.quadratic_tempering\nAutoEncoderToolkit.HVAEs.null_tempering\nAutoEncoderToolkit.HVAEs.leapfrog_tempering_step\nAutoEncoderToolkit.HVAEs._log_p̄\nAutoEncoderToolkit.HVAEs._log_q̄\nAutoEncoderToolkit.HVAEs.hamiltonian_elbo","category":"page"},{"location":"hvae/#AutoEncoderToolkit.HVAEs.potential_energy","page":"HVAE","title":"AutoEncoderToolkit.HVAEs.potential_energy","text":"potential_energy(\n x::AbstractVector,\n z::AbstractVector,\n decoder::AbstractVariationalDecoder,\n decoder_output::NamedTuple;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n latent_logprior::Function=spherical_logprior\n)\n\nCompute the potential energy of a Hamiltonian Variational Autoencoder (HVAE). In the context of Hamiltonian Monte Carlo (HMC), the potential energy is defined as the negative log-posterior. This function computes the potential energy for given data x and latent variable z. It does this by computing the log-likelihood of x under the distribution defined by reconstruction_loglikelihood(x, z, decoder, decoder_output), and the log-prior of z under the latent_logprior distribution. The potential energy is then computed as:\n\n U(x, z) = -log p(x | z) - log p(z)\n\nArguments\n\nx::AbstractArray: An array representing the input data. The last dimension corresponds to different data points.\nz::AbstractVecOrMat: A latent variable encoding of the input data. If a matrix, each column corresponds to a different data point.\ndecoder::AbstractVariationalDecoder: A decoder that maps the latent variables to the data space.\ndecoder_output::NamedTuple: The output of the decoder.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function=decoder_loglikelihood: A function representing the log-likelihood function used by the decoder. The function must take as first input a vector x representing the data, as second input a vector z representing the latent variable, as third input a decoder, and as fourth input a NamedTuple representing the decoder output. Default is decoder_loglikelihood.\nlatent_logprior::Function=spherical_logprior: A function representing the log-prior distribution used in the autoencoder. The function must take as single input a vector z representing the latent variable. Default is spherical_logprior. \n\nReturns\n\nenergy: The computed potential energy for the given input x and latent variable z.\n\n\n\n\n\npotential_energy(\n x::AbstractArray,\n z::AbstractVecOrMat,\n hvae::HVAE;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n latent_logprior::Function=spherical_logprior\n)\n\nCompute the potential energy of a Hamiltonian Variational Autoencoder (HVAE). In the context of Hamiltonian Monte Carlo (HMC), the potential energy is defined as the negative log-posterior. This function computes the potential energy for given data x and latent variable z. It does this by computing the log-likelihood of x under the distribution defined by reconstruction_loglikelihood(x, z, hvae.vae.decoder, decoder_output), and the log-prior of z under the latent_logprior distribution. The potential energy is then computed as:\n\n U(x, z) = -log p(x | z) - log p(z)\n\nArguments\n\nx::AbstractArray: An array representing the input data. The last dimension corresponds to different data points.\nz::AbstractVecOrMat: A latent variable encoding of the input data. If a matrix, each column corresponds to a different data point.\nhvae::HVAE: A Hamiltonian Variational Autoencoder that contains the decoder.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function=decoder_loglikelihood: A function representing the log-likelihood function used by the decoder. The function must take as first input an array x representing the data, as second input a vector or matrix z representing the latent variable, as third input a decoder, and as fourth input a NamedTuple representing the decoder output. Default is decoder_loglikelihood.\nlatent_logprior::Function=spherical_logprior: A function representing the log-prior distribution used in the autoencoder. The function must take as single input a vector or matrix z representing the latent variable. Default is spherical_logprior. \n\nReturns\n\nenergy: The computed potential energy for the given input x and latent variable z.\n\n\n\n\n\n","category":"function"},{"location":"hvae/#AutoEncoderToolkit.HVAEs.∇potential_energy","page":"HVAE","title":"AutoEncoderToolkit.HVAEs.∇potential_energy","text":"∇potential_energy(\n x::AbstractArray,\n z::AbstractVecOrMat,\n decoder::AbstractVariationalDecoder,\n decoder_output::NamedTuple;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n latent_logprior::Function=spherical_logprior,\n adtype::Union{Symbol,Nothing}=nothing,\n adkwargs::Union{NamedTuple,Dict}=Dict(),\n)\n\nCompute the gradient of the potential energy of a Hamiltonian Variational Autoencoder (HVAE) with respect to the latent variables z using the specified automatic differentiation method. This function returns the gradient of the potential energy computed for given data x and latent variable z.\n\nArguments\n\nx::AbstractArray: An array representing the input data. The last dimension corresponds to different data points.\nz::AbstractVecOrMat: A latent variable encoding of the input data. If a matrix, each column corresponds to a different data point.\ndecoder::AbstractVariationalDecoder: A decoder that maps the latent variables to the data space.\ndecoder_output::NamedTuple: The output of the decoder.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function=decoder_loglikelihood: A function representing the log-likelihood function used by the decoder. The function must take as first input an AbstractVariationalDecoder struct, as second input an array x representing the data, and as third input a vector or matrix z representing the latent variable. Default is decoder_loglikelihood.\nlatent_logprior::Function=spherical_logprior: A function representing the log-prior distribution used in the autoencoder. The function must take as single input a vector or matrix z representing the latent variable. Default is spherical_logprior. \nadtype::Symbol=:finite: The type of automatic differentiation method to use. Must be:finiteor:TaylorDiff. Default is:finite`.\nadkwargs::Union{NamedTuple,Dict}=Dict(): Additional keyword arguments to pass to the automatic differentiation method.\n\nReturns\n\ngradient: The computed gradient of the potential energy for the given input x and latent variable z.\n\n\n\n\n\n∇potential_energy(\n x::AbstractArray,\n z::AbstractVecOrMat,\n hvae::HVAE;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n latent_logprior::Function=spherical_logprior,\n adtype::Union{Symbol,Nothing}=nothing,\n adkwargs::Union{NamedTuple,Dict}=Dict(),\n)\n\nCompute the gradient of the potential energy of a Hamiltonian Variational Autoencoder (HVAE) with respect to the latent variables z using the specified automatic differentiation method. This function returns the gradient of the potential energy computed for given data x and latent variable z.\n\nArguments\n\nx::AbstractArray: An array representing the input data. The last dimension corresponds to different data points.\nz::AbstractVecOrMat: A latent variable encoding of the input data. If a matrix, each column corresponds to a different data point.\nhvae::HVAE: An HVAE model that contains a decoder which maps the latent variables to the data space.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function=decoder_loglikelihood: A function representing the log-likelihood function used by the decoder. The function must take as first input an array x representing the data, as second input a vector or matrix z representing the latent variable, and as third input a decoder. Default is decoder_loglikelihood.\nlatent_logprior::Function=spherical_logprior: A function representing the log-prior distribution used in the autoencoder. The function must take as single input a vector or matrix z representing the latent variable. Default is spherical_logprior. \nadtype::Symbol=:finite`: The type of automatic differentiation method to\nuse. Must be :finite or :TaylorDiff. Default is :finite.\nadkwargs::Union{NamedTuple,Dict}=Dict(): Additional keyword arguments to pass to the automatic differentiation method.\n\nReturns\n\ngradient: The computed gradient of the potential energy for the given input x and latent variable z.\n\n\n\n\n\n","category":"function"},{"location":"hvae/#AutoEncoderToolkit.HVAEs.leapfrog_step","page":"HVAE","title":"AutoEncoderToolkit.HVAEs.leapfrog_step","text":"leapfrog_step(\n x::AbstractArray,\n z::AbstractVecOrMat,\n ρ::AbstractVecOrMat,\n decoder::AbstractVariationalDecoder,\n decoder_output::NamedTuple;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n ∇U_kwargs::Union{Dict,NamedTuple}=(\n reconstruction_loglikelihood=reconstruction_loglikelihood,\n latent_logprior=spherical_logprior,\n )\n)\n\nPerform a full step of the leapfrog integrator for Hamiltonian dynamics.\n\nThe leapfrog integrator is a numerical integration scheme used to simulate Hamiltonian dynamics. It consists of three steps:\n\nHalf update of the momentum variable: \n ρ(t + ϵ/2) = ρ(t) - 0.5 * ϵ * ∇z_U(z(t), ρ(t + ϵ/2)).\nFull update of the position variable: \n z(t + ϵ) = z(t) + ϵ * ρ(t + ϵ/2).\nHalf update of the momentum variable: \n ρ(t + ϵ) = ρ(t + ϵ/2) - 0.5 * ϵ * ∇z_U(z(t + ϵ), ρ(t + ϵ/2)).\n\nThis function performs these three steps in sequence.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractVecOrMat: The point in the latent space. If matrix, each column represents a point in the latent space.\nρ::AbstractVecOrMat: The momentum. If matrix, each column represents a momentum vector.\ndecoder::AbstractVariationalDecoder: The decoder instance.\ndecoder_output::NamedTuple: The output of the decoder.\n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4): The step size. Default is 0.0001.\n∇U_kwargs::Union{Dict,NamedTuple}: The keyword arguments for ∇potential_energy. Default is a tuple with reconstruction_loglikelihood and latent_logprior.\n\nReturns\n\nA tuple (z̄, ρ̄, decoder_output_z̄) representing the updated position and momentum after performing the full leapfrog step as well as the decoder output of the updated position.\n\n\n\n\n\nleapfrog_step(\n x::AbstractArray,\n z::AbstractVecOrMat,\n ρ::AbstractVecOrMat,\n hvae::HVAE;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n ∇U_kwargs::Union{Dict,NamedTuple}=(\n reconstruction_loglikelihood=reconstruction_loglikelihood,\n latent_logprior=spherical_logprior,\n )\n)\n\nPerform a full step of the leapfrog integrator for Hamiltonian dynamics.\n\nThe leapfrog integrator is a numerical integration scheme used to simulate Hamiltonian dynamics. It consists of three steps:\n\nHalf update of the momentum variable: \n ρ(t + ϵ/2) = ρ(t) - 0.5 * ϵ * ∇z_U(z(t), ρ(t + ϵ/2)).\nFull update of the position variable: \n z(t + ϵ) = z(t) + ϵ * ρ(t + ϵ/2).\nHalf update of the momentum variable: \n ρ(t + ϵ) = ρ(t + ϵ/2) - 0.5 * ϵ * ∇z_U(z(t + ϵ), ρ(t + ϵ/2)).\n\nThis function performs these three steps in sequence.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractVecOrMat: The point in the latent space. If matrix, each column represents a point in the latent space.\nρ::AbstractVecOrMat: The momentum. If matrix, each column represents a momentum vector.\nhvae::HVAE: An HVAE model that contains the decoder.\n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4): The step size. Default is 0.0001.\n∇U_kwargs::Union{Dict,NamedTuple}: The keyword arguments for ∇potential_energy. Default is a tuple with reconstruction_loglikelihood and latent_logprior.\n\nReturns\n\nA tuple (z̄, ρ̄, decoder_output_z̄) representing the updated position and momentum after performing the full leapfrog step as well as the decoder output of the updated position.\n\n\n\n\n\n","category":"function"},{"location":"hvae/#AutoEncoderToolkit.HVAEs.quadratic_tempering","page":"HVAE","title":"AutoEncoderToolkit.HVAEs.quadratic_tempering","text":"quadratic_tempering(βₒ::AbstractFloat, k::Int, K::Int)\n\nCompute the inverse temperature βₖ at a given stage k of a tempering schedule with K total stages, using a quadratic tempering scheme. \n\nTempering is a technique used in sampling algorithms to improve mixing and convergence. It involves running parallel chains of the algorithm at different \"temperatures\", and swapping states between the chains. The \"temperature\" of a chain is controlled by an inverse temperature parameter β, which is varied according to a tempering schedule. \n\nIn a quadratic tempering schedule, the inverse temperature βₖ at stage k is computed as the square of the quantity ((1 - 1 / √(βₒ)) * (k / K)^2 + 1 / √(βₒ)), where βₒ is the initial inverse temperature. This schedule starts at βₒ when k = 0, and increases quadratically as k increases, reaching 1 when k = K.\n\nArguments\n\nβₒ::AbstractFloat: The initial inverse temperature.\nk::Int: The current stage of the tempering schedule.\nK::Int: The total number of stages in the tempering schedule.\n\nReturns\n\nβₖ::AbstractFloat: The inverse temperature at stage k.\n\n\n\n\n\n","category":"function"},{"location":"hvae/#AutoEncoderToolkit.HVAEs.null_tempering","page":"HVAE","title":"AutoEncoderToolkit.HVAEs.null_tempering","text":" null_tempering(βₒ::T, k::Int, K::Int) where {T<:AbstractFloat}\n\nReturn the initial inverse temperature βₒ. This function is used in the context of tempered Hamiltonian Monte Carlo (HMC) methods, where tempering involves running HMC at different \"temperatures\" to improve mixing and convergence. \n\nIn this case, null_tempering is a simple tempering schedule that does not actually change the temperature—it always returns the initial inverse temperature βₒ. This can be useful as a default or placeholder tempering schedule.\n\nArguments\n\nβₒ::AbstractFloat: The initial inverse temperature. \nk::Int: The current step in the tempering schedule. Not used in this function, but included for compatibility with other tempering schedules.\nK::Int: The total number of steps in the tempering schedule. Not used in this function, but included for compatibility with other tempering schedules.\n\nReturns\n\nβ::T: The inverse temperature for the current step, which is always βₒ in this case.\n\nExample\n\nβₒ = 0.5\nk = 1\nK = 10\nβ = null_tempering(βₒ, k, K) # β will be 0.5\n\n\n\n\n\n","category":"function"},{"location":"hvae/#AutoEncoderToolkit.HVAEs.leapfrog_tempering_step","page":"HVAE","title":"AutoEncoderToolkit.HVAEs.leapfrog_tempering_step","text":"leapfrog_tempering_step(\n x::AbstractArray,\n zₒ::AbstractVecOrMat,\n decoder::AbstractVariationalDecoder,\n decoder_output::NamedTuple;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n K::Int=3,\n βₒ::Number=0.3f0,\n ∇U_kwargs::Union{Dict,NamedTuple}=(\n reconstruction_loglikelihood=reconstruction_loglikelihood,\n latent_logprior=spherical_logprior,\n ),\n tempering_schedule::Function=quadratic_tempering,\n)\n\nCombines the leapfrog and tempering steps into a single function for the Hamiltonian Variational Autoencoder (HVAE).\n\nArguments\n\nx::AbstractArray: The data to be processed. If Array, the last dimension must be of size 1.\nzₒ::AbstractVecOrMat: The initial latent variable. \ndecoder::AbstractVariationalDecoder: The decoder of the HVAE model.\ndecoder_output::NamedTuple: The output of the decoder.\n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}: The step size for the leapfrog steps in the HMC algorithm. This can be a scalar or an array. Default is 0.0001. \nK::Int: The number of leapfrog steps to perform in the Hamiltonian Monte Carlo (HMC) algorithm. Default is 3.\nβₒ::Number: The initial inverse temperature for the tempering schedule. Default is 0.3f0.\n∇U_kwargs::Union{Dict,NamedTuple}: Additional keyword arguments to be passed to the ∇potential_energy function. Default is a NamedTuple with reconstruction_loglikelihood and latent_logprior.\ntempering_schedule::Function: The function to compute the inverse temperature at each step in the HMC algorithm. Defaults to quadratic_tempering. This function must take three arguments: First, βₒ, an initial inverse temperature, second, k, the current step in the tempering schedule, and third, K, the total number of steps in the tempering schedule.\n\nReturns\n\nA NamedTuple with the following keys: \nz_init: The initial latent variable. \nρ_init: The initial momentum variable. \nz_final: The final latent variable after K leapfrog steps. \nρ_final: The final momentum variable after K leapfrog steps. \nThe decoder output at the final latent variable is also returned. Note: This is not in the same named tuple as the other outputs, but as a separate output.\n\nDescription\n\nThe function first samples a random momentum variable γₒ from a standard normal distribution and scales it by the inverse square root of the initial inverse temperature βₒ to obtain the initial momentum variable ρₒ. Then, it performs K leapfrog steps, each followed by a tempering step, to generate a new sample from the latent space.\n\nNote\n\nEnsure the input data x and the initial latent variable zₒ match the expected input dimensionality for the HVAE model.\n\n\n\n\n\nleapfrog_tempering_step(\n x::AbstractArray,\n zₒ::AbstractVecOrMat,\n hvae::HVAE;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n K::Int=3,\n βₒ::Number=0.3f0,\n ∇U_kwargs::Union{Dict,NamedTuple}=(\n reconstruction_loglikelihood=reconstruction_loglikelihood,\n latent_logprior=spherical_logprior,\n ),\n tempering_schedule::Function=quadratic_tempering,\n)\n\nCombines the leapfrog and tempering steps into a single function for the Hamiltonian Variational Autoencoder (HVAE).\n\nArguments\n\nx::AbstractArray: The data to be processed. If Array, the last dimension must be of size 1.\nzₒ::AbstractVecOrMat: The initial latent variable. \nhvae::HVAE: An HVAE model that contains the decoder.\n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}: The step size for the leapfrog steps in the HMC algorithm. This can be a scalar or an array. Default is 0.0001. \nK::Int: The number of leapfrog steps to perform in the Hamiltonian Monte Carlo (HMC) algorithm. Default is 3.\nβₒ::Number: The initial inverse temperature for the tempering schedule. Default is 0.3f0.\n∇U_kwargs::Union{Dict,NamedTuple}: Additional keyword arguments to be passed to the ∇potential_energy function. Default is a NamedTuple with reconstruction_loglikelihood and latent_logprior.\ntempering_schedule::Function: The function to compute the inverse temperature at each step in the HMC algorithm. Defaults to quadratic_tempering. This function must take three arguments: First, βₒ, an initial inverse temperature, second, k, the current step in the tempering schedule, and third, K, the total number of steps in the tempering schedule.\n\nReturns\n\nA NamedTuple with the following keys: \nz_init: The initial latent variable. \nρ_init: The initial momentum variable. \nz_final: The final latent variable after K leapfrog steps. \nρ_final: The final momentum variable after K leapfrog steps. \nThe decoder output at the final latent variable is also returned. Note: This is not in the same named tuple as the other outputs, but as a separate output.\n\nDescription\n\nThe function first samples a random momentum variable γₒ from a standard normal distribution and scales it by the inverse square root of the initial inverse temperature βₒ to obtain the initial momentum variable ρₒ. Then, it performs K leapfrog steps, each followed by a tempering step, to generate a new sample from the latent space.\n\nNote\n\nEnsure the input data x and the initial latent variable zₒ match the expected input dimensionality for the HVAE model.\n\n\n\n\n\n","category":"function"},{"location":"hvae/#AutoEncoderToolkit.HVAEs._log_p̄","page":"HVAE","title":"AutoEncoderToolkit.HVAEs._log_p̄","text":"_log_p̄(\n x::AbstractArray,\n hvae::HVAE{VAE{E,D}},\n hvae_outputs::NamedTuple;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n logprior::Function=spherical_logprior,\n prefactor::AbstractArray=ones(Float32, 3),\n)\n\nThis is an internal function used in hamiltonian_elbo to compute the numerator of the unbiased estimator of the marginal likelihood. The function computes the sum of the log likelihood of the data given the latent variables, the log prior of the latent variables, and the log prior of the momentum variables.\n\n log p̄ = log p(x | zₖ) + log p(zₖ) + log p(ρₖ)\n\nArguments\n\nx::AbstractArray: The input data. If Array, the last dimension must contain each of the data points.\nhvae::HVAE{<:VAE{<:AbstractGaussianEncoder,<:AbstractGaussianLogDecoder}}: The Hamiltonian Variational Autoencoder (HVAE) model.\nhvae_outputs::NamedTuple: The outputs of the HVAE, including the final latent variables zₖ and the final momentum variables ρₖ.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function: The function to compute the log likelihood of the data given the latent variables. Default is decoder_loglikelihood.\nlogprior::Function: The function to compute the log prior of the latent variables. Default is spherical_logprior.\nprefactor::AbstractArray: A 3-element array to scale the log likelihood, log prior of the latent variables, and log prior of the momentum variables. Default is an array of ones.\n\nReturns\n\nlog_p̄::AbstractVector: The first term of the log of the unbiased estimator of the marginal likelihood for each data point.\n\nNote\n\nThis is an internal function and should not be called directly. It is used as part of the hamiltonian_elbo function.\n\n\n\n\n\n","category":"function"},{"location":"hvae/#AutoEncoderToolkit.HVAEs._log_q̄","page":"HVAE","title":"AutoEncoderToolkit.HVAEs._log_q̄","text":"_log_q̄(\n hvae::HVAE,\n hvae_outputs::NamedTuple,\n βₒ::Number;\n logprior::Function=spherical_logprior,\n prefactor::AbstractArray=ones(Float32, 3),\n)\n\nThis is an internal function used in hamiltonian_elbo to compute the second term of the unbiased estimator of the marginal likelihood. The function computes the sum of the log posterior of the initial latent variables and the log prior of the initial momentum variables, minus a term that depends on the dimensionality of the latent space and the initial temperature.\n\nlog q̄ = log q(zₒ | x) + log p(ρₒ | zₒ) - d/2 log(βₒ)\n\nArguments\n\nhvae::HVAE: The Hamiltonian Variational Autoencoder (HVAE) model.\nhvae_outputs::NamedTuple: The outputs of the HVAE, including the initial latent variables zₒ and the initial momentum variables ρₒ.\nβₒ::Number: The initial temperature for the tempering steps.\n\nOptional Keyword Arguments\n\nlogprior::Function: The function to compute the log prior of the momentum variables. Default is spherical_logprior.\nprefactor::AbstractArray: A 3-element array to scale the log posterior of the initial latent variables, log prior of the initial momentum variables, and the tempering Jacobian term. Default is an array of ones.\n\nReturns\n\nlog_q̄::Vector: The second term of the log of the unbiased estimator of the marginal likelihood for each data point.\n\nNote\n\nThis is an internal function and should not be called directly. It is used as part of the hamiltonian_elbo function.\n\n\n\n\n\n","category":"function"},{"location":"hvae/#AutoEncoderToolkit.HVAEs.hamiltonian_elbo","page":"HVAE","title":"AutoEncoderToolkit.HVAEs.hamiltonian_elbo","text":"hamiltonian_elbo(\n hvae::HVAE,\n x::AbstractArray;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n K::Int=3,\n βₒ::Number=0.3f0,\n ∇U_kwargs::Union{Dict,NamedTuple}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n latent_logprior=spherical_logprior,\n ),\n tempering_schedule::Function=quadratic_tempering,\n return_outputs::Bool=false,\n logp_prefactor::AbstractArray=ones(Float32, 3),\n logq_prefactor::AbstractArray=ones(Float32, 3),\n)\n\nCompute the Hamiltonian Monte Carlo (HMC) estimate of the evidence lower bound (ELBO) for a Hamiltonian Variational Autoencoder (HVAE).\n\nThis function takes as input an HVAE and a vector of input data x. It performs K HMC steps with a leapfrog integrator and a tempering schedule to estimate the ELBO. The ELBO is computed as the difference between the log p̄ and log q̄ as\n\nelbo = mean(log p̄ - log q̄),\n\nArguments\n\nhvae::HVAE: The HVAE used to encode the input data and decode the latent space.\nx::AbstractArray: The input data. If Array, the last dimension must contain each of the data points.\n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}: The step size for the leapfrog integrator (default is 0.01).\nK::Int: The number of HMC steps (default is 3).\nβₒ::Number: The initial inverse temperature (default is 0.3).\n∇U_kwargs::Union{Dict,NamedTuple}: Additional keyword arguments to be passed to the ∇potential_energy function. Defaults to a NamedTuple with :reconstruction_loglikelihood set to decoder_loglikelihood and :latent_logprior set to spherical_logprior.\ntempering_schedule::Function: The tempering schedule function used in the HMC (default is quadratic_tempering).\nreturn_outputs::Bool: Whether to return the outputs of the HVAE. Defaults to false. NOTE: This is necessary to avoid computing the forward pass twice when computing the loss function with regularization.\nlogp_prefactor::AbstractArray: A 3-element array to scale the log likelihood, log prior of the latent variables, and log prior of the momentum variables. Default is an array of ones.\nlogq_prefactor::AbstractArray: A 3-element array to scale the log posterior of the initial latent variables, log prior of the initial momentum variables, and the tempering Jacobian term. Default is an array of ones.\n\nReturns\n\nelbo::Number: The HMC estimate of the ELBO. If return_outputs is true, also returns the outputs of the HVAE.\n\n\n\n\n\nhamiltonian_elbo(\n hvae::HVAE,\n x_in::AbstractArray,\n x_out::AbstractArray;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n K::Int=3,\n βₒ::Number=0.3f0,\n ∇U_kwargs::Union{Dict,NamedTuple}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n latent_logprior=spherical_logprior,\n ),\n tempering_schedule::Function=quadratic_tempering,\n return_outputs::Bool=false,\n logp_prefactor::AbstractArray=ones(Float32, 3),\n logq_prefactor::AbstractArray=ones(Float32, 3),\n)\n\nCompute the Hamiltonian Monte Carlo (HMC) estimate of the evidence lower bound (ELBO) for a Hamiltonian Variational Autoencoder (HVAE).\n\nThis function takes as input an HVAE and a vector of input data x. It performs K HMC steps with a leapfrog integrator and a tempering schedule to estimate the ELBO. The ELBO is computed as the difference between the log p̄ and log q̄ as\n\nelbo = mean(log p̄ - log q̄),\n\nArguments\n\nhvae::HVAE: The HVAE used to encode the input data and decode the latent space.\nx_in::AbstractArray: The input data. If Array, the last dimension must contain each of the data points.\nx_out::AbstractArray: The data against which the reconstruction is compared. If Array, the last dimension must contain each of the data points.\n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}: The step size for the leapfrog integrator (default is 0.01).\nK::Int: The number of HMC steps (default is 3).\nβₒ::Number: The initial inverse temperature (default is 0.3).\n∇U_kwargs::Union{Dict,NamedTuple}: Additional keyword arguments to be passed to the ∇potential_energy function. Defaults to a NamedTuple with :reconstruction_loglikelihood set to decoder_loglikelihood and :latent_logprior set to spherical_logprior.\ntempering_schedule::Function: The tempering schedule function used in the HMC (default is quadratic_tempering).\nreturn_outputs::Bool: Whether to return the outputs of the HVAE. Defaults to false. NOTE: This is necessary to avoid computing the forward pass twice when computing the loss function with regularization.\nlogp_prefactor::AbstractArray: A 3-element array to scale the log likelihood, log prior of the latent variables, and log prior of the momentum variables. Default is an array of ones.\nlogq_prefactor::AbstractArray: A 3-element array to scale the log posterior of the initial latent variables, log prior of the initial momentum variables, and the tempering Jacobian term. Default is an array of ones.\n\nReturns\n\nelbo::Number: The HMC estimate of the ELBO. If return_outputs is true, also returns the outputs of the HVAE.\n\n\n\n\n\n","category":"function"},{"location":"infomaxvae/#InfoMaxVAEsmodule","page":"InfoMax-VAE","title":"InfoMax VAE","text":"","category":"section"},{"location":"infomaxvae/","page":"InfoMax-VAE","title":"InfoMax-VAE","text":"The InfoMax VAE is a variant of the Variational Autoencoder (VAE) that aims to explicitly account for the maximization of mutual information between the latent space representation and the input data. The main difference between the InfoMax VAE and the MMD-VAE (InfoVAE) is that rather than using the Maximum-Mean Discrepancy (MMD) as a measure of the \"distance\" between the latent space, the InfoMax VAE explicitly models the mutual information between latent representations and data inputs via a separate neural network. The loss function for this separate network then takes the form of a variational lower bound on the mutual information between the latent space and the input data.","category":"page"},{"location":"infomaxvae/","page":"InfoMax-VAE","title":"InfoMax-VAE","text":"Because of the need of this separate network, the InfoMaxVAE struct in AutoEncoderToolkit.jl takes two arguments to construct: the original VAE struct and a network to compute the mutual information. To properly deploy all relevant functions associated with this second network, we also provide a MutualInfoChain struct.","category":"page"},{"location":"infomaxvae/","page":"InfoMax-VAE","title":"InfoMax-VAE","text":"Furthermore, because of the two networks and the way the training algorithm is set up, the loss function for the InfoMax VAE includes two separate loss functions: one for the MutualInfoChain and one for the InfoMaxVAE.","category":"page"},{"location":"infomaxvae/#References","page":"InfoMax-VAE","title":"References","text":"","category":"section"},{"location":"infomaxvae/","page":"InfoMax-VAE","title":"InfoMax-VAE","text":"Rezaabad, A. L. & Vishwanath, S. Learning Representations by Maximizing Mutual Information in Variational Autoencoders. Preprint at http://arxiv.org/abs/1912.13361 (2020).","category":"page"},{"location":"infomaxvae/#MutualInfoChain","page":"InfoMax-VAE","title":"MutualInfoChain struct","text":"","category":"section"},{"location":"infomaxvae/","page":"InfoMax-VAE","title":"InfoMax-VAE","text":"AutoEncoderToolkit.InfoMaxVAEs.MutualInfoChain","category":"page"},{"location":"infomaxvae/#AutoEncoderToolkit.InfoMaxVAEs.MutualInfoChain","page":"InfoMax-VAE","title":"AutoEncoderToolkit.InfoMaxVAEs.MutualInfoChain","text":"MutualInfoChain\n\nA MutualInfoChain is used to compute the variational mutual information when training an InfoMaxVAE. The chain is composed of a series of layers that must end with a single output: the mutual information between the latent variables and the input data.\n\nArguments\n\ndata::Union{Flux.Dense,Flux.Chain}: The data layer of the MutualInfoChain. This layer is used to input the data.\nlatent::Union{Flux.Dense,Flux.Chain}: The latent layer of the MutualInfoChain. This layer is used to input the latent variables.\nmlp::Flux.Chain: A multi-layer perceptron (MLP) that is used to compute the mutual information between the inputs and the latent representations. The MLP takes as input the latent variables and outputs a scalar representing the estimated variational mutual information.\n\nCitation\n\nRezaabad, A. L. & Vishwanath, S. Learning Representations by Maximizing Mutual Information in Variational Autoencoders. in 2020 IEEE International Symposium on Information Theory (ISIT) 2729–2734 (IEEE, 2020). doi:10.1109/ISIT44484.2020.9174424.\n\nNote\n\nIf the input data is not a flat array, make sure to include a flattening layer within data.\n\n\n\n\n\n","category":"type"},{"location":"infomaxvae/#InfoMaxVAE","page":"InfoMax-VAE","title":"InfoMaxVAE struct","text":"","category":"section"},{"location":"infomaxvae/","page":"InfoMax-VAE","title":"InfoMax-VAE","text":"AutoEncoderToolkit.InfoMaxVAEs.InfoMaxVAE","category":"page"},{"location":"infomaxvae/#AutoEncoderToolkit.InfoMaxVAEs.InfoMaxVAE","page":"InfoMax-VAE","title":"AutoEncoderToolkit.InfoMaxVAEs.InfoMaxVAE","text":"`InfoMaxVAE <: AbstractVariationalAutoEncoder`\n\nstruct encapsulating an InfoMax variational autoencoder (InfoMaxVAE), an architecture designed to enhance the VAE framework by maximizing mutual information between the inputs and the latent representations, as per the methods described by Rezaabad and Vishwanath (2020).\n\nThe model aims to learn representations that preserve mutual information with the input data, arguably capturing more meaningful factors of variation.\n\nFields\n\nvae::VAE: The core variational autoencoder, consisting of an encoder that maps input data into a latent space representation, and a decoder that attempts to reconstruct the input from the latent representation.\nmi::MutualInfoChain: A multi-layer perceptron (MLP) that estimates the mutual information between the input data and the latent representations.\n\nUsage\n\nThe InfoMaxVAE struct is utilized in a similar manner to a standard VAE, with the added capability of mutual information maximization as part of the training process. This involves an additional loss term that considers the output of the mi network to encourage latent representations that are informative about the input data.\n\nExample\n\n# Assuming definitions for `encoder`, `decoder`, and `mi` are provided:\ninfo_max_vae = InfoMaxVAE(VAE(encoder, decoder), mi)\n\n# During training, one would maximize both the variational lower bound and the \n# mutual information estimate provided by `mlp`.\n\nCitation\n\nRezaabad, A. L. & Vishwanath, S. Learning Representations by Maximizing Mutual Information in Variational Autoencoders. in 2020 IEEE International Symposium on Information Theory (ISIT) 2729–2734 (IEEE, 2020). doi:10.1109/ISIT44484.2020.9174424.\n\n\n\n\n\n","category":"type"},{"location":"infomaxvae/#Forward-pass","page":"InfoMax-VAE","title":"Forward pass","text":"","category":"section"},{"location":"infomaxvae/#Mutual-Information-Network","page":"InfoMax-VAE","title":"Mutual Information Network","text":"","category":"section"},{"location":"infomaxvae/","page":"InfoMax-VAE","title":"InfoMax-VAE","text":"AutoEncoderToolkit.InfoMaxVAEs.MutualInfoChain(::AbstractArray, ::AbstractVecOrMat)\n","category":"page"},{"location":"infomaxvae/#AutoEncoderToolkit.InfoMaxVAEs.MutualInfoChain-Tuple{AbstractArray, AbstractVecOrMat}","page":"InfoMax-VAE","title":"AutoEncoderToolkit.InfoMaxVAEs.MutualInfoChain","text":"(mi::MutualInfoChain)(x::AbstractArray, z::AbstractVecOrMat)\n\nForward pass function for the MutualInfoChain, which applies the MLP to an input x.\n\nArguments\n\nx::AbstractArray: The input array to be processed. The last dimension represents each data sample.\nz::AbstractVecOrMat: The latent representation of the input data. The last dimension represents each data sample.\n\nReturns\n\nThe result of applying the MutualInfoChain to the input data and the latent representation simultaneously.\n\nDescription\n\nThis function applies the MLP (Multilayer Perceptron) of a MutualInfoChain instance to an input array. The MLP is a type of neural network used in the MutualInfoChain for processing the input data.\n\n\n\n\n\n","category":"method"},{"location":"infomaxvae/#InfoMax-VAE","page":"InfoMax-VAE","title":"InfoMax VAE","text":"","category":"section"},{"location":"infomaxvae/","page":"InfoMax-VAE","title":"InfoMax-VAE","text":"AutoEncoderToolkit.InfoMaxVAEs.InfoMaxVAE(::AbstractArray)","category":"page"},{"location":"infomaxvae/#AutoEncoderToolkit.InfoMaxVAEs.InfoMaxVAE-Tuple{AbstractArray}","page":"InfoMax-VAE","title":"AutoEncoderToolkit.InfoMaxVAEs.InfoMaxVAE","text":"(vae::InfoMaxVAE)(x::AbstractArray; latent::Bool=false)\n\nProcesses the input data x through an InfoMaxVAE, which consists of an encoder, a decoder, and a multi-layer perceptron (MLP) to estimate variational mutual information.\n\nArguments\n\nx::AbstractArray: The data to be decoded. If array, the last dimension contains each data sample. \n\nOptional Keyword Arguments\n\nlatent::Bool: If true, returns a dictionary with latent variables and mutual information estimations along with the reconstruction. Defaults to false.\nseed::Union{Nothing,Int}: Optional argument. The seed for the random number generator used for shuffling the latent codes. If not provided, a random seed will be used.\n\nReturns\n\nIf latent=false: The decoder output as a NamedTuple.\nIf latent=true: A NamedTuple with the :vae field that contains the outputs of the VAE, and the :mi field that contains the estimate of the variational mutual information. Note that this estimate requires shuffling the latent codes between data samples. Therefore, it is only meaningful for batch data cases.\n\nDescription\n\nThis function first encodes the input x . It then samples from this distribution using the reparametrization trick. The sampled latent vectors are then decoded, and the MutualInfoChain is used to estimate the mutual information.\n\nNote\n\nEnsure the input data x matches the expected input dimensionality for the encoder in the InfoMaxVAE.\n\n\n\n\n\n","category":"method"},{"location":"infomaxvae/#[Loss-functions]","page":"InfoMax-VAE","title":"[Loss functions]","text":"","category":"section"},{"location":"infomaxvae/#miloss","page":"InfoMax-VAE","title":"Mutual Information Network","text":"","category":"section"},{"location":"infomaxvae/","page":"InfoMax-VAE","title":"InfoMax-VAE","text":"AutoEncoderToolkit.InfoMaxVAEs.miloss","category":"page"},{"location":"infomaxvae/#AutoEncoderToolkit.InfoMaxVAEs.miloss","page":"InfoMax-VAE","title":"AutoEncoderToolkit.InfoMaxVAEs.miloss","text":"miloss(\n vae::VAE,\n mi::MutualInfoChain,\n x::AbstractArray;\n regularization::Union{Function,Nothing}=nothing,\n reg_strength::Float32=1.0f0,\n seed::Union{Nothing,Int}=nothing\n)\n\nCalculates the loss for training the MutualInfoChain in the InfoMaxVAE algorithm to estimate mutual information between the input x and the latent representation z. The loss function is based on a variational approximation of mutual information, using the MutualInfoChain's output g(x, z). The variational mutual information is then calculated as the difference between the MutualInfoChain's output for the true x and latent z, and the exponentiated average of the MLP's output for x and the shuffled latent z_shuffle, adjusted for the regularization term if provided.\n\nArguments\n\nvae::VAE: The variational autoencoder.\nmi::MutualInfoChain: The MutualInfoChain used for estimating mutual information.\nx::AbstractArray: The input vector for the VAE.\n\nOptional Keyword Arguments\n\nregularization::Union{Function, Nothing}=nothing: A regularization function applied to the MLP's output.\nreg_strength::Float32=1.0f0: The strength of the regularization term.\nseed::Union{Nothing,Int}=nothing: The seed for the random number generator used for shuffling the latent codes. If not provided, a random seed will be used.\n\nReturns\n\nFloat32: The computed loss, representing negative variational mutual information, adjusted by the regularization term.\n\nDescription\n\nThe function computes the loss as follows:\n\nloss = -sum(I(x; z)) + sum(exp(I(x; z̃) - 1)) + regstrength * regterm\n\nwhere I(x; z) is the MLP's output representing an estimation of mutual information for true x and latent z, and z̃ represents shuffled latent variables, meaning, the latent codes are randomly swap between data points.\n\nThe function is used to separately train the MLP to estimate mutual information, which is a component of the larger InfoMaxVAE model.\n\nNotes\n\nThis function takes the vae and mi instances of an InfoMaxVAE model as separate arguments to be able to compute a gradient only with respect to the mi parameters.\nEnsure that the dimensionality of the input data x aligns with the encoder's expected input in the VAE.\nInfoMaxVAEs fully depend on batch training as the estimation of mutual information depends on shuffling the latent codes. This method works for large enough batches (≥ 64 samples).\n\n\n\n\n\n","category":"function"},{"location":"infomaxvae/#infomaxloss","page":"InfoMax-VAE","title":"InfoMax VAE","text":"","category":"section"},{"location":"infomaxvae/","page":"InfoMax-VAE","title":"InfoMax-VAE","text":"AutoEncoderToolkit.InfoMaxVAEs.infomaxloss","category":"page"},{"location":"infomaxvae/#AutoEncoderToolkit.InfoMaxVAEs.infomaxloss","page":"InfoMax-VAE","title":"AutoEncoderToolkit.InfoMaxVAEs.infomaxloss","text":"infomaxloss(\n vae::VAE,\n mi::MutualInfoChain,\n x::AbstractArray;\n β=1.0f0,\n α=1.0f0,\n n_samples::Int=1,\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n kl_divergence::Function=encoder_kl,\n regularization::Union{Function,Nothing}=nothing,\n reg_strength::Float32=1.0f0,\n seed::Union{Nothing,Int}=nothing\n)\n\nComputes the loss for an InfoMax variational autoencoder (VAE) with mutual information constraints, by averaging over n_samples latent space samples.\n\nThe loss function combines the reconstruction loss with the Kullback-Leibler (KL) divergence, the variational mutual information between input and latent representations, and possibly a regularization term, defined as:\n\nloss = -⟨log p(x|z)⟩ + β × Dₖₗ[qᵩ(z|x) || p(z)] - α × I(x;z) + regstrength × regterm\n\nWhere:\n\n⟨log p(x|z)⟩ is the expected log likelihood of the probabilistic decoder. -\n\nDₖₗ[qᵩ(z|x) || p(z)] is the KL divergence between the approximated encoder and the prior over the latent space.\n\nI(x;z) is the variational mutual information between the inputs x and the latent variables z.\n\nArguments\n\nvae::VAE: A VAE model with encoder and decoder networks.\nmi::MutualInfoChain: A MutualInfoChain instance used to estimate mutual information term.\nx::AbstractArray: Input data. The last dimension represents each data sample.\n\nOptional Keyword Arguments\n\nβ::Float32=1.0f0: Weighting factor for the KL-divergence term, used for annealing.\nα::Float32=1.0f0: Weighting factor for the mutual information term.\nn_samples::Int=1: The number of samples to draw from the latent space when computing the loss.\nreconstruction_loglikelihood::Function=decoder_loglikelihood: A function that computes the log likelihood of the decoder's output.\nkl_divergence::Function=encoder_kl: A function that computes the KL divergence between the encoder's output and the prior.\nregularization::Union{Function, Nothing}=nothing: A function that computes the regularization term based on the VAE outputs. Should return a Float32.\nreg_strength::Float32=1.0f0: The strength of the regularization term.\nseed::Union{Nothing,Int}: The seed for the random number generator used for shuffling the latent codes. If not provided, a random seed will be used.\n\nReturns\n\nFloat32: The computed average loss value for the input x and its reconstructed counterparts over n_samples samples, including possible regularization terms and the mutual information constraint.\n\nNote\n\nThis function takes the vae and mi instances of an InfoMaxVAE model as separate arguments to be able to compute a gradient only with respect to the vae parameters.\nEnsure that the input data x match the expected input dimensionality for the encoder in the VAE.\nInfoMaxVAEs fully depend on batch training as the estimation of mutual information depends on shuffling the latent codes. This method works for large enough batches (≥ 64 samples).\n\n\n\n\n\ninfomaxloss(\n vae::VAE,\n mi::MutualInfoChain,\n x_in::AbstractArray,\n x_out::AbstractArray;\n β=1.0f0,\n α=1.0f0,\n n_samples::Int=1,\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n kl_divergence::Function=encoder_kl,\n regularization::Union{Function,Nothing}=nothing,\n reg_strength::Float32=1.0f0,\n seed::Union{Nothing,Int}=nothing\n)\n\nComputes the loss for an InfoMax variational autoencoder (VAE) with mutual information constraints, by averaging over n_samples latent space samples.\n\nThe loss function combines the reconstruction loss with the Kullback-Leibler (KL) divergence, the variational mutual information between input and latent representations, and possibly a regularization term, defined as:\n\nloss = -⟨log p(x|z)⟩ + β × Dₖₗ[qᵩ(z|x) || p(z)] - α × I(x;z) + regstrength × regterm\n\nWhere:\n\n⟨log p(x|z)⟩ is the expected log likelihood of the probabilistic decoder. -\n\nDₖₗ[qᵩ(z|x) || p(z)] is the KL divergence between the approximated encoder and the prior over the latent space.\n\nI(x;z) is the variational mutual information between the inputs x and the latent variables z.\n\nArguments\n\nvae::VAE: A VAE model with encoder and decoder networks.\nmi::MutualInfoChain: A MutualInfoChain instance used to estimate mutual information term.\nx_in::AbstractArray: Input matrix. The last dimension represents each data sample.\nx_out::AbstractArray: Output matrix against wich reconstructions are compared. The last dimension represents each data sample.\n\nOptional Keyword Arguments\n\nβ::Float32=1.0f0: Weighting factor for the KL-divergence term, used for annealing.\nα::Float32=1.0f0: Weighting factor for the mutual information term.\nn_samples::Int=1: The number of samples to draw from the latent space when computing the loss.\nreconstruction_loglikelihood::Function=decoder_loglikelihood: A function that computes the log likelihood of the decoder's output.\nkl_divergence::Function=encoder_kl: A function that computes the KL divergence between the encoder's output and the prior.\nregularization::Union{Function, Nothing}=nothing: A function that computes the regularization term based on the VAE outputs. Should return a Float32.\nreg_strength::Float32=1.0f0: The strength of the regularization term.\nseed::Union{Nothing,Int}: The seed for the random number generator used for shuffling the latent codes. If not provided, a random seed will be used.\n\nReturns\n\nFloat32: The computed average loss value for the input x and its reconstructed counterparts over n_samples samples, including possible regularization terms and the mutual information constraint.\n\nNote\n\nThis function takes the vae and mi instances of an InfoMaxVAE model as separate arguments to be able to compute a gradient only with respect to the vae parameters.\nEnsure that the input data x match the expected input dimensionality for the encoder in the VAE.\nInfoMaxVAEs fully depend on batch training as the estimation of mutual information depends on shuffling the latent codes. This method works for large enough batches (≥ 64 samples).\n\n\n\n\n\n","category":"function"},{"location":"infomaxvae/#Training","page":"InfoMax-VAE","title":"Training","text":"","category":"section"},{"location":"infomaxvae/","page":"InfoMax-VAE","title":"InfoMax-VAE","text":"AutoEncoderToolkit.InfoMaxVAEs.train!","category":"page"},{"location":"infomaxvae/#AutoEncoderToolkit.InfoMaxVAEs.train!","page":"InfoMax-VAE","title":"AutoEncoderToolkit.InfoMaxVAEs.train!","text":" train!(\n infomaxvae, x, opt; \n infomaxloss_function=infomaxloss,\n infomaxloss_kwargs, \n miloss_function=miloss, \n miloss_kwargs,\n loss_return::Bool=false,\n verbose::Bool=false\n )\n\nCustomized training function to update parameters of an InfoMax variational autoencoder (VAE) given a loss function of the specified form.\n\nThe InfoMax VAE loss function can be defined as:\n\nloss_infoMax = argmin -⟨log p(x|z)⟩ + β Dₖₗ(qᵩ(z) || p(z)) -\n α [⟨g(x, z)⟩ - ⟨exp(g(x, z) - 1)⟩],\n\nwhere ⟨log p(x|z)⟩ is the expected log likelihood of the probabilistic decoder, Dₖₗ[qᵩ(z) || p(z)] is the KL divergence between the approximated encoder distribution and the prior over the latent space, and g(x, z) is the output of the MutualInfoChain estimating the mutual information between the input data and the latent representation.\n\nThis function simultaneously optimizes two neural networks: the VAE itself and a multi-layer perceptron MutualInfoChain used to compute the mutual information between input and latent variables.\n\nArguments\n\ninfomaxvae::InfoMaxVAE: Struct containing the elements of an InfoMax VAE.\nx::AbstractArray: Matrix containing the data on which to evaluate the loss function. Each column represents a single data point.\nopt::NamedTuple: State of the optimizer for updating parameters. Typically initialized using Flux.Optimisers.update!.\n\nOptional Keyword arguments\n\ninfomaxloss_function::Function: The loss function to be used during training for the VAE, defaulting to infomaxloss.\ninfomaxloss_kwargs::Union{NamedTuple,Dict}: Additional keyword arguments to be passed to the VAE loss function.\nmiloss_function::Function: The loss function to be used during training for the MLP computing the variational free energy, defaulting to miloss.\nmiloss_kwargs::Union{NamedTuple,Dict}: Additional keyword arguments to be passed to the MutualInfoChain loss function.\nloss_return::Bool: If true, the function returns the loss values for the VAE and MutualInfoChain. Defaults to false.\nverbose::Bool: If true, the function prints the loss values for the VAE and MutualInfoChain. Defaults to false.\n\nDescription\n\nPerforms one step of gradient descent on the InfoMaxVAE loss function to jointly train the VAE and MutualInfoChain. The VAE parameters are updated to minimize the InfoMaxVAE loss, while the MutualInfoChain parameters are updated to maximize the estimated mutual information. The function allows for customization of loss hyperparameters during training.\n\nNotes\n\nEnsure that the dimensionality of the input data x aligns with the encoder's expected input in the VAE.\nInfoMaxVAEs fully depend on batch training as the estimation of mutual information depends on shuffling the latent codes. This method works best for large enough batches (≥ 64 samples).\n\n\n\n\n\n train!(\n infomaxvae, x, opt; \n infomaxloss_function=infomaxloss,\n infomaxloss_kwargs, \n miloss_function=miloss, \n miloss_kwargs,\n loss_return::Bool=false,\n verbose::Bool=false\n )\n\nCustomized training function to update parameters of an InfoMax variational autoencoder (VAE) given a loss function of the specified form.\n\nThe InfoMax VAE loss function can be defined as:\n\nloss_infoMax = argmin -⟨log p(x|z)⟩ + β Dₖₗ(qᵩ(z) || p(z)) -\n α [⟨g(x, z)⟩ - ⟨exp(g(x, z) - 1)⟩],\n\nwhere ⟨log p(x|z)⟩ is the expected log likelihood of the probabilistic decoder, Dₖₗ[qᵩ(z) || p(z)] is the KL divergence between the approximated encoder distribution and the prior over the latent space, and g(x, z) is the output of the MutualInfoChain estimating the mutual information between the input data and the latent representation.\n\nThis function simultaneously optimizes two neural networks: the VAE itself and a multi-layer perceptron MutualInfoChain used to compute the mutual information between input and latent variables.\n\nArguments\n\ninfomaxvae::InfoMaxVAE: Struct containing the elements of an InfoMax VAE.\nx::AbstractArray: Matrix containing the data on which to evaluate the loss function. Each column represents a single data point.\nopt::NamedTuple: State of the optimizer for updating parameters. Typically initialized using Flux.Optimisers.update!.\n\nOptional Keyword arguments\n\ninfomaxloss_function::Function: The loss function to be used during training for the VAE, defaulting to infomaxloss.\ninfomaxloss_kwargs::Union{NamedTuple,Dict}: Additional keyword arguments to be passed to the VAE loss function.\nmiloss_function::Function: The loss function to be used during training for the MutualInfoChain computing the variational free energy, defaulting to miloss.\nmiloss_kwargs::Union{NamedTuple,Dict}: Additional keyword arguments to be passed to the MutualInfoChain loss function.\nloss_return::Bool: If true, the function returns the loss values for the VAE and MLP. Defaults to false.\n\nDescription\n\nPerforms one step of gradient descent on the InfoMaxVAE loss function to jointly train the VAE and MutualInfoChain. The VAE parameters are updated to minimize the InfoMaxVAE loss, while the MutualInfoChain parameters are updated to maximize the estimated mutual information. The function allows for customization of loss hyperparameters during training.\n\nNotes\n\nEnsure that the dimensionality of the input data x aligns with the encoder's expected input in the VAE.\nInfoMaxVAEs fully depend on batch training as the estimation of mutual information depends on shuffling the latent codes. This method works best for large enough batches (≥ 64 samples).\n\n\n\n\n\n","category":"function"},{"location":"infomaxvae/#Other-Functions","page":"InfoMax-VAE","title":"Other Functions","text":"","category":"section"},{"location":"infomaxvae/","page":"InfoMax-VAE","title":"InfoMax-VAE","text":"AutoEncoderToolkit.InfoMaxVAEs.shuffle_latent\nAutoEncoderToolkit.InfoMaxVAEs.variational_mutual_info","category":"page"},{"location":"infomaxvae/#AutoEncoderToolkit.InfoMaxVAEs.shuffle_latent","page":"InfoMax-VAE","title":"AutoEncoderToolkit.InfoMaxVAEs.shuffle_latent","text":"shuffle_latent(z::AbstractMatrix, seed::Int=Random.seed!())\n\nShuffle the elements of the second dimension of a matrix representing latent space points.\n\nArguments\n\nz::AbstractMatrix: A matrix representing latent codes. Each column corresponds to a single latent code.\n\nOptional Keyword Arguments\n\nseed::Union{Nothing, Int}: Optional argument. The seed for the random number generator. If not provided, a random seed will be used.\n\nReturns\n\nAbstractMatrix: A new matrix with the second dimension shuffled.\n\n\n\n\n\n","category":"function"},{"location":"infomaxvae/#AutoEncoderToolkit.InfoMaxVAEs.variational_mutual_info","page":"InfoMax-VAE","title":"AutoEncoderToolkit.InfoMaxVAEs.variational_mutual_info","text":"variational_mutual_info(mi, x, z, z_shuffle)\n\nCompute a variational approximation of the mutual information between the input x and the latent code z using a MutualInfoChain. Note that this estimate requires shuffling the latent codes between data samples. Therefore, it only applies to batch data cases. A single sample will not provide a meaningful estimate.\n\nArguments\n\nmi::MutualInfoChain: A MutualInfoChain instance used to estimate mutual information.\nx::AbstractArray: Array of input data. The last dimension represents each data sample.\nz::AbstractMatrix: Matrix of corresponding latent representations of the input data.\nz_shuffle::AbstractMatrix: Matrix of latent representations where the second dimension has been shuffled.\n\nReturns\n\nFloat32: An approximation of the mutual information between the input data and its corresponding latent representation.\n\nReferences\n\nRezaabad, A. L. & Vishwanath, S. Learning Representations by Maximizing Mutual Information in Variational Autoencoders. Preprint at http://arxiv.org/abs/1912.13361 (2020).\n\n\n\n\n\nvariational_mutual_info(infomaxvae, x, z, z_shuffle)\n\nCompute a variational approximation of the mutual information between the input x and the latent code z using an InfoMaxVAE instance. Note that this estimate requires shuffling the latent codes between data samples. Therefore, it only applies to batch data cases. A single sample will not provide a meaningful estimate.\n\nArguments\n\ninfomaxvae::InfoMaxVAE: An InfoMaxVAE instance used to estimate mutual information.\nx::AbstractArray: Array of input data. The last dimension represents each data sample.\nz::AbstractMatrix: Matrix of corresponding latent representations of the input data.\nz_shuffle::AbstractMatrix: Matrix of latent representations where the second dimension has been shuffled.\n\nReturns\n\nFloat32: An approximation of the mutual information between the input data and its corresponding latent representation.\n\nReferences\n\nRezaabad, A. L. & Vishwanath, S. Learning Representations by Maximizing Mutual Information in Variational Autoencoders. Preprint at http://arxiv.org/abs/1912.13361 (2020).\n\n\n\n\n\nvariational_mutual_info(\n infomaxvae::InfoMaxVAE,\n x::AbstractArray;\n seed::Union{Nothing,Int}=nothing\n)\n\nCompute a variational approximation of the mutual information between the input x and the latent code z using an InfoMaxVAE instance. This function also shuffles the latent codes between data samples to provide a meaningful estimate even for a single data sample.\n\nArguments\n\ninfomaxvae::InfoMaxVAE: An InfoMaxVAE instance used to estimate mutual information.\nx::AbstractArray: Array of input data. The last dimension represents each data sample.\n\nOptional Keyword Arguments\n\nseed::Union{Nothing,Int}: Optional argument. The seed for the random number generator used for shuffling the latent codes. If not provided, a random seed will be used.\n\nReturns\n\nFloat32: An approximation of the mutual information between the input data and its corresponding latent representation.\n\nReferences\n\nRezaabad, A. L. & Vishwanath, S. Learning Representations by Maximizing Mutual Information in Variational Autoencoders. Preprint at http://arxiv.org/abs/1912.13361 (2020).\n\n\n\n\n\n","category":"function"},{"location":"infomaxvae/#Default-initializations","page":"InfoMax-VAE","title":"Default initializations","text":"","category":"section"},{"location":"infomaxvae/","page":"InfoMax-VAE","title":"InfoMax-VAE","text":"AutoEncoderToolkit.jl provides default initializations for the MutualInfoChain. Although it gives the user less flexibility, it can be useful for quick prototyping.","category":"page"},{"location":"infomaxvae/","page":"InfoMax-VAE","title":"InfoMax-VAE","text":"AutoEncoderToolkit.InfoMaxVAEs.MutualInfoChain(\n ::Union{Int,Vector{<:Int}},\n ::Int,\n ::Vector{<:Int},\n ::Vector{<:Function},\n ::Function;\n)","category":"page"},{"location":"infomaxvae/#AutoEncoderToolkit.InfoMaxVAEs.MutualInfoChain-Tuple{Union{Int64, Vector{<:Int64}}, Int64, Vector{<:Int64}, Vector{<:Function}, Function}","page":"InfoMax-VAE","title":"AutoEncoderToolkit.InfoMaxVAEs.MutualInfoChain","text":"MutualInfoChain(\n size_input::Union{Int,Vector{<:Int}},\n n_latent::Int,\n mlp_neurons::Vector{<:Int},\n mlp_activations::Vector{<:Function},\n output_activation::Function;\n init::Function = Flux.glorot_uniform\n)\n\nConstructs a default MutualInfoChain. \n\nArguments\n\nn_input::Int: Number of input features to the MutualInfoChain.\nn_latent::Int: The dimensionality of the latent space.\nmlp_neurons::Vector{<:Int}: A vector of integers where each element represents the number of neurons in the corresponding hidden layer of the MLP.\nmlp_activations::Vector{<:Function}: A vector of activation functions to be used in the hidden layers. Length must match that of mlp_neurons.\noutput_activation::Function: Activation function for the output neuron of the MLP.\n\nOptional Keyword Arguments\n\ninit::Function: Initialization function for the weights of all layers in the MutualInfoChain. Defaults to Flux.glorot_uniform.\n\nReturns\n\nMutualInfoChain: A MutualInfoChain instance with the specified MLP architecture.\n\nNotes\n\nThe function will throw an error if the number of provided activation functions does not match the number of layers specified in mlp_neurons.\n\n\n\n\n\n","category":"method"},{"location":"quickstart/#Quick-Start","page":"Quick Start","title":"Quick Start","text":"","category":"section"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"warning: Warning\nIn this guide we will use external packages with functions not directly related to AutoEncoderToolkit.jl. such as Flux.jl and MLDatasets.jl. Make sure to install them before running the code if you want to follow along.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"For this quick start guide, we will prepare different autoencoders to be trained on a fraction of the MNIST dataset. Let us begin by importing the necessary packages.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"note: Note\nWe prefer to load functions using the import keyword instead of using. This is a personal preference and you can use using if you prefer.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Import project package\nimport AutoEncoderToolkit as AET\n\n# Import ML libraries\nimport Flux\n\n# Import library to load MNIST dataset\nusing MLDatasets: MNIST\n\n# Import library to save models\nimport JLD2","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Now that we have imported the necessary packages, we can load the MNIST dataset. For this specific example, we will only use digits 0, 1, and 2, taking 10 batches of 64 samples each. We will also use 2 batches with the same number of samples for validation.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Define number of samples in batch\nn_batch = 64\n# Define total number of data points\nn_data = n_batch * 10\n# Define number of validation data points\nn_val = n_batch * 2\n\n# Define lables to keep\ndigit_label = [0, 1, 2]\n\n# Load data and labels\ndata, labels = MNIST.traindata(\n ; dir=\"your_own_custom_path/data/mnist\"\n)\n\n# Keep only data with labels in digit_label\ndata_filt = dataset.features[:, :, dataset.targets.∈Ref(digit_label)]\nlabels_filt = dataset.targets[dataset.targets.∈Ref(digit_label)]\n\n# Reduce size of training data and reshape to WHCN format\ntrain_data = Float32.(reshape(data_filt[:, :, 1:n_data], (28, 28, 1, n_data)))\ntrain_labels = labels_filt[1:n_data]\n\n# Reduce size of validation data and reshape to WHCN format\nval_data = Float32.(\n reshape(data_filt[:, :, n_data+1:n_data+n_val], (28, 28, 1, n_val))\n)\nval_labels = labels_filt[n_data+1:n_data+n_val]","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Furthermore, for this particular example, we will use a binarized version of the MNIST dataset. This means that we will convert the pixel values to either 0 or 1.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Define threshold for binarization\nthresh = 0.5\n\n# Binarize training data\ntrain_data = Float32.(train_data .> thresh)\n\n# Binarize validation data\nval_data = Float32.(val_data .> thresh)","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Let's look at some of the binarized data.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"(Image: )","category":"page"},{"location":"quickstart/#Define-Encoder-and-Decoder","page":"Quick Start","title":"Define Encoder and Decoder","text":"","category":"section"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"tip: Tip\nFor this walkthrough, we will define the layers of the encoder and decoder by hand. But, for other cases, make sure to check the default initializers in the Encoders and Decoders section.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"With the data in hand, let us define the encoder and decoder for the variational autoencoder. The encoder will be a simple convolutional network with two convolutional layers and a latent dimensionality of 2. Since we will use the JointGaussianLogEncoder type that defines the encoder as a Gaussian distribution with diagonal covariance, returning the mean and log standard deviation, we also need to define two dense layers that map the output of the convolutional to the latent space.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"In this definition we will use functions from the Flux package to define the the convolutional layers and the dense layers. We will also use the custom Flatten layer from AutoEncoderToolkit.jl to flatten the output of the last convolutional layer before passing it to the dense layers.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Define dimensionality of latent space\nn_latent = 2\n\n# Define number of initial channels\nn_channels_init = 32\n\nprintln(\"Defining encoder...\")\n# Define convolutional layers\nconv_layers = Flux.Chain(\n # First convolutional layer\n Flux.Conv((4, 4), 1 => n_channels_init, Flux.relu; stride=2, pad=1),\n # Second convolutional layer\n Flux.Conv(\n (4, 4), n_channels_init => n_channels_init * 2, Flux.relu;\n stride=2, pad=1\n ),\n # Flatten the output\n AET.Flatten(),\n # Add extra dense layer 1\n Flux.Dense(n_channels_init * 2 * 7 * 7 => 256, Flux.relu),\n # Add extra dense layer 2\n Flux.Dense(256 => 256, Flux.relu),\n)\n\n# Define layers for µ and log(σ)\nµ_layer = Flux.Dense(256, n_latent, Flux.identity)\nlogσ_layer = Flux.Dense(256, n_latent, Flux.identity)\n\n# build encoder\nencoder = AET.JointGaussianLogEncoder(conv_layers, µ_layer, logσ_layer)","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"note: Note\nThe Flatten layer is a custom layer defined in AutoEncoderToolkit.jl that flattens the output into a 1D vector. This flattening operation is necessary because the output of the convolutional layers is a 4D tensor, while the input to the µ and log(σ) layers is a 1D vector. The custom layer is needed to be able to save the model and load it later as BSON and JLD2 do not play well with anonymous functions.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"In the same way, the decoder will be a simple deconvolutional network with two deconvolutional layers. Given the binary nature of the MNIST dataset we are using, the probability distribution that makes sense to use in the decoder is a Bernoulli distribution. We will therfore define the decoder as a BernoulliDecoder type. This means that the output of the decoder must be a value between 0 and 1. ","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Define deconvolutional layers\ndeconv_layers = Flux.Chain(\n # Define linear layer out of latent space\n Flux.Dense(n_latent => 256, Flux.identity),\n # Add extra dense layer\n Flux.Dense(256 => 256, Flux.relu),\n # Add extra dense layer to map to initial number of channels\n Flux.Dense(256 => n_channels_init * 2 * 7 * 7, Flux.relu),\n # Unflatten input using custom Reshape layer\n AET.Reshape(7, 7, n_channels_init * 2, :),\n # First transposed convolutional layer\n Flux.ConvTranspose(\n (4, 4), n_channels_init * 2 => n_channels_init, Flux.relu; \n stride=2, pad=1\n ),\n # Second transposed convolutional layer\n Flux.ConvTranspose(\n (4, 4), n_channels_init => 1, Flux.sigmoid_fast; stride=2, pad=1\n ),\n)\n\n# Define decoder\ndecoder = AET.BernoulliDecoder(deconv_layers)","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"note: Note\nSimilar to the Flatten custom layer, the Reshape layer is used to reshape the output of the deconvolutional layers to the correct dimensions. This custom layer plays along with the BSON and JLD2 libraries.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Alternatively, if we hadn't binarized the data, a Gaussian distribution would be a more appropriate choice for the decoder. In that case, we could define the decoder as a SimpleGaussianDecoder using the same deconv_layers as above. This would change the probabilistic function associated with the decoder from the Bernoulli to a Gaussian distribution with constant diagonal covariance. But, everything else that follows would remain the same. That's the power of Julias multiple dispatch and the AutoEncoderToolkit.jl's design!","category":"page"},{"location":"quickstart/#VAE-Model","page":"Quick Start","title":"VAE Model","text":"","category":"section"},{"location":"quickstart/#Defining-VAE-Model","page":"Quick Start","title":"Defining VAE Model","text":"","category":"section"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"With the encoder and decoder in hand, defining a variational autoencoder model is as simple as writing:","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Define VAE model\nvae = encoder * decoder","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"If we wish so, at this point we can save the model architecture and the initial state to disk using the JLD2 package.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Save model object\nJLD2.save(\n \"./output/model.jld2\",\n Dict(\"model\" => vae, \"model_state\" => Flux.state(vae))\n)","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"tip: Tip\nTo proceed the training on a CUDA-compatible device, all we need to do is to move the model and the data to the device. This can be done asusing CUDA\n# Move model to GPU\nvae = vae |> Flux.gpu\n# Move data to GPU\ntrain_data = train_data |> Flux.gpu\nval_data = val_data |> Flux.gpuEverything else will remain the same, except for the partition of data into batches. This should be preferentially done by hand rather than using the Flux.DataLoader functionality. NOTE: Flux.jl offers support for other devices as well. But AutoEncoderToolkit.jl has not been tested with them. So, if you want to use other devices, make sure to test it first. PRs to add support for other devices are welcome!","category":"page"},{"location":"quickstart/#Training-VAE-Model","page":"Quick Start","title":"Training VAE Model","text":"","category":"section"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"We are now ready to train the model. First, we partition the training data into batches","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Partition data into batches\ntrain_loader = Flux.DataLoader(train_data, batchsize=n_batch, shuffle=true)","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Next, we define the optimizer. For this example, we will use the ADAM optimizer with a learning rate of 1e-3.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Define learning rate\nη = 1e-3\n# Explicit setup of optimizer\nopt_vae = Flux.Train.setup(\n Flux.Optimisers.Adam(η),\n vae\n)","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Finally, we can train the model.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"note: Note\nMost of the code below is used to compute and store diagnostics of the training process. The core of the training loop is very simple thanks to the custom training function provided by AutoEncoderToolkit.jl.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Initialize arrays to save loss, entropy, and MSE\ntrain_loss = Array{Float32}(undef, n_epoch)\nval_loss = Array{Float32}(undef, n_epoch)\ntrain_entropy = Array{Float32}(undef, n_epoch)\nval_entropy = Array{Float32}(undef, n_epoch)\ntrain_mse = Array{Float32}(undef, n_epoch)\nval_mse = Array{Float32}(undef, n_epoch)\n\n# Loop through epochs\nfor epoch in 1:n_epoch\n println(\"Epoch: $(epoch)\\n\")\n # Loop through batches\n for (i, x) in enumerate(train_loader)\n println(\"Epoch: $(epoch) | Batch: $(i) / $(length(train_loader))\")\n # Train VAE\n AET.VAEs.train!(vae, x, opt_vae)\n end # for train_loader\n\n # Compute loss in training data\n train_loss[epoch] = AET.VAEs.loss(vae, train_data)\n # Compute loss in validation data\n val_loss[epoch] = AET.VAEs.loss(vae, val_data)\n\n # Forward pass training data\n train_outputs = vae(train_data)\n # Compute cross-entropy\n train_entropy[epoch] = Flux.Losses.logitbinarycrossentropy(\n train_outputs.p, train_data\n )\n # Compute MSE for training data\n train_mse[epoch] = Flux.mse(train_outputs.p, train_data)\n\n # Forward pass training data\n val_outputs = vae(val_data)\n # Compute cross-entropy\n val_entropy[epoch] = Flux.Losses.logitbinarycrossentropy(\n val_outputs.p, val_data\n )\n # Compute MSE for validation data\n val_mse[epoch] = Flux.mse(val_outputs.p, val_data)\n\n println(\n \"Epoch: $(epoch) / $(n_epoch)\\n \" *\n \"- train_mse: $(train_mse[epoch])\\n \" *\n \"- val_mse: $(val_mse[epoch])\\n \" *\n \"- train_loss: $(train_loss[epoch])\\n \" *\n \"- val_loss: $(val_loss[epoch])\\n \" *\n \"- train_entropy: $(train_entropy[epoch])\\n \" *\n \"- val_entropy: $(val_entropy[epoch])\\n\"\n )\nend # for n_epoch","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"tip: Tip\nTo convert this vanilla VAE into a β-VAE, all we need to do is add an optional keyword argument β to the loss function. This would be then fed to the train! function as follows:# Define loss keyword argument as dictionary\nloss_kwargs = Dict(\"β\" => 0.1)\n# Train model using β-VAE\nAET.VAEs.train!(vae, x, opt_vae; loss_kwargs=loss_kwargs)This argument defines the relative weight of the KL divergence term in the loss function.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"That's it! We have trained a variational autoencoder on the MNIST dataset. We can store the model and the training diagnostics to disk using the JLD2.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Store model and diagnostics\nJLD2.jldsave(\n \"./output/vae_epoch$(lpad(n_epoch, 4, \"0\")).jld2\",\n model_state=Flux.state(vae),\n train_entropy=train_entropy,\n train_loss=train_loss,\n train_mse=train_mse,\n val_entropy=val_entropy,\n val_mse=val_mse,\n val_loss=val_loss,\n)","category":"page"},{"location":"quickstart/#Exploring-the-results","page":"Quick Start","title":"Exploring the results","text":"","category":"section"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"warning: Warning\nFor the plots below, we do not provide the code to generate them. We assume the user is familiar with plotting in Julia. If you are not, we recommend checking the Makie.jl documentation.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Let's look at the training diagnostics to see how the training went.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"(Image: )","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"We can see that the training loss, the cross-entropy, and the mean squared error decreased as the training progressed on both the training and validation data.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Next, let's look at the resulting latent space. In particular, let's encode the training data and plot the coordinates in the latent space. To encode the data we have two options:","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Directly encode the data using the encoder. This returns a NamedTuple, where for our JointGaussianLogEncoder the fields are μ and logσ.\n# Map training data to latent space\ntrain_latent = vae.encoder(train_data)\nWe could take as the latent space coordinates the mean of the distribution.\nPerform the forward pass of the VAE model with the optional keyword argument latent=true. This returns a NamedTuple with the fields encoder, decoder, and z. The z field contains the sampled latent space coordinates obtained when performing the reparameterization trick.\ntrain_outputs = vae(train_data; latent=true)","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Let's now look ath the resulting coordinates in latent space.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"(Image: )","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Finally, one of the most attractive features of variational autoencoders is their generative capabilities. To assess this, we can sample from the latent space prior and decode the samples to generate new data. Let's generate some samples and plot them.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Define number of samples\nn_samples = 6\n\n# Sample from prior\nRandom.seed!(42)\nprior_samples = Random.randn(n_latent, n_samples)\n\n# Decode samples\ndecoder_output = vae.decoder(prior_samples).p","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"(Image: )","category":"page"},{"location":"quickstart/#InfoMaxVAE-Model","page":"Quick Start","title":"InfoMaxVAE Model","text":"","category":"section"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Let's now proceed to train an InfoMaxVAE model. This model is a variational autoencoder that includes a term in the loss function to maximize a variational approximation of the mutual information between the latent space and the input data. This variational approximation of the mutual information is parametrized by a neural network that is trained jointly with the encoder and decoder. Thus, the InfoMaxVAE object takes as input a VAE model as well as a MutualInfoChain object that defines the multi-layer perceptron used to compute the mutual information. Since we can use the exact same VAE model we defined earlier, all we need to do is define the MutualInfoChain object to build the InfoMaxVAE model.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"note: Note\nMake sure to check the documentation for the MutualInfoChain to know the requirements for this object. The main thing for us in this example is that since the data input is a 4D tensor, we need a custom layer to flatten the output of the encoder before passing it to the multi-layer perceptron. Furthermore, the output of the multi-layer perceptron must be a scalar.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Define MutualInfochain elements\n\ndata_layer = Flux.Chain(\n AET.Flatten(),\n Flux.Dense(28 * 28 => 28 * 28, Flux.identity),\n)\n\nlatent_layer = Flux.Dense(n_latent => n_latent, Flux.identity)\n\nmlp = Flux.Chain(\n Flux.Dense(28 * 28 + n_latent => 256, Flux.relu),\n Flux.Dense(256 => 256, Flux.relu),\n Flux.Dense(256 => 256, Flux.relu),\n Flux.Dense(256 => 1, Flux.identity),\n)\n\n# Define MutualInfochain\nmi = AET.InfoMaxVAEs.MutualInfoChain(data_layer, latent_layer, mlp)","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Next, we put together the VAE model and the MutualInfoChain to define the InfoMaxVAE model.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Define InfoMaxVAE model\ninfomaxvae = AET.InfoMaxVAEs.InfoMaxVAE(encoder * decoder, mi)","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"The InfoMaxVAE model has two loss functions: one for the mutual information and one for the VAE. But this is internally handled by the InfoMaxVAEs.train! function. So, training the model is as simple as training the VAE model.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"tip: Tip\nNotice that we can pass additional keyword arguments to the train! function as keyword arguments for either the miloss or the infomaxloss. In this case, we will pass the hyperparameters α and β to weigh the mutual information term significantly more than the KL divergence term.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Explicit setup of optimizer\nopt_infomaxvae = Flux.Train.setup(\n Flux.Optimisers.Adam(η),\n infomaxvae\n)\n\n# Define infomaxloss function kwargs\nloss_kwargs = Dict(:α => 10.0f0, :β => 1.0f0,)\n\n# Loop through epochs\nfor epoch in 1:n_epoch\n println(\"Epoch: $(epoch)\\n\")\n # Loop through batches\n for (i, x) in enumerate(train_loader)\n println(\"Epoch: $(epoch) | Batch: $(i) / $(length(train_loader))\")\n # Train RHVAE\n AET.InfoMaxVAEs.train!(\n infomaxvae, x, opt_infomaxvae; infomaxloss_kwargs=loss_kwargs\n )\n end # for train_loader\nend # for n_epoch","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Notice that we only needed to define the MutualInfoChain object and we were ready to train the InfoMaxVAE model. This is the power of the design of AutoEncoderToolkit.jl!","category":"page"},{"location":"quickstart/#Exploring-the-results-2","page":"Quick Start","title":"Exploring the results","text":"","category":"section"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Let's now look ath the resulting coordinates in latent space after 100 epochs of training.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"(Image: )","category":"page"},{"location":"quickstart/#RHVAE-Model","page":"Quick Start","title":"RHVAE Model","text":"","category":"section"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Let's now train a RHVAE model. The process is very similar to the VAE model with the main difference that the RHVAE type has some extra requirements. Let's quickly look at the docstring for this type. In particular, let's look at the docstring for the default constructor.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"RHVAE(\n vae::VAE, \n metric_chain::MetricChain, \n centroids_data::AbstractArray, \n T::Number, \n λ::Number\n )\n\n Construct a Riemannian Hamiltonian Variational Autoencoder (RHVAE) from a standard VAE and a metric chain.\n\n Arguments\n ≡≡≡≡≡≡≡≡≡\n\n • vae::VAE: A standard Variational Autoencoder (VAE) model.\n\n • metric_chain::MetricChain: A chain of metrics to be used for the Riemannian Hamiltonian Monte Carlo (RHMC) sampler.\n\n • centroids_data::AbstractArray: An array of data centroids. Each column represents a centroid. N is a subtype of Number.\n\n • T::N: The temperature parameter for the inverse metric tensor. N is a subtype of Number.\n\n • λ::N: The regularization parameter for the inverse metric tensor. N is a subtype of Number.\n\n Returns\n ≡≡≡≡≡≡≡\n\n • A new RHVAE object.\n\n Description\n ≡≡≡≡≡≡≡≡≡≡≡\n\n The constructor initializes the latent centroids and the metric tensor M to their default values. The latent centroids are initialized to a zero matrix of\n the same size as centroids_data, and M is initialized to a 3D array of identity matrices, one for each centroid.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"From this we can see that we need to provide a VAE model–we can use the same model we defined earlier–a MetricChain type, an array of centroids, and two hyperparameters T and λ. The MetricChain type is another multi-layer perceptron specifically used to compute a lower-triangular matrix used for the metric tensor for the Riemannian manifold fit to the latent space. More specifically, when training an RHVAE model, the inverse of the metric tensor is also learned. This inverse metric tensor mathbfG^-1(z) is of the form","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"mathbfG^-1(z)=sum_i=1^N L_psi_i L_psi_i^top exp left(-fracleftz-c_iright_2^2T^2right)+lambda I_d\ntag1","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"where L_psi_i equiv L_psi_i(x) is the lower-triangular matrix computed by the MetricChain type given the corresponding data input x associated with the latent coordinate z. c_i is one of the N centroids in latent space used as anchoring points for the metric tensor. The hyperparameters T and lambda are used to control the temperature of the inverse metric tensor and an additional regularization term, respectively.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Looking at the requirements for MetricChain we see three components:","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"An mlp field that is a multi-layer perceptron.\nA diag field that is a dense layers used to compute the diagonal of the lower triangular matrix returned by MetricChain.\na lower field that is a dense layer used to compute the elements below the diagonal of the lower triangular matrix.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Let's define these elements and build the MetricChain.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"warning: Warning\nFor MetricChain to build a proper lower triangular matrix, the diag layer must return the same dimensionality as the latent space. The lower layer must return the number of elements in the lower triangular matrix below the diagonal. This is given by n_latent * (n_latent - 1) ÷ 2.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Define convolutional layers\nmlp_conv_layers = Flux.Chain(\n # Flatten the input using custom Flatten layer\n AET.Flatten(),\n # First layer\n Flux.Dense(28 * 28 => 256, Flux.relu),\n # Second layer\n Flux.Dense(256 => 256, Flux.relu),\n # Third layer\n Flux.Dense(256 => 256, Flux.relu),\n)\n\n# Define layers for the diagonal and lower triangular part of the covariance\n# matrix\ndiag = Flux.Dense(256 => n_latent, Flux.identity)\nlower = Flux.Dense(256 => n_latent * (n_latent - 1) ÷ 2, Flux.identity)\n\n# Build metric chain\nmetric_chain = AET.RHVAEs.MetricChain(mlp_conv_layers, diag, lower)","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Next, we need to define the centroids. These are the c_i in equation (1) used as anchoring points for the metric tensor. Their latent space coordinates will be updated as the model trains, but the corresponding data points must be fixed. In a way, these centroids is a subset of the data used to define the RHVAE structure itself. One possibility is to use the entire training data as centroids. But this can get computationally very expensive. Instead, we can use either k-means or k-medoids to define a smaller set of centroids. For this, AutoEncoderToolkit.jl provides functions to select these centroids.. For this example, we will use k-medoids to define the centroids.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Define number of centroids\nn_centroids = 64 \n\n# Select centroids via k-medoids\ncentroids_data = AET.utils.centroids_kmedoids(train_data, n_centroids)","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Finally, we are just missing the hyperparameters T and λ, and we can then define the RHVAE model.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"note: Note\nHere we are using the same vae model we defined earlier assuming it hasn't been previously trained. If it has been trained, we could load it from disk.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Define RHVAE hyper-parameters\nT = 0.4f0 # Temperature\nλ = 1.0f-2 # Regularization parameter\n\n# Define RHVAE model\nrhvae = AET.RHVAEs.RHVAE(vae, metric_chain, centroids_data, T, λ)","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"The RHVAE struct stores three elements for which no gradients are computed. Specifically, the elements","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"• centroids_latent::Matrix: A matrix where each column represents a centroid cᵢ in the inverse metric computation.\n• L::Array{<:Number, 3}: A 3D array where each slice represents a L_ψᵢ matrix.\n• M::Array{<:Number, 3}: A 3D array where each slice represents a Lψᵢ Lψᵢᵀ.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"used to compute the inverse metric tensor are not updated with gradients. Instead, they are updated using the update_metric! function. So, before training the model, we can update these elements.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Update metric tensor elements\nAET.RHVAEs.update_metric!(rhvae)","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"warning: Warning\nEvery time you load an RHVAE model from disk, you need to update the metric as shown above such that all parameters in the model are properly initialized.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Now, we are ready to train the RHVAE model. Setting the training process is very similar to the VAE model. Make sure to look at the documentation for the RHVAE type to understand the additional keyword arguments that can be passed to the loss function.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Define loss function hyper-parameters\nϵ = Float32(1E-4) # Leapfrog step size\nK = 5 # Number of leapfrog steps\nβₒ = 0.3f0 # Initial temperature for tempering\n\n# Define loss function hyper-parameters\nloss_kwargs = Dict(\n :K => K,\n :ϵ => ϵ,\n :βₒ => βₒ,\n)\n\n# Explicit setup of optimizer\nopt_rhvae = Flux.Train.setup(\n Flux.Optimisers.Adam(η),\n rhvae\n)\n\n# Define number of epochs\nn_epoch = 20\n\n# Loop through epochs\nfor epoch in 1:n_epoch\n println(\"Epoch: $(epoch)\\n\")\n # Loop through batches\n for (i, x) in enumerate(train_loader)\n println(\"Epoch: $(epoch) | Batch: $(i) / $(length(train_loader))\")\n # Train VAE\n AET.RHVAEs.train!(rhvae, x, opt_rhvae; loss_kwargs=loss_kwargs)\n end # for train_loader\nend # for n_epoch","category":"page"},{"location":"quickstart/#Exploring-the-results-3","page":"Quick Start","title":"Exploring the results","text":"","category":"section"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"note: Note\nFor the example above, we only trained the RHVAE model for 20 epochs.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Let's look at the resulting latent space encoding the training data.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"(Image: )","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Even for 20 epochs the latent space is already showing a clear separation of the different classes. This is a clear indication that the RHVAE model is learning a good representation of the data.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"One of the most attractive features of the RHVAE model is the ability to learn a Riemannian metric on the latent space. This means that we have a position-dependent measurement of how deformed the latent space is. We can visualize a proxy for this metric by computing the so-called volume measure sqrtdet(mathbfG(z)) for each point in the latent space. Let's compute this for a grid of points in the latent space and plot it as a background for the latent space.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Define number of points per axis\nn_points = 250\n\n# Define range of latent space\nlatent_range_z1 = Float32.(range(-5, 4.5, length=n_points))\nlatent_range_z2 = Float32.(range(-3.5, 6.5, length=n_points))\n\n# Define latent points to evaluate\nz_mat = reduce(hcat, [[x, y] for x in latent_range_z1, y in latent_range_z2])\n\n# Compute inverse metric tensor\nGinv = AET.RHVAEs.G_inv(z_mat, rhvae)\n\n# Compute log determinant of metric tensor\nlogdetG = reshape(-1 / 2 * AET.utils.slogdet(Ginv), n_points, n_points)","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"(Image: )","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"In the next section we will explore how to use this geometric information to compute the geodesic distance between points in the latent space.","category":"page"},{"location":"quickstart/#Differential-Geometry-of-RHVAE-model","page":"Quick Start","title":"Differential Geometry of RHVAE model","text":"","category":"section"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"The RHVAE model is a powerful tool to learn a Riemannian metric on the latent space. Having this metric allows us to compute distances between points, and even to perform geodesic interpolation between points. What this means is that as the model trains, the notion of distance between points in the latent space might not be the same as the Euclidean distance. Instead, the model learns a function that tells us how to measure distances in the latent space. We can use this function to compute the shortest path between two points. This is what is called a geodesic.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"AutoEncoderToolkit.jl provides a set of functions to compute the geodesic between points in latent space. In particular, a geodesic is a function that connects two points in the latent space such that the distance between them is minimized. Since we do not know the exact form of the geodesic, we can again make use of the power of neural networks to approximate it. The NeuralGeodesics submodule from the diffgeo module provides this functionality. The first step consits of defining a neural network that will approximate the path between two points. The NeuralGeodesic type takes three arguments:","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"A multi-layer perceptron that will approximate the path. This should have a single input–the time being a number between zero and 1–and the dimensionality of the output should be the same as the dimensionality of the latent space.\nThe initial point in the latent space for the path.\nThe final point in the latent space for the path.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Let's define this NeuralGeodesic network.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Import NeuralGeoedesics submodule\nimport AutoEncoderToolkit.diffgeo.NeuralGeodesics as NG\n\n# Define initial and final point for geometric path\nz_init = [-3.0f0, 5.0f0]\nz_end = [2.0f0, -2.0f0]\n\n# Extract dimensionality of latent space\nldim = size(rhvae.centroids_latent, 1)\n# Define number of neurons in hidden layers\nn_neuron = 16\n\n# Define mlp chain\nmlp_chain = Flux.Chain(\n # First layer\n Flux.Dense(1 => n_neuron, Flux.identity),\n # Second layer\n Flux.Dense(n_neuron => n_neuron, Flux.tanh_fast),\n # Third layer\n Flux.Dense(n_neuron => n_neuron, Flux.tanh_fast),\n # Fourth layer\n Flux.Dense(n_neuron => n_neuron, Flux.tanh_fast),\n # Output layer\n Flux.Dense(n_neuron => ldim, Flux.identity)\n)\n\n# Define NeuralGeodesic\nnng = NG.NeuralGeodesic(mlp_chain, z_init, z_end)","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"tip: Tip\nEmpirically, we have found that the activation functions in the hidden layers should not be unbounded. Thus, we recommend using tanh or sigmoid.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Next, we define the hyperparameters for the optimization of the neural network. In particular, we will sample 50 time points uniformly distributed between 0 and 1 to sample the path. We will train the network for 50,000 epochs using the Adam optimizer with a learning rate of 1e-5.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Define learning rate\nη = 10^-5\n# Define number of time points to sample\nn_time = 50\n# Define number of epochs\nn_epoch = 50_000\n# Define frequency with which to save model output\nn_save = 10_000\n\n# Define time points\nt_array = Float32.(collect(range(0, 1, length=n_time)))\n\n# Explicit setup of optimizer\nopt_nng = Flux.Train.setup(\n Flux.Optimisers.Adam(η),\n nng\n)","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"With this in hand, we are ready to train the network. We will save several outputs of the network to visualize the path as it is being trained.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Initialize empty array to save loss\nnng_loss = Vector{Float32}(undef, n_epoch)\n\n# Initialize array to save examples\nnng_ex = Array{Float32}(undef, ldim, length(t_array), n_epoch ÷ n_save + 1)\n\n# Save initial curve\nnng_ex[:, :, 1] = nng(t_array)\n# Loop through epochs\nfor epoch in 1:n_epoch\n # Train model and save loss\n nng_loss[epoch] = NG.train!(nng, rhvae, t_array, opt_nng; loss_return=true)\n # Check if model should be saved\n if epoch % n_save == 0\n # Save model output\n nng_ex[:, :, (epoch÷n_save)+1] = nng(t_array)\n end # if\nend # for","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Now that we have trained the network, we can visualize the path between the initial and final points in the latent space. The color code in the following plot matches the epoch at which the path was computed.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"(Image: )","category":"page"},{"location":"vae/#VAEsmodule","page":"VAE / β-VAE","title":"β-Variational Autoencoder","text":"","category":"section"},{"location":"vae/","page":"VAE / β-VAE","title":"VAE / β-VAE","text":"Variational Autoencoders, first introduced by Kingma and Welling in 2014, are a type of generative model that learns to encode high-dimensional data into a low-dimensional latent space. The main idea behind VAEs is to learn a probabilistic mapping (via variational inference) from the input data to the latent space, which allows for the generation of new data points by sampling from the latent space.","category":"page"},{"location":"vae/","page":"VAE / β-VAE","title":"VAE / β-VAE","text":"Their counterpart, the β-VAE, introduced by Higgins et al. in 2017, is a variant of the original VAE that includes a hyperparameter β that controls the relative importance of the reconstruction loss and the KL divergence term in the loss function. By adjusting β, the user can control the trade-off between the reconstruction quality and the disentanglement of the latent space.","category":"page"},{"location":"vae/","page":"VAE / β-VAE","title":"VAE / β-VAE","text":"In terms of implementation, the VAE struct in AutoEncoderToolkit.jl is a simple feedforward network composed of variational encoder and decoder parts. This means that the encoder has a log-posterior function and a KL divergence function associated with it, while the decoder has a log-likehood function associated with it.","category":"page"},{"location":"vae/#References","page":"VAE / β-VAE","title":"References","text":"","category":"section"},{"location":"vae/#VAE","page":"VAE / β-VAE","title":"VAE","text":"","category":"section"},{"location":"vae/","page":"VAE / β-VAE","title":"VAE / β-VAE","text":"Kingma, D. P. & Welling, M. Auto-Encoding Variational Bayes. Preprint at http://arxiv.org/abs/1312.6114 (2014).","category":"page"},{"location":"vae/#β-VAE","page":"VAE / β-VAE","title":"β-VAE","text":"","category":"section"},{"location":"vae/","page":"VAE / β-VAE","title":"VAE / β-VAE","text":"Higgins, I. et al. β-VAE: LEARNING BASIC VISUAL CONCEPTS WITH A CONSTRAINED VARIATIONAL FRAMEWORK. (2017).","category":"page"},{"location":"vae/#VAEstruct","page":"VAE / β-VAE","title":"VAE struct","text":"","category":"section"},{"location":"vae/","page":"VAE / β-VAE","title":"VAE / β-VAE","text":"AutoEncoderToolkit.VAEs.VAE","category":"page"},{"location":"vae/#AutoEncoderToolkit.VAEs.VAE","page":"VAE / β-VAE","title":"AutoEncoderToolkit.VAEs.VAE","text":"struct VAE{E<:AbstractVariationalEncoder, D<:AbstractVariationalDecoder}\n\nVariational autoencoder (VAE) model defined for Flux.jl\n\nFields\n\nencoder::E: Neural network that encodes the input into the latent space. E is a subtype of AbstractVariationalEncoder.\ndecoder::D: Neural network that decodes the latent representation back to the original input space. D is a subtype of AbstractVariationalDecoder.\n\nA VAE consists of an encoder and decoder network with a bottleneck latent space in between. The encoder compresses the input into a low-dimensional probabilistic representation q(z|x). The decoder tries to reconstruct the original input from a sampled point in the latent space p(x|z). \n\n\n\n\n\n","category":"type"},{"location":"vae/#Forward-pass","page":"VAE / β-VAE","title":"Forward pass","text":"","category":"section"},{"location":"vae/","page":"VAE / β-VAE","title":"VAE / β-VAE","text":"AutoEncoderToolkit.VAEs.VAE(::AbstractArray)","category":"page"},{"location":"vae/#AutoEncoderToolkit.VAEs.VAE-Tuple{AbstractArray}","page":"VAE / β-VAE","title":"AutoEncoderToolkit.VAEs.VAE","text":" (vae::VAE)(x::AbstractArray; latent::Bool=false)\n\nPerform the forward pass of a Variational Autoencoder (VAE).\n\nThis function takes as input a VAE and a vector or matrix of input data x. It first runs the input through the encoder to obtain the mean and log standard deviation of the latent variables. It then uses the reparameterization trick to sample from the latent distribution. Finally, it runs the latent sample through the decoder to obtain the output.\n\nArguments\n\nvae::VAE: The VAE used to encode the input data and decode the latent space.\nx::AbstractArray: The input data. If array, the last dimension contains each of the samples in a batch.\n\nOptional Keyword Arguments\n\nlatent::Bool: Whether to return the latent variables along with the decoder output. If true, the function returns a tuple containing the encoder outputs, the latent sample, and the decoder outputs. If false, the function only returns the decoder outputs. Defaults to false. \n\nReturns\n\nIf latent is true, returns a tuple containing:\nencoder: The outputs of the encoder.\nz: The latent sample.\ndecoder: The outputs of the decoder.\nIf latent is false, returns the outputs of the decoder.\n\nExample\n\n# Define a VAE\nvae = VAE(\n encoder=Flux.Chain(Flux.Dense(784, 400, relu), Flux.Dense(400, 20)),\n decoder=Flux.Chain(Flux.Dense(20, 400, relu), Flux.Dense(400, 784))\n)\n\n# Define input data\nx = rand(Float32, 784)\n\n# Perform the forward pass\noutputs = vae(x, latent=true)\n\n\n\n\n\n","category":"method"},{"location":"vae/#Loss-function","page":"VAE / β-VAE","title":"Loss function","text":"","category":"section"},{"location":"vae/","page":"VAE / β-VAE","title":"VAE / β-VAE","text":"AutoEncoderToolkit.VAEs.loss","category":"page"},{"location":"vae/#AutoEncoderToolkit.VAEs.loss","page":"VAE / β-VAE","title":"AutoEncoderToolkit.VAEs.loss","text":"loss(\n vae::VAE,\n x::AbstractArray;\n β::Number=1.0f0,\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n kl_divergence::Function=encoder_kl,\n reg_function::Union{Function,Nothing}=nothing,\n reg_kwargs::Union{NamedTuple,Dict}=Dict(),\n reg_strength::Number=1.0f0\n)\n\nComputes the loss for the variational autoencoder (VAE).\n\nThe loss function combines the reconstruction loss with the Kullback-Leibler (KL) divergence, and possibly a regularization term, defined as:\n\nloss = -⟨logπ(x|z)⟩ + β × Dₖₗ[qᵩ(z|x) || π(z)] + regstrength × regterm\n\nWhere:\n\nπ(x|z) is a probabilistic decoder: π(x|z) = N(f(z), σ² I̲̲)) - f(z) is the function defining the mean of the decoder π(x|z) - qᵩ(z|x) is the approximated encoder: qᵩ(z|x) = N(g(x), h(x))\ng(x) and h(x) define the mean and covariance of the encoder respectively.\n\nArguments\n\nvae::VAE: A VAE model with encoder and decoder networks.\nx::AbstractArray: Input data. The last dimension is taken as having each of the samples in a batch.\n\nOptional Keyword Arguments\n\nβ::Number=1.0f0: Weighting factor for the KL-divergence term, used for annealing.\nreconstruction_loglikelihood::Function=decoder_loglikelihood: A function that computes the reconstruction log likelihood.\nkl_divergence::Function=encoder_kl: A function that computes the Kullback-Leibler divergence between the encoder output and a standard normal.\nreg_function::Union{Function, Nothing}=nothing: A function that computes the regularization term based on the VAE outputs. Should return a Float32. This function must take as input the VAE outputs and the keyword arguments provided in reg_kwargs.\nreg_kwargs::Union{NamedTuple,Dict}=Dict(): Keyword arguments to pass to the regularization function.\nreg_strength::Number=1.0f0: The strength of the regularization term.\n\nReturns\n\nT: The computed average loss value for the input x and its reconstructed counterparts, including possible regularization terms.\n\nNote\n\nEnsure that the input data x matches the expected input dimensionality for the encoder in the VAE.\n\n\n\n\n\nloss(\n vae::VAE,\n x_in::AbstractArray,\n x_out::AbstractArray;\n β::Number=1.0f0,\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n kl_divergence::Function=encoder_kl,\n reg_function::Union{Function,Nothing}=nothing,\n reg_kwargs::Union{NamedTuple,Dict}=Dict(),\n reg_strength::Number=1.0f0\n)\n\nComputes the loss for the variational autoencoder (VAE).\n\nThe loss function combines the reconstruction loss with the Kullback-Leibler (KL) divergence and possibly a regularization term, defined as:\n\nloss = -⟨logπ(xout|z)⟩ + β × Dₖₗ[qᵩ(z|xin) || π(z)] + regstrength × regterm\n\nWhere:\n\nπ(xout|z) is a probabilistic decoder: π(xout|z) = N(f(z), σ² I̲̲)) - f(z) is\n\nthe function defining the mean of the decoder π(xout|z) - qᵩ(z|xin) is the approximated encoder: qᵩ(z|xin) = N(g(xin), h(x_in))\n\ng(xin) and h(xin) define the mean and covariance of the encoder respectively.\n\nArguments\n\nvae::VAE: A VAE model with encoder and decoder networks.\nx_in::AbstractArray: Input data to the VAE encoder. The last dimension is taken as having each of the samples in a batch.\nx_out::AbstractArray: Target data to compute the reconstruction error. The last dimension is taken as having each of the samples in a batch.\n\nOptional Keyword Arguments\n\nβ::Number=1.0f0: Weighting factor for the KL-divergence term, used for annealing.\nreconstruction_loglikelihood::Function=decoder_loglikelihood: A function that computes the reconstruction log likelihood.\nkl_divergence::Function=encoder_kl: A function that computes the Kullback-Leibler divergence.\nreg_function::Union{Function, Nothing}=nothing: A function that computes the regularization term based on the VAE outputs. Should return a Float32. This function must take as input the VAE outputs and the keyword arguments provided in reg_kwargs.\nreg_kwargs::Union{NamedTuple,Dict}=Dict(): Keyword arguments to pass to the regularization function.\nreg_strength::Number=1.0f0: The strength of the regularization term.\n\nReturns\n\nT: The computed average loss value for the input x_in and its reconstructed counterparts x_out, including possible regularization terms.\n\nNote\n\nEnsure that the input data x_in and x_out match the expected input dimensionality for the encoder in the VAE.\n\n\n\n\n\n","category":"function"},{"location":"vae/","page":"VAE / β-VAE","title":"VAE / β-VAE","text":"note: Note\nThe loss function includes the β optional argument that can turn a vanilla VAE into a β-VAE by changing the default value of β from 1.0 to any other value.","category":"page"},{"location":"vae/#Training","page":"VAE / β-VAE","title":"Training","text":"","category":"section"},{"location":"vae/","page":"VAE / β-VAE","title":"VAE / β-VAE","text":"AutoEncoderToolkit.VAEs.train!","category":"page"},{"location":"vae/#AutoEncoderToolkit.VAEs.train!","page":"VAE / β-VAE","title":"AutoEncoderToolkit.VAEs.train!","text":"train!(vae, x, opt; loss_function, loss_kwargs, verbose, loss_return)\n\nCustomized training function to update parameters of a variational autoencoder given a specified loss function.\n\nArguments\n\nvae::VAE: A struct containing the elements of a variational autoencoder.\nx::AbstractArray: Data on which to evaluate the loss function. The last dimension is taken as having each of the samples in a batch.\nopt::NamedTuple: State of the optimizer for updating parameters. Typically initialized using Flux.Train.setup.\n\nOptional Keyword Arguments\n\nloss_function::Function=loss: The loss function used for training. It should accept the VAE model, data x, and keyword arguments in that order.\nloss_kwargs::Union{NamedTuple,Dict} = Dict(): Arguments for the loss function. These might include parameters like σ, or β, depending on the specific loss function in use.\nverbose::Bool=false: If true, the loss value will be printed during training.\nloss_return::Bool=false: If true, the loss value will be returned after training.\n\nDescription\n\nTrains the VAE by:\n\nComputing the gradient of the loss w.r.t the VAE parameters.\nUpdating the VAE parameters using the optimizer.\n\nExamples\n\nopt = Flux.setup(Optax.adam(1e-3), vae)\nfor x in dataloader\n train!(vae, x, opt; loss_fn, loss_kwargs=Dict(:β => 1.0f0,), verbose=true)\nend\n\n\n\n\n\n `train!(\n vae, x_in, x_out, opt; \n loss_function, loss_kwargs, verbose, loss_return\n )`\n\nCustomized training function to update parameters of a variational autoencoder given a loss function.\n\nArguments\n\nvae::VAE: A struct containing the elements of a variational autoencoder.\nx_in::AbstractArray: Input data for the loss function. Represents an individual sample. The last dimension is taken as having each of the samples in a batch.\nx_out::AbstractArray: Target output data for the loss function. Represents the corresponding output for the x_in sample. The last dimension is taken as having each of the samples in a batch.\nopt::NamedTuple: State of the optimizer for updating parameters. Typically initialized using Flux.Optimisers.update!.\n\nOptional Keyword Arguments\n\nloss_function::Function=loss: The loss function used for training. It should accept the VAE model, data x_in, x_out, and keyword arguments in that order. \nloss_kwargs::Union{NamedTuple,Dict} = Dict(): Arguments for the loss function. These might include parameters like σ, or β, depending on the specific loss function in use.\nverbose::Bool=false: Whether to print the loss value after each training step.\nloss_return::Bool=false: Whether to return the loss value after each training step.\n\nDescription\n\nTrains the VAE by:\n\nComputing the gradient of the loss w.r.t the VAE parameters.\nUpdating the VAE parameters using the optimizer.\n\nExamples\n\nopt = Flux.setup(Optax.adam(1e-3), vae)\nfor (x_in, x_out) in dataloader\n train!(vae, x_in, x_out, opt) \nend\n\n\n\n\n\n","category":"function"},{"location":"layers/#Custom-Layers","page":"Custom Layers","title":"Custom Layers","text":"","category":"section"},{"location":"layers/","page":"Custom Layers","title":"Custom Layers","text":"AutoEncoderToolkit.jl provides a set of commonly-used custom layers for building autoencoders. These layers need to be explicitly defined if you want to save a train model and load it later. For example, if the input to the encoder is an image in format HWC (height, width, channel), somewhere in the encoder there must be a function that flattens its input to a vector for the mapping to the latent space to be possible. If you were to define this with a simple function, the libraries to save the the model such as JLD2 or BSON would not work with these anonymous function. This is why we provide this set of custom layers that play along these libraries.","category":"page"},{"location":"layers/#reshape","page":"Custom Layers","title":"Reshape","text":"","category":"section"},{"location":"layers/","page":"Custom Layers","title":"Custom Layers","text":"AutoEncoderToolkit.Reshape\nAutoEncoderToolkit.Reshape(::AbstractArray)","category":"page"},{"location":"layers/#AutoEncoderToolkit.Reshape","page":"Custom Layers","title":"AutoEncoderToolkit.Reshape","text":"Reshape(shape)\n\nA custom layer for Flux that reshapes its input to a specified shape.\n\nThis layer is useful when you need to change the dimensions of your data within a Flux model. Unlike the built-in reshape operation in Julia, this custom layer can be saved and loaded using packages such as BSON or JLD2.\n\nArguments\n\nshape: The target shape. This can be any tuple of integers and colons. Colons are used to indicate dimensions whose size should be inferred such that the total number of elements remains the same.\n\nExamples\n\njulia> r = Reshape(10, :)\nReshape((10, :))\n\njulia> r(rand(5, 2))\n10×1 Matrix{Float64}:\n\nNote\n\nWhen saving and loading the model, make sure to include Reshape in the list of layers to be processed by BSON or JLD2.\n\n\n\n\n\n","category":"type"},{"location":"layers/#AutoEncoderToolkit.Reshape-Tuple{AbstractArray}","page":"Custom Layers","title":"AutoEncoderToolkit.Reshape","text":"Reshape(args...)\n\nConstructor for the Reshape struct that takes variable arguments.\n\nThis function allows us to create a Reshape instance with any shape.\n\nArguments\n\nargs...: Variable arguments representing the dimensions of the target shape.\n\nReturns\n\nA Reshape instance with the target shape set to the provided dimensions.\n\nExamples\n\njulia> r = Reshape(10, :)\nReshape((10, :))\n\n\n\n\n\n(r::Reshape)(x)\n\nThis function is called during the forward pass of the model. It reshapes the input x to the target shape stored in the Reshape instance r.\n\nArguments\n\nr::Reshape: An instance of the Reshape struct.\nx: The input to be reshaped.\n\nReturns\n\nThe reshaped input.\n\nExamples\n\njulia> r = Reshape(10, :)\nReshape((10, :))\n\njulia> r(rand(5, 2))\n10×1 Matrix{Float64}:\n ...\n\n\n\n\n\n","category":"method"},{"location":"layers/#flatten","page":"Custom Layers","title":"Flatten","text":"","category":"section"},{"location":"layers/","page":"Custom Layers","title":"Custom Layers","text":"AutoEncoderToolkit.Flatten\nAutoEncoderToolkit.Flatten(::AbstractArray)","category":"page"},{"location":"layers/#AutoEncoderToolkit.Flatten","page":"Custom Layers","title":"AutoEncoderToolkit.Flatten","text":"Flatten()\n\nA custom layer for Flux that flattens its input into a 1D vector.\n\nThis layer is useful when you need to change the dimensions of your data within a Flux model. Unlike the built-in flatten operation in Julia, this custom layer can be saved and loaded by packages such as BSON and JLD2.\n\nExamples\n\njulia> f = Flatten()\n\njulia> f(rand(5, 2))\n10-element Vector{Float64}:\n\nNote\n\nWhen saving and loading the model, make sure to include Flatten in the list of layers to be processed by BSON or JLD2.\n\n\n\n\n\n","category":"type"},{"location":"layers/#AutoEncoderToolkit.Flatten-Tuple{AbstractArray}","page":"Custom Layers","title":"AutoEncoderToolkit.Flatten","text":"(f::Flatten)(x)\n\nThis function is called during the forward pass of the model. It flattens the input x into a 1D vector.\n\nArguments\n\nf::Flatten: An instance of the Flatten struct.\nx: The input to be flattened.\n\nReturns\n\nThe flattened input.\n\n\n\n\n\n","category":"method"},{"location":"layers/#ActivationOverDims","page":"Custom Layers","title":"ActivationOverDims","text":"","category":"section"},{"location":"layers/","page":"Custom Layers","title":"Custom Layers","text":"AutoEncoderToolkit.ActivationOverDims\nAutoEncoderToolkit.ActivationOverDims(::AbstractArray)","category":"page"},{"location":"layers/#AutoEncoderToolkit.ActivationOverDims","page":"Custom Layers","title":"AutoEncoderToolkit.ActivationOverDims","text":"ActivationOverDims(σ::Function, dims::Int)\n\nA custom layer for Flux that applies an activation function over specified dimensions.\n\nThis layer is useful when you need to apply an activation function over specific dimensions of your data within a Flux model. Unlike the built-in activation functions in Julia, this custom layer can be saved and loaded using the BSON or JLD2 package.\n\nArguments\n\nσ::Function: The activation function to be applied.\ndims: The dimensions over which the activation function should be applied.\n\nNote\n\nWhen saving and loading the model, make sure to include ActivationOverDims in the list of layers to be processed by BSON or JLD2.\n\n\n\n\n\n","category":"type"},{"location":"layers/#AutoEncoderToolkit.ActivationOverDims-Tuple{AbstractArray}","page":"Custom Layers","title":"AutoEncoderToolkit.ActivationOverDims","text":"(σ::ActivationOverDims)(x)\n\nThis function is called during the forward pass of the model. It applies the activation function σ.σ over the dimensions σ.dims of the input x.\n\nArguments\n\nσ::ActivationOverDims: An instance of the ActivationOverDims struct.\nx: The input to which the activation function should be applied.\n\nReturns\n\nThe input x with the activation function applied over the specified dimensions.\n\nNote\n\nThis custom layer can be saved and loaded using the BSON package. When saving and loading the model, make sure to include ActivationOverDims in the list of layers to be processed by BSON or JLD2.\n\n\n\n\n\n","category":"method"},{"location":"#AutoEncoderToolkit.jl","page":"Home","title":"AutoEncoderToolkit.jl","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Welcome to the AutoEncoderToolkit.jl documentation. This package provides a simple interface for training and using Flux.jl-based autoencoders and variational autoencoders in Julia.","category":"page"},{"location":"#Installation","page":"Home","title":"Installation","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"You can install AutoEncoderToolkit.jl using the Julia package manager. From the Julia REPL, type ] to enter the Pkg REPL mode and run:","category":"page"},{"location":"","page":"Home","title":"Home","text":"add AutoEncoderToolkit","category":"page"},{"location":"#Design","page":"Home","title":"Design","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"The idea behind AutoEncoderToolkit.jl is to take advantage of Julia's multiple dispatch to provide a simple and flexible interface for training and using different types of autoencoders. The package is designed to be modular and allow the user to easily define and test custom encoder and decoder architectures. Moreover, when it comes to variational autoencoders, AutoEncoderToolkit.jl takes a probabilistic perspective, where the type of encoders and decoders defines (via multiple dispatch) the corresponding distribution used within the corresponding loss function.","category":"page"},{"location":"","page":"Home","title":"Home","text":"For example, assume you want to train a variational autoencoder with convolutional layers in the encoder and deconvolutional layers in the decoder on the MNIST dataset. You can easily do this as follows:","category":"page"},{"location":"","page":"Home","title":"Home","text":"Let's begin by defining the encoder. For this, we will use the JointGaussianLogEncoder type, which is a simple encoder that takes a Flux.Chain for the shared layers between the mean and log-variance layers and two Flux.Dense (or Flux.Chain) layers for the last layers of the encoder.","category":"page"},{"location":"","page":"Home","title":"Home","text":"# Define dimensionality of latent space\nn_latent = 2\n\n# Define number of initial channels\nn_channels_init = 128\n\n# Define convolutional layers\nconv_layers = Flux.Chain(\n # First convolutional layer\n Flux.Conv((3, 3), 1 => n_channels_init, Flux.relu; stride=2, pad=1),\n # Second convolutional layer\n Flux.Conv(\n (3, 3), n_channels_init => n_channels_init * 2, Flux.relu;\n stride=2, pad=1\n ),\n # Flatten the output\n AutoEncoderToolkit.Flatten()\n)\n\n# Define layers for µ and log(σ)\nµ_layer = Flux.Dense(n_channels_init * 2 * 7 * 7, n_latent, Flux.identity)\nlogσ_layer = Flux.Dense(n_channels_init * 2 * 7 * 7, n_latent, Flux.identity)\n\n# build encoder\nencoder = AutoEncoderToolkit.JointGaussianLogEncoder(conv_layers, µ_layer, logσ_layer)","category":"page"},{"location":"","page":"Home","title":"Home","text":"note: Note\nThe Flatten layer is a custom layer defined in AutoEncoderToolkit.jl that flattens the output into a 1D vector. This flattening operation is necessary because the output of the convolutional layers is a 4D tensor, while the input to the µ and log(σ) layers is a 1D vector. The custom layer is needed to be able to save the model and load it later as BSON and JLD2 do not play well with anonymous functions.","category":"page"},{"location":"","page":"Home","title":"Home","text":"For the decoder, given the binary nature of the MNIST dataset, we expect the output to be a Bernoulli distribution. We can define the decoder as follows:","category":"page"},{"location":"","page":"Home","title":"Home","text":"# Define deconvolutional layers\ndeconv_layers = Flux.Chain(\n # Define linear layer out of latent space\n Flux.Dense(n_latent => n_channels_init * 2 * 7 * 7, Flux.identity),\n # Unflatten input using custom Reshape layer\n AutoEncoderToolkit.Reshape(7, 7, n_channels_init * 2, :),\n # First transposed convolutional layer\n Flux.ConvTranspose(\n (4, 4), n_channels_init * 2 => n_channels_init, Flux.relu;\n stride=2, pad=1\n ),\n # Second transposed convolutional layer\n Flux.ConvTranspose(\n (4, 4), n_channels_init => 1, Flux.relu;\n stride=2, pad=1\n ),\n # Add normalization layer\n Flux.BatchNorm(1, Flux.sigmoid),\n)\n\n# Define decoder\ndecoder = AutoEncoderToolkit.BernoulliDecoder(deconv_layers)","category":"page"},{"location":"","page":"Home","title":"Home","text":"note: Note\nAgain, the custom Reshape layer is used to reshape the output of the linear layer to the shape expected by the transposed convolutional layers. This custom layer is needed to be able to save the model and load it later.","category":"page"},{"location":"","page":"Home","title":"Home","text":"By defining the decoder as a BernoulliDecoder, AutoEncoderToolkit.jl already knows the log-likehood function to use when training the model. We can then simply define our variational autoencoder by combining the encoder and decoder as","category":"page"},{"location":"","page":"Home","title":"Home","text":"# Define variational autoencoder\nvae = encoder * decoder","category":"page"},{"location":"","page":"Home","title":"Home","text":"If for any reason we were curious to explore a different distribution for the decoder, for example, a Normal distribution with constant variance, it would be as simple as defining the decoder as a SimpleGaussianDecoder.","category":"page"},{"location":"","page":"Home","title":"Home","text":"# Define decoder with Normal likelihood function\ndecoder = AutoEncoderToolkit.SimpleGaussianDecoder(deconv_layers)\n\n# Re-defining the variational autoencoder\nvae = encoder * decoder","category":"page"},{"location":"","page":"Home","title":"Home","text":"Everything else in our training pipeline would remain the same thanks to multiple dispatch.","category":"page"},{"location":"","page":"Home","title":"Home","text":"Furthermore, let's say that we would like to use a different flavor for our variational autoencoder. In particular the InfoVAE (also known as MMD-VAE) includes extra terms in the loss function to maximize mutual information between the latent space and the input data. We can easily take our vae model and convert it into a MMDVAE-type object from the MMDVAEs submodule as follows:","category":"page"},{"location":"","page":"Home","title":"Home","text":"mmdvae = AutoEncoderToolkit.MMDVAEs.MMDVAE(vae)","category":"page"},{"location":"","page":"Home","title":"Home","text":"This is the power of AutoEncoderToolkit.jl and Julia's multiple dispatch!","category":"page"},{"location":"#Implemented-Autoencoders","page":"Home","title":"Implemented Autoencoders","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"model module description\nAutoencoder AEs Vanilla deterministic autoencoder\nVariational Autoencoder VAEs Vanilla variational autoencoder\nβ-VAE VAEs beta-VAE to weigh the reconstruction vs. KL divergence in ELBO\nMMD-VAEs MMDs Maximum-Mean Discrepancy Variational Autoencoders\nInfoMax-VAEs InfoMaxVAEs Information Maximization Variational Autoencoders\nHamiltonian VAE HVAEs Hamiltonian Variational Autoencoders\nRiemannian Hamiltonian-VAE RHVAEs Riemannian-Hamiltonian Variational Autoencoder","category":"page"},{"location":"","page":"Home","title":"Home","text":"tip: Looking for contributors!\nIf you are interested in contributing to the package to add a new model, please check the GitHub repository. We are always looking to expand the list of available models. And AutoEncoderToolkit.jl's structure should make it relatively easy.","category":"page"},{"location":"#GPU-support","page":"Home","title":"GPU support","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"AutoEncoderToolkit.jl supports GPU training out of the box for CUDA.jl-compatible GPUs. The CUDA functionality is provided as an extension. Therefore, to train a model on the GPU, simply import CUDA into the current environment, then move the model and data to the GPU. The rest of the training pipeline remains the same.","category":"page"}]
+[{"location":"guidelines/#Community-Guidelines","page":"Community Guidelines","title":"Community Guidelines","text":"","category":"section"},{"location":"guidelines/#Contributing-to-the-Software","page":"Community Guidelines","title":"Contributing to the Software","text":"","category":"section"},{"location":"guidelines/","page":"Community Guidelines","title":"Community Guidelines","text":"For those interested in contributing to AutoEncoderToolkit.jl, please refer to the GitHub repository. The project welcomes contributions to ","category":"page"},{"location":"guidelines/","page":"Community Guidelines","title":"Community Guidelines","text":"Expand the list of available models.\nImprove the performance of existing models.\nAdd new features to the toolkit.\nImprove the documentation.","category":"page"},{"location":"guidelines/#Reporting-Issues-or-Problems","page":"Community Guidelines","title":"Reporting Issues or Problems","text":"","category":"section"},{"location":"guidelines/","page":"Community Guidelines","title":"Community Guidelines","text":"If you encounter any issues or problems with the software, you can report them directly on the GitHub repository's issues page.","category":"page"},{"location":"guidelines/#Seeking-Support","page":"Community Guidelines","title":"Seeking Support","text":"","category":"section"},{"location":"guidelines/","page":"Community Guidelines","title":"Community Guidelines","text":"For support and further inquiries, consider checking the documentation and existing issues on the GitHub repository. If you still do not find the answer, you can open a new issue on the GitHub repository's issues page.","category":"page"},{"location":"mmdvae/#MMDVAEsmodule","page":"MMD-VAE (InfoVAE)","title":"MMD-VAE (InfoVAE)","text":"","category":"section"},{"location":"mmdvae/","page":"MMD-VAE (InfoVAE)","title":"MMD-VAE (InfoVAE)","text":"The Maximum-Mean Discrepancy Variational Autoencoder (MMD-VAE) is a variant of the Variational Autoencoder (VAE) that adds an extra term to the evidence lower bound (ELBO) that aims to maximize the mutual information between the latent space representation and the input data. In particular, the MMD-VAE uses the Maximum-Mean Discrepancy (MMD) as a measure of the \"distance\" between the latent space distribution and the input data distribution.","category":"page"},{"location":"mmdvae/","page":"MMD-VAE (InfoVAE)","title":"MMD-VAE (InfoVAE)","text":"For the implementation of the MMD-VAE in AutoEncoderToolkit.jl, the MMDVAE struct inherits directly from the VAE struct and adds the necessary functions to compute the extra terms in the loss function. An MMDVAE object is created by simply passing a VAE object to the constructor. This way, we can use Julias multiple dispatch to extend the functionality of the VAE object without having to redefine the entire structure.","category":"page"},{"location":"mmdvae/#Reference","page":"MMD-VAE (InfoVAE)","title":"Reference","text":"","category":"section"},{"location":"mmdvae/","page":"MMD-VAE (InfoVAE)","title":"MMD-VAE (InfoVAE)","text":"Maximum-Mean Discrepancy Variational Autoencoders Zhao, S., Song, J. & Ermon, S. InfoVAE: Information Maximizing Variational Autoencoders. Preprint at http://arxiv.org/abs/1706.02262 (2018).","category":"page"},{"location":"mmdvae/#MMDVAEstruct","page":"MMD-VAE (InfoVAE)","title":"MMDVAE struct","text":"","category":"section"},{"location":"mmdvae/","page":"MMD-VAE (InfoVAE)","title":"MMD-VAE (InfoVAE)","text":"AutoEncoderToolkit.MMDVAEs.MMDVAE{AutoEncoderToolkit.VAEs.VAE}","category":"page"},{"location":"mmdvae/#AutoEncoderToolkit.MMDVAEs.MMDVAE","page":"MMD-VAE (InfoVAE)","title":"AutoEncoderToolkit.MMDVAEs.MMDVAE","text":"`MMDVAE{\n V<:VAE{<:AbstractVariationalEncoder,<:AbstractVariationalDecoder}\n } <: AbstractVariationalAutoEncoder`\n\nA struct representing a Maximum-Mean Discrepancy Variational Autoencoder (MMD-VAE).\n\nFields\n\nvae::V: A Variational Autoencoder (VAE) that forms the basis of the MMD-VAE. The VAE should be composed of an AbstractVariationalEncoder and an AbstractVariationalDecoder.\n\nDescription\n\nThe MMDVAE struct is a subtype of AbstractVariationalAutoEncoder and represents a specific type of VAE known as an MMD-VAE. The MMD-VAE modifies the standard VAE by replacing the KL-divergence term in the loss function with a Maximum-Mean Discrepancy (MMD) term, which measures the distance between the aggregated posterior of the latent codes and the prior. This can help to alleviate the issue of posterior collapse, where the aggregated posterior fails to cover significant parts of the prior, commonly seen in VAEs.\n\nCitation\n\nMaximum-Mean Discrepancy Variational Autoencoders. Zhao, S., Song, J. & Ermon, S. InfoVAE: Information Maximizing Variational Autoencoders. Preprint at http://arxiv.org/abs/1706.02262 (2018).\n\n\n\n\n\n","category":"type"},{"location":"mmdvae/#Forward-pass","page":"MMD-VAE (InfoVAE)","title":"Forward pass","text":"","category":"section"},{"location":"mmdvae/","page":"MMD-VAE (InfoVAE)","title":"MMD-VAE (InfoVAE)","text":"AutoEncoderToolkit.MMDVAEs.MMDVAE(::AbstractArray)","category":"page"},{"location":"mmdvae/#AutoEncoderToolkit.MMDVAEs.MMDVAE-Tuple{AbstractArray}","page":"MMD-VAE (InfoVAE)","title":"AutoEncoderToolkit.MMDVAEs.MMDVAE","text":"(mmdvae::MMDVAE)(x::AbstractArray; latent::Bool=false)\n\nDefines the forward pass for the Maximum-Mean Discrepancy Variational Autoencoder (MMD-VAE).\n\nArguments\n\nx::AbstractArray: Input data.\n\nOptional Keyword Arguments\n\nlatent::Bool: Whether to return the latent variables along with the decoder output. If true, the function returns a tuple containing the encoder outputs, the latent sample, and the decoder outputs. If false, the function only returns the decoder outputs. Defaults to false. \n\nReturns\n\nIf latent is true, returns a NamedTuple containing:\nencoder: The outputs of the encoder.\nz: The latent sample.\ndecoder: The outputs of the decoder.\nIf latent is false, returns the outputs of the decoder.\n\n\n\n\n\n","category":"method"},{"location":"mmdvae/#Loss-function","page":"MMD-VAE (InfoVAE)","title":"Loss function","text":"","category":"section"},{"location":"mmdvae/","page":"MMD-VAE (InfoVAE)","title":"MMD-VAE (InfoVAE)","text":"AutoEncoderToolkit.MMDVAEs.loss","category":"page"},{"location":"mmdvae/#AutoEncoderToolkit.MMDVAEs.loss","page":"MMD-VAE (InfoVAE)","title":"AutoEncoderToolkit.MMDVAEs.loss","text":"loss(mmdvae::MMDVAE, x::AbstractArray; σ::Number=1.0f0, λ::Number=1.0f0, α::Number=0.0f0, n_latent_samples::Int=50, kernel::Function=gaussian_kernel, kernel_kwargs::Union{NamedTuple,Dict}=Dict(), reconstruction_loglikelihood::Function=decoder_loglikelihood, kl_divergence::Function=encoder_kl)\n\nLoss function for the Maximum-Mean Discrepancy variational autoencoder (MMD-VAE). The loss function is defined as:\n\nloss = -⟨log p(x|z)⟩ + (1 - α) * Dₖₗ(qᵩ(z | x) || p(z)) + (λ + α - 1) * MMD-D(qᵩ(z) || p(z)),\n\nArguments\n\nmmdvae::MMDVAE: Struct containing the elements of the MMD-VAE.\nx::AbstractArray: Input data.\n\nOptional Arguments\n\nλ::Number=1.0f0: Hyperparameter that emphasizes the importance of the KL divergence between qᵩ(z) and π(z) during training.\nα::Number=0.0f0: Hyperparameter that emphasizes the importance of the Mutual Information term during optimization.\nn_latent_samples::Int=50: Number of samples to take from the latent space prior π(z) when computing the MMD divergence.\nkernel::Function=gaussian_kernel: Kernel used to compute the divergence. Default is the Gaussian Kernel.\nkernel_kwargs::Union{NamedTuple,Dict}=Dict(): Additional keyword arguments to be passed to the kernel function.\nreconstruction_loglikelihood::Function=decoder_loglikelihood: Function that computes the log likelihood of the reconstructed input.\nkl_divergence::Function=encoder_kl: Function that computes the Kullback-Leibler divergence between the encoder distribution and the prior.\n\nReturns\n\nSingle value defining the loss function for entry x when compared with reconstructed output x̂.\n\nDescription\n\nThis function calculates the loss for the MMD-VAE. It computes the log likelihood of the reconstructed input, the MMD divergence between the encoder distribution and the prior, and the Kullback-Leibler divergence between the approximate decoder and the prior. These quantities are combined according to the formula above to compute the loss.\n\n\n\n\n\nloss(\n mmdvae::MMDVAE, x_in::AbstractArray, x_out::AbstractArray; \n λ::Number=1.0f0, α::Number=0.0f0, \n n_latent_samples::Int=50, \n kernel::Function=gaussian_kernel, \n kernel_kwargs::Union{NamedTuple,Dict}=Dict(), \n reconstruction_loglikelihood::Function=decoder_loglikelihood, \n kl_divergence::Function=encoder_kl\n)\n\nLoss function for the Maximum-Mean Discrepancy variational autoencoder (MMD-VAE). The loss function is defined as:\n\nloss = -⟨log p(x|z)⟩ + (1 - α) * Dₖₗ(qᵩ(z | x) || p(z)) + (λ + α - 1) * MMD-D(qᵩ(z) || p(z)),\n\nArguments\n\nmmdvae::MMDVAE: Struct containing the elements of the MMD-VAE.\nx_in::AbstractArray: Input data.\nx_out::AbstractArray: Data against which to compare the reconstructed output.\n\nOptional Arguments\n\nλ::Number=1.0f0: Hyperparameter that emphasizes the importance of the KL divergence between qᵩ(z) and π(z) during training.\nα::Number=0.0f0: Hyperparameter that emphasizes the importance of the Mutual Information term during optimization.\nn_latent_samples::Int=50: Number of samples to take from the latent space prior π(z) when computing the MMD divergence.\nkernel::Function=gaussian_kernel: Kernel used to compute the divergence. Default is the Gaussian Kernel.\nkernel_kwargs::Union{NamedTuple,Dict}=Dict(): Additional keyword arguments to be passed to the kernel function.\nreconstruction_loglikelihood::Function=decoder_loglikelihood: Function that computes the log likelihood of the reconstructed input.\nkl_divergence::Function=encoder_kl: Function that computes the Kullback-Leibler divergence between the encoder distribution and the prior.\n\nReturns\n\nSingle value defining the loss function for entry x when compared with reconstructed output x̂.\n\nDescription\n\nThis function calculates the loss for the MMD-VAE. It computes the log likelihood of the reconstructed input, the MMD divergence between the encoder distribution and the prior, and the Kullback-Leibler divergence between the approximate decoder and the prior. These quantities are combined according to the formula above to compute the loss.\n\n\n\n\n\n","category":"function"},{"location":"mmdvae/#Training","page":"MMD-VAE (InfoVAE)","title":"Training","text":"","category":"section"},{"location":"mmdvae/","page":"MMD-VAE (InfoVAE)","title":"MMD-VAE (InfoVAE)","text":"AutoEncoderToolkit.MMDVAEs.train!","category":"page"},{"location":"mmdvae/#AutoEncoderToolkit.MMDVAEs.train!","page":"MMD-VAE (InfoVAE)","title":"AutoEncoderToolkit.MMDVAEs.train!","text":"train!(mmdvae, x, opt; loss_function, loss_kwargs, verbose, loss_return)\n\nCustomized training function to update parameters of a variational autoencoder given a specified loss function.\n\nArguments\n\nmmdvae::MMDVAE: A struct containing the elements of a Maximum-Mean Discrepancy Variational Autoencoder (MMD-VAE).\nx::AbstractArray: Data on which to evaluate the loss function. The last dimension is taken as having each of the samples in a batch.\nopt::NamedTuple: State of the optimizer for updating parameters. Typically initialized using Flux.Train.setup.\n\nOptional Keyword Arguments\n\nloss_function::Function=loss: The loss function used for training. It should accept the MMDVAE model, data x, and keyword arguments in that order.\nloss_kwargs::Union{NamedTuple,Dict} = Dict(): Arguments for the loss function. These might include parameters like α, or β, depending on the specific loss function in use.\nverbose::Bool=false: If true, the loss value will be printed during training.\nloss_return::Bool=false: If true, the loss value will be returned after training.\n\nDescription\n\nTrains the MMDVAE by:\n\nComputing the gradient of the loss w.r.t the MMDVAE parameters.\nUpdating the MMDVAE parameters using the optimizer.\n\n\n\n\n\ntrain!(mmdvae, x_in, x_out, opt; loss_function, loss_kwargs, verbose, loss_return)\n\nCustomized training function to update parameters of a variational autoencoder given a specified loss function.\n\nArguments\n\nmmdvae::MMDVAE: A struct containing the elements of a Maximum-Mean Discrepancy Variational Autoencoder (MMD-VAE).\nx_in::AbstractArray: Data on which to evaluate the loss function. The last dimension is taken as having each of the samples in a batch.\nx_out::AbstractArray: Data against which to compare the reconstructed output.\nopt::NamedTuple: State of the optimizer for updating parameters. Typically initialized using Flux.Train.setup.\n\nOptional Keyword Arguments\n\nloss_function::Function=loss: The loss function used for training. It should accept the MMDVAE model, data x, and keyword arguments in that order.\nloss_kwargs::Union{NamedTuple,Dict} = Dict(): Arguments for the loss function. These might include parameters like α, or β, depending on the specific loss function in use.\nverbose::Bool=false: If true, the loss value will be printed during training.\nloss_return::Bool=false: If true, the loss value will be returned after training.\n\nDescription\n\nTrains the MMDVAE by:\n\nComputing the gradient of the loss w.r.t the MMDVAE parameters.\nUpdating the MMDVAE parameters using the optimizer.\n\n\n\n\n\n","category":"function"},{"location":"mmdvae/#Other-Functions","page":"MMD-VAE (InfoVAE)","title":"Other Functions","text":"","category":"section"},{"location":"mmdvae/","page":"MMD-VAE (InfoVAE)","title":"MMD-VAE (InfoVAE)","text":"AutoEncoderToolkit.MMDVAEs.gaussian_kernel\nAutoEncoderToolkit.MMDVAEs.mmd_div\nAutoEncoderToolkit.MMDVAEs.logP_mmd_ratio","category":"page"},{"location":"mmdvae/#AutoEncoderToolkit.MMDVAEs.gaussian_kernel","page":"MMD-VAE (InfoVAE)","title":"AutoEncoderToolkit.MMDVAEs.gaussian_kernel","text":"gaussian_kernel(\n x::AbstractArray, y::AbstractArray; ρ::Float32=1.0f0, dims::Int=2\n)\n\nFunction to compute the Gaussian Kernel between two arrays x and y, defined as \n\n k(x, y) = exp(-||x - y ||² / ρ²)\n\nArguments\n\nx::AbstractArray: First input array for the kernel.\ny::AbstractArray: Second input array for the kernel. \n\nOptional Keyword Arguments\n\nρ=1.0f0: Kernel amplitude hyperparameter. Larger ρ gives a smoother kernel.\ndims::Int=2: Number of dimensions to compute pairwise distances over.\n\nReturns\n\nk::AbstractArray: Kernel matrix where each element is computed as \n\nTheory\n\nThe Gaussian kernel measures the similarity between two points x and y. It is widely used in many machine learning algorithms. This implementation computes the squared Euclidean distance between all pairs of rows in x and y, scales the distance by ρ² and takes the exponential.\n\n\n\n\n\n","category":"function"},{"location":"mmdvae/#AutoEncoderToolkit.MMDVAEs.mmd_div","page":"MMD-VAE (InfoVAE)","title":"AutoEncoderToolkit.MMDVAEs.mmd_div","text":"mmd_div(\n x::AbstractArray, y::AbstractArray; \n kernel::Function=gaussian_kernel, \n kernel_kwargs::Union{NamedTuple,Dict}=Dict()\n)\n\nCompute the Maximum Mean Discrepancy (MMD) divergence between two arrays x and y.\n\nArguments\n\nx::AbstractArray: First input array.\ny::AbstractArray: Second input array.\n\nKeyword Arguments\n\nkernel::Function=gaussian_kernel: Kernel function to use. Default is the Gaussian kernel.\nkernel_kwargs::Union{NamedTuple,Dict}=Dict(): Additional keyword arguments to be passed to the kernel function.\n\nReturns\n\nmmd::Number: MMD divergence value. \n\nTheory\n\nMMD measures the difference between two distributions based on embeddings in a Reproducing Kernel Hilbert Space (RKHS). It is widely used for two-sample tests.\n\nThis function implements MMD as:\n\nMMD(x, y) = mean(k(x, x)) - 2 * mean(k(x, y)) + mean(k(y, y))\n\nwhere k is a positive definite kernel (e.g., Gaussian).\n\n\n\n\n\n","category":"function"},{"location":"mmdvae/#AutoEncoderToolkit.MMDVAEs.logP_mmd_ratio","page":"MMD-VAE (InfoVAE)","title":"AutoEncoderToolkit.MMDVAEs.logP_mmd_ratio","text":"logP_mmd_ratio(\n mmdvae::MMDVAE, x::AbstractArray; \n n_latent_samples::Int=100, kernel=gaussian_kernel, \n kernel_kwargs::Union{NamedTuple,Dict}=NamedTuple(), \n reconstruction_loglikelihood::Function=decoder_loglikelihood\n)\n\nFunction to compute the absolute ratio between the log likelihood ⟨log p(x|z)⟩ and the MMD divergence MMD-D(qᵩ(z|x)||p(z)).\n\nArguments\n\nmmdvae::MMDVAE: Struct containing the elements of the MMD-VAE.\nx::AbstractArray: Data to train the MMD-VAE.\n\nOptional Keyword Arguments\n\nn_latent_samples::Int=100: Number of samples to take from the latent space prior p(z) when computing the MMD divergence.\nkernel=gaussian_kernel: Kernel used to compute the divergence. Default is the Gaussian Kernel.\nkernel_kwargs::Union{NamedTuple,Dict}=NamedTuple(): Tuple containing arguments for the Kernel function.\nreconstruction_loglikelihood::Function=decoder_loglikelihood: Function that computes the log likelihood of the reconstructed input.\n\nReturns\n\nabs(⟨log p(x|z)⟩ / MMD-D(qᵩ(z|x)||p(z)))\n\nDescription\n\nThis function calculates:\n\nThe log likelihood ⟨log p(x|z)⟩ of x under the MMD-VAE decoder, averaged over\n\nall samples. 2. The MMD divergence between the encoder distribution q(z|x) and prior p(z). \n\nThe absolute ratio of these two quantities is returned.\n\nNote\n\nThis ratio is useful for setting the Lagrangian multiplier λ in training MMD-VAEs.\n\n\n\n\n\n","category":"function"},{"location":"utils/#Utils","page":"Utilities","title":"Utils","text":"","category":"section"},{"location":"utils/","page":"Utilities","title":"Utilities","text":"AutoEncoderToolkit.jl offers a series of utility functions for different tasks. ","category":"page"},{"location":"utils/#Training-Utilities","page":"Utilities","title":"Training Utilities","text":"","category":"section"},{"location":"utils/","page":"Utilities","title":"Utilities","text":"AutoEncoderToolkit.utils.step_scheduler\nAutoEncoderToolkit.utils.cycle_anneal\nAutoEncoderToolkit.utils.locality_sampler","category":"page"},{"location":"utils/#AutoEncoderToolkit.utils.step_scheduler","page":"Utilities","title":"AutoEncoderToolkit.utils.step_scheduler","text":"`step_scheduler(epoch, epoch_change, learning_rates)`\n\nSimple function to define different learning rates at specified epochs.\n\nArguments\n\nepoch::Int: Epoch at which to define learning rate.\nepoch_change::Vector{<:Int}: Number of epochs at which to change learning rate. It must include the initial learning rate!\nlearning_rates::Vector{<:AbstractFloat}: Learning rate value for the epoch range. Must be the same length as epoch_change\n\nReturns\n\nη::AbstractFloat: Learning rate for the current epoch.\n\n\n\n\n\n","category":"function"},{"location":"utils/#AutoEncoderToolkit.utils.cycle_anneal","page":"Utilities","title":"AutoEncoderToolkit.utils.cycle_anneal","text":"cycle_anneal(\n epoch::Int, \n n_epoch::Int, \n n_cycles::Int; \n frac::AbstractFloat=0.5f0, \n βmax::Number=1.0f0, \n βmin::Number=0.0f0, \n T::Type=Float32\n)\n\nFunction that computes the value of the annealing parameter β for a variational autoencoder as a function of the epoch number according to the cyclical annealing strategy.\n\nArguments\n\nepoch::Int: Epoch on which to evaluate the value of the annealing parameter.\nn_epoch::Int: Number of epochs that will be run to train the VAE.\nn_cycles::Int: Number of annealing cycles to be fit within the number of epochs.\n\nOptional Arguments\n\nfrac::AbstractFloat= 0.5f0: Fraction of the cycle in which the annealing parameter β will increase from the minimum to the maximum value.\nβmax::Number=1.0f0: Maximum value that the annealing parameter can reach.\nβmin::Number=0.0f0: Minimum value that the annealing parameter can reach.\nT::Type=Float32: The type of the output. The function will convert the output to this type.\n\nReturns\n\nβ::T: Value of the annealing parameter.\n\nCitation\n\nFu, H. et al. Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing. Preprint at http://arxiv.org/abs/1903.10145 (2019).\n\n\n\n\n\n","category":"function"},{"location":"utils/#AutoEncoderToolkit.utils.locality_sampler","page":"Utilities","title":"AutoEncoderToolkit.utils.locality_sampler","text":"locality_sampler(data, dist_tree, n_primary, n_secondary, k_neighbors; index=false)\n\nAlgorithm to generate mini-batches based on spatial locality as determined by a pre-constructed nearest neighbors tree.\n\nArguments\n\ndata::AbstractArray: An array containing the data points. The data points can be of any dimension.\ndist_tree::NearestNeighbors.NNTree: NearestNeighbors.jl tree used to determine the distance between data points.\nn_primary::Int: Number of primary points to sample.\nn_secondary::Int: Number of secondary points to sample from the neighbors of each primary point.\nk_neighbors::Int: Number of nearest neighbors from which to potentially sample the secondary points.\n\nOptional Keyword Arguments\n\nindex::Bool: If true, returns the indices of the selected samples. If false, returns the data corresponding to the indexes. Defaults to false.\n\nReturns\n\nIf index is true, returns sample_idx::Vector{Int64}: Indices of data points to include in the mini-batch.\nIf index is false, returns sample_data::AbstractArray: The data points to include in the mini-batch.\n\nDescription\n\nThis sampling algorithm consists of three steps:\n\nFor each datapoint, determine the k_neighbors nearest neighbors using the dist_tree.\nUniformly sample n_primary points without replacement from all data points.\nFor each primary point, sample n_secondary points without replacement from its k_neighbors nearest neighbors.\n\nExamples\n\n# Pre-constructed NearestNeighbors.jl tree\ndist_tree = NearestNeighbors.KDTree(data, metric)\nsample_indices = locality_sampler(data, dist_tree, 10, 5, 50)\n\nCitation\n\nSkafte, N., Jø rgensen, M. & Hauberg, S. ren. Reliable training and estimation of variance networks. in Advances in Neural Information Processing Systems vol. 32 (Curran Associates, Inc., 2019).\n\n\n\n\n\n","category":"function"},{"location":"utils/#centroidutils","page":"Utilities","title":"Centroid Finding Utilities","text":"","category":"section"},{"location":"utils/","page":"Utilities","title":"Utilities","text":"Some VAE models, such as the RHVAE, require clustering of the data. Specifically RHVAE can take a fixed subset of the training data as a reference for the computation of the metric tensor. The following functions can be used to define this reference subset to be used as centroids for the metric tensor computation.","category":"page"},{"location":"utils/","page":"Utilities","title":"Utilities","text":"AutoEncoderToolkit.utils.centroids_kmeans\nAutoEncoderToolkit.utils.centroids_kmedoids","category":"page"},{"location":"utils/#AutoEncoderToolkit.utils.centroids_kmeans","page":"Utilities","title":"AutoEncoderToolkit.utils.centroids_kmeans","text":"centroids_kmeans(\n x::AbstractMatrix, \n n_centroids::Int; \n assign::Bool=false\n)\n\nPerform k-means clustering on the input and return the centers. This function can be used to down-sample the number of points used when computing the metric tensor in training a Riemannian Hamiltonian Variational Autoencoder (RHVAE).\n\nArguments\n\nx::AbstractMatrix: The input data. Rows represent individual samples.\nn_centroids::Int: The number of centroids to compute.\n\nOptional Keyword Arguments\n\nassign::Bool=false: If true, also return the assignments of each point to a centroid.\n\nReturns\n\nIf assign is false, returns a matrix where each column is a centroid.\nIf assign is true, returns a tuple where the first element is the matrix of centroids and the second element is a vector of assignments.\n\nExamples\n\ndata = rand(100, 10)\ncentroids = centroids_kmeans(data, 5)\n\n\n\n\n\ncentroids_kmeans(\n x::AbstractArray, \n n_centroids::Int; \n reshape_centroids::Bool=true, \n assign::Bool=false\n)\n\nPerform k-means clustering on the input and return the centers. This function can be used to down-sample the number of points used when computing the metric tensor in training a Riemannian Hamiltonian Variational Autoencoder (RHVAE).\n\nThe input data is flattened into a matrix before performing k-means clustering. This is done because k-means operates on a set of data points in a vector space and cannot handle multi-dimensional arrays. Flattening the input ensures that the k-means algorithm can process the data correctly.\n\nBy default, the output centroids are reshaped back to the original input shape. This is controlled by the reshape_centroids argument.\n\nArguments\n\nx::AbstractArray: The input data. It can be a multi-dimensional array where the last dimension represents individual samples.\nn_centroids::Int: The number of centroids to compute.\n\nOptional Keyword Arguments\n\nreshape_centroids::Bool=true: If true, reshape the output centroids back to the original input shape.\nassign::Bool=false: If true, also return the assignments of each point to a centroid.\n\nReturns\n\nIf assign is false, returns a matrix where each column is a centroid.\nIf assign is true, returns a tuple where the first element is the matrix of centroids and the second element is a vector of assignments.\n\nExamples\n\ndata = rand(100, 10)\ncentroids = centroids_kmeans(data, 5)\n\n\n\n\n\n","category":"function"},{"location":"utils/#AutoEncoderToolkit.utils.centroids_kmedoids","page":"Utilities","title":"AutoEncoderToolkit.utils.centroids_kmedoids","text":" centroids_kmedoids(\n x::AbstractMatrix, n_centroids::Int; assign::Bool=false\n )\n\nPerform k-medoids clustering on the input and return the centers. This function can be used to down-sample the number of points used when computing the metric tensor in training a Riemannian Hamiltonian Variational Autoencoder (RHVAE).\n\nArguments\n\nx::AbstractMatrix: The input data. Rows represent individual samples.\nn_centroids::Int: The number of centroids to compute.\ndist::Distances.PreMetric=Distances.Euclidean(): The distance metric to use when computing the pairwise distance matrix.\n\nOptional Keyword Arguments\n\nassign::Bool=false: If true, also return the assignments of each point to a centroid.\n\nReturns\n\nIf assign is false, returns a matrix where each column is a centroid.\nIf assign is true, returns a tuple where the first element is the matrix of centroids and the second element is a vector of assignments.\n\nExamples\n\ndata = rand(100, 10)\ncentroids = centroids_kmedoids(data, 5)\n\n\n\n\n\ncentroids_kmedoids(\n x::AbstractArray,\n n_centroids::Int,\n dist::Distances.PreMetric=Distances.Euclidean();\n assign::Bool=false\n)\n\nPerform k-medoids clustering on the input and return the centers. This function can be used to down-sample the number of points used when computing the metric tensor in training a Riemannian Hamiltonian Variational Autoencoder (RHVAE).\n\nArguments\n\nx::AbstractArray: The input data. The last dimension of x should contain each of the samples that should be clustered.\nn_centroids::Int: The number of centroids to compute.\ndist::Distances.PreMetric=Distances.Euclidean(): The distance metric to use for the clustering. Defaults to Euclidean distance.\n\nOptional Keyword Arguments\n\nassign::Bool=false: If true, also return the assignments of each point to a centroid.\n\nReturns\n\nIf assign is false, returns an array where each column is a centroid.\nIf assign is true, returns a tuple where the first element is the array of centroids and the second element is a vector of assignments.\n\nExamples\n\ndata = rand(10, 100)\ncentroids = centroids_kmedoids(data, 5)\n\n\n\n\n\n","category":"function"},{"location":"utils/#Other-Utilities","page":"Utilities","title":"Other Utilities","text":"","category":"section"},{"location":"utils/","page":"Utilities","title":"Utilities","text":"AutoEncoderToolkit.utils.storage_type\nAutoEncoderToolkit.utils.vec_to_ltri\nAutoEncoderToolkit.utils.vec_mat_vec_batched\nAutoEncoderToolkit.utils.slogdet\nAutoEncoderToolkit.utils.sample_MvNormalCanon\nAutoEncoderToolkit.utils.unit_vector\nAutoEncoderToolkit.utils.finite_difference_gradient\nAutoEncoderToolkit.utils.taylordiff_gradient","category":"page"},{"location":"utils/#AutoEncoderToolkit.utils.storage_type","page":"Utilities","title":"AutoEncoderToolkit.utils.storage_type","text":"storage_type(A::AbstractArray)\n\nDetermine the storage type of an array.\n\nThis function recursively checks the parent of the array until it finds the base storage type. This is useful for determining whether an array or its subarrays are stored on the CPU or GPU.\n\nArguments\n\nA::AbstractArray: The array whose storage type is to be determined.\n\nReturns\n\nThe type of the array that is the base storage of A.\n\n\n\n\n\n","category":"function"},{"location":"utils/#AutoEncoderToolkit.utils.vec_to_ltri","page":"Utilities","title":"AutoEncoderToolkit.utils.vec_to_ltri","text":" vec_to_ltri(diag::AbstractVecOrMat, lower::AbstractVecOrMat)\n\nConvert two one-dimensional vectors or matrices into a lower triangular matrix or a 3D tensor.\n\nArguments\n\ndiag::AbstractVecOrMat: The input vector or matrix to be converted into the diagonal of the matrix. If it's a matrix, each column is considered as a separate vector.\nlower::AbstractVecOrMat: The input vector or matrix to be converted into the lower triangular part of the matrix. The length of this vector or the number of rows in this matrix should be a triangular number (i.e., the sum of the first n natural numbers for some n). If it's a matrix, each column is considered the lower part of a separate lower triangular matrix.\n\nReturns\n\nA lower triangular matrix or a 3D tensor where each slice is a lower triangular matrix constructed from diag and lower.\n\nDescription\n\nThis function constructs a lower triangular matrix or a 3D tensor from two input vectors or matrices, diag and lower. The diag vector or matrix provides the diagonal elements of the matrix, while the lower vector or matrix provides the elements below the diagonal. The function uses a comprehension to construct the matrix or tensor, with the lower_index function calculating the appropriate index in the lower vector or matrix for each element below the diagonal.\n\nGPU Support\n\nThe function supports both CPU and GPU arrays. For GPU arrays, the data is first transferred to the CPU, the lower triangular matrix or tensor is constructed, and then it is transferred back to the GPU.\n\n\n\n\n\n","category":"function"},{"location":"utils/#AutoEncoderToolkit.utils.vec_mat_vec_batched","page":"Utilities","title":"AutoEncoderToolkit.utils.vec_mat_vec_batched","text":"vec_mat_vec_batched(\n v::AbstractVector, \n M::AbstractMatrix, \n w::AbstractVector\n)\n\nCompute the product of a vector, a matrix, and another vector in the form v̲ᵀ M̲̲ w̲.\n\nThis function takes two vectors v and w, and a matrix M, and computes the product v̲ M̲̲ w̲. This function is added for consistency when calling multiple dispatch.\n\nArguments\n\nv::AbstractVector: A d dimensional vector.\nM::AbstractMatrix: A d×d matrix.\nw::AbstractVector: A d dimensional vector.\n\nReturns\n\nA scalar which is the result of the product v̲ M̲̲ w̲ for the corresponding vectors and matrix.\n\nNotes\n\nThis function uses the LinearAlgebra.dot function to perform the multiplication of the matrix M with the vector w. The resulting vector is then element-wise multiplied with the vector v and summed over the dimensions to obtain the final result. This function is added for consistency when calling multiple dispatch.\n\n\n\n\n\nvec_mat_vec_batched(\n v::AbstractMatrix, \n M::AbstractArray, \n w::AbstractMatrix\n)\n\nCompute the batched product of vectors and matrices in the form v̲ᵀ M̲̲ w̲.\n\nThis function takes two matrices v and w, and a 3D array M, and computes the batched product v̲ M̲̲ w̲. The computation is performed in a broadcasted manner using the Flux.batched_vec function.\n\nArguments\n\nv::AbstractMatrix: A d×n matrix, where d is the dimension of the vectors and n is the number of vectors.\nM::AbstractArray: A d×d×n array, where d is the dimension of the matrices and n is the number of matrices.\nw::AbstractMatrix: A d×n matrix, where d is the dimension of the vectors and n is the number of vectors.\n\nReturns\n\nAn n dimensional array where each element is the result of the product v̲ M̲̲ w̲ for the corresponding vectors and matrix.\n\nNotes\n\nThis function uses the Flux.batched_vec function to perform the batched multiplication of the matrices in M with the vectors in w. The resulting vectors are then element-wise multiplied with the vectors in v and summed over the dimensions to obtain the final result.\n\n\n\n\n\nvec_mat_vec_batched(\n v::AbstractVector{T}, \n M::AbstractMatrix{S}, \n w::AbstractVector{T}\n) where {T<:TaylorDiff.TaylorScalar{Float32,2},S<:Number}\n\nCompute the product of a vector and a matrix in the form v̲ᵀ M̲ w̲ for a specific type of matrix and vectors.\n\nThis function takes two vectors v and w of type TaylorDiff.TaylorScalar{Float32,2}, and a matrix M of type Number, and computes the product v̲ M̲ w̲. The computation is performed by first performing the matrix-vector multiplication M̲ w̲, and then computing the dot product of the resulting vector with v.\n\nArguments\n\nv::AbstractVector{T}: A d dimensional vector. T is a subtype of TaylorDiff.TaylorScalar{Float32,2}.\nM::AbstractMatrix{S}: A d×d matrix. S is a subtype of Number.\nw::AbstractVector{T}: A d dimensional vector. T is a subtype of TaylorDiff.TaylorScalar{Float32,2}.\n\nReturns\n\nA scalar which is the result of the product v̲ M̲ w̲.\n\nNotes\n\nThis function uses the dot function to compute the final dot product.\n\n\n\n\n\nvec_mat_vec_batched(\n v::AbstractMatrix{T}, \n M::AbstractArray{S,3}, \n w::AbstractMatrix{T}\n) where {T<:TaylorDiff.TaylorScalar{Float32,2},S<:Number}\n\nCompute the batched product of vectors and matrices in the form v̲ᵀ M̲̲ w̲ for a specific type of matrices and vectors.\n\nThis function takes two matrices v and w of type TaylorDiff.TaylorScalar{Float32,2}, and a 3D array M of type Number, and computes the batched product v̲ M̲̲ w̲. The computation is performed by first extracting each slice of M and each column of w, then performing the vector-matrix multiplication for each pair of slices, and finally computing the element-wise multiplication of the resulting matrix with v and summing over the dimensions.\n\nArguments\n\nv::AbstractMatrix{T}: A d×n matrix, where d is the dimension of the vectors and n is the number of vectors. T is a subtype of TaylorDiff.TaylorScalar{Float32,2}.\nM::AbstractArray{S,3}: A d×d×n array, where d is the dimension of the matrices and n is the number of matrices. S is a subtype of Number.\nw::AbstractMatrix{T}: A d×n matrix, where d is the dimension of the vectors and n is the number of vectors. T is a subtype of TaylorDiff.TaylorScalar{Float32,2}.\n\nReturns\n\nAn n dimensional array where each element is the result of the product v̲ M̲̲ w̲ for the corresponding vectors and matrix.\n\nNotes\n\nThis function uses the eachslice and eachcol functions to extract the slices of M and the columns of w, respectively. It then uses a list comprehension to perform the vector-matrix multiplication for each pair of slices, and finally computes the element-wise multiplication of the resulting matrix with v and sums over the dimensions to obtain the final result.\n\n\n\n\n\n","category":"function"},{"location":"utils/#AutoEncoderToolkit.utils.slogdet","page":"Utilities","title":"AutoEncoderToolkit.utils.slogdet","text":"slogdet(A::AbstractArray{T}; check::Bool=false) where {T<:Number}\n\nCompute the log determinant of a positive-definite matrix A or a 3D array of such matrices.\n\nArguments\n\nA::AbstractArray{T}: A positive-definite matrix or a 3D array of positive-definite matrices whose log determinant is to be computed. \ncheck::Bool=false: A flag that determines whether to check if the input matrix A is positive-definite. Defaults to false due to numerical instability.\n\nReturns\n\nThe log determinant of A. If A is a 3D array, returns a 1D array of log determinants, one for each slice along the third dimension of A.\n\nDescription\n\nThis function computes the log determinant of a positive-definite matrix A or a 3D array of such matrices. It first computes the Cholesky decomposition of A, and then calculates the log determinant as twice the sum of the log of the diagonal elements of the lower triangular matrix from the Cholesky decomposition.\n\nConditions\n\nThe input matrix A must be a positive-definite matrix, i.e., it must be symmetric and all its eigenvalues must be positive. If check is set to true, the function will throw an error if A is not positive-definite.\n\nGPU Support\n\nThe function supports both CPU and GPU arrays. \n\n\n\n\n\n","category":"function"},{"location":"utils/#AutoEncoderToolkit.utils.sample_MvNormalCanon","page":"Utilities","title":"AutoEncoderToolkit.utils.sample_MvNormalCanon","text":"sample_MvNormalCanon(Σ⁻¹::AbstractArray{T}) where {T<:Number}\n\nDraw a random sample from a multivariate normal distribution in canonical form.\n\nArguments\n\nΣ⁻¹::AbstractArray{T}: The precision matrix (inverse of the covariance matrix) of the multivariate normal distribution. This can be a 2D array (matrix) or a 3D array.\n\nReturns\n\nA random sample drawn from the multivariate normal distribution specified by the input precision matrix. If Σ⁻¹ is a 3D array, returns a 2D array of samples, one for each slice along the third dimension of Σ⁻¹.\n\nDescription\n\nThis function draws a random sample from a multivariate normal distribution specified by a precision matrix Σ⁻¹. The precision matrix can be a 2D array (matrix) or a 3D array. If Σ⁻¹ is a 3D array, the function draws a sample for each slice along the third dimension of Σ⁻¹.\n\nThe function first inverts the precision matrix to obtain the covariance matrix, then performs a Cholesky decomposition of the covariance matrix. It then draws a sample from a standard normal distribution and multiplies it by the lower triangular matrix from the Cholesky decomposition to obtain the final sample.\n\nGPU Support\n\nThe function supports both CPU and GPU arrays.\n\n\n\n\n\n","category":"function"},{"location":"utils/#AutoEncoderToolkit.utils.unit_vector","page":"Utilities","title":"AutoEncoderToolkit.utils.unit_vector","text":"unit_vector(x::AbstractVector, i::Int)\n\nCreate a unit vector of the same length as x with the i-th element set to 1.\n\nArguments\n\nx::AbstractVector: The vector whose length is used to determine the dimension of the unit vector.\ni::Int: The index of the element to be set to 1.\n\nReturns\n\nA unit vector of type eltype(x) and length equal to x with the i-th element set to 1.\n\nDescription\n\nThis function creates a unit vector of the same length as x with the i-th element set to 1. All other elements are set to 0.\n\nNote\n\nThis function is marked with the @ignore_derivatives macro from the ChainRulesCore package, which means that all AutoDiff backends will ignore any call to this function when computing gradients.\n\n\n\n\n\nunit_vector(x::AbstractMatrix, i::Int)\n\nCreate a unit vector of the same length as the number of rows in x with the i-th element set to 1.\n\nArguments\n\nx::AbstractMatrix: The matrix whose number of rows is used to determine the dimension of the unit vector.\ni::Int: The index of the element to be set to 1.\n\nReturns\n\nA unit vector of type eltype(x) and length equal to the number of rows in x with the i-th element set to 1.\n\nDescription\n\nThis function creates a unit vector of the same length as the number of rows in x with the i-th element set to 1. All other elements are set to 0. \n\n\n\n\n\n","category":"function"},{"location":"utils/#AutoEncoderToolkit.utils.finite_difference_gradient","page":"Utilities","title":"AutoEncoderToolkit.utils.finite_difference_gradient","text":"finite_difference_gradient(\n f::Function,\n x::AbstractVecOrMat;\n fdtype::Symbol=:central\n)\n\nCompute the finite difference gradient of a function f at a point x.\n\nArguments\n\nf::Function: The function for which the gradient is to be computed. This function must return a scalar value.\nx::AbstractVecOrMat: The point at which the gradient is to be computed. Can be a vector or a matrix. If a matrix, each column represents a point where the function f is to be evaluated and the derivative computed.\n\nOptional Keyword Arguments\n\nfdtype::Symbol=:central: The finite difference type. It can be either :forward or :central. Defaults to :central.\n\nReturns\n\nA vector or a matrix representing the gradient of f at x, depending on the input type of x.\n\nDescription\n\nThis function computes the finite difference gradient of a function f at a point x. The gradient is a vector or a matrix where the i-th element is the partial derivative of f with respect to the i-th element of x.\n\nThe partial derivatives are computed using the forward or central difference formula, depending on the fdtype argument:\n\nForward difference formula: ∂f/∂xᵢ ≈ [f(x + ε * eᵢ) - f(x)] / ε\nCentral difference formula: ∂f/∂xᵢ ≈ [f(x + ε * eᵢ) - f(x - ε * eᵢ)] / 2ε\n\nwhere ε is the step size and eᵢ is the i-th unit vector.\n\nGPU Support\n\nThis function supports both CPU and GPU arrays.\n\n\n\n\n\n","category":"function"},{"location":"utils/#AutoEncoderToolkit.utils.taylordiff_gradient","page":"Utilities","title":"AutoEncoderToolkit.utils.taylordiff_gradient","text":" taylordiff_gradient(\n f::Function,\n x::AbstractVecOrMat\n )\n\nCompute the gradient of a function f at a point x using Taylor series differentiation.\n\nArguments\n\nf::Function: The function for which the gradient is to be computed. This must be a scalar function.\nx::AbstractVecOrMat: The point at which the gradient is to be computed. Can be a vector or a matrix. If a matrix, each column represents a point where the function f is to be evaluated and the derivative computed.\n\nReturns\n\nA vector or a matrix representing the gradient of f at x, depending on the input type of x.\n\nDescription\n\nThis function computes the gradient of a function f at a point x using Taylor series differentiation. The gradient is a vector or a matrix where the i-th element or column is the partial derivative of f with respect to the i-th element of x.\n\nThe partial derivatives are computed using the TaylorDiff.derivative function.\n\nGPU Support\n\nThis function currently only supports CPU arrays.\n\n\n\n\n\n","category":"function"},{"location":"encoders/#encodersdecoders","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.jl provides a set of predefined encoders and decoders that can be used to define custom (variational) autoencoder architectures.","category":"page"},{"location":"encoders/#Encoders","page":"Encoders & Decoders","title":"Encoders","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"The tree structure of the encoder types looks like this (🧱 represents concrete types):","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AbstractEncoder\nAbstractDeterministicEncoder\nEncoder 🧱\nAbstractVariationalEncoder\nAbstractGaussianEncoder\nAbstractGaussianLinearEncoder\nJointGaussianEncoder 🧱\nAbstractGaussianLogEncoder\nJointGaussianLogEncoder 🧱","category":"page"},{"location":"encoders/#Encoder","page":"Encoders & Decoders","title":"Encoder","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.Encoder\nAutoEncoderToolkit.Encoder(::AbstractArray)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.Encoder","page":"Encoders & Decoders","title":"AutoEncoderToolkit.Encoder","text":"struct Encoder{E<:Union{Flux.Chain,Flux.Dense}} <: AbstractDeterministicEncoder\n\nDefault encoder function for deterministic autoencoders. The encoder network is used to map the input data directly into the latent space representation.\n\nFields\n\nencoder::Union{Flux.Chain,Flux.Dense}: The primary neural network used to process input data and map it into a latent space representation.\n\nExample\n\nenc = Encoder(Flux.Chain(Dense(784, 400, relu), Dense(400, 20)))\n\n\n\n\n\n","category":"type"},{"location":"encoders/#AutoEncoderToolkit.Encoder-Tuple{AbstractArray}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.Encoder","text":"(encoder::Encoder)(x)\n\nForward propagate the input x through the Encoder to obtain the encoded representation in the latent space.\n\nArguments\n\nx::Array: Input data to be encoded.\n\nReturns\n\nz: Encoded representation of the input data in the latent space.\n\nDescription\n\nThis method allows for a direct call on an instance of Encoder with the input data x. It runs the input through the encoder network and outputs the encoded representation in the latent space.\n\nExample\n\nenc = Encoder(...)\nz = enc(some_input)\n\nNote\n\nEnsure that the input x matches the expected dimensionality of the encoder's input layer.\n\n\n\n\n\n","category":"method"},{"location":"encoders/#JointGaussianEncoder","page":"Encoders & Decoders","title":"JointGaussianEncoder","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.JointGaussianEncoder\nAutoEncoderToolkit.JointGaussianEncoder(::AbstractArray)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.JointGaussianEncoder","page":"Encoders & Decoders","title":"AutoEncoderToolkit.JointGaussianEncoder","text":"struct JointGaussianEncoder <: AbstractGaussianLinearEncoder\n\nEncoder function for variational autoencoders where the same encoder network is used to map to the latent space mean µ and standard deviation σ.\n\nFields\n\nencoder::Flux.Chain: The primary neural network used to process input data and map it into a latent space representation.\nµ::Flux.Dense: A dense layer mapping from the output of the encoder to the mean of the latent space.\nσ::Flux.Dense: A dense layer mapping from the output of the encoder to the standard deviation of the latent space.\n\nExample\n\nenc = JointGaussianEncoder(\n Flux.Chain(Dense(784, 400, relu)), Flux.Dense(400, 20), Flux.Dense(400, 20)\n)\n\n\n\n\n\n","category":"type"},{"location":"encoders/#AutoEncoderToolkit.JointGaussianEncoder-Tuple{AbstractArray}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.JointGaussianEncoder","text":" (encoder::JointGaussianEncoder)(x::AbstractArray)\n\nForward propagate the input x through the JointGaussianEncoder to obtain the mean (µ) and standard deviation (σ) of the latent space.\n\nArguments\n\nx::AbstractArray: Input data to be encoded.\n\nReturns\n\nA NamedTuple (µ=µ, σ=σ,) where:\nµ: Mean of the latent space after passing the input through the encoder and subsequently through the µ layer.\nσ: Standard deviation of the latent space after passing the input through the encoder and subsequently through the σ layer.\n\nDescription\n\nThis method allows for a direct call on an instance of JointGaussianEncoder with the input data x. It first runs the input through the encoder network, then maps the output of the last encoder layer to both the mean and standard deviation of the latent space.\n\nExample\n\nje = JointGaussianEncoder(...)\nµ, σ = je(some_input)\n\nNote\n\nEnsure that the input x matches the expected dimensionality of the encoder's input layer.\n\n\n\n\n\n","category":"method"},{"location":"encoders/#JointGaussianLogEncoder","page":"Encoders & Decoders","title":"JointGaussianLogEncoder","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.JointGaussianLogEncoder\nAutoEncoderToolkit.JointGaussianLogEncoder(::AbstractArray)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.JointGaussianLogEncoder","page":"Encoders & Decoders","title":"AutoEncoderToolkit.JointGaussianLogEncoder","text":"struct JointGaussianLogEncoder <: AbstractGaussianLogEncoder\n\nDefault encoder function for variational autoencoders where the same encoder network is used to map to the latent space mean µ and log standard deviation logσ.\n\nFields\n\nencoder::Flux.Chain: The primary neural network used to process input data and map it into a latent space representation.\nµ::Union{Flux.Dense,Flux.Chain}: A dense layer or a chain of layers mapping from the output of the encoder to the mean of the latent space.\nlogσ::Union{Flux.Dense,Flux.Chain}: A dense layer or a chain of layers mapping from the output of the encoder to the log standard deviation of the latent space.\n\nExample\n\nenc = JointGaussianLogEncoder(\n Flux.Chain(Dense(784, 400, relu)), Flux.Dense(400, 20), Flux.Dense(400, 20)\n)\n\n\n\n\n\n","category":"type"},{"location":"encoders/#AutoEncoderToolkit.JointGaussianLogEncoder-Tuple{AbstractArray}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.JointGaussianLogEncoder","text":" (encoder::JointGaussianLogEncoder)(x)\n\nThis method forward propagates the input x through the JointGaussianLogEncoder to compute the mean (mu) and log standard deviation (logσ) of the latent space.\n\nArguments\n\nx::Array{Float32}: The input data to be encoded.\n\nReturns\n\nA NamedTuple (µ=µ, logσ=logσ,) where:\nµ: The mean of the latent space. This is computed by passing the input through the encoder and subsequently through the µ layer. \nlogσ: The log standard deviation of the latent space. This is computed by passing the input through the encoder and subsequently through the logσ layer.\n\nDescription\n\nThis method allows for a direct call on an instance of JointGaussianLogEncoder with the input data x. It first processes the input through the encoder network, then maps the output of the last encoder layer to both the mean and log standard deviation of the latent space.\n\nExample\n\nje = JointGaussianLogEncoder(...)\nmu, logσ = je(some_input)\n\nNote\n\nEnsure that the input x matches the expected dimensionality of the encoder's input layer.\n\n\n\n\n\n","category":"method"},{"location":"encoders/#Decoders","page":"Encoders & Decoders","title":"Decoders","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"The tree structure of the decoder types looks like this (🧱 represents concrete types):","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AbstractDecoder\nAbstractDeterministicDecoder\nDecoder 🧱\nAbstractVariationalDecoder\nBernoulliDecoder 🧱\nCategoricalDecoder 🧱\nAbstractGaussianDecoder\nSimpleGaussianDecoder 🧱\nAbstractGaussianLinearDecoder\nJointGaussianDecoder 🧱\nSplitGaussianDecoder 🧱\nAbstractGaussianLogDecoder\nJointGaussianLogDecoder 🧱\nSplitGaussianLogDecoder 🧱","category":"page"},{"location":"encoders/#Decoder","page":"Encoders & Decoders","title":"Decoder","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.Decoder\nAutoEncoderToolkit.Decoder(::AbstractArray)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.Decoder","page":"Encoders & Decoders","title":"AutoEncoderToolkit.Decoder","text":"struct Decoder{D<:Flux.Chain} <: AbstractDeterministicDecoder\n\nDefault decoder function for deterministic autoencoders. The decoder network is used to map the latent space representation directly back to the original data space.\n\nFields\n\ndecoder::Flux.Chain: The primary neural network used to process the latent space representation and map it back to the data space.\n\nExample\n\ndec = Decoder(Flux.Chain(Dense(20, 400, relu), Dense(400, 784)))\n\n\n\n\n\n","category":"type"},{"location":"encoders/#AutoEncoderToolkit.Decoder-Tuple{AbstractArray}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.Decoder","text":"(decoder::Decoder)(z::AbstractArray)\n\nForward propagate the encoded representation z through the Decoder to obtain the reconstructed input data.\n\nArguments\n\nz::AbstractArray: Encoded representation in the latent space.\n\nReturns\n\nx_reconstructed: Reconstructed version of the original input data after decoding from the latent space.\n\nDescription\n\nThis method allows for a direct call on an instance of Decoder with the encoded data z. It runs the encoded representation through the decoder network and outputs the reconstructed version of the original input data.\n\nExample\n\njulia dec = Decoder(...) x_reconstructed = dec(encoded_representation)`\n\nNote\n\nEnsure that the input z matches the expected dimensionality of the decoder's input layer.\n\n\n\n\n\n","category":"method"},{"location":"encoders/#BernoulliDecoder","page":"Encoders & Decoders","title":"BernoulliDecoder","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.BernoulliDecoder\nAutoEncoderToolkit.BernoulliDecoder(::AbstractArray)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.BernoulliDecoder","page":"Encoders & Decoders","title":"AutoEncoderToolkit.BernoulliDecoder","text":" BernoulliDecoder{D<:Flux.Chain} <: AbstractVariationalDecoder\n\nA decoder structure for variational autoencoders (VAEs) that models the output data as a Bernoulli distribution. This is typically used when the outputs of the decoder are probabilities.\n\nFields\n\ndecoder::Flux.Chain: The primary neural network used to process the latent space and map it to the output (or reconstructed) space.\n\nDescription\n\nBernoulliDecoder represents a VAE decoder that models the output data as a Bernoulli distribution. It's commonly used when the outputs of the decoder are probabilities, such as in a binary classification task or when modeling binary data. Unlike a Gaussian decoder, there's no need for separate paths or operations on the mean or log standard deviation.\n\nNote\n\nEnsure the last layer of the decoder outputs a value between 0 and 1, as this is required for a Bernoulli distribution.\n\n\n\n\n\n","category":"type"},{"location":"encoders/#AutoEncoderToolkit.BernoulliDecoder-Tuple{AbstractArray}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.BernoulliDecoder","text":" (decoder::BernoulliDecoder)(z::AbstractArray)\n\nMaps the given latent representation z through the BernoulliDecoder network to reconstruct the original input.\n\nArguments\n\nz::AbstractArray: The latent space representation to be decoded. This can be a vector or a matrix, where each column represents a separate sample from the latent space of a VAE.\n\nReturns\n\nA NamedTuple (p=p,) where p is an array representing the output of the decoder, which should resemble the original input to the VAE (post encoding and sampling from the latent space).\n\nDescription\n\nThis function processes the latent space representation z using the neural network defined in the BernoulliDecoder struct. The aim is to decode or reconstruct the original input from this representation.\n\nNote\n\nEnsure that the latent space representation z matches the expected input dimensionality for the BernoulliDecoder.\n\n\n\n\n\n","category":"method"},{"location":"encoders/#CategoricalDecoder","page":"Encoders & Decoders","title":"CategoricalDecoder","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.CategoricalDecoder\nAutoEncoderToolkit.CategoricalDecoder(::AbstractArray)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.CategoricalDecoder","page":"Encoders & Decoders","title":"AutoEncoderToolkit.CategoricalDecoder","text":"CategoricalDecoder{D<:Flux.Chain} <: AbstractVariationalDecoder\n\nA decoder structure for variational autoencoders (VAEs) that models the output data as a categorical distribution. This is typically used when the outputs of the decoder are categorical variables encoded as one-hot vectors.\n\nFields\n\ndecoder::Flux.Chain: The primary neural network used to process the latent space and map it to the output (or reconstructed) space.\n\nDescription\n\nCategoricalDecoder represents a VAE decoder that models the output data as a categorical distribution. It's commonly used when the outputs of the decoder are categorical variables, such as in a multi-class one-hot encoded vectors. Unlike a Gaussian decoder, there's no need for separate paths or operations on the mean or log standard deviation.\n\nNote\n\nEnsure the last layer of the decoder outputs a probability distribution over the categories, as this is required for a categorical distribution. This can be done using a softmax activation function, for example.\n\n\n\n\n\n","category":"type"},{"location":"encoders/#AutoEncoderToolkit.CategoricalDecoder-Tuple{AbstractArray}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.CategoricalDecoder","text":"(decoder::CategoricalDecoder)(z::AbstractArray)\n\nMaps the given latent representation z through the CategoricalDecoder network to reconstruct the original input.\n\nArguments\n\nz::AbstractArray: The latent space representation to be decoded. This can be a vector or a matrix, where each column represents a separate sample from the latent space of a VAE.\n\nReturns\n\nA NamedTuple (p=p,) where p is an array representing the output of the decoder, which should resemble the original input to the VAE (post encoding and sampling from the latent space).\n\nDescription\n\nThis function processes the latent space representation z using the neural network defined in the CategoricalDecoder struct. The aim is to decode or reconstruct the original input from this representation.\n\nNote\n\nEnsure that the latent space representation z matches the expected input dimensionality for the CategoricalDecoder.\n\n\n\n\n\n","category":"method"},{"location":"encoders/#SimpleGaussianDecoder","page":"Encoders & Decoders","title":"SimpleGaussianDecoder","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.SimpleGaussianDecoder\nAutoEncoderToolkit.SimpleGaussianDecoder(::AbstractArray)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.SimpleGaussianDecoder","page":"Encoders & Decoders","title":"AutoEncoderToolkit.SimpleGaussianDecoder","text":"SimpleGaussianDecoder{D} <: AbstractGaussianDecoder\n\nA straightforward decoder structure for variational autoencoders (VAEs) that contains only a single decoder network.\n\nFields\n\ndecoder::Flux.Chain: The primary neural network used to process the latent space and map it to the output (or reconstructed) space.\n\nDescription\n\nSimpleGaussianDecoder represents a basic VAE decoder without explicit components for the latent space's mean (µ) or log standard deviation (logσ). It's commonly used when the VAE's latent space distribution is implicitly defined, and there's no need for separate paths or operations on the mean or log standard deviation.\n\n\n\n\n\n","category":"type"},{"location":"encoders/#AutoEncoderToolkit.SimpleGaussianDecoder-Tuple{AbstractArray}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.SimpleGaussianDecoder","text":"(decoder::SimpleGaussianDecoder)(z::AbstractVecOrMat)\n\nMaps the given latent representation z through the SimpleGaussianDecoder network to reconstruct the original input.\n\nArguments\n\nz::AbstractArray: The latent space representation to be decoded. This can be a vector or a matrix, where each column represents a separate sample from the latent space of a VAE.\n\nReturns\n\nA NamedTuple (µ=µ,) where µ is an array representing the output of the decoder, which should resemble the original input to the VAE (post encoding and sampling from the latent space).\n\nDescription\n\nThis function processes the latent space representation z using the neural network defined in the SimpleGaussianDecoder struct. The aim is to decode or reconstruct the original input from this representation.\n\nExample\n\ndecoder = SimpleGaussianDecoder(...)\nz = ... # some latent space representation\noutput = decoder(z)\n\nNote\n\nEnsure that the latent space representation z matches the expected input dimensionality for the SimpleGaussianDecoder.\n\n\n\n\n\n","category":"method"},{"location":"encoders/#JointGaussianDecoder","page":"Encoders & Decoders","title":"JointGaussianDecoder","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.JointGaussianDecoder\nAutoEncoderToolkit.JointGaussianDecoder(::AbstractArray)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.JointGaussianDecoder","page":"Encoders & Decoders","title":"AutoEncoderToolkit.JointGaussianDecoder","text":"JointGaussianDecoder{D<:Flux.Chain,L<:Flux.Dense} <: AbstractGaussianLinearDecoder\n\nAn extended decoder structure for VAEs that incorporates separate layers for mapping from the latent space to both its mean (µ) and standard deviation (σ).\n\nFields\n\ndecoder::Flux.Chain: The primary neural network used to process the latent space before determining its mean and log standard deviation.\nµ::Flux.Dense: A dense layer that maps from the output of the decoder to the mean of the latent space.\nσ::Flux.Dense: A dense layer that maps from the output of the decoder to the standard deviation of the latent space.\n\nDescription\n\nJointGaussianDecoder is tailored for VAE architectures where the same decoder network is used initially, and then splits into two separate paths for determining both the mean and standard deviation of the latent space.\n\n\n\n\n\n","category":"type"},{"location":"encoders/#AutoEncoderToolkit.JointGaussianDecoder-Tuple{AbstractArray}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.JointGaussianDecoder","text":" (decoder::JointGaussianDecoder)(z::AbstractArray)\n\nMaps the given latent representation z through the JointGaussianDecoder network to produce both the mean (µ) and standard deviation (σ).\n\nArguments\n\nz::AbstractArray: The latent space representation to be decoded. If array, the last dimension contains each of the latent space representations to be decoded.\n\nReturns\n\nA NamedTuple (µ=µ, σ=σ,) where:\nµ::AbstractArray: The mean representation obtained from the decoder.\nσ::AbstractArray: The standard deviation representation obtained from the decoder.\n\nDescription\n\nThis function processes the latent space representation z using the primary neural network of the JointGaussianDecoder struct. It then separately maps the output of this network to the mean and standard deviation using the µ and σ dense layers, respectively.\n\nExample\n\ndecoder = JointGaussianDecoder(...)\nz = ... # some latent space representation\noutput = decoder(z)\n\nNote\n\nEnsure that the latent space representation z matches the expected input dimensionality for the JointGaussianDecoder.\n\n\n\n\n\n","category":"method"},{"location":"encoders/#JointGaussianLogDecoder","page":"Encoders & Decoders","title":"JointGaussianLogDecoder","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.JointGaussianLogDecoder\nAutoEncoderToolkit.JointGaussianLogDecoder(::AbstractArray)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.JointGaussianLogDecoder","page":"Encoders & Decoders","title":"AutoEncoderToolkit.JointGaussianLogDecoder","text":"JointGaussianLogDecoder{D<:Flux.Chain,L<:Flux.Dense} <: AbstractGaussianLogDecoder\n\nAn extended decoder structure for VAEs that incorporates separate layers for mapping from the latent space to both its mean (µ) and log standard deviation (logσ).\n\nFields\n\ndecoder::Flux.Chain: The primary neural network used to process the latent space before determining its mean and log standard deviation.\nµ::Flux.Dense: A dense layer that maps from the output of the decoder to the mean of the latent space.\nlogσ::Flux.Dense: A dense layer that maps from the output of the decoder to the log standard deviation of the latent space.\n\nDescription\n\nJointGaussianLogDecoder is tailored for VAE architectures where the same decoder network is used initially, and then splits into two separate paths for determining both the mean and log standard deviation of the latent space.\n\n\n\n\n\n","category":"type"},{"location":"encoders/#AutoEncoderToolkit.JointGaussianLogDecoder-Tuple{AbstractArray}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.JointGaussianLogDecoder","text":" (decoder::JointGaussianLogDecoder)(z::AbstractArray)\n\nMaps the given latent representation z through the JointGaussianLogDecoder network to produce both the mean (µ) and log standard deviation (logσ).\n\nArguments\n\nz::AbstractArray: The latent space representation to be decoded. If array, the last dimension contains each of the latent space representations.\n\nReturns\n\nA NamedTuple (µ=µ, logσ=logσ,) where:\nµ::Array: The mean representation obtained from the decoder.\nlogσ::Array: The log standard deviation representation obtained from the decoder.\n\nDescription\n\nThis function processes the latent space representation z using the primary neural network of the JointGaussianLogDecoder struct. It then separately maps the output of this network to the mean and log standard deviation using the µ and logσ dense layers, respectively.\n\nExample\n\ndecoder = JointGaussianLogDecoder(...)\nz = ... # some latent space representation\noutput = decoder(z)\n\nNote\n\nEnsure that the latent space representation z matches the expected input dimensionality for the JointGaussianLogDecoder.\n\n\n\n\n\n","category":"method"},{"location":"encoders/#SplitGaussianDecoder","page":"Encoders & Decoders","title":"SplitGaussianDecoder","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.SplitGaussianDecoder\nAutoEncoderToolkit.SplitGaussianDecoder(::AbstractArray)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.SplitGaussianDecoder","page":"Encoders & Decoders","title":"AutoEncoderToolkit.SplitGaussianDecoder","text":"SplitGaussianDecoder{D<:Flux.Chain} <: AbstractGaussianLinearDecoder\n\nA specialized decoder structure for VAEs that uses distinct neural networks for determining the mean (µ) and standard deviation (logσ) of the latent space.\n\nFields\n\ndecoder_µ::Flux.Chain: A neural network dedicated to processing the latent space and mapping it to its mean.\ndecoder_σ::Flux.Chain: A neural network dedicated to processing the latent space and mapping it to its standard deviation.\n\nDescription\n\nSplitGaussianDecoder is designed for VAE architectures where separate decoder networks are preferred for computing the mean and log standard deviation, ensuring that each has its own distinct set of parameters and transformation logic.\n\n\n\n\n\n","category":"type"},{"location":"encoders/#AutoEncoderToolkit.SplitGaussianDecoder-Tuple{AbstractArray}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.SplitGaussianDecoder","text":" (decoder::SplitGaussianDecoder)(z::AbstractArray)\n\nMaps the given latent representation z through the separate networks of the SplitGaussianDecoder to produce both the mean (µ) and standard deviation (σ).\n\nArguments\n\nz::AbstractArray: The latent space representation to be decoded. If array, the last dimension contains each of the latent space representations to be decoded.\n\nReturns\n\nA NamedTuple (µ=µ, σ=σ,) where:\nµ::AbstractArray: The mean representation obtained using the dedicated decoder_µ network.\nσ::AbstractArray: The standard deviation representation obtained using the dedicated decoder_σ network.\n\nDescription\n\nThis function processes the latent space representation z through two distinct neural networks within the SplitGaussianDecoder struct. The decoder_µ network is used to produce the mean representation, while the decoder_σ network is utilized for the standard deviation.\n\nExample\n\ndecoder = SplitGaussianDecoder(...)\nz = ... # some latent space representation\noutput = decoder(z)\n\nNote\n\nEnsure that the latent space representation z matches the expected input dimensionality for both networks in the SplitGaussianDecoder.\n\n\n\n\n\n","category":"method"},{"location":"encoders/#SplitGaussianLogDecoder","page":"Encoders & Decoders","title":"SplitGaussianLogDecoder","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.SplitGaussianLogDecoder\nAutoEncoderToolkit.SplitGaussianLogDecoder(::AbstractArray)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.SplitGaussianLogDecoder","page":"Encoders & Decoders","title":"AutoEncoderToolkit.SplitGaussianLogDecoder","text":"SplitGaussianLogDecoder{D<:Flux.Chain} <: AbstractGaussianLogDecoder\n\nA specialized decoder structure for VAEs that uses distinct neural networks for determining the mean (µ) and log standard deviation (logσ) of the latent space.\n\nFields\n\ndecoder_µ::Flux.Chain: A neural network dedicated to processing the latent space and mapping it to its mean.\ndecoder_logσ::Flux.Chain: A neural network dedicated to processing the latent space and mapping it to its log standard deviation.\n\nDescription\n\nSplitGaussianLogDecoder is designed for VAE architectures where separate decoder networks are preferred for computing the mean and log standard deviation, ensuring that each has its own distinct set of parameters and transformation logic.\n\n\n\n\n\n","category":"type"},{"location":"encoders/#AutoEncoderToolkit.SplitGaussianLogDecoder-Tuple{AbstractArray}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.SplitGaussianLogDecoder","text":" (decoder::SplitGaussianLogDecoder)(z::AbstractArray)\n\nMaps the given latent representation z through the separate networks of the SplitGaussianLogDecoder to produce both the mean (µ) and log standard deviation (logσ).\n\nArguments\n\nz::AbstractArray: The latent space representation to be decoded. If array, the last dimension contains each of the latent space representations to be decoded.\n\nReturns\n\nA NamedTuple (µ=µ, logσ=logσ,) where:\nµ::AbstractArray: The mean representation obtained using the dedicated decoder_µ network.\nlogσ::AbstractArray: The log standard deviation representation obtained using the dedicated decoder_logσ network.\n\nDescription\n\nThis function processes the latent space representation z through two distinct neural networks within the SplitGaussianLogDecoder struct. The decoder_µ network is used to produce the mean representation, while the decoder_logσ network is utilized for the log standard deviation.\n\nExample\n\ndecoder = SplitGaussianLogDecoder(...)\nz = ... # some latent space representation\noutput = decoder(z))\n\nNote\n\nEnsure that the latent space representation z matches the expected input dimensionality for both networks in the SplitGaussianLogDecoder.\n\n\n\n\n\n","category":"method"},{"location":"encoders/#Default-initializations","page":"Encoders & Decoders","title":"Default initializations","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"The package provides a set of functions to initialize encoder and decoder architectures. Although it gives the user less flexibility, it can be useful for quick prototyping.","category":"page"},{"location":"encoders/#Encoder-initializations","page":"Encoders & Decoders","title":"Encoder initializations","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.Encoder(\n ::Int, ::Int, ::Vector{<:Int}, ::Vector{<:Function}, ::Function\n)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.Encoder-Tuple{Int64, Int64, Vector{<:Int64}, Vector{<:Function}, Function}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.Encoder","text":"Encoder(n_input, n_latent, latent_activation, encoder_neurons, \n encoder_activation; init=Flux.glorot_uniform)\n\nConstruct and initialize an Encoder struct that defines an encoder network for a deterministic autoencoder.\n\nArguments\n\nn_input::Int: The dimensionality of the input data.\nn_latent::Int: The dimensionality of the latent space.\nencoder_neurons::Vector{<:Int}: A vector specifying the number of neurons in each layer of the encoder network.\nencoder_activation::Vector{<:Function}: Activation functions corresponding to each layer in the encoder_neurons.\nlatent_activation::Function: Activation function for the latent space layer.\n\nOptional Keyword Arguments\n\ninit::Function=Flux.glorot_uniform: The initialization function used for the neural network weights.\n\nReturns\n\nAn Encoder struct initialized based on the provided arguments.\n\nExamples\n\njulia encoder = Encoder(784, 20, tanh, [400], [relu])`\n\nNotes\n\nThe length of encoderneurons should match the length of encoderactivation, ensuring that each layer in the encoder has a corresponding activation function.\n\n\n\n\n\n","category":"method"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.JointGaussianLogEncoder( \n ::Int, \n ::Int, \n ::Vector{<:Int}, \n ::Vector{<:Function}, \n ::Function;\n)\nAutoEncoderToolkit.JointGaussianLogEncoder(\n ::Int,\n ::Int,\n ::Vector{<:Int},\n ::Vector{<:Function},\n ::Vector{<:Function};\n)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.JointGaussianLogEncoder-Tuple{Int64, Int64, Vector{<:Int64}, Vector{<:Function}, Function}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.JointGaussianLogEncoder","text":"JointGaussianLogEncoder(n_input, n_latent, encoder_neurons, encoder_activation, \n latent_activation; init=Flux.glorot_uniform)\n\nConstruct and initialize a JointGaussianLogEncoder struct that defines an encoder network for a variational autoencoder.\n\nArguments\n\nn_input::Int: The dimensionality of the input data.\nn_latent::Int: The dimensionality of the latent space.\nencoder_neurons::Vector{<:Int}: A vector specifying the number of neurons in each layer of the encoder network.\nencoder_activation::Vector{<:Function}: Activation functions corresponding to each layer in the encoder_neurons.\nlatent_activation::Function: Activation function for the latent space layers (both µ and logσ).\n\nOptional Keyword Arguments\n\ninit::Function=Flux.glorot_uniform: The initialization function used for the neural network weights.\n\nReturns\n\nA JointGaussianLogEncoder struct initialized based on the provided arguments.\n\nExamples\n\nencoder = JointGaussianLogEncoder(784, 20, [400], [relu], tanh)\n\nNotes\n\nThe length of encoderneurons should match the length of encoderactivation, ensuring that each layer in the encoder has a corresponding activation function.\n\n\n\n\n\n","category":"method"},{"location":"encoders/#AutoEncoderToolkit.JointGaussianLogEncoder-Tuple{Int64, Int64, Vector{<:Int64}, Vector{<:Function}, Vector{<:Function}}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.JointGaussianLogEncoder","text":"JointGaussianLogEncoder(n_input, n_latent, encoder_neurons, encoder_activation, \n latent_activation; init=Flux.glorot_uniform)\n\nConstruct and initialize a JointGaussianLogEncoder struct that defines an encoder network for a variational autoencoder.\n\nArguments\n\nn_input::Int: The dimensionality of the input data.\nn_latent::Int: The dimensionality of the latent space.\nencoder_neurons::Vector{<:Int}: A vector specifying the number of neurons in each layer of the encoder network.\nencoder_activation::Vector{<:Function}: Activation functions corresponding to each layer in the encoder_neurons.\nlatent_activation::Vector{<:Function}: Activation functions for the latent space layers (both µ and logσ).\n\nOptional Keyword Arguments\n\ninit::Function=Flux.glorot_uniform: The initialization function used for the neural network weights.\n\nReturns\n\nA JointGaussianLogEncoder struct initialized based on the provided arguments.\n\nExamples\n\nencoder = JointGaussianLogEncoder(784, 20, [400], [relu], tanh)\n\nNotes\n\nThe length of encoderneurons should match the length of encoderactivation, ensuring that each layer in the encoder has a corresponding activation function.\n\n\n\n\n\n","category":"method"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.JointGaussianEncoder(\n ::Int,\n ::Int,\n ::Vector{<:Int},\n ::Vector{<:Function},\n ::Vector{<:Function};\n)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.JointGaussianEncoder-Tuple{Int64, Int64, Vector{<:Int64}, Vector{<:Function}, Vector{<:Function}}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.JointGaussianEncoder","text":"JointGaussianEncoder(n_input, n_latent, encoder_neurons, encoder_activation, \n latent_activation; init=Flux.glorot_uniform)\n\nConstruct and initialize a JointGaussianLogEncoder struct that defines an encoder network for a variational autoencoder.\n\nArguments\n\nn_input::Int: The dimensionality of the input data.\nn_latent::Int: The dimensionality of the latent space.\nencoder_neurons::Vector{<:Int}: A vector specifying the number of neurons in each layer of the encoder network.\nencoder_activation::Vector{<:Function}: Activation functions corresponding to each layer in the encoder_neurons.\nlatent_activation::Vector{<:Function}: Activation function for the latent space layers. This vector must contain the activation for both µ and logσ.\n\nOptional Keyword Arguments\n\ninit::Function=Flux.glorot_uniform: The initialization function used for the neural network weights.\n\nReturns\n\nA JointGaussianEncoder struct initialized based on the provided arguments.\n\nExamples\n\nencoder = JointGaussianEncoder(784, 20, [400], [relu], [tanh, softplus])\n\nNotes\n\nThe length of encoderneurons should match the length of encoderactivation, ensuring that each layer in the encoder has a corresponding activation function.\n\n\n\n\n\n","category":"method"},{"location":"encoders/#Decoder-initializations","page":"Encoders & Decoders","title":"Decoder initializations","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.Decoder(\n ::Int,\n ::Int,\n ::Vector{<:Int},\n ::Vector{<:Function},\n ::Function;\n)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.Decoder-Tuple{Int64, Int64, Vector{<:Int64}, Vector{<:Function}, Function}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.Decoder","text":"Decoder(n_input, n_latent, decoder_neurons, decoder_activation, \n output_activation; init=Flux.glorot_uniform)\n\nConstruct and initialize a Decoder struct that defines a decoder network for a deterministic autoencoder.\n\nArguments\n\nn_input::Int: The dimensionality of the output data (which typically matches the input data dimensionality of the autoencoder).\nn_latent::Int: The dimensionality of the latent space.\ndecoder_neurons::Vector{<:Int}: A vector specifying the number of neurons in each layer of the decoder network.\ndecoder_activation::Vector{<:Function}: Activation functions corresponding to each layer in the decoder_neurons.\noutput_activation::Function: Activation function for the final output layer.\n\nOptional Keyword Arguments\n\ninit::Function=Flux.glorot_uniform: The initialization function used for the neural network weights.\n\nReturns\n\nA Decoder struct initialized based on the provided arguments.\n\nExamples\n\ndecoder = Decoder(784, 20, sigmoid, [400], [relu])\n\nNotes\n\nThe length of decoderneurons should match the length of decoderactivation, ensuring that each layer in the decoder has a corresponding activation function.\n\n\n\n\n\n","category":"method"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.SimpleGaussianDecoder(\n ::Int,\n ::Int,\n ::Vector{<:Int},\n ::Vector{<:Function},\n ::Function;\n)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.SimpleGaussianDecoder-Tuple{Int64, Int64, Vector{<:Int64}, Vector{<:Function}, Function}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.SimpleGaussianDecoder","text":"SimpleGaussianDecoder(\n n_input, n_latent, decoder_neurons, \n decoder_activation, output_activation; \n init=Flux.glorot_uniform\n)\n\nConstructs and initializes a SimpleGaussianDecoder object designed for variational autoencoders (VAEs). This function sets up a straightforward decoder network that maps from a latent space to an output space.\n\nArguments\n\nn_input::Int: Dimensionality of the output data (or the data to be reconstructed).\nn_latent::Int: Dimensionality of the latent space.\ndecoder_neurons::Vector{<:Int}: Vector of layer sizes for the decoder network, not including the input latent layer and the final output layer.\ndecoder_activation::Vector{<:Function}: Activation functions for each decoder layer, not including the final output layer.\noutput_activation::Function: Activation function for the final output layer.\n\nOptional Keyword Arguments\n\ninit::Function=Flux.glorot_uniform: Initialization function for the network parameters.\n\nReturns\n\nA SimpleGaussianDecoder object with the specified architecture and initialized weights.\n\nDescription\n\nThis function constructs a SimpleGaussianDecoder object, setting up its decoder network based on the provided specifications. The architecture begins with a dense layer mapping from the latent space, goes through a sequence of middle layers if specified, and finally maps to the output space.\n\nThe function ensures that there are appropriate activation functions provided for each layer in the decoder_neurons and checks for potential mismatches in length.\n\nExample\n\nn_input = 28*28\nn_latent = 64\ndecoder_neurons = [128, 256]\ndecoder_activation = [relu, relu]\noutput_activation = sigmoid\ndecoder = SimpleGaussianDecoder(\n n_input, n_latent, decoder_neurons, decoder_activation, output_activation\n)\n\nNote\n\nEnsure that the lengths of decoderneurons and decoderactivation match, excluding the output layer.\n\n\n\n\n\n","category":"method"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.JointGaussianLogDecoder(\n ::Int,\n ::Int,\n ::Vector{<:Int},\n ::Vector{<:Function},\n ::Function;\n)\nAutoEncoderToolkit.JointGaussianLogDecoder(\n ::Int,\n ::Int,\n ::Vector{<:Int},\n ::Vector{<:Function},\n ::Vector{<:Function};\n)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.JointGaussianLogDecoder-Tuple{Int64, Int64, Vector{<:Int64}, Vector{<:Function}, Function}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.JointGaussianLogDecoder","text":"JointGaussianLogDecoder(n_input, n_latent, decoder_neurons, decoder_activation, \n latent_activation; init=Flux.glorot_uniform)\n\nConstructs and initializes a JointGaussianLogDecoder object for variational autoencoders (VAEs). This function sets up a decoder network that first processes the latent space and then maps it separately to both its mean (µ) and log standard deviation (logσ).\n\nArguments\n\nn_input::Int: Dimensionality of the output data (or the data to be reconstructed).\nn_latent::Int: Dimensionality of the latent space.\ndecoder_neurons::Vector{<:Int}: Vector of layer sizes for the primary decoder network, not including the input latent layer.\ndecoder_activation::Vector{<:Function}: Activation functions for each primary decoder layer.\noutput_activation::Function: Activation function for the mean (µ) and log standard deviation (logσ) layers.\n\nOptional Keyword Arguments\n\ninit::Function=Flux.glorot_uniform: Initialization function for the network parameters.\n\nReturns\n\nA JointGaussianLogDecoder object with the specified architecture and initialized weights.\n\nDescription\n\nThis function constructs a JointGaussianLogDecoder object, setting up its primary decoder network based on the provided specifications. The architecture begins with a dense layer mapping from the latent space and goes through a sequence of middle layers if specified. After processing the latent space through the primary decoder, it then maps separately to both its mean (µ) and log standard deviation (logσ).\n\nExample\n\nn_input = 28*28\nn_latent = 64\ndecoder_neurons = [128, 256]\ndecoder_activation = [relu, relu]\noutput_activation = tanh\ndecoder = JointGaussianLogDecoder(\n n_input, n_latent, decoder_neurons, decoder_activation, output_activation\n)\n\nNote\n\nEnsure that the lengths of decoderneurons and decoderactivation match.\n\n\n\n\n\n","category":"method"},{"location":"encoders/#AutoEncoderToolkit.JointGaussianLogDecoder-Tuple{Int64, Int64, Vector{<:Int64}, Vector{<:Function}, Vector{<:Function}}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.JointGaussianLogDecoder","text":"JointGaussianLogDecoder(n_input, n_latent, decoder_neurons, decoder_activation, \n latent_activation; init=Flux.glorot_uniform)\n\nConstructs and initializes a JointGaussianLogDecoder object for variational autoencoders (VAEs). This function sets up a decoder network that first processes the latent space and then maps it separately to both its mean (µ) and log standard deviation (logσ).\n\nArguments\n\nn_input::Int: Dimensionality of the output data (or the data to be reconstructed).\nn_latent::Int: Dimensionality of the latent space.\ndecoder_neurons::Vector{<:Int}: Vector of layer sizes for the primary decoder network, not including the input latent layer.\ndecoder_activation::Vector{<:Function}: Activation functions for each primary decoder layer.\noutput_activation::Vector{<:Function}: Activation functions for the mean (µ) and log standard deviation (logσ) layers.\n\nOptional Keyword Arguments\n\ninit::Function=Flux.glorot_uniform: Initialization function for the network parameters.\n\nReturns\n\nA JointGaussianLogDecoder object with the specified architecture and initialized weights.\n\nDescription\n\nThis function constructs a JointGaussianLogDecoder object, setting up its primary decoder network based on the provided specifications. The architecture begins with a dense layer mapping from the latent space and goes through a sequence of middle layers if specified. After processing the latent space through the primary decoder, it then maps separately to both its mean (µ) and log standard deviation (logσ).\n\nExample\n\nn_input = 28*28\nn_latent = 64\ndecoder_neurons = [128, 256]\ndecoder_activation = [relu, relu]\noutput_activation = [tanh, identity]\ndecoder = JointGaussianLogDecoder(\n n_input, n_latent, decoder_neurons, decoder_activation, latent_activation\n)\n\nNote\n\nEnsure that the lengths of decoderneurons and decoderactivation match.\n\n\n\n\n\n","category":"method"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.JointGaussianDecoder(\n ::Int,\n ::Int,\n ::Vector{<:Int},\n ::Vector{<:Function},\n ::Function;\n)\nAutoEncoderToolkit.JointGaussianDecoder(\n ::Int,\n ::Int,\n ::Vector{<:Int},\n ::Vector{<:Function},\n ::Vector{<:Function};\n)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.JointGaussianDecoder-Tuple{Int64, Int64, Vector{<:Int64}, Vector{<:Function}, Function}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.JointGaussianDecoder","text":"JointGaussianDecoder(n_input, n_latent, decoder_neurons, decoder_activation, \n latent_activation; init=Flux.glorot_uniform)\n\nConstructs and initializes a JointGaussianLogDecoder object for variational autoencoders (VAEs). This function sets up a decoder network that first processes the latent space and then maps it separately to both its mean (µ) and log standard deviation (logσ).\n\nArguments\n\nn_input::Int: Dimensionality of the output data (or the data to be reconstructed).\nn_latent::Int: Dimensionality of the latent space.\ndecoder_neurons::Vector{<:Int}: Vector of layer sizes for the primary decoder network, not including the input latent layer.\ndecoder_activation::Vector{<:Function}: Activation functions for each primary decoder layer.\noutput_activation::Function: Activation function for the mean (µ) and log standard deviation (logσ) layers.\n\nOptional Keyword Arguments\n\ninit::Function=Flux.glorot_uniform: Initialization function for the network parameters.\n\nReturns\n\nA JointGaussianDecoder object with the specified architecture and initialized weights.\n\nDescription\n\nThis function constructs a JointGaussianDecoder object, setting up its primary decoder network based on the provided specifications. The architecture begins with a dense layer mapping from the latent space and goes through a sequence of middle layers if specified. After processing the latent space through the primary decoder, it then maps separately to both its mean (µ) and standard deviation (σ).\n\nExample\n\nn_input = 28*28\nn_latent = 64\ndecoder_neurons = [128, 256]\ndecoder_activation = [relu, relu]\noutput_activation = tanh\ndecoder = JointGaussianDecoder(\n n_input, n_latent, decoder_neurons, decoder_activation, output_activation\n)\n\nNote\n\nEnsure that the lengths of decoderneurons and decoderactivation match.\n\n\n\n\n\n","category":"method"},{"location":"encoders/#AutoEncoderToolkit.JointGaussianDecoder-Tuple{Int64, Int64, Vector{<:Int64}, Vector{<:Function}, Vector{<:Function}}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.JointGaussianDecoder","text":"JointGaussianDecoder(n_input, n_latent, decoder_neurons, decoder_activation, \n latent_activation; init=Flux.glorot_uniform)\n\nConstructs and initializes a JointGaussianDecoder object for variational autoencoders (VAEs). This function sets up a decoder network that first processes the latent space and then maps it separately to both its mean (µ) and standard deviation (σ).\n\nArguments\n\nn_input::Int: Dimensionality of the output data (or the data to be reconstructed).\nn_latent::Int: Dimensionality of the latent space.\ndecoder_neurons::Vector{<:Int}: Vector of layer sizes for the primary decoder network, not including the input latent layer.\ndecoder_activation::Vector{<:Function}: Activation functions for each primary decoder layer.\noutput_activation::Function: Activation function for the mean (µ) and standard deviation (σ) layers.\n\nOptional Keyword Arguments\n\ninit::Function=Flux.glorot_uniform: Initialization function for the network parameters.\n\nReturns\n\nA JointGaussianDecoder object with the specified architecture and initialized weights.\n\nDescription\n\nThis function constructs a JointGaussianDecoder object, setting up its primary decoder network based on the provided specifications. The architecture begins with a dense layer mapping from the latent space and goes through a sequence of middle layers if specified. After processing the latent space through the primary decoder, it then maps separately to both its mean (µ) and standard deviation (σ).\n\nExample\n\nn_input = 28*28\nn_latent = 64\ndecoder_neurons = [128, 256]\ndecoder_activation = [relu, relu]\nlatent_activation = [tanh, softplus]\ndecoder = JointGaussianDecoder(\n n_input, n_latent, decoder_neurons, decoder_activation, latent_activation\n)\n\nNote\n\nEnsure that the lengths of decoderneurons and decoderactivation match.\n\n\n\n\n\n","category":"method"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.SplitGaussianLogDecoder(\n ::Int,\n ::Int,\n ::Vector{<:Int},\n ::Vector{<:Function},\n ::Vector{<:Int},\n ::Vector{<:Function};\n)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.SplitGaussianLogDecoder-Tuple{Int64, Int64, Vector{<:Int64}, Vector{<:Function}, Vector{<:Int64}, Vector{<:Function}}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.SplitGaussianLogDecoder","text":"SplitGaussianLogDecoder(n_input, n_latent, µ_neurons, µ_activation, logσ_neurons, \n logσ_activation; init=Flux.glorot_uniform)\n\nConstructs and initializes a SplitGaussianLogDecoder object for variational autoencoders (VAEs). This function sets up two distinct decoder networks, one dedicated for determining the mean (µ) and the other for the log standard deviation (logσ) of the latent space.\n\nArguments\n\nn_input::Int: Dimensionality of the output data (or the data to be reconstructed).\nn_latent::Int: Dimensionality of the latent space.\nµ_neurons::Vector{<:Int}: Vector of layer sizes for the µ decoder network, not including the input latent layer.\nµ_activation::Vector{<:Function}: Activation functions for each µ decoder layer.\nlogσ_neurons::Vector{<:Int}: Vector of layer sizes for the logσ decoder network, not including the input latent layer.\nlogσ_activation::Vector{<:Function}: Activation functions for each logσ decoder layer.\n\nOptional Keyword Arguments\n\ninit::Function=Flux.glorot_uniform: Initialization function for the network parameters.\n\nReturns\n\nA SplitGaussianLogDecoder object with two distinct networks initialized with the specified architectures and weights.\n\nDescription\n\nThis function constructs a SplitGaussianLogDecoder object, setting up two separate decoder networks based on the provided specifications. The first network, dedicated to determining the mean (µ), and the second for the log standard deviation (logσ), both begin with a dense layer mapping from the latent space and go through a sequence of middle layers if specified.\n\nExample\n\nn_latent = 64\nµ_neurons = [128, 256]\nµ_activation = [relu, relu]\nlogσ_neurons = [128, 256]\nlogσ_activation = [relu, relu]\ndecoder = SplitGaussianLogDecoder(\n n_latent, µ_neurons, µ_activation, logσ_neurons, logσ_activation\n)\n\nNotes\n\nEnsure that the lengths of µneurons with µactivation and logσneurons with logσactivation match respectively.\nIf µneurons[end] or logσneurons[end] do not match n_input, the function automatically changes this number to match the right dimensionality\n\n\n\n\n\n","category":"method"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.SplitGaussianDecoder(\n ::Int,\n ::Int,\n ::Vector{<:Int},\n ::Vector{<:Function},\n ::Vector{<:Int},\n ::Vector{<:Function};\n)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.SplitGaussianDecoder-Tuple{Int64, Int64, Vector{<:Int64}, Vector{<:Function}, Vector{<:Int64}, Vector{<:Function}}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.SplitGaussianDecoder","text":"SplitGaussianDecoder(n_input, n_latent, µ_neurons, µ_activation, logσ_neurons, \n logσ_activation; init=Flux.glorot_uniform)\n\nConstructs and initializes a SplitGaussianDecoder object for variational autoencoders (VAEs). This function sets up two distinct decoder networks, one dedicated for determining the mean (µ) and the other for the standard deviation (σ) of the latent space.\n\nArguments\n\nn_input::Int: Dimensionality of the output data (or the data to be reconstructed).\nn_latent::Int: Dimensionality of the latent space.\nµ_neurons::Vector{<:Int}: Vector of layer sizes for the µ decoder network, not including the input latent layer.\nµ_activation::Vector{<:Function}: Activation functions for each µ decoder layer.\nσ_neurons::Vector{<:Int}: Vector of layer sizes for the σ decoder network, not including the input latent layer.\nσ_activation::Vector{<:Function}: Activation functions for each σ decoder layer.\n\nOptional Keyword Arguments\n\ninit::Function=Flux.glorot_uniform: Initialization function for the network parameters.\n\nReturns\n\nA SplitGaussianDecoder object with two distinct networks initialized with the specified architectures and weights.\n\nDescription\n\nThis function constructs a SplitGaussianDecoder object, setting up two separate decoder networks based on the provided specifications. The first network, dedicated to determining the mean (µ), and the second for the standard deviation (σ), both begin with a dense layer mapping from the latent space and go through a sequence of middle layers if specified.\n\nExample\n\nn_latent = 64\nµ_neurons = [128, 256]\nµ_activation = [relu, relu]\nσ_neurons = [128, 256]\nσ_activation = [relu, relu]\ndecoder = SplitGaussianDecoder(\n n_latent, µ_neurons, µ_activation, σ_neurons, σ_activation\n)\n\nNotes\n\nEnsure that the lengths of µneurons with µactivation and σneurons with σactivation match respectively.\nIf µneurons[end] or σneurons[end] do not match n_input, the function automatically changes this number to match the right dimensionality\nEnsure that σ_neurons[end] maps to a positive value. Activation functions such as softplus are needed to guarantee the positivity of the standard deviation.\n\n\n\n\n\n","category":"method"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.BernoulliDecoder(\n ::Int,\n ::Int,\n ::Vector{<:Int},\n ::Vector{<:Function},\n ::Function;\n)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.BernoulliDecoder-Tuple{Int64, Int64, Vector{<:Int64}, Vector{<:Function}, Function}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.BernoulliDecoder","text":" BernoulliDecoder(n_input, n_latent, decoder_neurons, decoder_activation, \n output_activation; init=Flux.glorot_uniform)\n\nConstructs and initializes a BernoulliDecoder object designed for variational autoencoders (VAEs). This function sets up a decoder network that maps from a latent space to an output space.\n\nArguments\n\nn_input::Int: Dimensionality of the output data (or the data to be reconstructed).\nn_latent::Int: Dimensionality of the latent space.\ndecoder_neurons::Vector{<:Int}: Vector of layer sizes for the decoder network, not including the input latent layer and the final output layer.\ndecoder_activation::Vector{<:Function}: Activation functions for each decoder layer, not including the final output layer.\noutput_activation::Function: Activation function for the final output layer.\n\nOptional Keyword Arguments\n\ninit::Function=Flux.glorot_uniform: Initialization function for the network parameters.\n\nReturns\n\nA BernoulliDecoder object with the specified architecture and initialized weights.\n\nDescription\n\nThis function constructs a BernoulliDecoder object, setting up its decoder network based on the provided specifications. The architecture begins with a dense layer mapping from the latent space, goes through a sequence of middle layers if specified, and finally maps to the output space.\n\nThe function ensures that there are appropriate activation functions provided for each layer in the decoder_neurons and checks for potential mismatches in length.\n\nExample\n\nn_input = 28*28\nn_latent = 64\ndecoder_neurons = [128, 256]\ndecoder_activation = [relu, relu]\noutput_activation = sigmoid\ndecoder = BernoulliDecoder(\n n_input, \n n_latent, \n decoder_neurons, \n decoder_activation, \n output_activation\n)\n\nNote\n\nEnsure that the lengths of decoderneurons and decoderactivation match, excluding the output layer. Also, the output activation function should return values between 0 and 1, as the decoder models the output data as a Bernoulli distribution. \n\n\n\n\n\n","category":"method"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.CategoricalDecoder(\n ::AbstractVector{<:Int},\n ::Int,\n ::Vector{<:Int},\n ::Vector{<:Function},\n ::Function;\n)\nAutoEncoderToolkit.CategoricalDecoder(\n ::Int,\n ::Int,\n ::Vector{<:Int},\n ::Vector{<:Function},\n ::Function;\n)","category":"page"},{"location":"encoders/#AutoEncoderToolkit.CategoricalDecoder-Tuple{AbstractVector{<:Int64}, Int64, Vector{<:Int64}, Vector{<:Function}, Function}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.CategoricalDecoder","text":" CategoricalDecoder(\n size_input, n_latent, decoder_neurons, decoder_activation, \n output_activation; init=Flux.glorot_uniform\n )\n\nConstructs and initializes a CategoricalDecoder object designed for variational autoencoders (VAEs). This function sets up a decoder network that maps from a latent space to an output space.\n\nArguments\n\nsize_input::AbstractVector{<:Int}: Dimensionality of the output data (or the data to be reconstructed) in the form of a vector where each element represents the size of a dimension.\nn_latent::Int: Dimensionality of the latent space.\ndecoder_neurons::Vector{<:Int}: Vector of layer sizes for the decoder network, not including the input latent layer and the final output layer.\ndecoder_activation::Vector{<:Function}: Activation functions for each decoder layer, not including the final output layer.\noutput_activation::Function: Activation function for the final output layer.\n\nOptional Keyword Arguments\n\ninit::Function=Flux.glorot_uniform: Initialization function for the network parameters.\n\nReturns\n\nA CategoricalDecoder object with the specified architecture and initialized weights.\n\nDescription\n\nThis function constructs a CategoricalDecoder object, setting up its decoder network based on the provided specifications. The architecture begins with a dense layer mapping from the latent space, goes through a sequence of middle layers if specified, and finally maps to the output space.\n\nThe function ensures that there are appropriate activation functions provided for each layer in the decoder_neurons and checks for potential mismatches in length.\n\nThe output layer uses the identity function as its activation function, and the output is reshaped to match the dimensions specified in size_input. The output_activation function is then applied over the first dimension of the reshaped output.\n\nNote\n\nEnsure that the lengths of decoderneurons and decoderactivation match, excluding the output layer. Also, the output activation function should return values that can be interpreted as probabilities, as the decoder models the output data as a categorical distribution. \n\n\n\n\n\n","category":"method"},{"location":"encoders/#AutoEncoderToolkit.CategoricalDecoder-Tuple{Int64, Int64, Vector{<:Int64}, Vector{<:Function}, Function}","page":"Encoders & Decoders","title":"AutoEncoderToolkit.CategoricalDecoder","text":"CategoricalDecoder(\n n_input, n_latent, decoder_neurons, decoder_activation,\n output_activation; init=Flux.glorot_uniform\n)\n\nConstructs and initializes a CategoricalDecoder object designed for variational autoencoders (VAEs). This function sets up a decoder network that maps from a latent space to an output space.\n\nArguments\n\nsize_input::AbstractVector{<:Int}: Dimensionality of the output data (or the data to be reconstructed).\nn_latent::Int: Dimensionality of the latent space.\ndecoder_neurons::Vector{<:Int}: Vector of layer sizes for the decoder network, not including the input latent layer and the final output layer.\ndecoder_activation::Vector{<:Function}: Activation functions for each decoder layer, not including the final output layer.\noutput_activation::Function: Activation function for the final output layer.\n\nOptional Keyword Arguments\n\ninit::Function=Flux.glorot_uniform: Initialization function for the network parameters.\n\nReturns\n\nA CategoricalDecoder object with the specified architecture and initialized weights.\n\nDescription\n\nThis function constructs a CategoricalDecoder object, setting up its decoder network based on the provided specifications. The architecture begins with a dense layer mapping from the latent space, goes through a sequence of middle layers if specified, and finally maps to the output space.\n\nThe function ensures that there are appropriate activation functions provided for each layer in the decoder_neurons and checks for potential mismatches in length.\n\nNote\n\nEnsure that the lengths of decoderneurons and decoderactivation match, excluding the output layer. Also, the output activation function should return values that can be interpreted as probabilities, as the decoder models the output data as a categorical distribution. \n\n\n\n\n\n","category":"method"},{"location":"encoders/#Probabilistic-functions","page":"Encoders & Decoders","title":"Probabilistic functions","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"Given the probability-centered design of AutoEncoderToolkit.jl, each variational encoder and decoder has an associated probabilistic function used when computing the evidence lower bound (ELBO). The following functions are available:","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.encoder_logposterior","category":"page"},{"location":"encoders/#AutoEncoderToolkit.encoder_logposterior","page":"Encoders & Decoders","title":"AutoEncoderToolkit.encoder_logposterior","text":"encoder_logposterior(\n z::AbstractVector,\n encoder::AbstractGaussianLogEncoder,\n encoder_output::NamedTuple\n)\n\nComputes the log-posterior of the latent variable z given the encoder output under a Gaussian distribution with mean and standard deviation given by the encoder.\n\nArguments\n\nz::AbstractVector: The latent variable for which the log-posterior is to be computed.\nencoder::AbstractGaussianLogEncoder: The encoder of the VAE, which is not used in the computation of the log-posterior. This argument is only used to know which method to call.\nencoder_output::NamedTuple: The output of the encoder, which includes the mean and log standard deviation of the Gaussian distribution.\n\nReturns\n\nlogposterior::T: The computed log-posterior of the latent variable z given the encoder output.\n\nDescription\n\nThe function computes the log-posterior of the latent variable z given the encoder output under a Gaussian distribution. The mean and log standard deviation of the Gaussian distribution are extracted from the encoder_output. The standard deviation is then computed by exponentiating the log standard deviation. The log-posterior is computed using the formula for the log-posterior of a Gaussian distribution.\n\nNote\n\nEnsure the dimensions of z match the expected input dimensionality of the encoder.\n\n\n\n\n\nencoder_logposterior(\n z::AbstractMatrix,\n encoder::AbstractGaussianLogEncoder,\n encoder_output::NamedTuple\n)\n\nComputes the log-posterior of the latent variable z given the encoder output under a Gaussian distribution with mean and standard deviation given by the encoder.\n\nArguments\n\nz::AbstractMatrix: The latent variable for which the log-posterior is to be computed. Each column of z represents a different data point.\nencoder::AbstractGaussianLogEncoder: The encoder of the VAE, which is not used in the computation of the log-posterior. This argument is only used to know which method to call.\nencoder_output::NamedTuple: The output of the encoder, which includes the mean and log standard deviation of the Gaussian distribution.\n\nReturns\n\nlogposterior::Vector: The computed log-posterior of the latent variable z given the encoder output. Each element of the vector corresponds to a different data point.\n\nDescription\n\nThe function computes the log-posterior of the latent variable z given the encoder output under a Gaussian distribution. The mean and log standard deviation of the Gaussian distribution are extracted from the encoder_output. The standard deviation is then computed by exponentiating the log standard deviation. The log-posterior is computed using the formula for the log-posterior of a Gaussian distribution.\n\nNote\n\nEnsure the dimensions of z match the expected input dimensionality of the encoder.\n\n\n\n\n\nencoder_logposterior(\n z::AbstractVector,\n encoder::AbstractGaussianLogEncoder,\n encoder_output::NamedTuple,\n index::Int\n)\n\nComputes the log-posterior of the latent variable z for a single data point specified by index given the encoder output under a Gaussian distribution with mean and standard deviation given by the encoder.\n\nArguments\n\nz::AbstractVector: The latent variable for which the log-posterior is to be computed. \nencoder::AbstractGaussianLogEncoder: The encoder of the VAE, which is not used in the computation of the log-posterior. This argument is only used to know which method to call.\nencoder_output::NamedTuple: The output of the encoder, which includes the mean and log standard deviation of the Gaussian distribution for multiple data points.\nindex::Int: The index of the data point for which the log-posterior is to be computed.\n\nReturns\n\nlogposterior::Float32: The computed log-posterior of the latent variable z for the specified data point given the encoder output.\n\nDescription\n\nThe function computes the log-posterior of the latent variable z for a single data point specified by index given the encoder output under a Gaussian distribution. The mean and log standard deviation of the Gaussian distribution are extracted from the encoder_output for the specified data point. The standard deviation is then computed by exponentiating the log standard deviation. The log-posterior is computed using the formula for the log-posterior of a Gaussian distribution.\n\nNote\n\nEnsure the dimensions of z match the expected input dimensionality of the encoder. Also, ensure that index is a valid index for the data points in encoder_output.\n\n\n\n\n\n","category":"function"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.encoder_kl","category":"page"},{"location":"encoders/#AutoEncoderToolkit.encoder_kl","page":"Encoders & Decoders","title":"AutoEncoderToolkit.encoder_kl","text":"encoder_kl(\n encoder::AbstractGaussianLogEncoder,\n encoder_output::NamedTuple\n)\n\nCalculate the Kullback-Leibler (KL) divergence between the approximate posterior distribution and the prior distribution in a variational autoencoder with a Gaussian encoder.\n\nThe KL divergence for a Gaussian encoder with mean encoder_µ and log standard deviation encoder_logσ is computed against a standard Gaussian prior.\n\nArguments\n\nencoder::AbstractGaussianLogEncoder: Encoder network. This argument is not used in the computation of the KL divergence, but is included to allow for multiple encoder types to be used with the same function.\nencoder_output::NamedTuple: NamedTuple containing all the encoder outputs. It should have fields μ and logσ representing the mean and log standard deviation of the encoder's output.\n\nReturns\n\nkl_div::Union{Number, Vector}: The KL divergence for the entire batch of data points. If encoder_µ is a vector, kl_div is a scalar. If encoder_µ is a matrix, kl_div is a vector where each element corresponds to the KL divergence for a batch of data points.\n\nNote\n\nIt is assumed that the mapping from data space to latent parameters (encoder_µ and encoder_logσ) has been performed prior to calling this function. The encoder argument is provided to indicate the type of decoder network used, but it is not used within the function itself.\n\n\n\n\n\n","category":"function"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"AutoEncoderToolkit.spherical_logprior","category":"page"},{"location":"encoders/#AutoEncoderToolkit.spherical_logprior","page":"Encoders & Decoders","title":"AutoEncoderToolkit.spherical_logprior","text":"spherical_logprior(z::AbstractVector, σ::Real=1.0f0)\n\nComputes the log-prior of the latent variable z under a spherical Gaussian distribution with zero mean and standard deviation σ.\n\nArguments\n\nz::AbstractVector: The latent variable for which the log-prior is to be computed.\nσ::T=1.0f0: The standard deviation of the spherical Gaussian distribution. Defaults to 1.0f0.\n\nReturns\n\nlogprior::T: The computed log-prior of the latent variable z.\n\nDescription\n\nThe function computes the log-prior of the latent variable z under a spherical Gaussian distribution with zero mean and standard deviation σ. The log-prior is computed using the formula for the log-prior of a Gaussian distribution.\n\nNote\n\nEnsure the dimension of z matches the expected dimensionality of the latent space.\n\n\n\n\n\nspherical_logprior(z::AbstractMatrix, σ::Real=1.0f0)\n\nComputes the log-prior of the latent variable z under a spherical Gaussian distribution with zero mean and standard deviation σ.\n\nArguments\n\nz::AbstractMatrix: The latent variable for which the log-prior is to be computed. Each column of z represents a different latent variable.\nσ::Real=1.0f0: The standard deviation of the spherical Gaussian distribution. Defaults to 1.0f0.\n\nReturns\n\nlogprior::T: The computed log-prior(s) of the latent variable z.\n\nDescription\n\nThe function computes the log-prior of the latent variable z under a spherical Gaussian distribution with zero mean and standard deviation σ. The log-prior is computed using the formula for the log-prior of a Gaussian distribution.\n\nNote\n\nEnsure the dimension of z matches the expected dimensionality of the latent space.\n\n\n\n\n\n","category":"function"},{"location":"encoders/#Defining-custom-encoder-and-decoder-types","page":"Encoders & Decoders","title":"Defining custom encoder and decoder types","text":"","category":"section"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"note: Note\nWe will omit all docstrings in the following examples for brevity. However, every struct and function in AutoEncoderToolkit.jl is well-documented.","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"Let us imagine your particular task requires a custom encoder or decoder type. For example, let's imagine that for a particular application, you need a decoder whose output distribution is Poisson. In other words, the assumption is that each dimension in the input x_i is a sample from a Poisson distribution with mean lambda_i. Thus, on the decoder side, what the decoder return is a vector of these lambda paraeters. We thus need to define a custom decoder type.","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"struct PoissonDecoder <: AbstractVariationalDecoder\n decoder::Flux.Chain\nend # struct","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"With this struct defined, we need to define the forward-pass function for our custom PoissonDecoder. All decoders in AutoEncoderToolkit.jl return a NamedTuple with the corresponding parameters of the distribution that defines them. In this case, the Poisson distribution is defined by a single parameter lambda. Thus, we have a forward-pass of the form","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"function (decoder::PoissonDecoder)(z::AbstractArray)\n # Run input to decoder network\n return (λ=decoder.decoder(z),)\nend # function","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"Next, we need to define the probabilistic function associated with this decoder. We know that the probability of observing x_i given lambda_i is given by","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"P(x_i lambda_i) = fraclambda_i^x_i e^-lambda_ix_i\ntag1","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"If each x_i is independent, then the probability of observing the entire input x given the entire output lambda is given by the product of the individual probabilities, i.e.","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"P(x lambda) = prod_i P(x_i lambda_i)\ntag2","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"The log-likehood of the data given the output of the decoder is then given by","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"mathcalL(x lambda) = log P(x lambda) = sum_i log P(x_i lambda_i)\ntag3","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"which, by using the properties of the logarithm, can be written as","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"mathcalL(x lambda) = sum_i x_i log lambda_i - lambda_i - log(x_i)\ntag4","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"We can then define the probabilistic function associated with the PoissonDecoder as","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"function decoder_loglikelihood(\n x::AbstractArray,\n z::AbstractVector,\n decoder::PoissonDecoder,\n decoder_output::NamedTuple;\n)\n # Extract the lambda parameter of the Poisson distribution\n λ = decoder_output.λ\n\n # Compute log-likelihood\n loglikelihood = sum(x .* log.(λ) - λ - loggamma.(x .+ 1))\n\n return loglikelihood\nend # function","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"where we use the loggamma function from SpecialFunctions.jl to compute the log of the factorial of x_i.","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"warning: Warning\nWe only defined the decoder_loglikelihood method for z::AbstractVector. One should also include a method for z::AbstractMatrix used when performing batch training.","category":"page"},{"location":"encoders/","page":"Encoders & Decoders","title":"Encoders & Decoders","text":"With these two functions defined, our PoissonDecoder is ready to be used with any of the different VAE flavors included in AutoEncoderToolkit.jl!","category":"page"},{"location":"diffgeo/#Differential-Geometry-of-Generative-Models","page":"Differential Geometry","title":"Differential Geometry of Generative Models","text":"","category":"section"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"A lot of recent research in the field of generative models has focused on the geometry of the learned latent space (see the references at the end of this section for examples). The non-linear nature of neural networks makes it relevant to consider the non-Euclidean geometry of the latent space when trying to gain insights into the structure of the learned space. In other words, given that neural networks involve a series of non-linear transformations of the input data, we cannot expect the latent space to be Euclidean, and thus, we need to account for curvature and other non-Euclidean properties. For this, we can borrow concepts and tools from Riemannian geometry, now applied to the latent space of generative models.","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"AutoEncoderToolkit.jl aims to provide the set of necessary tools to study the geometry of the latent space in the context of variational autoencoders generative models.","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"note: Note\nThis is very much work in progress. As always, contributions are welcome!","category":"page"},{"location":"diffgeo/#A-word-on-Riemannian-geometry","page":"Differential Geometry","title":"A word on Riemannian geometry","text":"","category":"section"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"In what follows we will give a very short primer on some relevant concepts in differential geometry. This includes some basic definitions and concepts along with what we consider intuitive explanations of the concepts. We trade rigor for accessibility, so if you are looking for a more formal treatment, this is not the place.","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"note: Note\nThese notes are partially based on the 2022 paper by Chadebec et al. [2].","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"A d-dimensional manifold mathcalM is a manifold that is locally homeomorphic to a d-dimensional Euclidean space. This means that the manifold–some surface or high-dimensional shape–when observed from really close, can be stretched or bent without tearing or gluing it to make it resemble regular Euclidean space. ","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"If the manifold is differentiable, it possesses a tangent space T_z at any point z in mathcalM composed of the tangent vectors of the curves passing by z. ","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"(Image: )","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"If the manifold mathcalM is equipped with a smooth inner product, ","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"g z rightarrow langle cdot mid cdot rangle_z\ntag1","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"defined on the tangent space T_z for any z in mathcalM, then mathcalM is a Riemannian manifold and g is the associated Riemannian metric. With this, a local representation of g at any point z is given by the positive definite matrix mathbfG(z).","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"A chart (fancy name for a coordinate system) (U phi) provides a homeomorphic mapping between an open set U of the manifold and an open set V of Euclidean space. This means that there is a way to bend and stretch any segment of the manifold to make it look like a segment of Euclidean space. Therefore, given a point z in U, a chart–its coordinate–phi (z_1 z_2 ldots z_d) induces a basis partial_z_1 partial_z_2 ldots partial_z_d on the tangent space T_z mathcalM. In other words, the partial derivatives of the manifold with respect to the dimensions form a basis (think of hati hatj hatk in 3D space) for the tangent space at that point. Hence, the metric–a \"position-dependent scale-bar\"–of a Riemannian manifold can be locally represented at phi as a positive definite matrix mathbfG(z) with components g_ij(z) of the form","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"g_ij(z) = langle partial_z_i mid partial_z_j rangle_z\ntag2","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"This implies that for every pair of vectors v w in T_z mathcalM and a point z in mathcalM, the inner product langle v mid w rangle_z is given by","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"langle v mid w rangle_z = v^T mathbfG(z) w\ntag3","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"If mathcalM is connected–a continuous shape with no breaks–a Riemannian distance between two points z_1 z_2 in mathcalM can be defined as","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"textdist(z_1 z_2) = min_gamma int_0^1 dt\nsqrtlangle dotgamma(t) mid dotgamma(t) rangle_gamma(t)\ntag4","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"where gamma is a 1D curve traveling from z_1 to z_2, i.e., gamma(0) = z_1 and gamma(1) = z_2. Another way to state this is that the length of a curve on the manifold gamma is given by","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"L(gamma) = int_0^1 dt \nsqrtlangle dotgamma(t) mid dotgamma(t) rangle_gamma(t)\ntag5","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"If L minimizes the distance between the initial and final points, then gamma is a geodesic curve.","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"The concept of geodesic is so important the study of the Riemannian manifold learned by generative models that let's try to give another intuitive explanation. Let us consider a curve gamma such that","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"gamma 0 1 rightarrow mathbbR^d\ntag6","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"In words, gamma is a function that, without loss of generality, maps a number between zero and one to the dimensionality of the latent space (the dimensionality of our manifold). Let us define f to be a continuous function that embeds any point along the curve gamma into the data space, i.e.,","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"f gamma(t) rightarrow x in mathbbR^n\ntag7","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"where n is the dimensionality of the data space. ","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"(Image: )","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"The length of this curve in the data space is given by","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"L(gamma) = int_0^1 dt\nleft fracd fdt right_2\ntag8","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"After some manipulation, we can show that the length of the curve in the data space is given by","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"L(gamma) = int_0^1 dt\nsqrt\n dotgamma(t)^T mathbfG(gamma(t)) dotgamma(t)\n\ntag9","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"where dotgamma(t) is the derivative of gamma with respect to t, and T denotes the transpose of a vector. For a Euclidean space, the length of the curve would take the same functional form, except that the metric tensor would be given by the identity matrix. This is why the metric tensor can be thought of as a position-dependent scale-bar.","category":"page"},{"location":"diffgeo/#neuralgeodesic","page":"Differential Geometry","title":"Neural Geodesic Networks","text":"","category":"section"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"Computing a geodesic on a Riemannian manifold is a non-trivial task, especially when the manifold is parametrized by a neural network. Thus, knowing the function gamma that minimizes the distance between two points z_1 and z_2 is not straightforward. However, as first suggested by Chen et al. [1], we can repurpose the expressivity of neural networks to approximate almost any function to approximate the geodesic curve. This is the idea behind the Neural Geodesic module in AutoEncoderToolkit.jl.","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"Briefly, to approximate the geodesic curve between two points z_1 and z_2 in latent space, we define a neural network g_omega such that","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"g_omega mathbbR rightarrow mathbbR^d\ntag10","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"i.e., the neural network takes a number between zero and one and maps it to the dimensionality of the latent space. The intention is to have g_omega approx gamma, where omega are the parameters of the neural network we are free to optimize.","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"We approximate the integral defining the length of the curve in the latent space with n equidistantly sampled points t_i between zero and one. The length of the curve is then approximated by","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"L(g_gamma(t)) approx frac1n sum_i=1^n \nsqrt\n dotg_omega(t_i)^T mathbfG(g_omega(t_i)) dotg_omega(t_i)\n","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"By setting the loss function to be this approximation of the length of the curve, we can train the neural network to approximate the geodesic curve.","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"AutoEncoderToolkit.jl provides the NeuralGeodesic struct to implement this idea. The struct takes three inputs:","category":"page"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"The multi-layer perceptron (MLP) that approximates the geodesic curve.\nThe initial point in latent space.\nThe final point in latent space.","category":"page"},{"location":"diffgeo/#NeuralGeodesic-struct","page":"Differential Geometry","title":"NeuralGeodesic struct","text":"","category":"section"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"AutoEncoderToolkit.diffgeo.NeuralGeodesics.NeuralGeodesic","category":"page"},{"location":"diffgeo/#AutoEncoderToolkit.diffgeo.NeuralGeodesics.NeuralGeodesic","page":"Differential Geometry","title":"AutoEncoderToolkit.diffgeo.NeuralGeodesics.NeuralGeodesic","text":"NeuralGeodesic\n\nType to define a neural network that approximates a geodesic curve on a Riemanian manifold. If a curve γ̲(t) represents a geodesic curve on a manifold, i.e.,\n\nL(γ̲) = min_γ ∫ dt √(⟨γ̲̇(t), M̲̲ γ̲̇(t)⟩),\n\nwhere M̲̲ is the Riemmanian metric, then this type defines a neural network g_ω(t) such that\n\nγ̲(t) ≈ g_ω(t).\n\nThis neural network must have a single input (1D). The dimensionality of the output must match the dimensionality of the manifold.\n\nFields\n\nmlp::Flux.Chain: Neural network that approximates the geodesic curve. The dimensionality of the input must be one.\nz_init::AbstractVector: Initial position of the geodesic curve on the latent space.\nz_end::AbstractVector: Final position of the geodesic curve on the latent space.\n\nCitation\n\nChen, N. et al. Metrics for Deep Generative Models. in Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics 1540–1550 (PMLR, 2018).\n\n\n\n\n\n","category":"type"},{"location":"diffgeo/#NeuralGeodesic-forward-pass","page":"Differential Geometry","title":"NeuralGeodesic forward pass","text":"","category":"section"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"AutoEncoderToolkit.diffgeo.NeuralGeodesics.NeuralGeodesic(::AbstractVector)","category":"page"},{"location":"diffgeo/#AutoEncoderToolkit.diffgeo.NeuralGeodesics.NeuralGeodesic-Tuple{AbstractVector}","page":"Differential Geometry","title":"AutoEncoderToolkit.diffgeo.NeuralGeodesics.NeuralGeodesic","text":" (g::NeuralGeodesic)(t::AbstractArray)\n\nComputes the output of the NeuralGeodesic at each given time in t by scaling and shifting the output of the neural network.\n\nArguments\n\nt::AbstractArray: An array of times at which the output of the NeuralGeodesic is to be computed. This must be within the interval [0, 1].\n\nReturns\n\noutput::Array: The computed output of the NeuralGeodesic at each time in t.\n\nDescription\n\nThe function computes the output of the NeuralGeodesic at each given time in t. The steps are:\n\nCompute the output of the neural network at each time in t.\nCompute the output of the neural network at time 0 and 1.\nCompute scale and shift parameters based on the initial and end points of the geodesic and the neural network outputs at times 0 and 1.\nScale and shift the output of the neural network at each time in t according to these parameters. The result is the output of the NeuralGeodesic at each time in t.\n\nScale and shift parameters are defined as:\n\nscale = (zinit - zend) / (ẑinit - ẑend)\nshift = (zinit * ẑend - zend * ẑinit) / (ẑinit - ẑend)\n\nwhere zinit and zend are the initial and end points of the geodesic, and ẑinit and ẑend are the outputs of the neural network at times 0 and 1, respectively.\n\nNote\n\nEnsure that each t in the array is within the interval [0, 1].\n\n\n\n\n\n","category":"method"},{"location":"diffgeo/#NeuralGeodesic-loss-function","page":"Differential Geometry","title":"NeuralGeodesic loss function","text":"","category":"section"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"AutoEncoderToolkit.diffgeo.NeuralGeodesics.loss","category":"page"},{"location":"diffgeo/#AutoEncoderToolkit.diffgeo.NeuralGeodesics.loss","page":"Differential Geometry","title":"AutoEncoderToolkit.diffgeo.NeuralGeodesics.loss","text":"loss(\n curve::NeuralGeodesic,\n rhvae::RHVAE,\n t::AbstractVector;\n curve_velocity::Function=curve_velocity_TaylorDiff,\n curve_integral::Function=curve_length,\n)\n\nFunction to compute the loss for a given curve on a Riemmanian manifold. The loss is defined as the integral over the curve, computed using the provided curve_integral function (either length or energy).\n\nArguments\n\ncurve::NeuralGeodesic: The curve on the Riemmanian manifold.\nrhvae::RHVAE: The Riemmanian Hamiltonian Variational AutoEncoder used to compute the Riemmanian metric tensor.\nt::AbstractVector: Vector of time points at which the curve is sampled.\n\nOptional Keyword Arguments\n\ncurve_velocity::Function=curve_velocity_TaylorDiff: Function to compute the velocity of the curve. Default is curve_velocity_TaylorDiff. Also accepts curve_velocity_finitediff.\ncurve_integral::Function=curve_length: Function to compute the integral over the curve. Default is curve_energy. Also accepts curve_length.\n\nReturns\n\nLoss::Number: The computed loss for the given curve.\n\nNotes\n\nThis function first computes the geodesic curve using the provided curve function. It then computes the Riemmanian metric tensor using the metric_tensor function from the RHVAE module with the computed curve and the provided rhvae. The velocity of the curve is then computed using the provided curve_velocity function. Finally, the integral over the curve is computed using the provided curve_integral function and returned as the loss.\n\n\n\n\n\n","category":"function"},{"location":"diffgeo/#NeuralGeodesic-training","page":"Differential Geometry","title":"NeuralGeodesic training","text":"","category":"section"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"AutoEncoderToolkit.diffgeo.NeuralGeodesics.train!","category":"page"},{"location":"diffgeo/#AutoEncoderToolkit.diffgeo.NeuralGeodesics.train!","page":"Differential Geometry","title":"AutoEncoderToolkit.diffgeo.NeuralGeodesics.train!","text":"train!(\n curve::NeuralGeodesic,\n rhvae::RHVAE,\n t::AbstractVector,\n opt::NamedTuple;\n loss::Function=loss,\n loss_kwargs::Dict=Dict(),\n verbose::Bool=false,\n loss_return::Bool=false,\n)\n\nFunction to train a NeuralGeodesic model using a Riemmanian Hamiltonian Variational AutoEncoder (RHVAE). The training process involves computing the gradient of the loss function and updating the model parameters accordingly.\n\nArguments\n\ncurve::NeuralGeodesic: The curve on the Riemmanian manifold.\nrhvae::RHVAE: The Riemmanian Hamiltonian Variational AutoEncoder used to compute the Riemmanian metric tensor.\nt::AbstractVector: Vector of time points at which the curve is sampled. These must be equally spaced.\nopt::NamedTuple: The optimization parameters.\n\nOptional Keyword Arguments\n\nloss_function::Function=loss: The loss function to be minimized during training. Default is loss.\nloss_kwargs::Dict=Dict(): Additional keyword arguments to be passed to the loss function.\nverbose::Bool=false: If true, the loss value is printed at each iteration.\nloss_return::Bool=false: If true, the function returns the loss value.\n\nReturns\n\nLoss::Number: The computed loss for the given curve. This is only returned if loss_return is true.\n\nNotes\n\nThis function first computes the gradient of the loss function with respect to the model parameters. It then updates the model parameters using the computed gradient and the provided optimization parameters. If verbose is true, the loss value is printed at each iteration. If loss_return is true, the function returns the loss value.\n\n\n\n\n\n","category":"function"},{"location":"diffgeo/#Other-functions-for-NeuralGeodesic","page":"Differential Geometry","title":"Other functions for NeuralGeodesic","text":"","category":"section"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"AutoEncoderToolkit.diffgeo.NeuralGeodesics.curve_velocity_TaylorDiff\nAutoEncoderToolkit.diffgeo.NeuralGeodesics.curve_velocity_finitediff\nAutoEncoderToolkit.diffgeo.NeuralGeodesics.curve_length\nAutoEncoderToolkit.diffgeo.NeuralGeodesics.curve_energy","category":"page"},{"location":"diffgeo/#AutoEncoderToolkit.diffgeo.NeuralGeodesics.curve_velocity_TaylorDiff","page":"Differential Geometry","title":"AutoEncoderToolkit.diffgeo.NeuralGeodesics.curve_velocity_TaylorDiff","text":"curve_velocity_TaylorDiff(\n curve::NeuralGeodesic,\n t\n)\n\nCompute the velocity of a neural geodesic curve at a given time using Taylor differentiation.\n\nThis function takes a NeuralGeodesic instance and a time t, and computes the velocity of the curve at that time using Taylor differentiation. The computation is performed for each dimension of the latent space.\n\nArguments\n\ncurve::NeuralGeodesic: The neural geodesic curve.\nt: The time at which to compute the velocity.\n\nReturns\n\nA vector representing the velocity of the curve at time t.\n\nNotes\n\nThis function uses the TaylorDiff package to compute derivatives. Please note that TaylorDiff has limited support for certain activation functions. If you encounter an error while using this function, it may be due to the activation function used in your NeuralGeodesic instance.\n\n\n\n\n\ncurve_velocity_TaylorDiff(\n curve::NeuralGeodesic,\n t::AbstractVector\n)\n\nCompute the velocity of a neural geodesic curve at each time in a vector of times using Taylor differentiation.\n\nThis function takes a NeuralGeodesic instance and a vector of times t, and computes the velocity of the curve at each time using Taylor differentiation. The computation is performed for each dimension of the latent space and each time in t.\n\nArguments\n\ncurve::NeuralGeodesic: The neural geodesic curve.\nt::AbstractVector: The vector of times at which to compute the velocity.\n\nReturns\n\nA matrix where each column represents the velocity of the curve at a time in t.\n\nNotes\n\nThis function uses the TaylorDiff package to compute derivatives. Please note that TaylorDiff has limited support for certain activation functions. If you encounter an error while using this function, it may be due to the activation function used in your NeuralGeodesic instance.\n\n\n\n\n\n","category":"function"},{"location":"diffgeo/#AutoEncoderToolkit.diffgeo.NeuralGeodesics.curve_velocity_finitediff","page":"Differential Geometry","title":"AutoEncoderToolkit.diffgeo.NeuralGeodesics.curve_velocity_finitediff","text":"curve_velocity_finitediff(\n curve::NeuralGeodesic,\n t::AbstractVector;\n fdtype::Symbol=:central,\n)\n\nCompute the velocity of a neural geodesic curve at each time in a vector of times using finite difference methods.\n\nThis function takes a NeuralGeodesic instance, a vector of times t, and an optional finite difference type fdtype (which can be either :forward or :central), and computes the velocity of the curve at each time using the specified finite difference method. The computation is performed for each dimension of the latent space and each time in t.\n\nArguments\n\ncurve::NeuralGeodesic: The neural geodesic curve.\nt::AbstractVector: The vector of times at which to compute the velocity.\nfdtype::Symbol=:central: The type of finite difference method to use. Can be either :forward or :central. Default is :central.\n\nReturns\n\nA matrix where each column represents the velocity of the curve at a time in t.\n\nNotes\n\nThis function uses finite difference methods to compute derivatives. Please note that the accuracy of the computed velocities depends on the chosen finite difference method and the step size used, which is determined by the machine epsilon of the type of t.\n\n\n\n\n\n","category":"function"},{"location":"diffgeo/#AutoEncoderToolkit.diffgeo.NeuralGeodesics.curve_length","page":"Differential Geometry","title":"AutoEncoderToolkit.diffgeo.NeuralGeodesics.curve_length","text":"curve_length(\n riemannian_metric::AbstractArray,\n curve_velocity::AbstractArray,\n t::AbstractVector;\n)\n\nFunction to compute the (discretized) integral defining the length of a curve γ̲ on a Riemmanina manifold. The length is defined as\n\nL(γ̲) = ∫ dt √(⟨γ̲̇(t), G̲̲ γ̲̇(t)⟩),\n\nwhere γ̲̇(t) defines the velocity of the parametric curve, and G̲̲ is the Riemmanian metric tensor. For this function, we approximate the integral as\n\nL(γ̲) ≈ ∑ᵢ Δt √(⟨γ̲̇(tᵢ)ᵀ G̲̲ (γ̲(tᵢ+1)) γ̲̇(tᵢ))⟩),\n\nwhere Δt is the time step between points. Note that this Δt is assumed to be constant, thus, the time points t must be equally spaced.\n\nArguments\n\nriemannian_metric::AbstractArray: d×d×N tensor where d is the dimension of the manifold on which the curve lies and N is the number of sampled time points along the curve. Each slice of the array represents the Riemmanian metric tensor for the curve at the corresponding time point.\ncurve_velocity::AbstractArray: d×N Matrix where d is the dimension of the manifold on which the curve lies and N is the number of sampled time points along the curve. Each column represents the velocity of the curve at the corresponding time point.\nt::AbstractVector: Vector of time points at which the curve is sampled.\n\nReturns\n\nLength::Number: Approximation of the Length for the path on the manifold.\n\n\n\n\n\n","category":"function"},{"location":"diffgeo/#AutoEncoderToolkit.diffgeo.NeuralGeodesics.curve_energy","page":"Differential Geometry","title":"AutoEncoderToolkit.diffgeo.NeuralGeodesics.curve_energy","text":"curve_energy(\n riemannian_metric::AbstractArray,\n curve_velocity::AbstractArray,\n t::AbstractVector;\n)\n\nFunction to compute the (discretized) integral defining the energy of a curve γ̲ on a Riemmanina manifold. The energy is defined as\n\n E(γ̲) = ∫ dt ⟨γ̲̇(t), G̲̲ γ̲̇(t)⟩,\n\nwhere γ̲̇(t) defines the velocity of the parametric curve, and G̲̲ is the Riemmanian metric tensor. For this function, we approximate the integral as\n\n E(γ̲) ≈ ∑ᵢ Δt ⟨γ̲̇(tᵢ)ᵀ G̲̲ (γ̲(tᵢ+1) γ̲̇(tᵢ))⟩,\n\nwhere Δt is the time step between points. Note that this Δt is assumed to be constant, thus, the time points t must be equally spaced.\n\nArguments\n\nriemannian_metric::AbstractArray: d×d×N tensor where d is the dimension of the manifold on which the curve lies and N is the number of sampled time points along the curve. Each slice of the array represents the Riemmanian metric tensor for the curve at the corresponding time point.\ncurve_velocity::AbstractArray: d×N Matrix where d is the dimension of the manifold on which the curve lies and N is the number of sampled time points along the curve. Each column represents the velocity of the curve at the corresponding time point.\nt::AbstractVector: Vector of time points at which the curve is sampled.\n\nReturns\n\nEnergy::Number: Approximation of the Energy for the path on the manifold.\n\n\n\n\n\n","category":"function"},{"location":"diffgeo/#diffgeoref","page":"Differential Geometry","title":"References","text":"","category":"section"},{"location":"diffgeo/","page":"Differential Geometry","title":"Differential Geometry","text":"Chen, N. et al. Metrics for Deep Generative Models. in Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics 1540–1550 (PMLR, 2018).\nChadebec, C. & Allassonnière, S. A Geometric Perspective on Variational Autoencoders. Preprint at http://arxiv.org/abs/2209.07370 (2022).\nChadebec, C., Mantoux, C. & Allassonnière, S. Geometry-Aware Hamiltonian Variational Auto-Encoder. Preprint at http://arxiv.org/abs/2010.11518 (2020).\nArvanitidis, G., Hauberg, S., Hennig, P. & Schober, M. Fast and Robust Shortest Paths on Manifolds Learned from Data. in Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics 1506–1515 (PMLR, 2019).\nArvanitidis, G., Hauberg, S. & Schölkopf, B. Geometrically Enriched Latent Spaces. Preprint at https://doi.org/10.48550/arXiv.2008.00565 (2020).\nArvanitidis, G., González-Duque, M., Pouplin, A., Kalatzis, D. & Hauberg, S. Pulling back information geometry. Preprint at http://arxiv.org/abs/2106.05367 (2022).\nFröhlich, C., Gessner, A., Hennig, P., Schölkopf, B. & Arvanitidis, G. Bayesian Quadrature on Riemannian Data Manifolds.\nKalatzis, D., Eklund, D., Arvanitidis, G. & Hauberg, S. Variational Autoencoders with Riemannian Brownian Motion Priors. Preprint at http://arxiv.org/abs/2002.05227 (2020).\nArvanitidis, G., Hansen, L. K. & Hauberg, S. Latent Space Oddity: on the Curvature of Deep Generative Models. Preprint at http://arxiv.org/abs/1710.11379 (2021).","category":"page"},{"location":"ae/#AEsmodule","page":"Deterministic Autoencoders","title":"Deterministic Autoencoder","text":"","category":"section"},{"location":"ae/","page":"Deterministic Autoencoders","title":"Deterministic Autoencoders","text":"The deterministic autoencoders are a type of neural network that learns to embed high-dimensional data into a lower-dimensional space in a one-to-one fashion. The AEs module provides the necessary tools to train these networks. The main type is the AE struct, which is a simple feedforward neural network composed of two parts: an Encoder and a Decoder.","category":"page"},{"location":"ae/#Autoencoder-struct-AE","page":"Deterministic Autoencoders","title":"Autoencoder struct AE","text":"","category":"section"},{"location":"ae/","page":"Deterministic Autoencoders","title":"Deterministic Autoencoders","text":"AutoEncoderToolkit.AEs.AE","category":"page"},{"location":"ae/#AutoEncoderToolkit.AEs.AE","page":"Deterministic Autoencoders","title":"AutoEncoderToolkit.AEs.AE","text":"struct AE{E<:AbstractDeterministicEncoder, D<:AbstractDeterministicDecoder}\n\nAutoencoder (AE) model defined for Flux.jl\n\nFields\n\nencoder::E: Neural network that encodes the input into the latent space. E is a subtype of AbstractDeterministicEncoder.\ndecoder::D: Neural network that decodes the latent representation back to the original input space. D is a subtype of AbstractDeterministicDecoder.\n\nAn AE consists of an encoder and decoder network with a bottleneck latent space in between. The encoder compresses the input into a low-dimensional representation. The decoder tries to reconstruct the original input from the point in the latent space. \n\n\n\n\n\n","category":"type"},{"location":"ae/#Forward-pass","page":"Deterministic Autoencoders","title":"Forward pass","text":"","category":"section"},{"location":"ae/","page":"Deterministic Autoencoders","title":"Deterministic Autoencoders","text":"AutoEncoderToolkit.AEs.AE(::AbstractArray)","category":"page"},{"location":"ae/#AutoEncoderToolkit.AEs.AE-Tuple{AbstractArray}","page":"Deterministic Autoencoders","title":"AutoEncoderToolkit.AEs.AE","text":"(ae::AE{Encoder, Decoder})(x::AbstractArray; latent::Bool=false)\n\nProcesses the input data x through the autoencoder (AE) that consists of an encoder and a decoder.\n\nArguments\n\nx::AbstractVecOrMat{Float32}: The data to be decoded. This can be a vector or a matrix where each column represents a separate sample.\n\nOptional Keyword Arguments\n\nlatent::Bool: If set to true, returns a dictionary containing the latent representation alongside the reconstructed data. Defaults to false.\n\nReturns\n\nIf latent=false: A Namedtuple with key :decoder that contains the reconstructed data after processing through the encoder and decoder.\nIf latent=true: A Namedtuplewith keys :encoder, and :decoder, containing the corresponding values.\n\nDescription\n\nThe function first encodes the input x using the encoder to get the encoded representation in the latent space. This latent representation is then decoded using the decoder to produce the reconstructed data. If latent is set to true, it also returns the latent representation.\n\nNote\n\nEnsure the input data x matches the expected input dimensionality for the encoder in the AE.\n\n\n\n\n\n","category":"method"},{"location":"ae/#Loss-function","page":"Deterministic Autoencoders","title":"Loss function","text":"","category":"section"},{"location":"ae/#MSE-loss","page":"Deterministic Autoencoders","title":"MSE loss","text":"","category":"section"},{"location":"ae/","page":"Deterministic Autoencoders","title":"Deterministic Autoencoders","text":"AutoEncoderToolkit.AEs.mse_loss","category":"page"},{"location":"ae/#AutoEncoderToolkit.AEs.mse_loss","page":"Deterministic Autoencoders","title":"AutoEncoderToolkit.AEs.mse_loss","text":"mse_loss(ae::AE, \n x::AbstractArray; \n regularization::Union{Function, Nothing}=nothing, \n reg_strength::Float32=1.0f0\n)\n\nCalculate the loss for an autoencoder (AE) by computing the mean squared error (MSE) reconstruction loss and a possible regularization term.\n\nThe AE loss is given by: loss = MSE(x, x̂) + regstrength × regterm\n\nWhere:\n\nx is the input Array.\nx̂ is the reconstructed output from the AE.\nregstrength × regterm is an optional regularization term.\n\nArguments\n\nae::AE: An AE model.\nx::AbstractArray: Input data.\n\nOptional Keyword Arguments\n\nreg_function::Union{Function, Nothing}=nothing: A function that computes the regularization term based on the ae outputs. Should return a Float32. This function must take as input the ae outputs and the keyword arguments provided in reg_kwargs.\nreg_kwargs::Union{NamedTuple,Dict}=Dict(): Keyword arguments to pass to the regularization function.\nreg_strength::Number=1.0f0: The strength of the regularization term.\n\nReturns\n\nThe computed average AE loss value for the given input x, including possible regularization terms.\n\nNotes\n\nEnsure that the dimensionality of the input data x aligns with the encoder's expected input in the AE.\n\n\n\n\n\nmse_loss(ae::AE, \n x_in::AbstractArray, \n x_out::AbstractArray;\n regularization::Union{Function, Nothing}=nothing, \n reg_strength::Float32=1.0f0)\n\nCalculate the mean squared error (MSE) loss for an autoencoder (AE) using separate input and target output vectors.\n\nThe AE loss is computed as: loss = MSE(xout, x̂) + regstrength × reg_term\n\nWhere:\n\nx_out is the target output vector.\nx̂ is the reconstructed output from the AE given x_in as input.\nregstrength × regterm is an optional regularization term.\n\nArguments\n\nae::AE: An AE model.\nx_in::AbstractArray: Input vector to the AE encoder.\nx_out::AbstractArray: Target output vector to compute the reconstruction error.\n\nOptional Keyword Arguments\n\nreg_function::Union{Function, Nothing}=nothing: A function that computes the regularization term based on the ae outputs. Should return a Float32. This function must take as input the ae outputs and the keyword arguments provided in reg_kwargs.\nreg_kwargs::Union{NamedTuple,Dict}=Dict(): Keyword arguments to pass to the regularization function.\nreg_strength::Number=1.0f0: The strength of the regularization term.\n\nReturns\n\nThe computed loss value between the target x_out and its reconstructed counterpart from x_in, including possible regularization terms.\n\nNote\n\nEnsure that the input data x_in matches the expected input dimensionality for the encoder in the AE.\n\n\n\n\n\n","category":"function"},{"location":"ae/#Training","page":"Deterministic Autoencoders","title":"Training","text":"","category":"section"},{"location":"ae/","page":"Deterministic Autoencoders","title":"Deterministic Autoencoders","text":"AutoEncoderToolkit.AEs.train!","category":"page"},{"location":"ae/#AutoEncoderToolkit.AEs.train!","page":"Deterministic Autoencoders","title":"AutoEncoderToolkit.AEs.train!","text":"`train!(ae, x, opt; loss_function, loss_kwargs...)`\n\nCustomized training function to update parameters of an autoencoder given a specified loss function.\n\nArguments\n\nae::AE: A struct containing the elements of an autoencoder.\nx::AbstractArray: Input data on which the autoencoder will be trained.\nopt::NamedTuple: State of the optimizer for updating parameters. Typically initialized using Flux.Train.setup.\n\nOptional Keyword Arguments\n\nloss_function::Function: The loss function used for training. It should accept the autoencoder model and input data x, and return a loss value.\nloss_kwargs::Union{NamedTuple,Dict} = Dict(): Additional arguments for the loss function.\nverbose::Bool=false: If true, the loss value will be printed during training.\nloss_return::Bool=false: If true, the loss value will be returned after training.\n\nDescription\n\nTrains the autoencoder by:\n\nComputing the gradient of the loss with respect to the autoencoder parameters.\nUpdating the autoencoder parameters using the optimizer.\n\n\n\n\n\ntrain!(ae, x_in, x_out, opt; loss_function, loss_kwargs...)\n\nCustomized training function to update parameters of an autoencoder given a specified loss function.\n\nArguments\n\nae::AE: A struct containing the elements of an autoencoder.\nx_in::AbstractArray: Input data on which the autoencoder will be trained.\nx_out::AbstractArray: Target output data for the autoencoder.\nopt::NamedTuple: State of the optimizer for updating parameters. Typically initialized using Flux.Train.setup.\n\nOptional Keyword Arguments\n\nloss_function::Function: The loss function used for training. It should accept the autoencoder model and input data x, and return a loss value.\nloss_kwargs::Union{NamedTuple,Dict} = Dict(): Additional arguments for the loss function.\nverbose::Bool=false: If true, the loss value will be printed during training.\nloss_return::Bool=false: If true, the loss value will be returned after training.\n\nDescription\n\nTrains the autoencoder by:\n\nComputing the gradient of the loss with respect to the autoencoder parameters.\nUpdating the autoencoder parameters using the optimizer.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#RHVAEsmodule","page":"RHVAE","title":"Riemannian Hamiltonian Variational Autoencoder","text":"","category":"section"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"The Riemannian Hamiltonian Variational Autoencoder (RHVAE) is a variant of the Hamiltonian Variational Autoencoder (HVAE) that uses concepts from Riemannian geometry to improve the sampling of the latent space representation. As the HVAE, the RHVAE uses Hamiltonian dynamics to improve the sampling of the latent. However, the RHVAE accounts for the geometry of the latent space by learning a Riemannian metric tensor that is used to compute the kinetic energy of the dynamical system. This allows the RHVAE to sample the latent space more evenly while learning the curvature of the latent space.","category":"page"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"For the implementation of the RHVAE in AutoEncoderToolkit.jl, the RHVAE requires two arguments to construct: the original VAE as well as a separate neural network used to compute the metric tensor. To facilitate the dispatch of the necessary functions associated with this second network, we also provide a MetricChain struct.","category":"page"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"warning: Warning\nRHVAEs require the computation of nested gradients. This means that the AutoDiff framework must differentiate a function of an already AutoDiff differentiated function. This is known to be problematic for Julia's AutoDiff backends. See details below to understand how to we circumvent this problem.","category":"page"},{"location":"rhvae/#Reference","page":"RHVAE","title":"Reference","text":"","category":"section"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"Chadebec, C., Mantoux, C. & Allassonnière, S. Geometry-Aware Hamiltonian Variational Auto-Encoder. Preprint at http://arxiv.org/abs/2010.11518 (2020).","category":"page"},{"location":"rhvae/#MetricChainstruct","page":"RHVAE","title":"MetricChain struct","text":"","category":"section"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"AutoEncoderToolkit.RHVAEs.MetricChain","category":"page"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.MetricChain","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.MetricChain","text":"MetricChain <: AbstractMetricChain\n\nA MetricChain is used to compute the Riemannian metric tensor in the latent space of a Riemannian Hamiltonian Variational AutoEncoder (RHVAE).\n\nFields\n\nmlp::Flux.Chain: A multi-layer perceptron (MLP) consisting of the hidden layers. The inputs are first run through this MLP.\ndiag::Flux.Dense: A dense layer that computes the diagonal elements of a lower-triangular matrix. The output of the mlp is fed into this layer.\nlower::Flux.Dense: A dense layer that computes the off-diagonal elements of the lower-triangular matrix. The output of the mlp is also fed into this layer.\n\nThe outputs of diag and lower are used to construct a lower-triangular matrix used to compute the Riemannian metric tensor in latent space.\n\nNote\n\nIf the dimension of the latent space is n, the number of neurons in the output layer of diag must be n, and the number of neurons in the output layer of lower must be n * (n - 1) ÷ 2.\n\nExample\n\nmlp = Flux.Chain(Dense(10, 10, relu), Dense(10, 10, relu))\ndiag = Flux.Dense(10, 5)\nlower = Flux.Dense(10, 15)\nmetric_chain = MetricChain(mlp, diag, lower)\n\n\n\n\n\n","category":"type"},{"location":"rhvae/#RHVAEstruct","page":"RHVAE","title":"RHVAE struct","text":"","category":"section"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"AutoEncoderToolkit.RHVAEs.RHVAE","category":"page"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.RHVAE","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.RHVAE","text":"RHVAE{\n V<:VAE{<:AbstractVariationalEncoder,<:AbstractVariationalDecoder}\n} <: AbstractVariationalAutoEncoder\n\nA Riemannian Hamiltonian Variational AutoEncoder (RHVAE) as described in Chadebec, C., Mantoux, C. & Allassonnière, S. Geometry-Aware Hamiltonian Variational Auto-Encoder. Preprint at http://arxiv.org/abs/2010.11518 (2020).\n\nThe RHVAE is a type of Variational AutoEncoder (VAE) that incorporates a Riemannian metric in the latent space. This metric is computed by a MetricChain, which is a struct that contains a multi-layer perceptron (MLP) and two dense layers for computing the elements of a lower-triangular matrix.\n\nThe inverse metric is computed as follows:\n\nG⁻¹(z) = ∑ᵢ₌₁ⁿ Lψᵢ Lψᵢᵀ exp(-‖z - cᵢ‖₂² / T²) + λIₗ\n\nwhere L_ψᵢ is computed by the MetricChain, T is the temperature, λ is a regularization factor, and each column of centroids are the cᵢ.\n\nFields\n\nvae::V: The underlying VAE, where V is a subtype of VAE with an AbstractVariationalEncoder and an AbstractVariationalDecoder.\nmetric_chain::MetricChain: The MetricChain that computes the Riemannian metric in the latent space.\ncentroids_data::AbstractArray: An array where the last dimension represents a data point xᵢ from which the centroids cᵢ are computed by passing them through the encoder.\ncentroids_latent::AbstractMatrix: A matrix where each column represents a centroid cᵢ in the inverse metric computation.\nL::AbstractArray{<:Number, 3}: A 3D array where each slice represents a Lψᵢ matrix. Lψᵢ can intuitively be seen as the triangular matrix in the Cholesky decomposition of G⁻¹(centroids_latentᵢ) up to a regularization factor.\nM::AbstractArray{<:Number, 3}: A 3D array where each slice represents a Lψᵢ Lψᵢᵀ.\nT::Number: The temperature parameter in the inverse metric computation. \nλ::Number: The regularization factor in the inverse metric computation.\n\n\n\n\n\n","category":"type"},{"location":"rhvae/#Forward-pass","page":"RHVAE","title":"Forward pass","text":"","category":"section"},{"location":"rhvae/#MetricChain","page":"RHVAE","title":"Metric Network","text":"","category":"section"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"AutoEncoderToolkit.RHVAEs.MetricChain(::AbstractArray)","category":"page"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.MetricChain-Tuple{AbstractArray}","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.MetricChain","text":"(m::MetricChain)(x::AbstractArray; matrix::Bool=false)\n\nPerform a forward pass through the MetricChain.\n\nArguments\n\nx::AbstractArray: The input data to be processed. \nmatrix::Bool=false: A boolean flag indicating whether to return the result as a lower triangular matrix (if true) or as a tuple of diagonal and lower off-diagonal elements (if false). Defaults to false.\n\nReturns\n\nIf matrix is true, returns a lower triangular matrix constructed from the outputs of the diag and lower components of the MetricChain.\nIf matrix is false, returns a NamedTuple with two elements: diag, the output of the diag component of the MetricChain, and lower, the output of the lower component of the MetricChain.\n\nExample\n\nm = MetricChain(...)\nx = rand(Float32, 100, 10)\nm(x, matrix=true) # Returns a lower triangular matrix\n\n\n\n\n\n","category":"method"},{"location":"rhvae/#RHVAE","page":"RHVAE","title":"RHVAE","text":"","category":"section"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"AutoEncoderToolkit.RHVAEs.RHVAE(::AbstractArray)","category":"page"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.RHVAE-Tuple{AbstractArray}","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.RHVAE","text":"(rhvae::RHVAE{VAE{E,D}})(\n x::AbstractArray;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n K::Int=3,\n βₒ::Number=0.3f0,\n ∇H::Function=∇hamiltonian_TaylorDiff,\n ∇H_kwargs::Union{NamedTuple,Dict}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n position_logprior=spherical_logprior,\n momentum_logprior=riemannian_logprior,\n G_inv=G_inv,\n ),\n tempering_schedule::Function=quadratic_tempering,\n latent::Bool=false,\n) where where {E<:AbstractGaussianLogEncoder,D<:AbstractVariationalDecoder}\n\nRun the Riemannian Hamiltonian Variational Autoencoder (RHVAE) on the given input.\n\nArguments\n\nx::AbstractArray: The input to the RHVAE. If it is a vector, it represents a single data point. If Array, the last dimension must contain each of the data points.\n\nOptional Keyword Arguments\n\nK::Int=3: The number of leapfrog steps to perform in the Hamiltonian Monte Carlo (HMC) part of the RHVAE.\nϵ::Union{<:Number,<:AbstractVector}=0.01f0: The step size for the leapfrog steps in the HMC part of the RHVAE. If it is a scalar, the same step size is used for all dimensions. If it is an array, each element corresponds to the step size for a specific dimension.\nβₒ::Number=0.3f0: The initial inverse temperature for the tempering schedule.\nsteps::Int: The number of fixed-point iterations to perform. Default is 3.\n∇H::Function=∇hamiltonian_finite: The function to compute the gradient of the Hamiltonian in the HMC part of the RHVAE.\n∇H_kwargs::Union{NamedTuple,Dict}: Additional keyword arguments to be passed to the ∇hamiltonian function. Default is a NamedTuple with reconstruction_loglikelihood, position_logprior, and momentum_logprior. \nG_inv::Function=G_inv: The function to compute the inverse of the Riemannian metric tensor.\ntempering_schedule::Function=quadratic_tempering: The function to compute the tempering schedule in the RHVAE.\nlatent::Bool=false: If true, the function returns a NamedTuple containing the outputs of the encoder and decoder, and the final state of the phase space after the leapfrog and tempering steps. If false, the function only returns the output of the decoder.\n\nReturns\n\nIf latent=true, the function returns a NamedTuple with the following fields:\n\nencoder: The outputs of the encoder.\ndecoder: The output of the decoder.\nphase_space: The final state of the phase space after the leapfrog and tempering steps.\n\nIf latent=false, the function only returns the output of the decoder.\n\nDescription\n\nThis function runs the RHVAE on the given input. It first passes the input through the encoder to obtain the mean and log standard deviation of the latent space. It then uses the reparameterization trick to sample from the latent space. After that, it performs the leapfrog and tempering steps to refine the sample from the latent space. Finally, it passes the refined sample through the decoder to obtain the output.\n\nNotes\n\nEnsure that the dimensions of x match the input dimensions of the RHVAE, and that the dimensions of ϵ match the dimensions of the latent space.\n\n\n\n\n\n","category":"method"},{"location":"rhvae/#Loss-function","page":"RHVAE","title":"Loss function","text":"","category":"section"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"AutoEncoderToolkit.RHVAEs.loss","category":"page"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.loss","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.loss","text":"loss(\n rhvae::RHVAE,\n x::AbstractArray;\n K::Int=3,\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n βₒ::Number=0.3f0,\n steps::Int=3,\n ∇H_kwargs::Union{NamedTuple,Dict}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n position_logprior=spherical_logprior,\n momentum_logprior=riemannian_logprior,\n ),\n G_inv::Function=G_inv,\n tempering_schedule::Function=quadratic_tempering,\n reg_function::Union{Function,Nothing}=nothing,\n reg_kwargs::Union{NamedTuple,Dict}=Dict(),\n reg_strength::Number=1.0f0,\n logp_prefactor::AbstractArray=ones(Float32, 3),\n logq_prefactor::AbstractArray=ones(Float32, 3),\n)\n\nCompute the loss for a Riemannian Hamiltonian Variational Autoencoder (RHVAE).\n\nArguments\n\nrhvae::RHVAE: The RHVAE used to encode the input data and decode the latent space.\nx::AbstractArray: Input data to the RHVAE encoder. The last dimension is taken as having each of the samples in a batch.\n\nOptional Keyword Arguments\n\nK::Int: The number of HMC steps (default is 3).\nϵ::Union{<:Number,<:AbstractVector}: The step size for the leapfrog integrator (default is 0.001).\nβₒ::Number: The initial inverse temperature (default is 0.3).\nsteps::Int: The number of steps in the leapfrog integrator (default is 3).\n∇H_kwargs::Union{NamedTuple,Dict}: Additional keyword arguments to be passed to the ∇hamiltonian function.\nG_inv::Function: The function to compute the inverse of the Riemannian metric tensor (default is G_inv).\ntempering_schedule::Function: The tempering schedule function used in the HMC (default is quadratic_tempering).\nreg_function::Union{Function, Nothing}=nothing: A function that computes the regularization term based on the VAE outputs. This function must take as input the VAE outputs and the keyword arguments provided in reg_kwargs.\nreg_kwargs::Union{NamedTuple,Dict}=Dict(): Keyword arguments to pass to the regularization function.\nreg_strength::Number=1.0f0: The strength of the regularization term.\nlogp_prefactor::AbstractArray: A 3-element array to scale the log likelihood, log prior of the latent variables, and log prior of the momentum variables. Default is an array of ones.\nlogq_prefactor::AbstractArray: A 3-element array to scale the log posterior of the initial latent variables, log prior of the initial momentum variables, and the tempering Jacobian term. Default is an array of ones.\n\nReturns\n\nThe computed loss.\n\n\n\n\n\nloss(\n rhvae::RHVAE,\n x_in::AbstractArray,\n x_out::AbstractArray;\n K::Int=3,\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n βₒ::Number=0.3f0,\n steps::Int=3,\n ∇H_kwargs::Union{NamedTuple,Dict}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n position_logprior=spherical_logprior,\n momentum_logprior=riemannian_logprior,\n ),\n G_inv::Function=G_inv,\n tempering_schedule::Function=quadratic_tempering,\n reg_function::Union{Function,Nothing}=nothing,\n reg_kwargs::Union{NamedTuple,Dict}=Dict(),\n reg_strength::Number=1.0f0,\n logp_prefactor::AbstractArray=ones(Float32, 3),\n logq_prefactor::AbstractArray=ones(Float32, 3),\n)\n\nCompute the loss for a Riemannian Hamiltonian Variational Autoencoder (RHVAE).\n\nArguments\n\nrhvae::RHVAE: The RHVAE used to encode the input data and decode the latent space.\nx_in::AbstractArray: Input data to the RHVAE encoder. The last dimension is taken as having each of the samples in a batch.\nx_out::AbstractArray: Target data to compute the reconstruction error. The last dimension is taken as having each of the samples in a batch.\n\nOptional Keyword Arguments\n\nK::Int: The number of HMC steps (default is 3).\nϵ::Union{<:Number,<:AbstractVector}: The step size for the leapfrog integrator (default is 0.001).\nβₒ::Number: The initial inverse temperature (default is 0.3).\nsteps::Int: The number of steps in the leapfrog integrator (default is 3).\n∇H_kwargs::Union{NamedTuple,Dict}: Additional keyword arguments to be passed to the ∇hamiltonian function.\nG_inv::Function: The function to compute the inverse of the Riemannian metric tensor (default is G_inv).\ntempering_schedule::Function: The tempering schedule function used in the HMC (default is quadratic_tempering).\nreg_function::Union{Function, Nothing}=nothing: A function that computes the regularization term based on the VAE outputs. This function must take as input the VAE outputs and the keyword arguments provided in reg_kwargs.\nreg_kwargs::Union{NamedTuple,Dict}=Dict(): Keyword arguments to pass to the regularization function.\nreg_strength::Number=1.0f0: The strength of the regularization term.\nlogp_prefactor::AbstractArray: A 3-element array to scale the log likelihood, log prior of the latent variables, and log prior of the momentum variables. Default is an array of ones.\nlogq_prefactor::AbstractArray: A 3-element array to scale the log posterior of the initial latent variables, log prior of the initial momentum variables, and the tempering Jacobian term. Default is an array of ones.\n\nReturns\n\nThe computed loss.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#Training","page":"RHVAE","title":"Training","text":"","category":"section"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"AutoEncoderToolkit.RHVAEs.train!","category":"page"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.train!","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.train!","text":"train!(\n rhvae::RHVAE, \n x::AbstractArray, \n opt::NamedTuple; \n loss_function::Function=loss, \n loss_kwargs::Union{NamedTuple,Dict}=Dict(),\n verbose::Bool=false,\n loss_return::Bool=false,\n)\n\nCustomized training function to update parameters of a Riemannian Hamiltonian Variational Autoencoder given a specified loss function.\n\nArguments\n\nrhvae::RHVAE: A struct containing the elements of a Riemannian Hamiltonian Variational Autoencoder.\nx::AbstractArray: Input data to the RHVAE encoder. The last dimension is taken as having each of the samples in a batch.\nopt::NamedTuple: State of the optimizer for updating parameters. Typically initialized using Flux.Optimisers.update!.\n\nOptional Keyword Arguments\n\nloss_function::Function=loss: The loss function used for training. It should accept the RHVAE model, data x, and keyword arguments in that order.\nloss_kwargs::Dict=Dict(): Arguments for the loss function. These might include parameters like K, ϵ, βₒ, steps, ∇H, ∇H_kwargs, tempering_schedule, reg_function, reg_kwargs, reg_strength, depending on the specific loss function in use.\nverbose::Bool=false: Whether to print the loss at each iteration.\nloss_return::Bool=false: Whether to return the loss at each iteration.\n\nDescription\n\nTrains the RHVAE by:\n\nComputing the gradient of the loss w.r.t the RHVAE parameters.\nUpdating the RHVAE parameters using the optimizer.\nUpdating the metric parameters.\n\n\n\n\n\ntrain!(\n rhvae::RHVAE, \n x_in::AbstractArray,\n x_out::AbstractArray,\n opt::NamedTuple; \n loss_function::Function=loss, \n loss_kwargs::Union{NamedTuple,Dict}=Dict(),\n verbose::Bool=false,\n loss_return::Bool=false,\n)\n\nCustomized training function to update parameters of a Riemannian Hamiltonian Variational Autoencoder given a specified loss function.\n\nArguments\n\nrhvae::RHVAE: A struct containing the elements of a Riemannian Hamiltonian Variational Autoencoder.\nx_in::AbstractArray: Input data to the RHVAE encoder. The last dimension is taken as having each of the samples in a batch.\nx_out::AbstractArray: Target data to compute the reconstruction error. The last dimension is taken as having each of the samples in a batch.\nopt::NamedTuple: State of the optimizer for updating parameters. Typically initialized using Flux.Optimisers.update!.\n\nOptional Keyword Arguments\n\nloss_function::Function=loss: The loss function used for training. It should accept the RHVAE model, data x, and keyword arguments in that order.\nloss_kwargs::Dict=Dict(): Arguments for the loss function. These might include parameters like K, ϵ, βₒ, steps, ∇H, ∇H_kwargs, tempering_schedule, reg_function, reg_kwargs, reg_strength, depending on the specific loss function in use.\nverbose::Bool=false: Whether to print the loss at each iteration.\nloss_return::Bool=false: Whether to return the loss at each iteration.\n\nDescription\n\nTrains the RHVAE by:\n\nComputing the gradient of the loss w.r.t the RHVAE parameters.\nUpdating the RHVAE parameters using the optimizer.\nUpdating the metric parameters.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#gradhamiltonian","page":"RHVAE","title":"Computing the gradient of the potential energy","text":"","category":"section"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"One of the crucial components in the training of the RHVAE is the computation of the gradient of the Hamiltonian nabla H with respect to the latent space representation. This gradient is used in the leapfrog steps of the generalized Hamiltonian dynamics. When training the RHVAE, we need to backpropagate through the leapfrog steps to update the parameters of the neural network. This requires computing a gradient of a function of the gradient of the Hamiltonian, i.e., nested gradients. Zygote.jl the main AutoDiff backend in Flux.jl famously struggle with these types of computations. Specifically, Zygote.jl does not support Zygote over Zygote differentiation (meaning differentiating a function of something previously differentiated with Zygote using Zygote), or Zygote over ForwardDiff (meaning differentiating a function of something differentiated with ForwardDiff using Zygote).","category":"page"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"With this, we are left with a couple of options to compute the gradient of the potential energy:","category":"page"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"Use finite differences to approximate the gradient of the potential energy.\nUse the relatively new TaylorDiff.jl AutoDiff backend to compute the gradient of the potential energy. This backend is composable with Zygote.jl, so we can, in principle, do Zygote over TaylorDiff differentiation.","category":"page"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"The second option would be preferred, as the gradients computed with TaylorDiff are much more accurate than the ones computed with finite differences. However, there are two problems with this approach:","category":"page"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"The TaylorDiff nested gradient capability stopped working with Julia ≥ 1.10, as discussed in #70.\nEven for Julia < 1.10, we could not get TaylorDiff to work on CUDA devices. (PRs are welcome!)","category":"page"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"With these limitations in mind, we have implemented the gradient of the potential using both finite differences and TaylorDiff. The user can choose which method to use by setting the adtype keyword argument in the ∇H_kwargs in the loss function to either :finite or :TaylorDiff. This means that for the train! function, the user can pass loss_kwargs that looks like this:","category":"page"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"# Define the autodiff backend to use\nloss_kwargs = Dict(\n :∇H_kwargs => Dict(\n :adtype => :finite\n )\n)","category":"page"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"note: Note\nAlthough verbose, the nested dictionaries help to keep everything organized. (PRs with better design ideas are welcome!)","category":"page"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"The default both for cpu and gpu devices is :finite.","category":"page"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"AutoEncoderToolkit.RHVAEs.∇hamiltonian_finite\nAutoEncoderToolkit.RHVAEs.∇hamiltonian_TaylorDiff\nAutoEncoderToolkit.RHVAEs.∇hamiltonian_ForwardDiff","category":"page"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.∇hamiltonian_finite","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.∇hamiltonian_finite","text":"∇hamiltonian_finite(\n x::AbstractArray,\n z::AbstractVecOrMat,\n ρ::AbstractVecOrMat,\n G⁻¹::AbstractArray,\n logdetG::Union{<:Number,AbstractVector},\n decoder::AbstractVariationalDecoder,\n decoder_output::NamedTuple,\n var::Symbol;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n position_logprior::Function=spherical_logprior,\n momentum_logprior::Function=riemannian_logprior,\n fdtype::Symbol=:central,\n)\n\nCompute the gradient of the Hamiltonian with respect to a given variable using a naive finite difference method.\n\nThis function takes a point x in the data space, a point z in the latent space, a momentum ρ, the inverse of the Riemannian metric tensor G⁻¹, a decoder of type AbstractVariationalDecoder, a decoder_output NamedTuple, and a variable var (:z or :ρ), and computes the gradient of the Hamiltonian with respect to var using a simple finite differences method. The computation is based on the log-likelihood of the decoder, the log-prior of the latent space, and G⁻¹.\n\nThe Hamiltonian is computed as follows:\n\nHₓ(z, ρ) = Uₓ(z) + κ(ρ),\n\nwhere Uₓ(z) is the potential energy, and κ(ρ) is the kinetic energy. The potential energy is defined as follows:\n\nUₓ(z) = -log p(x|z) - log p(z),\n\nwhere p(x|z) is the log-likelihood of the decoder and p(z) is the log-prior in latent space. The kinetic energy is defined as follows:\n\nκ(ρ) = 0.5 * log((2π)ᴰ det G(z)) + 0.5 * ρᵀ G⁻¹ ρ\n\nwhere D is the dimension of the latent space, and G(z) is the metric tensor at the point z.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractVecOrMat: The point in the latent space. If matrix, each column represents a point in the latent space.\nρ::AbstractVecOrMat: The momentum. If matrux, each column represents a momentum vector.\nG⁻¹::AbstractArray: The inverse of the Riemannian metric tensor. If 3D array, each slice along the third dimension represents the inverse of the metric tensor at the corresponding column of z.\nlogdetG::Union{<:Number,AbstractVector}: The log determinant of the Riemannian metric tensor. If vector, each element represents the log determinant of the metric tensor at the corresponding column of z.\ndecoder::AbstractVariationalDecoder: The decoder instance.\ndecoder_output::NamedTuple: The output of the decoder.\nvar::Symbol: The variable with respect to which the gradient is computed. Must be :z or :ρ.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function: The function to compute the log-likelihood of the decoder reconstruction. Default is decoder_loglikelihood. This function must take as input the decoder, the point x in the data space, and the decoder_output.\nposition_logprior::Function: The function to compute the log-prior of the latent space position. Default is spherical_logprior. This function must take as input the point z in the latent space.\nmomentum_logprior::Function: The function to compute the log-prior of the momentum. Default is riemannian_logprior. This function must take as input the momentum ρ and G⁻¹.\nfdtype::Symbol=:central: The type of finite difference method to use. Must be :central or :forward. Default is :central.\n\nReturns\n\nA vector representing the gradient of the Hamiltonian at the point (z, ρ) with respect to variable var.\n\n\n\n\n\n∇hamiltonian_finite(\n x::AbstractArray,\n z::AbstractVecOrMat,\n ρ::AbstractVecOrMat,\n rhvae::RHVAE,\n var::Symbol;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n position_logprior::Function=spherical_logprior,\n momentum_logprior::Function=riemannian_logprior,\n G_inv::Function=G_inv,\n fdtype::Symbol=:central,\n)\n\nCompute the gradient of the Hamiltonian with respect to a given variable using a naive finite difference method.\n\nThis function takes a point x in the data space, a point z in the latent space, a momentum ρ, an instance of RHVAE, and a variable var (:z or :ρ), and computes the gradient of the Hamiltonian with respect to var using a simple finite differences method. The computation is based on the log-likelihood of the decoder, the log-prior of the latent space, and the inverse of the metric tensor G at the point z.\n\nThe Hamiltonian is computed as follows:\n\nHₓ(z, ρ) = Uₓ(z) + κ(ρ),\n\nwhere Uₓ(z) is the potential energy, and κ(ρ) is the kinetic energy. The potential energy is defined as follows:\n\nUₓ(z) = -log p(x|z) - log p(z),\n\nwhere p(x|z) is the log-likelihood of the decoder and p(z) is the log-prior in latent space. The kinetic energy is defined as follows:\n\nκ(ρ) = 0.5 * log((2π)ᴰ det G(z)) + 0.5 * ρᵀ G⁻¹ ρ\n\nwhere D is the dimension of the latent space, and G(z) is the metric tensor at the point z.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractVecOrMat: The point in the latent space. If matrix, each column represents a point in the latent space.\nρ::AbstractVecOrMat: The momentum. If matrux, each column represents a momentum vector.\nrhvae::RHVAE: An instance of the RHVAE model.\nvar::Symbol: The variable with respect to which the gradient is computed. Must be :z or :ρ.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function: The function to compute the log-likelihood of the decoder reconstruction. Default is decoder_loglikelihood. This function must take as input the decoder, the point x in the data space, and the decoder_output.\nposition_logprior::Function: The function to compute the log-prior of the latent space position. Default is spherical_logprior. This function must take as input the point z in the latent space.\nmomentum_logprior::Function: The function to compute the log-prior of the momentum. Default is riemannian_logprior. This function must take as input the momentum ρ and the inverse of the Riemannian metric tensor G⁻¹.\nG_inv::Function: The function to compute the inverse of the Riemannian metric tensor. Default is G_inv. This function must take as input the point z in the latent space and the rhvae instance.\nfdtype::Symbol=:central: The type of finite difference method to use. Must be :central or :forward. Default is :central.\n\nReturns\n\nA vector representing the gradient of the Hamiltonian at the point (z, ρ) with respect to variable var.\n\nNote\n\nThe inverse of the Riemannian metric tensor G⁻¹, the log determinant of the metric tensor, and the output of the decoder are computed internally in this function. The user does not need to provide these as inputs.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.∇hamiltonian_TaylorDiff","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.∇hamiltonian_TaylorDiff","text":"∇hamiltonian_TaylorDiff(\n x::AbstractArray,\n z::AbstractVector,\n ρ::AbstractVector,\n G⁻¹::AbstractMatrix,\n logdetG::Union{<:Number,AbstractVector},\n decoder::AbstractVariationalDecoder,\n decoder_output::NamedTuple,\n var::Symbol;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n position_logprior::Function=spherical_logprior,\n momentum_logprior::Function=riemannian_logprior,\n)\n\nCompute the gradient of the Hamiltonian with respect to a given variable using the TaylorDiff.jl automatic differentiation library.\n\nThis function takes a point x in the data space, a point z in the latent space, a momentum ρ, an instance of AbstractVariationalDecoder, and a variable var (:z or :ρ), and computes the gradient of the Hamiltonian with respect to var using TaylorDiff.jl.\n\nThe Hamiltonian is computed as follows:\n\nHₓ(z, ρ) = Uₓ(z) + κ(ρ),\n\nwhere Uₓ(z) is the potential energy, and κ(ρ) is the kinetic energy. The potential energy is defined as follows:\n\nUₓ(z) = -log p(x|z) - log p(z),\n\nwhere p(x|z) is the log-likelihood of the decoder and p(z) is the log-prior in latent space. The kinetic energy is defined as follows:\n\nκ(ρ) = 0.5 * log((2π)ᴰ det G(z)) + 0.5 * ρᵀ G⁻¹ ρ\n\nwhere D is the dimension of the latent space, and G(z) is the metric tensor at the point z.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractVector: The point in the latent space.\nρ::AbstractVector: The momentum.\nG⁻¹::AbstractMatrix: The inverse of the Riemannian metric tensor.\nlogdetG::Number: The logarithm of the determinant of the Riemannian metric tensor.\ndecoder::AbstractVariationalDecoder: An instance of the decoder model.\ndecoder_output::NamedTuple: The output of the decoder model.\nvar::Symbol: The variable with respect to which the gradient is computed. Must be :z or :ρ.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function: The function to compute the log-likelihood of the decoder reconstruction. Default is decoder_loglikelihood. This function must take as input the decoder, the point x in the data space, and the decoder_output.\nposition_logprior::Function: The function to compute the log-prior of the latent space position. Default is spherical_logprior. This function must take as input the point z in the latent space.\nmomentum_logprior::Function: The function to compute the log-prior of the momentum. Default is riemannian_logprior. This function must take as input the momentum ρ and the inverse of the Riemannian metric tensor G⁻¹.\n\nReturns\n\nA vector representing the gradient of the Hamiltonian at the point (z, ρ) with respect to variable var.\n\nNote\n\nTaylorDiff.jl is composable with Zygote.jl. Thus, for backpropagation using this function one should use Zygote.jl.\n\n\n\n\n\n∇hamiltonian_TaylorDiff(\n x::AbstractArray,\n z::AbstractVecOrMat,\n ρ::AbstractVecOrMat,\n rhvae::RHVAE,\n var::Symbol;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n position_logprior::Function=spherical_logprior,\n momentum_logprior::Function=riemannian_logprior,\n G_inv::Function=G_inv,\n)\n\nCompute the gradient of the Hamiltonian with respect to a given variable using the TaylorDiff.jl automatic differentiation library.\n\nThis function takes a point x in the data space, a point z in the latent space, a momentum ρ, an instance of RHVAE, and a variable var (:z or :ρ), and computes the gradient of the Hamiltonian with respect to var using TaylorDiff.jl.\n\nThe Hamiltonian is computed as follows:\n\nHₓ(z, ρ) = Uₓ(z) + κ(ρ),\n\nwhere Uₓ(z) is the potential energy, and κ(ρ) is the kinetic energy. The potential energy is defined as follows:\n\nUₓ(z) = -log p(x|z) - log p(z),\n\nwhere p(x|z) is the log-likelihood of the decoder and p(z) is the log-prior in latent space. The kinetic energy is defined as follows:\n\nκ(ρ) = 0.5 * log((2π)ᴰ det G(z)) + 0.5 * ρᵀ G⁻¹ ρ\n\nwhere D is the dimension of the latent space, and G(z) is the metric tensor at the point z.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractVecOrMat: The point in the latent space. If matrix, each column represents a point in the latent space.\nρ::AbstractVecOrMat: The momentum. If matrix, each column represents a momentum vector.\nrhvae::RHVAE: An instance of the RHVAE model.\nvar::Symbol: The variable with respect to which the gradient is computed. Must be :z or :ρ.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function: The function to compute the log-likelihood of the decoder reconstruction. Default is decoder_loglikelihood. This function must take as input the decoder, the point x in the data space, and the decoder_output.\nposition_logprior::Function: The function to compute the log-prior of the latent space position. Default is spherical_logprior. This function must take as input the point z in the latent space.\nmomentum_logprior::Function: The function to compute the log-prior of the momentum. Default is riemannian_logprior. This function must take as input the momentum ρ and the inverse of the Riemannian metric tensor G⁻¹.\nG_inv::Function: The function to compute the inverse of the Riemannian metric tensor. Default is G_inv. This function must take as input the point z in the latent space and the rhvae instance.\n\nReturns\n\nA matrix representing the gradient of the Hamiltonian at the point (z, ρ) with respect to variable var.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.∇hamiltonian_ForwardDiff","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.∇hamiltonian_ForwardDiff","text":"∇hamiltonian_ForwardDiff(\n x::AbstractArray,\n z::AbstractVector,\n ρ::AbstractVector,\n G⁻¹::AbstractMatrix,\n logdetG::Union{<:Number,AbstractVector},\n decoder::AbstractVariationalDecoder,\n decoder_output::NamedTuple,\n var::Symbol;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n position_logprior::Function=spherical_logprior,\n momentum_logprior::Function=riemannian_logprior,\n)\n\nCompute the gradient of the Hamiltonian with respect to a given variable using the ForwardDiff.jl automatic differentiation library.\n\nThis function takes a point x in the data space, a point z in the latent space, a momentum ρ, the inverse of the Riemannian metric tensor G⁻¹, a decoder of type AbstractVariationalDecoder, a decoder_output NamedTuple, and a variable var (:z or :ρ), and computes the gradient of the Hamiltonian with respect to var using ForwardDiff.jl.\n\nThe Hamiltonian is computed as follows:\n\nHₓ(z, ρ) = Uₓ(z) + κ(ρ),\n\nwhere Uₓ(z) is the potential energy, and κ(ρ) is the kinetic energy. The potential energy is defined as follows:\n\nUₓ(z) = -log p(x|z) - log p(z),\n\nwhere p(x|z) is the log-likelihood of the decoder and p(z) is the log-prior in latent space. The kinetic energy is defined as follows:\n\nκ(ρ) = 0.5 * log((2π)ᴰ det G(z)) + 0.5 * ρᵀ G⁻¹ ρ\n\nwhere D is the dimension of the latent space, and G(z) is the metric tensor at the point z.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractVector: The point in the latent space.\nρ::AbstractVector: The momentum.\nG⁻¹::AbstractMatrix: The inverse of the Riemannian metric tensor.\nlogdetG::Union{<:Number,AbstractVector}: The log determinant of the Riemannian metric tensor.\ndecoder::AbstractVariationalDecoder: The decoder instance.\ndecoder_output::NamedTuple: The output of the decoder.\nvar::Symbol: The variable with respect to which the gradient is computed. Must be :z or :ρ.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function: The function to compute the log-likelihood of the decoder reconstruction. Default is decoder_loglikelihood. This function must take as input the decoder, the point x in the data space, and the decoder_output.\nposition_logprior::Function: The function to compute the log-prior of the latent space position. Default is spherical_logprior. This function must take as input the point z in the latent space.\nmomentum_logprior::Function: The function to compute the log-prior of the momentum. Default is riemannian_logprior. This function must take as input the momentum ρ and G⁻¹.\n\nReturns\n\nA vector representing the gradient of the Hamiltonian at the point (z, ρ) with respect to variable var.\n\nNote\n\nForwardDiff.jl is not composable with Zygote.jl. Thus, for backpropagation using this function one should use ReverseDiff.jl.\n\n\n\n\n\n∇hamiltonian_ForwardDiff(\n x::AbstractArray,\n z::AbstractMatrix,\n ρ::AbstractMatrix,\n G⁻¹::AbstractArray,\n logdetG::Union{<:Number,AbstractVector},\n decoder::AbstractVariationalDecoder,\n decoder_output::NamedTuple,\n var::Symbol;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n position_logprior::Function=spherical_logprior,\n momentum_logprior::Function=riemannian_logprior,\n)\n\nCompute the gradient of the Hamiltonian with respect to a given variable using the ForwardDiff.jl automatic differentiation library.\n\nThis function takes a point x in the data space, a point z in the latent space, a momentum ρ, the inverse of the Riemannian metric tensor G⁻¹, a decoder of type AbstractVariationalDecoder, a decoder_output NamedTuple, and a variable var (:z or :ρ), and computes the gradient of the Hamiltonian with respect to var using ForwardDiff.jl.\n\nThe Hamiltonian is computed as follows:\n\nHₓ(z, ρ) = Uₓ(z) + κ(ρ),\n\nwhere Uₓ(z) is the potential energy, and κ(ρ) is the kinetic energy. The potential energy is defined as follows:\n\nUₓ(z) = -log p(x|z) - log p(z),\n\nwhere p(x|z) is the log-likelihood of the decoder and p(z) is the log-prior in latent space. The kinetic energy is defined as follows:\n\nκ(ρ) = 0.5 * log((2π)ᴰ det G(z)) + 0.5 * ρᵀ G⁻¹ ρ\n\nwhere D is the dimension of the latent space, and G(z) is the metric tensor at the point z.\n\nThe Jacobian is computed with respect to var to compute derivatives for all columns at once. The relevant terms for each column's gradient are then extracted from the Jacobian.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractMatrix: The point in the latent space.\nρ::AbstractMatrix: The momentum.\nG⁻¹::AbstractArray: The inverse of the Riemannian metric tensor.\nlogdetG::Union{<:Number,AbstractVector}: The log determinant of the Riemannian metric tensor.\ndecoder::AbstractVariationalDecoder: The decoder instance.\ndecoder_output::NamedTuple: The output of the decoder.\nvar::Symbol: The variable with respect to which the gradient is computed. Must be :z or :ρ.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function: The function to compute the log-likelihood of the decoder reconstruction. Default is decoder_loglikelihood. This function must take as input the decoder, the point x in the data space, and the decoder_output.\nposition_logprior::Function: The function to compute the log-prior of the latent space position. Default is spherical_logprior. This function must take as input the point z in the latent space.\nmomentum_logprior::Function: The function to compute the log-prior of the momentum. Default is riemannian_logprior. This function must take as input the momentum ρ and G⁻¹.\n\nReturns\n\nA matrix representing the gradient of the Hamiltonian at the point (z, ρ) with respect to variable var.\n\nNote\n\nForwardDiff.jl is not composable with Zygote.jl. Thus, for backpropagation using this function one should use ReverseDiff.jl.\n\n\n\n\n\n∇hamiltonian_ForwardDiff(\n x::AbstractArray,\n z::AbstractVecOrMat,\n ρ::AbstractVecOrMat,\n rhvae::RHVAE,\n var::Symbol;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n position_logprior::Function=spherical_logprior,\n momentum_logprior::Function=riemannian_logprior,\n G_inv::Function=G_inv,\n)\n\nCompute the gradient of the Hamiltonian with respect to a given variable using the ForwardDiff.jl automatic differentiation library.\n\nThis function takes a point x in the data space, a point z in the latent space, a momentum ρ, an instance of RHVAE, and a variable var (:z or :ρ), and computes the gradient of the Hamiltonian with respect to var using ForwardDiff.jl.\n\nThe Hamiltonian is computed as follows:\n\nHₓ(z, ρ) = Uₓ(z) + κ(ρ),\n\nwhere Uₓ(z) is the potential energy, and κ(ρ) is the kinetic energy. The potential energy is defined as follows:\n\nUₓ(z) = -log p(x|z) - log p(z),\n\nwhere p(x|z) is the log-likelihood of the decoder and p(z) is the log-prior in latent space. The kinetic energy is defined as follows:\n\nκ(ρ) = 0.5 * log((2π)ᴰ det G(z)) + 0.5 * ρᵀ G⁻¹ ρ\n\nwhere D is the dimension of the latent space, and G(z) is the metric tensor at the point z.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractVecOrMat: The point in the latent space. If matrix, each column represents a point in the latent space.\nρ::AbstractVecOrMat: The momentum. If matrix, each column represents a momentum vector.\nrhvae::RHVAE: An instance of the RHVAE model.\nvar::Symbol: The variable with respect to which the gradient is computed. Must be :z or :ρ.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function: The function to compute the log-likelihood of the decoder reconstruction. Default is decoder_loglikelihood. This function must take as input the decoder, the point x in the data space, and the decoder_output.\nposition_logprior::Function: The function to compute the log-prior of the latent space position. Default is spherical_logprior. This function must take as input the point z in the latent space.\nmomentum_logprior::Function: The function to compute the log-prior of the momentum. Default is riemannian_logprior. This function must take as input the momentum ρ and the inverse of the Riemannian metric tensor G⁻¹.\nG_inv::Function: The function to compute the inverse of the Riemannian metric tensor. Default is G_inv. This function must take as input the point z in the latent space and the rhvae instance.\n\nReturns\n\nA matrix representing the gradient of the Hamiltonian at the point (z, ρ) with respect to variable var.\n\nNote\n\nForwardDiff.jl is not composable with Zygote.jl. Thus, for backpropagation using this function one should use ReverseDiff.jl.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#Other-Functions","page":"RHVAE","title":"Other Functions","text":"","category":"section"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"AutoEncoderToolkit.RHVAEs.update_metric\nAutoEncoderToolkit.RHVAEs.update_metric!\nAutoEncoderToolkit.RHVAEs.G_inv\nAutoEncoderToolkit.RHVAEs.metric_tensor\nAutoEncoderToolkit.RHVAEs.riemannian_logprior\nAutoEncoderToolkit.RHVAEs.hamiltonian\nAutoEncoderToolkit.RHVAEs.∇hamiltonian\nAutoEncoderToolkit.RHVAEs._leapfrog_first_step\nAutoEncoderToolkit.RHVAEs._leapfrog_second_step\nAutoEncoderToolkit.RHVAEs._leapfrog_third_step\nAutoEncoderToolkit.RHVAEs.general_leapfrog_step\nAutoEncoderToolkit.RHVAEs.general_leapfrog_tempering_step\nAutoEncoderToolkit.RHVAEs._log_p̄\nAutoEncoderToolkit.RHVAEs._log_q̄\nAutoEncoderToolkit.RHVAEs.riemannian_hamiltonian_elbo","category":"page"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.update_metric","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.update_metric","text":"update_metric(\n rhvae::RHVAE{<:VAE{<:AbstractGaussianEncoder,<:AbstractVariationalDecoder}}\n)\n\nCompute the centroids_latent and M field of a RHVAE instance without modifying the instance. This method is used when needing to backpropagate through the RHVAE during training.\n\nArguments\n\nrhvae::RHVAE{<:VAE{<:AbstractGaussianEncoder,<:AbstractVariationalDecoder}}: The RHVAE instance to be updated.\n\nReturns\n\nNamedTuple with the following fields:\ncentroids_latent::Matrix: A matrix where each column represents a centroid cᵢ in the inverse metric computation.\nL::Array{<:Number, 3}: A 3D array where each slice represents a L_ψᵢ matrix.\nM::Array{<:Number, 3}: A 3D array where each slice represents a Lψᵢ Lψᵢᵀ.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.update_metric!","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.update_metric!","text":"update_metric!(\n rhvae::RHVAE{<:VAE{<:AbstractGaussianEncoder,<:AbstractVariationalDecoder}},\n params::NamedTuple\n)\n\nUpdate the centroids_latent and M fields of a RHVAE instance in place.\n\nThis function takes a RHVAE instance and a named tuple params containing the new values for centroids_latent and M. It updates the centroids_latent, L, and M fields of the RHVAE instance with the provided values.\n\nArguments\n\nrhvae::RHVAE{<:VAE{<:AbstractGaussianEncoder,<:AbstractVariationalDecoder}}: The RHVAE instance to update.\nparams::NamedTuple: A named tuple containing the new values for centroids_latent and M. Must have the keys :centroids_latent, :L, and :M.\n\nReturns\n\nNothing. The RHVAE instance is updated in place.\n\n\n\n\n\nupdate_metric!(\n rhvae::RHVAE{\n <:VAE{<:AbstractGaussianEncoder,<:AbstractVariationalDecoder}\n }\n)\n\nUpdate the centroids_latent, and M fields of a RHVAE instance in place.\n\nThis function takes a RHVAE instance as input and modifies its centroids_latent and M fields. The centroids_latent field is updated by running the centroids_data through the encoder of the underlying VAE and extracting the mean (µ) of the resulting Gaussian distribution. The M field is updated by running each column of the centroids_data through the metric_chain and concatenating the results along the third dimension, then each slice is updated by multiplying each slice of L by its transpose and concating the results along the third dimension.\n\nArguments\n\nrhvae::RHVAE{<:VAE{<:AbstractGaussianEncoder,<:AbstractVariationalDecoder}}: The RHVAE instance to be updated.\n\nNotes\n\nThis function modifies the RHVAE instance in place, so it does not return anything. The changes are made directly to the centroids_latent, L, and M fields of the input RHVAE instance.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.G_inv","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.G_inv","text":"G_inv(\n z::AbstractVecOrMat,\n centroids_latent::AbstractMatrix,\n M::AbstractArray{<:Number,3},\n T::Number,\n λ::Number,\n)\n\nCompute the inverse of the metric tensor G for a given point in the latent space.\n\nThis function takes a point z in the latent space, the centroids_latent of the RHVAE instance, a 3D array M representing the metric tensor, a temperature T, and a regularization factor λ, and computes the inverse of the metric tensor G at that point. The computation is based on the centroids and the temperature, as well as a regularization term. The inverse metric is computed as follows:\n\nG⁻¹(z) = ∑ᵢ₌₁ⁿ Lψᵢ Lψᵢᵀ exp(-‖z - cᵢ‖₂² / T²) + λIₗ,\n\nwhere Lψᵢ is computed by the MetricChain, T is the temperature, λ is a regularization factor, and each column of `centroidslatent` are the cᵢ.\n\nArguments\n\nz::AbstractVecOrMat: The point in the latent space. If a matrix, each column represents a point in the latent space.\ncentroids_latent::AbstractMatrix: The centroids in the latent space.\nM::AbstractArray{<:Number,3}: The 3D array containing the symmetric matrices used to compute the inverse metric tensor.\nT::N: The temperature.\nλ::N: The regularization factor.\n\nReturns\n\nA matrix or 3D array representing the inverse of the metric tensor G at the point z. If a 3D array, each slice represents the inverse metric tensor at a different point in the latent space.\n\nNotes\n\nThe computation involves the squared Euclidean distance between z and each centroid, the exponential of the negative of these distances divided by the square of the temperature, and a regularization term proportional to the identity matrix. The result is a matrix of the same size as the latent space.\n\nGPU support\n\nThis function supports CPU and GPU arrays.\n\n\n\n\n\nG_inv( \n z::AbstractVecOrMat,\n metric_param::Union{RHVAE,NamedTuple},\n)\n\nCompute the inverse of the metric tensor G for a given point in the latent space.\n\nThis function takes a RHVAE instance and a point z in the latent space, and computes the inverse of the metric tensor G at that point. The computation is based on the centroids and the temperature of the RHVAE instance, as well as a regularization term. The inverse metric is computed as follows:\n\nG⁻¹(z) = ∑ᵢ₌₁ⁿ Lψᵢ Lψᵢᵀ exp(-‖z - cᵢ‖₂² / T²) + λIₗ,\n\nwhere Lψᵢ is computed by the MetricChain, T is the temperature, λ is a regularization factor, and each column of `centroidslatent` are the cᵢ.\n\nArguments\n\nz::AbstractVecOrMat: The point in the latent space. If a matrix, each column represents a point in the latent space.\nmetric_param::Union{RHVAE,NamedTuple}: Either an RHVAE instance or a named tuple containing the fields centroids_latent, M, T, and λ.\n\nReturns\n\nA matrix representing the inverse of the metric tensor G at the point z.\n\nNotes\n\nThe computation involves the squared Euclidean distance between z and each centroid of the RHVAE instance, the exponential of the negative of these distances divided by the square of the temperature, and a regularization term proportional to the identity matrix. The result is a matrix of the same size as the latent space.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.metric_tensor","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.metric_tensor","text":"metric_tensor(\n z::AbstractVecOrMat,\n metric_param::Union{RHVAE,NamedTuple},\n)\n\nCompute the metric tensor G for a given point in the latent space. This function is a wrapper that determines the type of the input z and calls the appropriate specialized function _metric_tensor to perform the actual computation.\n\nThis function takes a RHVAE instance or a named tuple containing the fields centroids_latent, M, T, and λ, and a point z in the latent space, and computes the metric tensor G at that point. The computation is based on the inverse of the metric tensor G, which is computed by the G_inv function.\n\nArguments\n\nz::AbstractVecOrMat: The point in the latent space. If a matrix, each column represents a point in the latent space.\nmetric_param::Union{RHVAE,NamedTuple}: Either an RHVAE instance or a named tuple containing the fields centroids_latent, M, T, and λ.\n\nReturns\n\nA matrix representing the metric tensor G at the point z.\n\nNotes\n\nThe computation involves the inverse of the metric tensor G at the point z. The result is a matrix of the same size as the latent space.\n\nGPU Support\n\nThis function supports CPU and GPU arrays.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.riemannian_logprior","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.riemannian_logprior","text":"riemannian_logprior(\n ρ::AbstractVector,\n G⁻¹::AbstractMatrix,\n logdetG::Number;\n)\n\nCPU AbstractVector version of the riemannian_logprior function.\n\n\n\n\n\nriemannian_logprior(\n ρ::AbstractVector,\n G⁻¹::AbstractMatrix,\n logdetG::Number,\n)\n\nCPU AbstractMatrix version of the riemannian_logprior function.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.hamiltonian","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.hamiltonian","text":"hamiltonian(\n x::AbstractArray,\n z::AbstractVecOrMat,\n ρ::AbstractVecOrMat,\n G⁻¹::AbstractArray,\n logdetG::Union{<:Number,<:AbstractVector},\n decoder::AbstractVariationalDecoder,\n decoder_output::NamedTuple;\n decoder_loglikelihood::Function=decoder_loglikelihood,\n position_logprior::Function=spherical_logprior,\n momentum_logprior::Function=riemannian_logprior,\n)\n\nCompute the Hamiltonian for a given point in the latent space and a given momentum.\n\nThis function takes a point x in the data space, a point z in the latent space, a momentum ρ, the inverse of the Riemannian metric tensor G⁻¹, a decoder of type AbstractVariationalDecoder, and a decoder_output NamedTuple, and computes the Hamiltonian. The computation is based on the log-likelihood of the decoder, the log-prior of the latent space, and the inverse of the metric tensor G at the point z.\n\nThe Hamiltonian is computed as follows:\n\nHₓ(z, ρ) = Uₓ(z) + κ(ρ),\n\nwhere Uₓ(z) is the potential energy, and κ(ρ) is the kinetic energy. The potential energy is defined as follows:\n\nUₓ(z) = -log p(x|z) - log p(z),\n\nwhere p(x|z) is the log-likelihood of the decoder and p(z) is the log-prior in latent space. The kinetic energy is defined as follows:\n\nκ(ρ) = -log p(ρ),\n\nwhere p(ρ) is the log-prior of the momentum.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported, but the last dimension of the array should be of size 1.\nz::AbstractVecOrMat: The point in the latent space.\nρ::AbstractVecOrMat: The momentum.\nG⁻¹::AbstractArray: The inverse of the Riemannian metric tensor. This should be computed elsewhere and should correspond to the given z value.\nlogdetG::Union{<:Number,AbstractVector}: The log determinant of the Riemannian metric tensor. This should be computed elsewhere and should correspond to the given z value.\ndecoder::AbstractVariationalDecoder: The decoder instance. This is not used in the computation of the Hamiltonian, but is passed to the decoder_loglikelihood function to know which method to use.\ndecoder_output::NamedTuple: The output of the decoder.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function: The function to compute the log-likelihood of the decoder reconstruction. Default is decoder_loglikelihood. This function must take as input the decoder, the point x in the data space, and the decoder_output.\nposition_logprior::Function: The function to compute the log-prior of the latent space position. Default is spherical_logprior. This function must take as input the point z in the latent space.\nmomentum_logprior::Function: The function to compute the log-prior of the momentum. Default is riemannian_logprior. This function must take as input the momentum ρ and the inverse of the Riemannian metric tensor G⁻¹.\n\nReturns\n\nA scalar representing the Hamiltonian at the point z with the momentum ρ.\n\nNote\n\nThe inverse of the Riemannian metric tensor G⁻¹ is assumed to be computed elsewhere. The user must ensure that the provided G⁻¹ corresponds to the given z value.\n\n\n\n\n\nhamiltonian(\n x::AbstractArray,\n z::AbstractVecOrMat,\n ρ::AbstractVecOrMat,\n rhvae::RHVAE;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n position_logprior::Function=spherical_logprior,\n momentum_logprior::Function=riemannian_logprior,\n G_inv::Function=G_inv,\n)\n\nCompute the Hamiltonian for a given point in the latent space and a given momentum.\n\nThis function takes a point x in the data space, a point z in the latent space, a momentum ρ, and an instance of RHVAE. It computes the inverse of the Riemannian metric tensor G⁻¹ and the output of the decoder internally, and then computes the Hamiltonian. The computation is based on the log-likelihood of the decoder, the log-prior of the latent space, and the inverse of the metric tensor G at the point z.\n\nThe Hamiltonian is computed as follows:\n\nHₓ(z, ρ) = Uₓ(z) + κ(ρ),\n\nwhere Uₓ(z) is the potential energy, and κ(ρ) is the kinetic energy. The potential energy is defined as follows:\n\nUₓ(z) = -log p(x|z) - log p(z),\n\nwhere p(x|z) is the log-likelihood of the decoder and p(z) is the log-prior in latent space. The kinetic energy is defined as follows:\n\nκ(ρ) = -log p(ρ),\n\nwhere p(ρ) is the log-prior of the momentum.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported, but the last dimension of the array should be of size 1.\nz::AbstractVector: The point in the latent space.\nρ::AbstractVector: The momentum.\nrhvae::RHVAE: An instance of the RHVAE model.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function: The function to compute the log-likelihood of the decoder reconstruction. Default is decoder_loglikelihood. This function must take as input the decoder, the point x in the data space, and the decoder_output.\nposition_logprior::Function: The function to compute the log-prior of the latent space position. Default is spherical_logprior. This function must take as input the point z in the latent space.\nmomentum_logprior::Function: The function to compute the log-prior of the momentum. Default is riemannian_logprior. This function must take as input the momentum ρ and the inverse of the Riemannian metric tensor G⁻¹.\nG_inv::Function: The function to compute the inverse of the Riemannian metric tensor. Default is G_inv. This function must take as input the point z in the latent space and the rhvae instance.\n\nReturns\n\nA scalar representing the Hamiltonian at the point z with the momentum ρ.\n\nNote\n\nThe inverse of the Riemannian metric tensor G⁻¹, the log determinant of the metric tensor, and the output of the decoder are computed internally in this function. The user does not need to provide these as inputs.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.∇hamiltonian","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.∇hamiltonian","text":"∇hamiltonian(\n x::AbstractArray,\n z::AbstractVecOrMat,\n ρ::AbstractVecOrMat,\n G⁻¹::AbstractArray,\n logdetG::Union{<:Number,AbstractVector},\n decoder::AbstractVariationalDecoder,\n decoder_output::NamedTuple,\n var::Symbol;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n position_logprior::Function=spherical_logprior,\n momentum_logprior::Function=riemannian_logprior,\n adtype::Symbol=:TaylorDiff,\n adkwargs::Union{NamedTuple,Dict}=Dict(),\n)\n\nCompute the gradient of the Hamiltonian with respect to a given variable using a specified automatic differentiation method.\n\nThis function takes a point x in the data space, a point z in the latent space, a momentum ρ, the inverse of the Riemannian metric tensor G⁻¹, a decoder of type AbstractVariationalDecoder, a decoder_output NamedTuple, and a variable var (:z or :ρ), and computes the gradient of the Hamiltonian with respect to var using the specified automatic differentiation method. The computation is based on the log-likelihood of the decoder, the log-prior of the latent space, and G⁻¹.\n\nThe Hamiltonian is computed as follows:\n\nHₓ(z, ρ) = Uₓ(z) + κ(ρ),\n\nwhere Uₓ(z) is the potential energy, and κ(ρ) is the kinetic energy. The potential energy is defined as follows:\n\nUₓ(z) = -log p(x|z) - log p(z),\n\nwhere p(x|z) is the log-likelihood of the decoder and p(z) is the log-prior in latent space. The kinetic energy is defined as follows:\n\nκ(ρ) = 0.5 * log((2π)ᴰ det G(z)) + 0.5 * ρᵀ G⁻¹ ρ\n\nwhere D is the dimension of the latent space, and G(z) is the metric tensor at the point z.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractVecOrMat: The point in the latent space. If matrix, each column represents a point in the latent space.\nρ::AbstractVecOrMat: The momentum. If matrix, each column represents a momentum vector.\nG⁻¹::AbstractArray: The inverse of the Riemannian metric tensor. If 3D array, each slice along the third dimension represents the inverse of the metric tensor at the corresponding column of z.\nlogdetG::Union{<:Number,AbstractVector}: The log determinant of the Riemannian metric tensor. If vector, each element represents the log determinant of the metric tensor at the corresponding column of z.\ndecoder::AbstractVariationalDecoder: The decoder instance.\ndecoder_output::NamedTuple: The output of the decoder.\nvar::Symbol: The variable with respect to which the gradient is computed. Must be :z or :ρ.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function: The function to compute the log-likelihood of the decoder reconstruction. Default is decoder_loglikelihood. This function must take as input the decoder, the point x in the data space, and the decoder_output.\nposition_logprior::Function: The function to compute the log-prior of the latent space position. Default is spherical_logprior. This function must take as input the point z in the latent space.\nmomentum_logprior::Function: The function to compute the log-prior of the momentum. Default is riemannian_logprior. This function must take as input the momentum ρ and G⁻¹.\nadtype::Symbol=:finite: The type of automatic differentiation method to use. Must be:finite,:ForwardDiff, or:TaylorDiff. Default is:finite`.\nadkwargs::Union{NamedTuple,Dict}=Dict(): Additional keyword arguments to pass to the automatic differentiation method.\n\nReturns\n\nA vector representing the gradient of the Hamiltonian at the point (z, ρ) with respect to variable var.\n\n\n\n\n\n∇hamiltonian(\n x::AbstractArray,\n z::AbstractVecOrMat,\n ρ::AbstractVecOrMat,\n rhvae::RHVAE,\n var::Symbol;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n position_logprior::Function=spherical_logprior,\n momentum_logprior::Function=riemannian_logprior,\n G_inv::Function=G_inv,\n adtype::Symbol=:TaylorDiff,\n adkwargs::Union{NamedTuple,Dict}=Dict(),\n)\n\nCompute the gradient of the Hamiltonian with respect to a given variable using a specified automatic differentiation method.\n\nThis function takes a point x in the data space, a point z in the latent space, a momentum ρ, an instance of RHVAE, and a variable var (:z or :ρ), and computes the gradient of the Hamiltonian with respect to var using the specified automatic differentiation method. The computation is based on the log-likelihood of the decoder, the log-prior of the latent space, and G_inv.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractVecOrMat: The point in the latent space. If matrix, each column represents a point in the latent space.\nρ::AbstractVecOrMat: The momentum. If matrix, each column represents a momentum vector.\nrhvae::RHVAE: An instance of the RHVAE model.\nvar::Symbol: The variable with respect to which the gradient is computed. Must be :z or :ρ.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function: The function to compute the log-likelihood of the decoder reconstruction. Default is decoder_loglikelihood. This function must take as input the decoder, the point x in the data space, and the decoder_output.\nposition_logprior::Function: The function to compute the log-prior of the latent space position. Default is spherical_logprior. This function must take as input the point z in the latent space.\nmomentum_logprior::Function: The function to compute the log-prior of the momentum. Default is riemannian_logprior. This function must take as input the momentum ρ and G_inv.\nG_inv::Function: The function to compute the inverse of the Riemannian metric tensor. Default is G_inv.\nadtype::Symbol=:finite: The type of automatic differentiation method to use. Must be:finite,:ForwardDiff, or:TaylorDiff. Default is:finite`.\nadkwargs::Union{NamedTuple,Dict}=Dict(): Additional keyword arguments to pass to the automatic differentiation method.\n\nReturns\n\nA vector representing the gradient of the Hamiltonian at the point (z, ρ) with respect to variable var.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs._leapfrog_first_step","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs._leapfrog_first_step","text":"_leapfrog_first_step(\n x::AbstractArray,\n z::AbstractVecOrMat,\n ρ::AbstractVecOrMat,\n G⁻¹::AbstractArray,\n logdetG::Union{<:Number,AbstractVector},\n decoder::AbstractVariationalDecoder,\n decoder_output::NamedTuple;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n steps::Int=3,\n ∇H_kwargs::Union{NamedTuple,Dict}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n position_logprior=spherical_logprior,\n momentum_logprior=riemannian_logprior,\n ),\n)\n\nPerform the first step of the generalized leapfrog integrator for Hamiltonian dynamics, defined as\n\nρ(t + ϵ/2) = ρ(t) - 0.5 * ϵ * ∇z_H(z(t), ρ(t + ϵ/2)).\n\nThis function is part of the generalized leapfrog integrator used in Hamiltonian dynamics. Unlike the standard leapfrog integrator, the generalized leapfrog integrator is implicit, which means it requires the use of fixed-point iterations to be solved.\n\nThe function takes a point x in the data space, a point z in the latent space, a momentum ρ, the inverse of the Riemannian metric tensor G⁻¹, a decoder of type AbstractVariationalDecoder, the output of the decoder decoder_output, a step size ϵ, and optionally the number of fixed-point iterations to perform (steps), a function to compute the gradient of the Hamiltonian (∇H), and a set of keyword arguments for ∇H (∇H_kwargs).\n\nThe function performs the following update for steps times:\n\nρ̃ = ρ̃ - 0.5 * ϵ * ∇hamiltonian(x, z, ρ̃, G⁻¹, decoder, decoderoutput, :z; ∇Hkwargs...)\n\nwhere ∇H is the gradient of the Hamiltonian with respect to the position variables z. The result is returned as ρ̃.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractVecOrMat: The point in the latent space. If matrix, each column represents a point in the latent space.\nρ::AbstractVecOrMat: The momentum. If matrux, each column represents a momentum vector.\nG⁻¹::AbstractArray: The inverse of the Riemannian metric tensor. If 3D array, each slice along the third dimension represents the inverse of the metric tensor at the corresponding column of z.\nlogdetG::Union{<:Number,AbstractVector}: The log determinant of the Riemannian metric tensor. If vector, each element represents the log determinant of the metric tensor at the corresponding column of z.\ndecoder::AbstractVariationalDecoder: The decoder instance.\ndecoder_output::NamedTuple: The output of the decoder.\n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}=0.01f0: The leapfrog step size. Default is 0.01f0.\nsteps::Int=3: The number of fixed-point iterations to perform. Default is 3.\n∇H_kwargs::Union{NamedTuple,Dict}: The keyword arguments for ∇hamiltonian. Default is a tuple with reconstruction_loglikelihood, position_logprior, momentum_logprior, and G_inv.\n\nReturns\n\nA vector representing the updated momentum after performing the first step of the generalized leapfrog integrator.\n\n\n\n\n\n_leapfrog_first_step(\n x::AbstractArray,\n z::AbstractVecOrMat,\n ρ::AbstractVecOrMat,\n rhvae::RHVAE;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n steps::Int=3,\n ∇H_kwargs::Union{NamedTuple,Dict}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n position_logprior=spherical_logprior,\n momentum_logprior=riemannian_logprior,\n ),\n G_inv::Function=G_inv,\n)\n\nPerform the first step of the generalized leapfrog integrator for Hamiltonian dynamics, defined as\n\nρ(t + ϵ/2) = ρ(t) - 0.5 * ϵ * ∇z_H(z(t), ρ(t + ϵ/2)).\n\nThis function is part of the generalized leapfrog integrator used in Hamiltonian dynamics. Unlike the standard leapfrog integrator, the generalized leapfrog integrator is implicit, which means it requires the use of fixed-point iterations to be solved.\n\nThe function takes a RHVAE instance, a point x in the data space, a point z in the latent space, a momentum ρ, a step size ϵ, and optionally the number of fixed-point iterations to perform (steps), a function to compute the gradient of the Hamiltonian (∇H), and a set of keyword arguments for ∇H (∇H_kwargs).\n\nThe function performs the following update for steps times:\n\nρ̃ = ρ̃ - 0.5 * ϵ * ∇hamiltonian(rhvae, x, z, ρ̃, :z; ∇H_kwargs...)\n\nwhere ∇H is the gradient of the Hamiltonian with respect to the position variables z. The result is returned as ρ̃.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractVecOrMat: The point in the latent space. If matrix, each column represents a point in the latent space.\nρ::AbstractVecOrMat: The momentum. If matrux, each column represents a momentum vector.\nrhvae::RHVAE: The RHVAE instance.\n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}=0.01f0: The leapfrog step size. Default is 0.01f0.\nsteps::Int=3: The number of fixed-point iterations to perform. Default is 3.\n∇H_kwargs::Union{NamedTuple,Dict}: The keyword arguments for ∇hamiltonian. Default is a tuple with reconstruction_loglikelihood, position_logprior, and momentum_logprior.\nG_inv::Function: The function to compute the inverse of the Riemannian metric tensor. Default is G_inv.\n\nReturns\n\nA vector representing the updated momentum after performing the first step of the generalized leapfrog integrator.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs._leapfrog_second_step","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs._leapfrog_second_step","text":"_leapfrog_second_step(\n x::AbstractArray,\n z::AbstractVecOrMat,\n ρ::AbstractVecOrMat,\n G⁻¹::AbstractArray,\n logdetG::Union{<:Number,AbstractVector},\n decoder::AbstractVariationalDecoder,\n decoder_output::NamedTuple;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n steps::Int=3,\n ∇H_kwargs::Union{NamedTuple,Dict}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n position_logprior=spherical_logprior,\n momentum_logprior=riemannian_logprior,\n ),\n)\n\nPerform the second step of the generalized leapfrog integrator for Hamiltonian dynamics, defined as\n\nz(t + ϵ) = z(t) + 0.5 * ϵ * [∇ρH(z(t), ρ(t+ϵ/2)) + ∇ρH(z(t + ϵ), ρ(t+ϵ/2))].\n\nThis function is part of the generalized leapfrog integrator used in Hamiltonian dynamics. Unlike the standard leapfrog integrator, the generalized leapfrog integrator is implicit, which means it requires the use of fixed-point iterations to be solved.\n\nThe function takes a point x in the data space, a point z in the latent space, a momentum ρ, the inverse of the Riemannian metric tensor G⁻¹, a decoder of type AbstractVariationalDecoder, the output of the decoder decoder_output, a step size ϵ, and optionally the number of fixed-point iterations to perform (steps), a function to compute the gradient of the Hamiltonian (∇H), and a set of keyword arguments for ∇H (∇H_kwargs).\n\nThe function performs the following update for steps times:\n\nz̄ = z̄ + 0.5 * ϵ * ( ∇hamiltonian(x, z̄, ρ, G⁻¹, decoder, decoderoutput, :ρ; ∇Hkwargs...) + ∇hamiltonian(x, z, ρ, G⁻¹, decoder, decoderoutput, :ρ; ∇Hkwargs...) )\n\nwhere ∇H is the gradient of the Hamiltonian with respect to the momentum variables ρ. The result is returned as z̄.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractVecOrMat: The point in the latent space. If matrix, each column represents a point in the latent space.\nρ::AbstractVecOrMat: The momentum. If matrux, each column represents a momentum vector.\nG⁻¹::AbstractArray: The inverse of the Riemannian metric tensor. If 3D array, each slice along the third dimension represents the inverse of the metric tensor at the corresponding column of z.\nlogdetG::Union{<:Number,AbstractVector}: The log determinant of the Riemannian metric tensor. If vector, each element represents the log determinant of the metric tensor at the corresponding column of z.\ndecoder::AbstractVariationalDecoder: The decoder instance.\ndecoder_output::NamedTuple: The output of the decoder.\n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}=0.01f0: The step size. Default is 0.01.\nsteps::Int=3: The number of fixed-point iterations to perform. Default is 3.\n∇H_kwargs::Union{NamedTuple,Dict}: The keyword arguments for ∇hamiltonian. Default is a tuple with reconstruction_loglikelihood, position_logprior, momentum_logprior.\n\nReturns\n\nA vector representing the updated position after performing the second step of the generalized leapfrog integrator.\n\n\n\n\n\n_leapfrog_second_step(\n x::AbstractArray,\n z::AbstractVecOrMat,\n ρ::AbstractVecOrMat,\n rhvae::RHVAE;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n steps::Int=3,\n ∇H_kwargs::Union{NamedTuple,Dict}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n position_logprior=spherical_logprior,\n momentum_logprior=riemannian_logprior,\n ),\n G_inv::Function=G_inv,\n)\n\nPerform the second step of the generalized leapfrog integrator for Hamiltonian dynamics, defined as\n\nz(t + ϵ) = z(t) + 0.5 * ϵ * [∇ρH(z(t), ρ(t+ϵ/2)) + ∇ρH(z(t + ϵ), ρ(t+ϵ/2))].\n\nThis function is part of the generalized leapfrog integrator used in Hamiltonian dynamics. Unlike the standard leapfrog integrator, the generalized leapfrog integrator is implicit, which means it requires the use of fixed-point iterations to be solved.\n\nThe function takes a RHVAE instance, a point x in the data space, a point z in the latent space, a momentum ρ, a step size ϵ, and optionally the number of fixed-point iterations to perform (steps), a function to compute the gradient of the Hamiltonian (∇H), and a set of keyword arguments for ∇H (∇H_kwargs).\n\nThe function performs the following update for steps times:\n\nz̄ = z̄ + 0.5 * ϵ * ( ∇hamiltonian(rhvae, x, z̄, ρ, :ρ; ∇Hkwargs...) + ∇hamiltonian(rhvae, x, z, ρ, :ρ; ∇Hkwargs...) )\n\nwhere ∇H is the gradient of the Hamiltonian with respect to the momentum variables ρ. The result is returned as z̄.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractVecOrMat: The point in the latent space. If matrix, each column represents a point in the latent space.\nρ::AbstractVecOrMat: The momentum. If matrux, each column represents a momentum vector.\nrhvae::RHVAE: The RHVAE instance.\n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}=0.01f0: The leapfrog step size. Default is 0.01f0.\nsteps::Int=3: The number of fixed-point iterations to perform. Default is 3. Typically, 3 iterations are sufficient.\n∇H_kwargs::Union{NamedTuple,Dict}: The keyword arguments for ∇hamiltonian. Default is a tuple with reconstruction_loglikelihood, position_logprior, and momentum_logprior.\nG_inv::Function: The function to compute the inverse of the Riemannian metric tensor. Default is G_inv.\n\nReturns\n\nA vector representing the updated position after performing the second step of the generalized leapfrog integrator.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs._leapfrog_third_step","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs._leapfrog_third_step","text":"_leapfrog_third_step(\n x::AbstractArray,\n z::AbstractVecOrMat,\n ρ::AbstractVecOrMat,\n G⁻¹::AbstractArray,\n logdetG::Union{<:Number,AbstractVector},\n decoder::AbstractVariationalDecoder,\n decoder_output::NamedTuple;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n ∇H_kwargs::Union{NamedTuple,Dict}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n position_logprior=spherical_logprior,\n momentum_logprior=riemannian_logprior,\n ),\n)\n\nPerform the third step of the generalized leapfrog integrator for Hamiltonian dynamics, defined as\n\nρ(t + ϵ) = ρ(t + ϵ/2) - 0.5 * ϵ * ∇z_H(z(t + ϵ), ρ(t + ϵ/2)).\n\nThis function is part of the generalized leapfrog integrator used in Hamiltonian dynamics. Unlike the standard leapfrog integrator, the generalized leapfrog integrator is implicit, which means it requires the use of fixed-point iterations to be solved.\n\nThe function takes a point x in the data space, a point z in the latent space, a momentum ρ, the inverse of the Riemannian metric tensor G⁻¹, a decoder of type AbstractVariationalDecoder, the output of the decoder decoder_output, a step size ϵ, a function to compute the gradient of the Hamiltonian (∇H), and a set of keyword arguments for ∇H (∇H_kwargs).\n\nThe function performs the following update:\n\nρ̃ = ρ - 0.5 * ϵ * ∇hamiltonian( x, z, ρ, G⁻¹, decoder, decoderoutput, :z; ∇Hkwargs... )\n\nwhere ∇H is the gradient of the Hamiltonian with respect to the position variables z. The result is returned as ρ̃.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractVecOrMat: The point in the latent space. If matrix, each column represents a point in the latent space.\nρ::AbstractVecOrMat: The momentum. If matrux, each column represents a momentum vector.\nG⁻¹::AbstractArray: The inverse of the Riemannian metric tensor. If 3D array, each slice along the third dimension represents the inverse of the metric tensor at the corresponding column of z.\nlogdetG::Union{<:Number,AbstractVector}: The log determinant of the Riemannian metric tensor. If vector, each element represents the log determinant of the metric tensor at the corresponding column of z.\ndecoder::AbstractVariationalDecoder: The decoder instance.\ndecoder_output::NamedTuple: The output of the decoder.\n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}=0.01f0: The step size. Default is 0.01f0.\n∇H_kwargs::Union{NamedTuple,Dict}: The keyword arguments for ∇hamiltonian. Default is a tuple with reconstruction_loglikelihood, position_logprior, momentum_logprior.\n\nReturns\n\nA vector representing the updated momentum after performing the third step of the generalized leapfrog integrator.\n\n\n\n\n\n_leapfrog_third_step(\n x::AbstractArray,\n z::AbstractVecOrMat,\n ρ::AbstractVecOrMat,\n rhvae::RHVAE;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n steps::Int=3,\n ∇H_kwargs::Union{NamedTuple,Dict}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n position_logprior=spherical_logprior,\n momentum_logprior=riemannian_logprior,\n ),\n G_inv::Function=G_inv,\n)\n\nPerform the third step of the generalized leapfrog integrator for Hamiltonian dynamics, defined as\n\nρ(t + ϵ) = ρ(t + ϵ/2) - 0.5 * ϵ * ∇z_H(z(t + ϵ), ρ(t + ϵ/2)).\n\nThis function is part of the generalized leapfrog integrator used in Hamiltonian dynamics. Unlike the standard leapfrog integrator, the generalized leapfrog integrator is implicit, which means it requires the use of fixed-point iterations to be solved.\n\nThe function takes a RHVAE instance, a point x in the data space, a point z in the latent space, a momentum ρ, a step size ϵ, the number of fixed-point iterations to perform (steps), a function to compute the gradient of the Hamiltonian (∇H), and a set of keyword arguments for ∇H (∇H_kwargs).\n\nThe function performs the following update:\n\nρ̃ = ρ - 0.5 * ϵ * ∇hamiltonian(rhvae, x, z, ρ, :z; ∇H_kwargs...)\n\nwhere ∇H is the gradient of the Hamiltonian with respect to the position variables z. The result is returned as ρ̃.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractVecOrMat: The point in the latent space. If matrix, each column represents a point in the latent space.\nρ::AbstractVecOrMat: The momentum. If matrux, each column represents a momentum vector.\nrhvae::RHVAE: The RHVAE instance.\n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}: The leapfrog step size. Default is 0.01f0.\nsteps::Int=3: The number of fixed-point iterations to perform. Default is 3.\n∇H_kwargs::Union{NamedTuple,Dict}: The keyword arguments for ∇hamiltonian. Default is a tuple with reconstruction_loglikelihood, position_logprior, and momentum_logprior.\nG_inv::Function: The function to compute the inverse of the Riemannian metric tensor. Default is G_inv.\n\nReturns\n\nA vector representing the updated momentum after performing the third step of the generalized leapfrog integrator.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.general_leapfrog_step","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.general_leapfrog_step","text":"general_leapfrog_step(\n x::AbstractArray,\n z::AbstractVecOrMat,\n ρ::AbstractVecOrMat,\n G⁻¹::AbstractArray,\n logdetG::Union{<:Number,AbstractVector},\n decoder::AbstractVariationalDecoder,\n decoder_output::NamedTuple,\n metric_param::NamedTuple;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n steps::Int=3,\n ∇H_kwargs::Union{NamedTuple,Dict}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n position_logprior=spherical_logprior,\n momentum_logprior=riemannian_logprior,\n ),\n G_inv::Function=G_inv,\n)\n\nPerform a full step of the generalized leapfrog integrator for Hamiltonian dynamics.\n\nThe leapfrog integrator is a numerical integration scheme used to simulate Hamiltonian dynamics. It consists of three steps:\n\nHalf update of the momentum variable: \nρ(t + ϵ/2) = ρ(t) - 0.5 * ϵ * ∇z_H(z(t), ρ(t + ϵ/2)).\nFull update of the position variable: \n\nz(t + ϵ) = z(t) + 0.5 * ϵ * [∇ρH(z(t), ρ(t+ϵ/2)) + ∇ρH(z(t + ϵ), ρ(t+ϵ/2))].\n\nHalf update of the momentum variable: \nρ(t + ϵ) = ρ(t + ϵ/2) - 0.5 * ϵ * ∇z_H(z(t + ϵ), ρ(t + ϵ/2)).\n\nThis function performs these three steps in sequence, using the _leapfrog_first_step, _leapfrog_second_step and _leapfrog_third_step helper functions.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractVecOrMat: The point in the latent space. If matrix, each column represents a point in the latent space.\nρ::AbstractVecOrMat: The momentum. If matrux, each column represents a momentum vector.\nG⁻¹::AbstractArray: The inverse of the Riemannian metric tensor. If 3D array, each slice along the third dimension represents the inverse of the metric tensor at the corresponding column of z.\nlogdetG::Union{<:Number,AbstractVector}: The log determinant of the Riemannian metric tensor. If vector, each element represents the log determinant of the metric tensor at the corresponding column of z.\ndecoder::AbstractVariationalDecoder: The decoder instance.\ndecoder_output::NamedTuple: The output of the decoder.\nmetric_param::NamedTuple: The parameters for the metric tensor.\n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}=0.01f0: The step size. Default is 0.01.\nsteps::Int=3: The number of fixed-point iterations to perform. Default is 3. Typically, 3 iterations are sufficient.\n∇H_kwargs::Union{NamedTuple,Dict}: The keyword arguments for ∇hamiltonian. Default is a tuple with decoder_loglikelihood, position_logprior, momentum_logprior, and G_inv.\nG_inv::Function=G_inv: The function to compute the inverse of the Riemannian metric tensor.\n\nReturns\n\nA tuple (z̄, ρ̄, Ḡ⁻¹, logdetḠ, decoder_update) representing the updated position, momentum, the inverse of the updated Riemannian metric tensor, the log of the determinant of the metric tensor and the updated decoder outputs after performing the full leapfrog step.\n\n\n\n\n\ngeneral_leapfrog_step(\n x::AbstractArray,\n z::AbstractVecOrMat,\n ρ::AbstractVecOrMat,\n rhvae::RHVAE;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n steps::Int=3,\n ∇H_kwargs::Union{NamedTuple,Dict}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n position_logprior=spherical_logprior,\n momentum_logprior=riemannian_logprior,\n G_inv=G_inv,\n ),\n)\n\nPerform a full step of the generalized leapfrog integrator for Hamiltonian dynamics.\n\nThe leapfrog integrator is a numerical integration scheme used to simulate Hamiltonian dynamics. It consists of three steps:\n\nHalf update of the momentum variable: ρ(t + ϵ/2) = ρ(t) - 0.5 * ϵ * ∇z_H(z(t), ρ(t + ϵ/2)).\nFull update of the position variable: z(t + ϵ) = z(t) + 0.5 * ϵ * [∇ρ_H(z(t),\n\nρ(t+ϵ/2)) + ∇ρ_H(z(t + ϵ), ρ(t+ϵ/2))].\n\nHalf update of the momentum variable: ρ(t + ϵ) = ρ(t + ϵ/2) - 0.5 * ϵ * ∇z_H(z(t + ϵ), ρ(t + ϵ/2)).\n\nThis function performs these three steps in sequence, using the _leapfrog_first_step and _leapfrog_second_step helper functions.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractVecOrMat: The point in the latent space. If matrix, each column represents a point in the latent space.\nρ::AbstractVecOrMat: The momentum. If matrux, each column represents a momentum vector.\nrhvae::RHVAE: The RHVAE instance.\n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}=0.01f0: The leapfrog step size. Default is 0.01f0.\nsteps::Int=3: The number of fixed-point iterations to perform. Default is 3. Typically, 3 iterations are sufficient.\n∇H_kwargs::Union{NamedTuple,Dict}: The keyword arguments for ∇hamiltonian. Default is a tuple with decoder_loglikelihood, position_logprior, and momentum_logprior\nG_inv::Function: The function to compute the inverse of the Riemannian metric tensor. Default is G_inv.\nA tuple (z̄, ρ̄, Ḡ⁻¹, logdetḠ, decoder_update) representing the updated position, momentum, the inverse of the updated Riemannian metric tensor, the log of the determinant of the metric tensor, and the updated decoder outputs after performing the full leapfrog step.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.general_leapfrog_tempering_step","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.general_leapfrog_tempering_step","text":"general_leapfrog_tempering_step(\n x::AbstractArray,\n zₒ::AbstractVecOrMat,\n Gₒ⁻¹::AbstractArray,\n logdetGₒ::Union{<:Number,AbstractVector},\n decoder::AbstractVariationalDecoder,\n decoder_output::NamedTuple,\n metric_param::NamedTuple;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n K::Int=3,\n βₒ::Number=0.3f0,\n steps::Int=3,\n ∇H_kwargs::Union{NamedTuple,Dict}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n position_logprior=spherical_logprior,\n momentum_logprior=riemannian_logprior,\n G_inv=G_inv,\n ),\n tempering_schedule::Function=quadratic_tempering,\n)\n\nCombines the leapfrog and tempering steps into a single function for the Riemannian Hamiltonian Variational Autoencoder (RHVAE).\n\nArguments\n\nx::AbstractArray: The data to be processed. If Array, the last dimension must be of size 1.\nzₒ::AbstractVector: The initial latent variable. \nGₒ⁻¹::AbstractArray: The initial inverse of the Riemannian metric tensor.\nlogdetGₒ::Union{<:Number,AbstractVector}: The log determinant of the initial Riemannian metric tensor. If vector, each element represents the log determinant of the metric tensor at the corresponding column of zₒ.\ndecoder::AbstractVariationalDecoder: The decoder of the RHVAE model.\ndecoder_output::NamedTuple: The output of the decoder.\nmetric_param::NamedTuple: The parameters of the metric tensor.\n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}: The step size for the leapfrog steps in the HMC algorithm. This can be a scalar or an array. Default is 0.01f0. \nK::Int: The number of leapfrog steps to perform in the Hamiltonian Monte Carlo (HMC) algorithm. Default is 3.\nβₒ::Number: The initial inverse temperature for the tempering schedule. Default is 0.3f0.\nsteps::Int: The number of fixed-point iterations to perform. Default is 3.\n∇H_kwargs::Union{NamedTuple,Dict}: Additional keyword arguments to be passed to the ∇hamiltonian function. Default is a NamedTuple with reconstruction_loglikelihood, position_logprior, and momentum_logprior.\ntempering_schedule::Function: The function to compute the inverse temperature at each step in the HMC algorithm. Defaults to quadratic_tempering. This function must take three arguments: First, βₒ, an initial inverse temperature, second, k, the current step in the tempering schedule, and third, K, the total number of steps in the tempering schedule.\n\nReturns\n\nA NamedTuple with the following keys: \nz_init: The initial latent variable. \nρ_init: The initial momentum variable. \nGinv_init: The initial inverse of the Riemannian metric tensor. \nlogdetG_init: The initial log determinant of the Riemannian metric tensor.\nz_final: The final latent variable after K leapfrog steps. \nρ_final: The final momentum variable after K leapfrog steps. \nGinv_final: The final inverse of the Riemannian metric tensor after K leapfrog steps.\nlogdetG_final: The final log determinant of the Riemannian metric tensor after K leapfrog steps.\nThe decoder output at the final latent variable is also returned. Note: This is not in the same named tuple as the other outputs, but as a separate output.\n\nDescription\n\nThe function first samples a random momentum variable γₒ from a standard normal distribution and scales it by the inverse square root of the initial inverse temperature βₒ to obtain the initial momentum variable ρₒ. Then, it performs K leapfrog steps, each followed by a tempering step, to generate a new sample from the latent space.\n\nNote\n\nEnsure the input data x and the initial latent variable zₒ match the expected input dimensionality for the RHVAE model.\n\n\n\n\n\ngeneral_leapfrog_tempering_step(\n x::AbstractArray,\n zₒ::AbstractVecOrMat,\n rhvae::RHVAE;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n K::Int=3,\n βₒ::Number=0.3f0,\n steps::Int=3,\n ∇H_kwargs::Union{NamedTuple,Dict}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n position_logprior=spherical_logprior,\n momentum_logprior=riemannian_logprior,\n ),\n G_inv::Function=G_inv,\n tempering_schedule::Function=quadratic_tempering,\n)\n\nCombines the leapfrog and tempering steps into a single function for the Riemannian Hamiltonian Variational Autoencoder (RHVAE).\n\nArguments\n\nx::AbstractArray: The data to be processed. If Array, the last dimension must be of size 1.\nzₒ::AbstractVecOrMat: The initial latent variable. \n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}: The step size for the leapfrog steps in the HMC algorithm. This can be a scalar or an array. Default is 0.01f0. \nK::Int: The number of leapfrog steps to perform in the Hamiltonian Monte Carlo (HMC) algorithm. Default is 3.\nβₒ::Number: The initial inverse temperature for the tempering schedule. Default is 0.3f0.\nsteps::Int: The number of fixed-point iterations to perform. Default is 3.\n∇H_kwargs::Union{NamedTuple,Dict}: Additional keyword arguments to be passed to the ∇hamiltonian function. Default is a NamedTuple with reconstruction_loglikelihood, position_logprior, and momentum_logprior.\ntempering_schedule::Function: The function to compute the inverse temperature at each step in the HMC algorithm. Defaults to quadratic_tempering. This function must take three arguments: First, βₒ, an initial inverse temperature, second, k, the current step in the tempering schedule, and third, K, the total number of steps in the tempering schedule.\n\nReturns\n\nA NamedTuple with the following keys: \nz_init: The initial latent variable. \nρ_init: The initial momentum variable. \nGinv_init: The initial inverse of the Riemannian metric tensor. \nz_final: The final latent variable after K leapfrog steps. \nρ_final: The final momentum variable after K leapfrog steps. \nGinv_final: The final inverse of the Riemannian metric tensor after K leapfrog steps.\nThe decoder output at the final latent variable is also returned. Note: This is not in the same named tuple as the other outputs, but as a separate output.\n\nDescription\n\nThe function first samples a random momentum variable γₒ from a standard normal distribution and scales it by the inverse square root of the initial inverse temperature βₒ to obtain the initial momentum variable ρₒ. Then, it performs K leapfrog steps, each followed by a tempering step, to generate a new sample from the latent space.\n\nNote\n\nEnsure the input data x and the initial latent variable zₒ match the expected input dimensionality for the RHVAE model.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs._log_p̄","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs._log_p̄","text":"_log_p̄(\n x::AbstractArray,\n rhvae::RHVAE{VAE{E,D}},\n rhvae_outputs::NamedTuple;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n position_logprior::Function=spherical_logprior,\n momentum_logprior::Function=riemannian_logprior,\n prefactor::AbstractArray=ones(Float32, 3),\n)\n\nThis is an internal function used in riemannian_hamiltonian_elbo to compute the numerator of the unbiased estimator of the marginal likelihood. The function computes the sum of the log likelihood of the data given the latent variables, the log prior of the latent variables, and the log prior of the momentum variables.\n\nlog p̄ = log p(x | zₖ) + log p(zₖ) + log p(ρₖ(zₖ))\n\nArguments\n\nx::AbstractArray: The input data. If Array, the last dimension must contain each of the data points.\nrhvae::RHVAE{<:VAE{<:AbstractGaussianEncoder,<:AbstractGaussianLogDecoder}}: The Riemannian Hamiltonian Variational Autoencoder (RHVAE) model.\nrhvae_outputs::NamedTuple: The outputs of the RHVAE, including the final latent variables zₖ and the final momentum variables ρₖ.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function: The function to compute the log likelihood of the data given the latent variables. Default is decoder_loglikelihood.\nposition_logprior::Function: The function to compute the log prior of the latent variables. Default is spherical_logprior.\nmomentum_logprior::Function: The function to compute the log prior of the momentum variables. Default is riemannian_logprior.\nprefactor::AbstractArray: A 3-element array to scale the log likelihood, log prior of the latent variables, and log prior of the momentum variables. Default is an array of ones.\n\nReturns\n\nlog_p̄::AbstractVector: The first term of the log of the unbiased estimator of the marginal likelihood for each data point.\n\nNote\n\nThis is an internal function and should not be called directly. It is used as part of the riemannian_hamiltonian_elbo function.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs._log_q̄","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs._log_q̄","text":"_log_q̄(\n rhvae::RHVAE,\n rhvae_outputs::NamedTuple,\n βₒ::Number;\n momentum_logprior::Function=riemannian_logprior,\n prefactor::AbstractArray=ones(Float32, 3),\n)\n\nThis is an internal function used in riemannian_hamiltonian_elbo to compute the second term of the unbiased estimator of the marginal likelihood. The function computes the sum of the log posterior of the initial latent variables and the log prior of the initial momentum variables, minus a term that depends on the dimensionality of the latent space and the initial temperature.\n\n log q̄ = log q(zₒ) + log p(ρₒ) - d/2 log(βₒ)\n\nArguments\n\nrhvae::RHVAE: The Riemannian Hamiltonian Variational Autoencoder (RHVAE) model.\nrhvae_outputs::NamedTuple: The outputs of the RHVAE, including the initial latent variables zₒ and the initial momentum variables ρₒ.\nβₒ::Number: The initial temperature for the tempering steps.\n\nOptional Keyword Arguments\n\nmomentum_logprior::Function: The function to compute the log prior of the momentum variables. Default is riemannian_logprior.\nprefactor::AbstractArray: A 3-element array to scale the log posterior of the initial latent variables, log prior of the initial momentum variables, and the tempering Jacobian term. Default is an array of ones.\n\nReturns\n\nlog_q̄::Vector: The second term of the log of the unbiased estimator of the marginal likelihood for each data point.\n\nNote\n\nThis is an internal function and should not be called directly. It is used as part of the riemannian_hamiltonian_elbo function.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.riemannian_hamiltonian_elbo","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.riemannian_hamiltonian_elbo","text":"riemannian_hamiltonian_elbo(\n rhvae::RHVAE,\n metric_param::NamedTuple,\n x::AbstractArray;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n K::Int=3,\n βₒ::Number=0.3f0,\n steps::Int=3,\n ∇H_kwargs::Union{NamedTuple,Dict}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n position_logprior=spherical_logprior,\n momentum_logprior=riemannian_logprior,\n G_inv=G_inv,\n ),\n tempering_schedule::Function=quadratic_tempering,\n return_outputs::Bool=false,\n logp_prefactor::AbstractArray=ones(Float32, 3),\n logq_prefactor::AbstractArray=ones(Float32, 3),\n)\n\nCompute the Riemannian Hamiltonian Monte Carlo (RHMC) estimate of the evidence lower bound (ELBO) for a Riemannian Hamiltonian Variational Autoencoder (RHVAE).\n\nThis function takes as input an RHVAE, a NamedTuple of metric parameters, and a vector of input data x. It performs K RHMC steps with a leapfrog integrator and a tempering schedule to estimate the ELBO. The ELBO is computed as the difference between the log p̄ and log q̄ as\n\nelbo = mean(log p̄ - log q̄),\n\nArguments\n\nrhvae::RHVAE: The RHVAE used to encode the input data and decode the latent space.\nmetric_param::NamedTuple: The parameters used to compute the metric tensor.\nx::AbstractArray: The input data. If Array, the last dimension must contain each of the data points.\n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}: The step size for the leapfrog integrator (default is 0.01).\nK::Int: The number of RHMC steps (default is 3).\nβₒ::Number: The initial inverse temperature (default is 0.3).\nsteps::Int: The number of leapfrog steps (default is 3).\n∇H_kwargs::Union{NamedTuple,Dict}: Additional keyword arguments to be passed to the ∇hamiltonian function. Defaults to a NamedTuple with :decoder_loglikelihood set to decoder_loglikelihood, :position_logprior set to spherical_logprior, and :momentum_logprior set to riemannian_logprior.\nG_inv::Function: The function to compute the inverse of the Riemannian metric tensor. Defaults to G_inv.\ntempering_schedule::Function: The tempering schedule function used in the RHMC (default is quadratic_tempering).\nreturn_outputs::Bool: Whether to return the outputs of the RHVAE. Defaults to false. NOTE: This is necessary to avoid computing the forward pass twice when computing the loss function with regularization.\nlogp_prefactor::AbstractArray: A 3-element array to scale the log likelihood, log prior of the latent variables, and log prior of the momentum variables. Default is an array of ones.\nlogq_prefactor::AbstractArray: A 3-element array to scale the log posterior of the initial latent variables, log prior of the initial momentum variables, and the tempering Jacobian term. Default is an array of ones.\n\nReturns\n\nelbo::Number: The RHMC estimate of the ELBO. If return_outputs is true, also returns the outputs of the RHVAE.\n\n\n\n\n\nriemannian_hamiltonian_elbo(\n rhvae::RHVAE,\n x::AbstractVector;\n K::Int=3,\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n βₒ::Number=0.3f0,\n steps::Int=3,\n ∇H_kwargs::Union{NamedTuple,Dict}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n position_logprior=spherical_logprior,\n momentum_logprior=riemannian_logprior,\n G_inv=G_inv,\n ),\n tempering_schedule::Function=quadratic_tempering,\n return_outputs::Bool=false,\n logp_prefactor::AbstractArray=ones(Float32, 3),\n logq_prefactor::AbstractArray=ones(Float32, 3),\n)\n\nCompute the Riemannian Hamiltonian Monte Carlo (RHMC) estimate of the evidence lower bound (ELBO) for a Riemannian Hamiltonian Variational Autoencoder (RHVAE).\n\nThis function takes as input an RHVAE, a NamedTuple of metric parameters, and a vector of input data x. It performs K RHMC steps with a leapfrog integrator and a tempering schedule to estimate the ELBO. The ELBO is computed as the difference between the log p̄ and log q̄ as\n\nelbo = mean(log p̄ - log q̄)\n\nArguments\n\nrhvae::RHVAE: The RHVAE used to encode the input data and decode the latent space.\nx::AbstractVector: The input data.\n\nOptional Keyword Arguments\n\n∇H_kwargs::Union{NamedTuple,Dict}: Additional keyword arguments to be passed to the ∇hamiltonian function. Defaults to a NamedTuple with :decoder_loglikelihood set to decoder_loglikelihood, :position_logprior set to spherical_logprior, :momentum_logprior set to riemannian_logprior, and :G_inv set to G_inv.\nK::Int: The number of RHMC steps (default is 3).\nϵ::Union{<:Number,<:AbstractVector}: The step size for the leapfrog integrator (default is 0.001).\nβₒ::Number: The initial inverse temperature (default is 0.3).\nsteps::Int: The number of leapfrog steps (default is 3).\nG_inv::Function: The function to compute the inverse of the Riemannian metric tensor (default is G_inv).\ntempering_schedule::Function: The tempering schedule function used in the RHMC (default is quadratic_tempering).\nreturn_outputs::Bool: Whether to return the outputs of the RHVAE. Defaults to false. NOTE: This is necessary to avoid computing the forward pass twice when computing the loss function with regularization.\nlogp_prefactor::AbstractArray: A 3-element array to scale the log likelihood, log prior of the latent variables, and log prior of the momentum variables. Default is an array of ones.\nlogq_prefactor::AbstractArray: A 3-element array to scale the log posterior of the initial latent variables, log prior of the initial momentum variables, and the tempering Jacobian term. Default is an array of ones.\n\nReturns\n\nelbo::Number: The RHMC estimate of the ELBO. If return_outputs is true, also returns the outputs of the RHVAE.\n\n\n\n\n\nriemannian_hamiltonian_elbo(\n rhvae::RHVAE,\n metric_param::NamedTuple,\n x_in::AbstractArray,\n x_out::AbstractArray;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n K::Int=3,\n βₒ::Number=0.3f0,\n steps::Int=3,\n ∇H_kwargs::Union{NamedTuple,Dict}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n position_logprior=spherical_logprior,\n momentum_logprior=riemannian_logprior,\n G_inv=G_inv,\n ),\n tempering_schedule::Function=quadratic_tempering,\n return_outputs::Bool=false,\n logp_prefactor::AbstractArray=ones(Float32, 3),\n logq_prefactor::AbstractArray=ones(Float32, 3),\n)\n\nCompute the Riemannian Hamiltonian Monte Carlo (RHMC) estimate of the evidence lower bound (ELBO) for a Riemannian Hamiltonian Variational Autoencoder (RHVAE).\n\nThis function takes as input an RHVAE, a NamedTuple of metric parameters, and a vector of input data x. It performs K RHMC steps with a leapfrog integrator and a tempering schedule to estimate the ELBO. The ELBO is computed as the difference between the log p̄ and log q̄ as\n\nelbo = mean(log p̄ - log q̄),\n\nArguments\n\nrhvae::RHVAE: The RHVAE used to encode the input data and decode the latent space.\nmetric_param::NamedTuple: The parameters used to compute the metric tensor.\nx_in::AbstractArray: Input data to the RHVAE encoder. The last dimension is taken as having each of the samples in a batch.\nx_out::AbstractArray: Target data to compute the reconstruction error. The last dimension is taken as having each of the samples in a batch.\n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}: The step size for the leapfrog integrator (default is 0.01).\nK::Int: The number of RHMC steps (default is 3).\nβₒ::Number: The initial inverse temperature (default is 0.3).\nsteps::Int: The number of leapfrog steps (default is 3).\n∇H_kwargs::Union{NamedTuple,Dict}: Additional keyword arguments to be passed to the ∇hamiltonian function. Defaults to a NamedTuple with :decoder_loglikelihood set to decoder_loglikelihood, :position_logprior set to spherical_logprior, and :momentum_logprior set to riemannian_logprior.\nG_inv::Function: The function to compute the inverse of the Riemannian metric tensor. Defaults to G_inv.\ntempering_schedule::Function: The tempering schedule function used in the RHMC (default is quadratic_tempering).\nreturn_outputs::Bool: Whether to return the outputs of the RHVAE. Defaults to false. NOTE: This is necessary to avoid computing the forward pass twice when computing the loss function with regularization.\nlogp_prefactor::AbstractArray: A 3-element array to scale the log likelihood, log prior of the latent variables, and log prior of the momentum variables. Default is an array of ones.\nlogq_prefactor::AbstractArray: A 3-element array to scale the log posterior of the initial latent variables, log prior of the initial momentum variables, and the tempering Jacobian term. Default is an array of ones.\n\nReturns\n\nelbo::Number: The RHMC estimate of the ELBO. If return_outputs is true, also returns the outputs of the RHVAE.\n\n\n\n\n\nriemannian_hamiltonian_elbo(\n rhvae::RHVAE,\n x_in::AbstractArray,\n x_out::AbstractArray;\n K::Int=3,\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n βₒ::Number=0.3f0,\n steps::Int=3,\n ∇H_kwargs::Union{NamedTuple,Dict}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n position_logprior=spherical_logprior,\n momentum_logprior=riemannian_logprior,\n G_inv=G_inv,\n ),\n tempering_schedule::Function=quadratic_tempering,\n return_outputs::Bool=false,\n logp_prefactor::AbstractArray=ones(Float32, 3),\n logq_prefactor::AbstractArray=ones(Float32, 3),\n)\n\nCompute the Riemannian Hamiltonian Monte Carlo (RHMC) estimate of the evidence lower bound (ELBO) for a Riemannian Hamiltonian Variational Autoencoder (RHVAE).\n\nThis function takes as input an RHVAE, a NamedTuple of metric parameters, and a vector of input data x. It performs K RHMC steps with a leapfrog integrator and a tempering schedule to estimate the ELBO. The ELBO is computed as the difference between the log p̄ and log q̄ as\n\nelbo = mean(log p̄ - log q̄).\n\nArguments\n\nrhvae::RHVAE: The RHVAE used to encode the input data and decode the latent space.\nx_in::AbstractArray: Input data to the RHVAE encoder. The last dimension is taken as having each of the samples in a batch.\nx_out::AbstractArray: Target data to compute the reconstruction error. The last dimension is taken as having each of the samples in a batch.\n\nOptional Keyword Arguments\n\n∇H_kwargs::Union{NamedTuple,Dict}: Additional keyword arguments to be passed to the ∇hamiltonian function. Defaults to a NamedTuple with :decoder_loglikelihood set to decoder_loglikelihood, :position_logprior set to spherical_logprior, :momentum_logprior set to riemannian_logprior, and :G_inv set to G_inv.\nK::Int: The number of RHMC steps (default is 3).\nϵ::Union{<:Number,<:AbstractVector}: The step size for the leapfrog integrator (default is 0.001).\nβₒ::Number: The initial inverse temperature (default is 0.3).\nsteps::Int: The number of leapfrog steps (default is 3).\nG_inv::Function: The function to compute the inverse of the Riemannian metric tensor (default is G_inv).\ntempering_schedule::Function: The tempering schedule function used in the RHMC (default is quadratic_tempering).\nreturn_outputs::Bool: Whether to return the outputs of the RHVAE. Defaults to false. NOTE: This is necessary to avoid computing the forward pass twice when computing the loss function with regularization.\nlogp_prefactor::AbstractArray: A 3-element array to scale the log likelihood, log prior of the latent variables, and log prior of the momentum variables. Default is an array of ones.\nlogq_prefactor::AbstractArray: A 3-element array to scale the log posterior of the initial latent variables, log prior of the initial momentum variables, and the tempering Jacobian term. Default is an array of ones.\n\nReturns\n\nelbo::Number: The RHMC estimate of the ELBO. If return_outputs is true, also returns the outputs of the RHVAE.\n\n\n\n\n\n","category":"function"},{"location":"rhvae/#Default-initializations","page":"RHVAE","title":"Default initializations","text":"","category":"section"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"AutoEncoderToolkit.jl provides default initializations for both the metric tensor network and the RHVAE. Although less flexible than defining your own initial networks, these can serve as a good starting point for your experiments.","category":"page"},{"location":"rhvae/","page":"RHVAE","title":"RHVAE","text":"AutoEncoderToolkit.RHVAEs.MetricChain(\n ::Int,\n ::Int,\n ::Vector{<:Int},\n ::Vector{<:Function},\n ::Function;\n)\nAutoEncoderToolkit.RHVAEs.RHVAE(\n ::AutoEncoderToolkit.VAEs.VAE,\n ::AutoEncoderToolkit.RHVAEs.MetricChain,\n ::AbstractArray{AbstractFloat},\n T::AbstractFloat,\n λ::AbstractFloat\n)","category":"page"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.MetricChain-Tuple{Int64, Int64, Vector{<:Int64}, Vector{<:Function}, Function}","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.MetricChain","text":"MetricChain(\n n_input::Int,\n n_latent::Int,\n metric_neurons::Vector{<:Int},\n metric_activation::Vector{<:Function},\n output_activation::Function;\n init::Function=Flux.glorot_uniform\n) -> MetricChain\n\nConstruct a MetricChain for computing the Riemannian metric tensor in the latent space.\n\nArguments\n\nn_input::Int: The number of input features.\nn_latent::Int: The dimension of the latent space.\nmetric_neurons::Vector{<:Int}: The number of neurons in each hidden layer of the MLP.\nmetric_activation::Vector{<:Function}: The activation function for each hidden layer of the MLP.\noutput_activation::Function: The activation function for the output layer.\ninit::Function: The initialization function for the weights in the layers (default is Flux.glorot_uniform).\n\nReturns\n\nMetricChain: A MetricChain object that includes the MLP, and two dense layers for computing the elements of a lower-triangular matrix used to compute the Riemannian metric tensor in latent space.\n\n\n\n\n\n","category":"method"},{"location":"rhvae/#AutoEncoderToolkit.RHVAEs.RHVAE-Tuple{AutoEncoderToolkit.VAEs.VAE, AutoEncoderToolkit.RHVAEs.MetricChain, AbstractArray{AbstractFloat}, AbstractFloat, AbstractFloat}","page":"RHVAE","title":"AutoEncoderToolkit.RHVAEs.RHVAE","text":"RHVAE(\n vae::VAE, \n metric_chain::MetricChain, \n centroids_data::AbstractArray, \n T::Number, \n λ::Number\n)\n\nConstruct a Riemannian Hamiltonian Variational Autoencoder (RHVAE) from a standard VAE and a metric chain.\n\nArguments\n\nvae::VAE: A standard Variational Autoencoder (VAE) model.\nmetric_chain::MetricChain: A chain of metrics to be used for the Riemannian Hamiltonian Monte Carlo (RHMC) sampler.\ncentroids_data::AbstractArray: An array of data centroids. Each column represents a centroid. N is a subtype of Number.\nT::N: The temperature parameter for the inverse metric tensor. N is a subtype of Number.\nλ::N: The regularization parameter for the inverse metric tensor. N is a subtype of Number.\n\nReturns\n\nA new RHVAE object.\n\nDescription\n\nThe constructor initializes the latent centroids and the metric tensor M to their default values. The latent centroids are initialized to a zero matrix of the same size as centroids_data, and M is initialized to a 3D array of identity matrices, one for each centroid.\n\n\n\n\n\n","category":"method"},{"location":"hvae/#HVAEsmodule","page":"HVAE","title":"Hamiltonian Variational Autoencoder","text":"","category":"section"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"The Hamiltonian Variational Autoencoder (HVAE) is a variant of the Variational autoencoder (VAE) that uses Hamiltonian dynamics to improve the sampling of the latent space representation. HVAE combines ideas from Hamiltonian Monte Carlo, annealed importance sampling, and variational inference to improve the latent space representation of the VAE.","category":"page"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"For the implementation of the HVAE in AutoEncoderToolkit.jl, the HVAE struct inherits directly from the VAE struct and adds the necessary functions to compute the Hamiltonian dynamics steps as part of the training protocol. An HVAE object is created by simply passing a VAE object to the constructor. This way, we can use Julias multiple dispatch to extend the functionality of the VAE object without having to redefine the entire structure.","category":"page"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"warning: Warning\nHVAEs require the computation of nested gradients. This means that the AutoDiff framework must differentiate a function of an already AutoDiff differentiated function. This is known to be problematic for Julia's AutoDiff backends. See details below to understand how to we circumvent this problem.","category":"page"},{"location":"hvae/#Reference","page":"HVAE","title":"Reference","text":"","category":"section"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"Caterini, A. L., Doucet, A. & Sejdinovic, D. Hamiltonian Variational Auto-Encoder. 11 (2018).","category":"page"},{"location":"hvae/#HVAEstruct","page":"HVAE","title":"HVAE struct","text":"","category":"section"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"AutoEncoderToolkit.HVAEs.HVAE","category":"page"},{"location":"hvae/#AutoEncoderToolkit.HVAEs.HVAE","page":"HVAE","title":"AutoEncoderToolkit.HVAEs.HVAE","text":"struct HVAE{\n V<:VAE{<:AbstractVariationalEncoder,<:AbstractVariationalDecoder}\n} <: AbstractVariationalAutoEncoder\n\nHamiltonian Variational Autoencoder (HVAE) model defined for Flux.jl.\n\nFields\n\nvae::V: A Variational Autoencoder (VAE) model that forms the basis of the HVAE. V is a subtype of VAE with a specific AbstractVariationalEncoder and AbstractVariationalDecoder.\n\nAn HVAE is a type of Variational Autoencoder (VAE) that uses Hamiltonian Monte Carlo (HMC) to sample from the posterior distribution in the latent space. The VAE's encoder compresses the input into a low-dimensional probabilistic representation q(z|x). The VAE's decoder tries to reconstruct the original input from a sampled point in the latent space p(x|z). \n\nThe HMC sampling in the latent space allows the HVAE to better capture complex posterior distributions compared to a standard VAE, which assumes a simple Gaussian posterior. This can lead to more accurate reconstructions and better disentanglement of latent variables.\n\n\n\n\n\n","category":"type"},{"location":"hvae/#Forward-pass","page":"HVAE","title":"Forward pass","text":"","category":"section"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"AutoEncoderToolkit.HVAEs.HVAE(::AbstractArray)","category":"page"},{"location":"hvae/#AutoEncoderToolkit.HVAEs.HVAE-Tuple{AbstractArray}","page":"HVAE","title":"AutoEncoderToolkit.HVAEs.HVAE","text":"(hvae::HVAE{VAE{E,D}})(\n x::AbstractArray;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n K::Int=3,\n βₒ::Number=0.3f0,\n ∇U_kwargs::Union{Dict,NamedTuple}=(\n reconstruction_loglikelihood=reconstruction_loglikelihood,\n latent_logprior=spherical_logprior,\n ),\n tempering_schedule::Function=quadratic_tempering,\n latent::Bool=false,\n) where {E<:AbstractGaussianLogEncoder,D<:AbstractVariationalDecoder}\n\nRun the Hamiltonian Variational Autoencoder (HVAE) on the given input.\n\nArguments\n\nx::AbstractArray: The input to the HVAE. If Vector, it represents a single data point. If Array, the last dimension must contain each of the data points.\n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}=0.0001: The step size for the leapfrog steps in the HMC part of the HVAE. If it is a scalar, the same step size is used for all dimensions. If it is an array, each element corresponds to the step size for a specific dimension.\nK::Int=3: The number of leapfrog steps to perform in the Hamiltonian Monte Carlo (HMC) part of the HVAE.\nβₒ::Number=0.3f0: The initial inverse temperature for the tempering schedule.\n∇U_kwargs::Union{Dict,NamedTuple}: Additional keyword arguments to be passed to the ∇potential_energy function. Default is a NamedTuple with reconstruction_loglikelihood and latent_logprior.\ntempering_schedule::Function=quadratic_tempering: The function to compute the tempering schedule in the HVAE.\nlatent::Bool=false: If true, the function returns a NamedTuple containing the outputs of the encoder and decoder, and the final state of the phase space after the leapfrog and tempering steps. If false, the function only returns the output of the decoder.\n\nReturns\n\nIf latent=true, the function returns a NamedTuple with the following fields:\n\nencoder: The outputs of the encoder.\ndecoder: The output of the decoder.\nphase_space: The final state of the phase space after the leapfrog and tempering steps.\n\nIf latent=false, the function only returns the output of the decoder.\n\nDescription\n\nThis function runs the HVAE on the given input. It first passes the input through the encoder to obtain the mean and log standard deviation of the latent space. It then uses the reparameterization trick to sample from the latent space. After that, it performs the leapfrog and tempering steps to refine the sample from the latent space. Finally, it passes the refined sample through the decoder to obtain the output.\n\nNotes\n\nEnsure that the dimensions of x match the input dimensions of the HVAE, and that the dimensions of ϵ match the dimensions of the latent space.\n\n\n\n\n\n","category":"method"},{"location":"hvae/#Loss-function","page":"HVAE","title":"Loss function","text":"","category":"section"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"AutoEncoderToolkit.HVAEs.loss","category":"page"},{"location":"hvae/#AutoEncoderToolkit.HVAEs.loss","page":"HVAE","title":"AutoEncoderToolkit.HVAEs.loss","text":"loss(\n hvae::HVAE,\n x::AbstractArray;\n K::Int=3,\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n βₒ::Number=0.3f0,\n ∇U_kwargs::Union{Dict,NamedTuple}=(\n reconstruction_loglikelihood=reconstruction_loglikelihood,\n latent_logprior=spherical_logprior,\n ),\n tempering_schedule::Function=quadratic_tempering,\n reg_function::Union{Function,Nothing}=nothing,\n reg_kwargs::Union{NamedTuple,Dict}=Dict(),\n reg_strength::Float32=1.0f0,\n logp_prefactor::AbstractArray=ones(Float32, 3),\n logq_prefactor::AbstractArray=ones(Float32, 3),\n)\n\nCompute the loss for a Hamiltonian Variational Autoencoder (HVAE).\n\nArguments\n\nhvae::HVAE: The HVAE used to encode the input data and decode the latent space.\nx::AbstractArray: Input data to the HVAE encoder. The last dimension is taken as having each of the samples in a batch.\n\nOptional Keyword Arguments\n\nK::Int: The number of HMC steps (default is 3).\nϵ::Union{<:Number,<:AbstractVector}: The step size for the leapfrog integrator (default is 0.001).\nβₒ::Number: The initial inverse temperature (default is 0.3).\n∇U_kwargs::Union{Dict,NamedTuple}: Additional keyword arguments to be passed to the ∇potential_energy function.\ntempering_schedule::Function: The tempering schedule function used in the HMC (default is quadratic_tempering).\nreg_function::Union{Function, Nothing}=nothing: A function that computes the regularization term based on the VAE outputs. This function must take as input the VAE outputs and the keyword arguments provided in reg_kwargs.\nreg_kwargs::Union{NamedTuple,Dict}=Dict(): Keyword arguments to pass to the regularization function.\nreg_strength::Float32=1.0f0: The strength of the regularization term.\nlogp_prefactor::AbstractArray: A 3-element array to scale the log likelihood, log prior of the latent variables, and log prior of the momentum variables. Default is an array of ones.\nlogq_prefactor::AbstractArray: A 3-element array to scale the log posterior of the initial latent variables, log prior of the initial momentum variables, and the tempering Jacobian term. Default is an array of ones.\n\nReturns\n\nThe computed loss.\n\n\n\n\n\nloss(\n hvae::HVAE,\n x_in::AbstractArray,\n x_out::AbstractArray;\n K::Int=3,\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n βₒ::Number=0.3f0,\n ∇U_kwargs::Union{Dict,NamedTuple}=(\n reconstruction_loglikelihood=reconstruction_loglikelihood,\n latent_logprior=spherical_logprior,\n ),\n tempering_schedule::Function=quadratic_tempering,\n reg_function::Union{Function,Nothing}=nothing,\n reg_kwargs::Union{NamedTuple,Dict}=Dict(),\n reg_strength::Float32=1.0f0,\n logp_prefactor::AbstractArray=ones(Float32, 3),\n logq_prefactor::AbstractArray=ones(Float32, 3),\n)\n\nCompute the loss for a Hamiltonian Variational Autoencoder (HVAE).\n\nArguments\n\nhvae::HVAE: The HVAE used to encode the input data and decode the latent space.\nx_in::AbstractArray: Input data to the HVAE encoder. The last dimension is taken as having each of the samples in a batch.\nx_out::AbstractArray: The data against which the reconstruction is compared. If Array, the last dimension must contain each of the data points.\n\nOptional Keyword Arguments\n\nK::Int: The number of HMC steps (default is 3).\nϵ::Union{<:Number,<:AbstractVector}: The step size for the leapfrog integrator (default is 0.001).\nβₒ::Number: The initial inverse temperature (default is 0.3).\n∇U_kwargs::Union{Dict,NamedTuple}: Additional keyword arguments to be passed to the ∇potential_energy function.\ntempering_schedule::Function: The tempering schedule function used in the HMC (default is quadratic_tempering).\nreg_function::Union{Function, Nothing}=nothing: A function that computes the regularization term based on the VAE outputs. This function must take as input the VAE outputs and the keyword arguments provided in reg_kwargs.\nreg_kwargs::Union{NamedTuple,Dict}=Dict(): Keyword arguments to pass to the regularization function.\nreg_strength::Float32=1.0f0: The strength of the regularization term.\nlogp_prefactor::AbstractArray: A 3-element array to scale the log likelihood, log prior of the latent variables, and log prior of the momentum variables. Default is an array of ones.\nlogq_prefactor::AbstractArray: A 3-element array to scale the log posterior of the initial latent variables, log prior of the initial momentum variables, and the tempering Jacobian term. Default is an array of ones.\n\nReturns\n\nThe computed loss.\n\n\n\n\n\n","category":"function"},{"location":"hvae/#Training","page":"HVAE","title":"Training","text":"","category":"section"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"AutoEncoderToolkit.HVAEs.train!","category":"page"},{"location":"hvae/#AutoEncoderToolkit.HVAEs.train!","page":"HVAE","title":"AutoEncoderToolkit.HVAEs.train!","text":"train!(\n hvae::HVAE, \n x::AbstractArray, \n opt::NamedTuple; \n loss_function::Function=loss, \n loss_kwargs::Union{NamedTuple,Dict}=Dict(),\n verbose::Bool=false,\n loss_return::Bool=false,\n)\n\nCustomized training function to update parameters of a Hamiltonian Variational Autoencoder given a specified loss function.\n\nArguments\n\nhvae::HVAE: A struct containing the elements of a Hamiltonian Variational Autoencoder.\nx::AbstractArray: Input data to the HVAE encoder. The last dimension is taken as having each of the samples in a batch.\nopt::NamedTuple: State of the optimizer for updating parameters. Typically initialized using Flux.Optimisers.update!.\n\nOptional Keyword Arguments\n\nloss_function::Function=loss: The loss function used for training. It should accept the HVAE model, data x, and keyword arguments in that order.\nloss_kwargs::Dict=Dict(): Arguments for the loss function. These might include parameters like K, ϵ, βₒ, steps, ∇H, ∇H_kwargs, tempering_schedule, reg_function, reg_kwargs, reg_strength, depending on the specific loss function in use.\nverbose::Bool=false: Whether to print the loss at each iteration.\nloss_return::Bool=false: Whether to return the loss at each iteration.\n\nDescription\n\nTrains the HVAE by:\n\nComputing the gradient of the loss w.r.t the HVAE parameters.\nUpdating the HVAE parameters using the optimizer.\nUpdating the metric parameters.\n\n\n\n\n\ntrain!(\n hvae::HVAE, \n x_in::AbstractArray,\n x_out::AbstractArray,\n opt::NamedTuple; \n loss_function::Function=loss, \n loss_kwargs::Union{NamedTuple,Dict}=Dict(),\n verbose::Bool=false,\n loss_return::Bool=false,\n)\n\nCustomized training function to update parameters of a Hamiltonian Variational Autoencoder given a specified loss function.\n\nArguments\n\nhvae::HVAE: A struct containing the elements of a Hamiltonian Variational Autoencoder.\nx_in::AbstractArray: Input data to the HVAE encoder. The last dimension is taken as having each of the samples in a batch.\nx_out::AbstractArray: Target data to compute the reconstruction error. The last dimension is taken as having each of the samples in a batch.\nopt::NamedTuple: State of the optimizer for updating parameters. Typically initialized using Flux.Optimisers.update!.\n\nOptional Keyword Arguments\n\nloss_function::Function=loss: The loss function used for training. It should accept the HVAE model, data x, and keyword arguments in that order.\nloss_kwargs::Dict=Dict(): Arguments for the loss function. These might include parameters like K, ϵ, βₒ, steps, ∇H, ∇H_kwargs, tempering_schedule, reg_function, reg_kwargs, reg_strength, depending on the specific loss function in use.\nverbose::Bool=false: Whether to print the loss at each iteration.\nloss_return::Bool=false: Whether to return the loss at each iteration.\n\nDescription\n\nTrains the HVAE by:\n\nComputing the gradient of the loss w.r.t the HVAE parameters.\nUpdating the HVAE parameters using the optimizer.\nUpdating the metric parameters.\n\n\n\n\n\n","category":"function"},{"location":"hvae/#gradpotenergy","page":"HVAE","title":"Computing the gradient of the potential energy","text":"","category":"section"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"One of the crucial components in the training of the HVAE is the computation of the gradient of the potential energy nabla U with respect to the latent space representation. This gradient is used in the leapfrog steps of the Hamiltonian dynamics. When training the HVAE, we need to backpropagate through the leapfrog steps to update the parameters of the neural network. This requires computing a gradient of a function of the gradient of the potential energy, i.e., nested gradients. Zygote.jl the main AutoDiff backend in Flux.jl famously struggle with these types of computations. Specifically, Zygote.jl does not support Zygote over Zygote differentiation (meaning differentiating a function of something previously differentiated with Zygote using Zygote), or Zygote over ForwardDiff (meaning differentiating a function of something differentiated with ForwardDiff using Zygote).","category":"page"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"With this, we are left with a couple of options to compute the gradient of the potential energy:","category":"page"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"Use finite differences to approximate the gradient of the potential energy.\nUse the relatively new TaylorDiff.jl AutoDiff backend to compute the gradient of the potential energy. This backend is composable with Zygote.jl, so we can, in principle, do Zygote over TaylorDiff differentiation.","category":"page"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"The second option would be preferred, as the gradients computed with TaylorDiff are much more accurate than the ones computed with finite differences. However, there are two problems with this approach:","category":"page"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"The TaylorDiff nested gradient capability stopped working with Julia ≥ 1.10, as discussed in #70.\nEven for Julia < 1.10, we could not get TaylorDiff to work on CUDA devices. (PRs are welcome!)","category":"page"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"With these limitations in mind, we have implemented the gradient of the potential using both finite differences and TaylorDiff. The user can choose which method to use by setting the adtype keyword argument in the ∇U_kwargs in the loss function to either :finite or :TaylorDiff. This means that for the train! function, the user can pass loss_kwargs that looks like this:","category":"page"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"# Define the autodiff backend to use\nloss_kwargs = Dict(\n :∇U_kwargs => Dict(\n :adtype => :finite\n )\n)","category":"page"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"note: Note\nAlthough verbose, the nested dictionaries help to keep everything organized. (PRs with better design ideas are welcome!)","category":"page"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"The default both for cpu and gpu devices is :finite.","category":"page"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"AutoEncoderToolkit.HVAEs.∇potential_energy_finite\nAutoEncoderToolkit.HVAEs.∇potential_energy_TaylorDiff","category":"page"},{"location":"hvae/#AutoEncoderToolkit.HVAEs.∇potential_energy_finite","page":"HVAE","title":"AutoEncoderToolkit.HVAEs.∇potential_energy_finite","text":"∇potential_energy_finite(\n x::AbstractArray,\n z::AbstractVecOrMat,\n decoder::AbstractVariationalDecoder,\n decoder_output::NamedTuple;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n latent_logprior::Function=spherical_logprior,\n fdtype::Symbol=:central\n)\n\nCompute the gradient of the potential energy of a Hamiltonian Variational Autoencoder (HVAE) with respect to the latent variables z using finite difference method. This function returns the gradient of the potential energy computed for given data x and latent variable z.\n\nArguments\n\nx::AbstractArray: An array representing the input data. The last dimension corresponds to different data points.\nz::AbstractVecOrMat: A latent variable encoding of the input data. If a matrix, each column corresponds to a different data point.\ndecoder::AbstractVariationalDecoder: A decoder that maps the latent variables to the data space.\ndecoder_output::NamedTuple: The output of the decoder.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function=decoder_loglikelihood: A function representing the log-likelihood function used by the decoder. The function must take as first input an AbstractVariationalDecoder struct, as second input an array x representing the data, and as third input a vector or matrix z representing the latent variable. Default is decoder_loglikelihood.\nlatent_logprior::Function=spherical_logprior: A function representing the log-prior distribution used in the autoencoder. The function must take as single input a vector or matrix z representing the latent variable. Default is spherical_logprior. \nfdtype::Symbol=:central: A symbol representing the type of finite difference method to use. Default is :central, but it can also be :forward.\n\nReturns\n\ngradient: The computed gradient of the potential energy for the given input x and latent variable z.\n\n\n\n\n\n∇potential_energy_finite(\n x::AbstractArray,\n z::AbstractVecOrMat,\n hvae::HVAE;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n latent_logprior::Function=spherical_logprior,\n fdtype::Symbol=:central\n)\n\nCompute the gradient of the potential energy of a Hamiltonian Variational Autoencoder (HVAE) with respect to the latent variables z using finite difference method. This function returns the gradient of the potential energy computed for given data x and latent variable z.\n\nArguments\n\nx::AbstractArray: An array representing the input data. The last dimension corresponds to different data points.\nz::AbstractVecOrMat: A latent variable encoding of the input data. If a matrix, each column corresponds to a different data point.\nhvae::HVAE: An HVAE model that contains a decoder which maps the latent variables to the data space.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function=decoder_loglikelihood: A function representing the log-likelihood function used by the decoder. The function must take as first input an array x representing the data, as second input a vector or matrix z representing the latent variable, and as third input a decoder. Default is decoder_loglikelihood.\nlatent_logprior::Function=spherical_logprior: A function representing the log-prior distribution used in the autoencoder. The function must take as single input a vector or matrix z representing the latent variable. Default is spherical_logprior. \nfdtype::Symbol=:central: A symbol representing the type of finite difference method to use. Default is :central, but it can also be :forward.\n\nReturns\n\ngradient: The computed gradient of the potential energy for the given input x and latent variable z.\n\n\n\n\n\n","category":"function"},{"location":"hvae/#AutoEncoderToolkit.HVAEs.∇potential_energy_TaylorDiff","page":"HVAE","title":"AutoEncoderToolkit.HVAEs.∇potential_energy_TaylorDiff","text":"∇potential_energy_TaylorDiff(\n x::AbstractArray,\n z::AbstractVecOrMat,\n hvae::HVAE;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n latent_logprior::Function=spherical_logprior,\n)\n\nCompute the gradient of the potential energy of a Hamiltonian Variational Autoencoder (HVAE) with respect to the latent variables z using Taylor series differentiation. This function returns the gradient of the potential energy computed for given data x and latent variable z.\n\nArguments\n\nx::AbstractArray: An array representing the input data. The last dimension corresponds to different data points.\nz::AbstractVecOrMat: A latent variable encoding of the input data. If a matrix, each column corresponds to a different data point.\nhvae::HVAE: An HVAE model that contains a decoder which maps the latent variables to the data space.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function=decoder_loglikelihood: A function representing the log-likelihood function used by the decoder. The function must take as first input an array x representing the data, as second input a vector or matrix z representing the latent variable, and as third input a decoder. Default is decoder_loglikelihood.\nlatent_logprior::Function=spherical_logprior: A function representing the log-prior distribution used in the autoencoder. The function must take as single input a vector or matrix z representing the latent variable. Default is spherical_logprior. \n\nReturns\n\ngradient: The computed gradient of the potential energy for the given input x and latent variable z.\n\n\n\n\n\n∇potential_energy_TaylorDiff(\n x::AbstractArray,\n z::AbstractVecOrMat,\n hvae::HVAE;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n latent_logprior::Function=spherical_logprior,\n)\n\nCompute the gradient of the potential energy of a Hamiltonian Variational Autoencoder (HVAE) with respect to the latent variables z using Taylor series differentiation. This function returns the gradient of the potential energy computed for given data x and latent variable z.\n\nArguments\n\nx::AbstractArray: An array representing the input data. The last dimension corresponds to different data points.\nz::AbstractVecOrMat: A latent variable encoding of the input data. If a matrix, each column corresponds to a different data point.\nhvae::HVAE: An HVAE model that contains a decoder which maps the latent variables to the data space.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function=decoder_loglikelihood: A function representing the log-likelihood function used by the decoder. The function must take as first input an array x representing the data, as second input a vector or matrix z representing the latent variable, and as third input a decoder. Default is decoder_loglikelihood.\nlatent_logprior::Function=spherical_logprior: A function representing the log-prior distribution used in the autoencoder. The function must take as single input a vector or matrix z representing the latent variable. Default is spherical_logprior. \n\nReturns\n\ngradient: The computed gradient of the potential energy for the given input x and latent variable z.\n\n\n\n\n\n","category":"function"},{"location":"hvae/#Other-Functions","page":"HVAE","title":"Other Functions","text":"","category":"section"},{"location":"hvae/","page":"HVAE","title":"HVAE","text":"AutoEncoderToolkit.HVAEs.potential_energy\nAutoEncoderToolkit.HVAEs.∇potential_energy\nAutoEncoderToolkit.HVAEs.leapfrog_step\nAutoEncoderToolkit.HVAEs.quadratic_tempering\nAutoEncoderToolkit.HVAEs.null_tempering\nAutoEncoderToolkit.HVAEs.leapfrog_tempering_step\nAutoEncoderToolkit.HVAEs._log_p̄\nAutoEncoderToolkit.HVAEs._log_q̄\nAutoEncoderToolkit.HVAEs.hamiltonian_elbo","category":"page"},{"location":"hvae/#AutoEncoderToolkit.HVAEs.potential_energy","page":"HVAE","title":"AutoEncoderToolkit.HVAEs.potential_energy","text":"potential_energy(\n x::AbstractVector,\n z::AbstractVector,\n decoder::AbstractVariationalDecoder,\n decoder_output::NamedTuple;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n latent_logprior::Function=spherical_logprior\n)\n\nCompute the potential energy of a Hamiltonian Variational Autoencoder (HVAE). In the context of Hamiltonian Monte Carlo (HMC), the potential energy is defined as the negative log-posterior. This function computes the potential energy for given data x and latent variable z. It does this by computing the log-likelihood of x under the distribution defined by reconstruction_loglikelihood(x, z, decoder, decoder_output), and the log-prior of z under the latent_logprior distribution. The potential energy is then computed as:\n\n U(x, z) = -log p(x | z) - log p(z)\n\nArguments\n\nx::AbstractArray: An array representing the input data. The last dimension corresponds to different data points.\nz::AbstractVecOrMat: A latent variable encoding of the input data. If a matrix, each column corresponds to a different data point.\ndecoder::AbstractVariationalDecoder: A decoder that maps the latent variables to the data space.\ndecoder_output::NamedTuple: The output of the decoder.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function=decoder_loglikelihood: A function representing the log-likelihood function used by the decoder. The function must take as first input a vector x representing the data, as second input a vector z representing the latent variable, as third input a decoder, and as fourth input a NamedTuple representing the decoder output. Default is decoder_loglikelihood.\nlatent_logprior::Function=spherical_logprior: A function representing the log-prior distribution used in the autoencoder. The function must take as single input a vector z representing the latent variable. Default is spherical_logprior. \n\nReturns\n\nenergy: The computed potential energy for the given input x and latent variable z.\n\n\n\n\n\npotential_energy(\n x::AbstractArray,\n z::AbstractVecOrMat,\n hvae::HVAE;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n latent_logprior::Function=spherical_logprior\n)\n\nCompute the potential energy of a Hamiltonian Variational Autoencoder (HVAE). In the context of Hamiltonian Monte Carlo (HMC), the potential energy is defined as the negative log-posterior. This function computes the potential energy for given data x and latent variable z. It does this by computing the log-likelihood of x under the distribution defined by reconstruction_loglikelihood(x, z, hvae.vae.decoder, decoder_output), and the log-prior of z under the latent_logprior distribution. The potential energy is then computed as:\n\n U(x, z) = -log p(x | z) - log p(z)\n\nArguments\n\nx::AbstractArray: An array representing the input data. The last dimension corresponds to different data points.\nz::AbstractVecOrMat: A latent variable encoding of the input data. If a matrix, each column corresponds to a different data point.\nhvae::HVAE: A Hamiltonian Variational Autoencoder that contains the decoder.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function=decoder_loglikelihood: A function representing the log-likelihood function used by the decoder. The function must take as first input an array x representing the data, as second input a vector or matrix z representing the latent variable, as third input a decoder, and as fourth input a NamedTuple representing the decoder output. Default is decoder_loglikelihood.\nlatent_logprior::Function=spherical_logprior: A function representing the log-prior distribution used in the autoencoder. The function must take as single input a vector or matrix z representing the latent variable. Default is spherical_logprior. \n\nReturns\n\nenergy: The computed potential energy for the given input x and latent variable z.\n\n\n\n\n\n","category":"function"},{"location":"hvae/#AutoEncoderToolkit.HVAEs.∇potential_energy","page":"HVAE","title":"AutoEncoderToolkit.HVAEs.∇potential_energy","text":"∇potential_energy(\n x::AbstractArray,\n z::AbstractVecOrMat,\n decoder::AbstractVariationalDecoder,\n decoder_output::NamedTuple;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n latent_logprior::Function=spherical_logprior,\n adtype::Union{Symbol,Nothing}=nothing,\n adkwargs::Union{NamedTuple,Dict}=Dict(),\n)\n\nCompute the gradient of the potential energy of a Hamiltonian Variational Autoencoder (HVAE) with respect to the latent variables z using the specified automatic differentiation method. This function returns the gradient of the potential energy computed for given data x and latent variable z.\n\nArguments\n\nx::AbstractArray: An array representing the input data. The last dimension corresponds to different data points.\nz::AbstractVecOrMat: A latent variable encoding of the input data. If a matrix, each column corresponds to a different data point.\ndecoder::AbstractVariationalDecoder: A decoder that maps the latent variables to the data space.\ndecoder_output::NamedTuple: The output of the decoder.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function=decoder_loglikelihood: A function representing the log-likelihood function used by the decoder. The function must take as first input an AbstractVariationalDecoder struct, as second input an array x representing the data, and as third input a vector or matrix z representing the latent variable. Default is decoder_loglikelihood.\nlatent_logprior::Function=spherical_logprior: A function representing the log-prior distribution used in the autoencoder. The function must take as single input a vector or matrix z representing the latent variable. Default is spherical_logprior. \nadtype::Symbol=:finite: The type of automatic differentiation method to use. Must be:finiteor:TaylorDiff. Default is:finite`.\nadkwargs::Union{NamedTuple,Dict}=Dict(): Additional keyword arguments to pass to the automatic differentiation method.\n\nReturns\n\ngradient: The computed gradient of the potential energy for the given input x and latent variable z.\n\n\n\n\n\n∇potential_energy(\n x::AbstractArray,\n z::AbstractVecOrMat,\n hvae::HVAE;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n latent_logprior::Function=spherical_logprior,\n adtype::Union{Symbol,Nothing}=nothing,\n adkwargs::Union{NamedTuple,Dict}=Dict(),\n)\n\nCompute the gradient of the potential energy of a Hamiltonian Variational Autoencoder (HVAE) with respect to the latent variables z using the specified automatic differentiation method. This function returns the gradient of the potential energy computed for given data x and latent variable z.\n\nArguments\n\nx::AbstractArray: An array representing the input data. The last dimension corresponds to different data points.\nz::AbstractVecOrMat: A latent variable encoding of the input data. If a matrix, each column corresponds to a different data point.\nhvae::HVAE: An HVAE model that contains a decoder which maps the latent variables to the data space.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function=decoder_loglikelihood: A function representing the log-likelihood function used by the decoder. The function must take as first input an array x representing the data, as second input a vector or matrix z representing the latent variable, and as third input a decoder. Default is decoder_loglikelihood.\nlatent_logprior::Function=spherical_logprior: A function representing the log-prior distribution used in the autoencoder. The function must take as single input a vector or matrix z representing the latent variable. Default is spherical_logprior. \nadtype::Symbol=:finite`: The type of automatic differentiation method to\nuse. Must be :finite or :TaylorDiff. Default is :finite.\nadkwargs::Union{NamedTuple,Dict}=Dict(): Additional keyword arguments to pass to the automatic differentiation method.\n\nReturns\n\ngradient: The computed gradient of the potential energy for the given input x and latent variable z.\n\n\n\n\n\n","category":"function"},{"location":"hvae/#AutoEncoderToolkit.HVAEs.leapfrog_step","page":"HVAE","title":"AutoEncoderToolkit.HVAEs.leapfrog_step","text":"leapfrog_step(\n x::AbstractArray,\n z::AbstractVecOrMat,\n ρ::AbstractVecOrMat,\n decoder::AbstractVariationalDecoder,\n decoder_output::NamedTuple;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n ∇U_kwargs::Union{Dict,NamedTuple}=(\n reconstruction_loglikelihood=reconstruction_loglikelihood,\n latent_logprior=spherical_logprior,\n )\n)\n\nPerform a full step of the leapfrog integrator for Hamiltonian dynamics.\n\nThe leapfrog integrator is a numerical integration scheme used to simulate Hamiltonian dynamics. It consists of three steps:\n\nHalf update of the momentum variable: \n ρ(t + ϵ/2) = ρ(t) - 0.5 * ϵ * ∇z_U(z(t), ρ(t + ϵ/2)).\nFull update of the position variable: \n z(t + ϵ) = z(t) + ϵ * ρ(t + ϵ/2).\nHalf update of the momentum variable: \n ρ(t + ϵ) = ρ(t + ϵ/2) - 0.5 * ϵ * ∇z_U(z(t + ϵ), ρ(t + ϵ/2)).\n\nThis function performs these three steps in sequence.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractVecOrMat: The point in the latent space. If matrix, each column represents a point in the latent space.\nρ::AbstractVecOrMat: The momentum. If matrix, each column represents a momentum vector.\ndecoder::AbstractVariationalDecoder: The decoder instance.\ndecoder_output::NamedTuple: The output of the decoder.\n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4): The step size. Default is 0.0001.\n∇U_kwargs::Union{Dict,NamedTuple}: The keyword arguments for ∇potential_energy. Default is a tuple with reconstruction_loglikelihood and latent_logprior.\n\nReturns\n\nA tuple (z̄, ρ̄, decoder_output_z̄) representing the updated position and momentum after performing the full leapfrog step as well as the decoder output of the updated position.\n\n\n\n\n\nleapfrog_step(\n x::AbstractArray,\n z::AbstractVecOrMat,\n ρ::AbstractVecOrMat,\n hvae::HVAE;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n ∇U_kwargs::Union{Dict,NamedTuple}=(\n reconstruction_loglikelihood=reconstruction_loglikelihood,\n latent_logprior=spherical_logprior,\n )\n)\n\nPerform a full step of the leapfrog integrator for Hamiltonian dynamics.\n\nThe leapfrog integrator is a numerical integration scheme used to simulate Hamiltonian dynamics. It consists of three steps:\n\nHalf update of the momentum variable: \n ρ(t + ϵ/2) = ρ(t) - 0.5 * ϵ * ∇z_U(z(t), ρ(t + ϵ/2)).\nFull update of the position variable: \n z(t + ϵ) = z(t) + ϵ * ρ(t + ϵ/2).\nHalf update of the momentum variable: \n ρ(t + ϵ) = ρ(t + ϵ/2) - 0.5 * ϵ * ∇z_U(z(t + ϵ), ρ(t + ϵ/2)).\n\nThis function performs these three steps in sequence.\n\nArguments\n\nx::AbstractArray: The point in the data space. This does not necessarily need to be a vector. Array inputs are supported. The last dimension is assumed to have each of the data points.\nz::AbstractVecOrMat: The point in the latent space. If matrix, each column represents a point in the latent space.\nρ::AbstractVecOrMat: The momentum. If matrix, each column represents a momentum vector.\nhvae::HVAE: An HVAE model that contains the decoder.\n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4): The step size. Default is 0.0001.\n∇U_kwargs::Union{Dict,NamedTuple}: The keyword arguments for ∇potential_energy. Default is a tuple with reconstruction_loglikelihood and latent_logprior.\n\nReturns\n\nA tuple (z̄, ρ̄, decoder_output_z̄) representing the updated position and momentum after performing the full leapfrog step as well as the decoder output of the updated position.\n\n\n\n\n\n","category":"function"},{"location":"hvae/#AutoEncoderToolkit.HVAEs.quadratic_tempering","page":"HVAE","title":"AutoEncoderToolkit.HVAEs.quadratic_tempering","text":"quadratic_tempering(βₒ::AbstractFloat, k::Int, K::Int)\n\nCompute the inverse temperature βₖ at a given stage k of a tempering schedule with K total stages, using a quadratic tempering scheme. \n\nTempering is a technique used in sampling algorithms to improve mixing and convergence. It involves running parallel chains of the algorithm at different \"temperatures\", and swapping states between the chains. The \"temperature\" of a chain is controlled by an inverse temperature parameter β, which is varied according to a tempering schedule. \n\nIn a quadratic tempering schedule, the inverse temperature βₖ at stage k is computed as the square of the quantity ((1 - 1 / √(βₒ)) * (k / K)^2 + 1 / √(βₒ)), where βₒ is the initial inverse temperature. This schedule starts at βₒ when k = 0, and increases quadratically as k increases, reaching 1 when k = K.\n\nArguments\n\nβₒ::AbstractFloat: The initial inverse temperature.\nk::Int: The current stage of the tempering schedule.\nK::Int: The total number of stages in the tempering schedule.\n\nReturns\n\nβₖ::AbstractFloat: The inverse temperature at stage k.\n\n\n\n\n\n","category":"function"},{"location":"hvae/#AutoEncoderToolkit.HVAEs.null_tempering","page":"HVAE","title":"AutoEncoderToolkit.HVAEs.null_tempering","text":" null_tempering(βₒ::T, k::Int, K::Int) where {T<:AbstractFloat}\n\nReturn the initial inverse temperature βₒ. This function is used in the context of tempered Hamiltonian Monte Carlo (HMC) methods, where tempering involves running HMC at different \"temperatures\" to improve mixing and convergence. \n\nIn this case, null_tempering is a simple tempering schedule that does not actually change the temperature—it always returns the initial inverse temperature βₒ. This can be useful as a default or placeholder tempering schedule.\n\nArguments\n\nβₒ::AbstractFloat: The initial inverse temperature. \nk::Int: The current step in the tempering schedule. Not used in this function, but included for compatibility with other tempering schedules.\nK::Int: The total number of steps in the tempering schedule. Not used in this function, but included for compatibility with other tempering schedules.\n\nReturns\n\nβ::T: The inverse temperature for the current step, which is always βₒ in this case.\n\nExample\n\nβₒ = 0.5\nk = 1\nK = 10\nβ = null_tempering(βₒ, k, K) # β will be 0.5\n\n\n\n\n\n","category":"function"},{"location":"hvae/#AutoEncoderToolkit.HVAEs.leapfrog_tempering_step","page":"HVAE","title":"AutoEncoderToolkit.HVAEs.leapfrog_tempering_step","text":"leapfrog_tempering_step(\n x::AbstractArray,\n zₒ::AbstractVecOrMat,\n decoder::AbstractVariationalDecoder,\n decoder_output::NamedTuple;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n K::Int=3,\n βₒ::Number=0.3f0,\n ∇U_kwargs::Union{Dict,NamedTuple}=(\n reconstruction_loglikelihood=reconstruction_loglikelihood,\n latent_logprior=spherical_logprior,\n ),\n tempering_schedule::Function=quadratic_tempering,\n)\n\nCombines the leapfrog and tempering steps into a single function for the Hamiltonian Variational Autoencoder (HVAE).\n\nArguments\n\nx::AbstractArray: The data to be processed. If Array, the last dimension must be of size 1.\nzₒ::AbstractVecOrMat: The initial latent variable. \ndecoder::AbstractVariationalDecoder: The decoder of the HVAE model.\ndecoder_output::NamedTuple: The output of the decoder.\n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}: The step size for the leapfrog steps in the HMC algorithm. This can be a scalar or an array. Default is 0.0001. \nK::Int: The number of leapfrog steps to perform in the Hamiltonian Monte Carlo (HMC) algorithm. Default is 3.\nβₒ::Number: The initial inverse temperature for the tempering schedule. Default is 0.3f0.\n∇U_kwargs::Union{Dict,NamedTuple}: Additional keyword arguments to be passed to the ∇potential_energy function. Default is a NamedTuple with reconstruction_loglikelihood and latent_logprior.\ntempering_schedule::Function: The function to compute the inverse temperature at each step in the HMC algorithm. Defaults to quadratic_tempering. This function must take three arguments: First, βₒ, an initial inverse temperature, second, k, the current step in the tempering schedule, and third, K, the total number of steps in the tempering schedule.\n\nReturns\n\nA NamedTuple with the following keys: \nz_init: The initial latent variable. \nρ_init: The initial momentum variable. \nz_final: The final latent variable after K leapfrog steps. \nρ_final: The final momentum variable after K leapfrog steps. \nThe decoder output at the final latent variable is also returned. Note: This is not in the same named tuple as the other outputs, but as a separate output.\n\nDescription\n\nThe function first samples a random momentum variable γₒ from a standard normal distribution and scales it by the inverse square root of the initial inverse temperature βₒ to obtain the initial momentum variable ρₒ. Then, it performs K leapfrog steps, each followed by a tempering step, to generate a new sample from the latent space.\n\nNote\n\nEnsure the input data x and the initial latent variable zₒ match the expected input dimensionality for the HVAE model.\n\n\n\n\n\nleapfrog_tempering_step(\n x::AbstractArray,\n zₒ::AbstractVecOrMat,\n hvae::HVAE;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n K::Int=3,\n βₒ::Number=0.3f0,\n ∇U_kwargs::Union{Dict,NamedTuple}=(\n reconstruction_loglikelihood=reconstruction_loglikelihood,\n latent_logprior=spherical_logprior,\n ),\n tempering_schedule::Function=quadratic_tempering,\n)\n\nCombines the leapfrog and tempering steps into a single function for the Hamiltonian Variational Autoencoder (HVAE).\n\nArguments\n\nx::AbstractArray: The data to be processed. If Array, the last dimension must be of size 1.\nzₒ::AbstractVecOrMat: The initial latent variable. \nhvae::HVAE: An HVAE model that contains the decoder.\n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}: The step size for the leapfrog steps in the HMC algorithm. This can be a scalar or an array. Default is 0.0001. \nK::Int: The number of leapfrog steps to perform in the Hamiltonian Monte Carlo (HMC) algorithm. Default is 3.\nβₒ::Number: The initial inverse temperature for the tempering schedule. Default is 0.3f0.\n∇U_kwargs::Union{Dict,NamedTuple}: Additional keyword arguments to be passed to the ∇potential_energy function. Default is a NamedTuple with reconstruction_loglikelihood and latent_logprior.\ntempering_schedule::Function: The function to compute the inverse temperature at each step in the HMC algorithm. Defaults to quadratic_tempering. This function must take three arguments: First, βₒ, an initial inverse temperature, second, k, the current step in the tempering schedule, and third, K, the total number of steps in the tempering schedule.\n\nReturns\n\nA NamedTuple with the following keys: \nz_init: The initial latent variable. \nρ_init: The initial momentum variable. \nz_final: The final latent variable after K leapfrog steps. \nρ_final: The final momentum variable after K leapfrog steps. \nThe decoder output at the final latent variable is also returned. Note: This is not in the same named tuple as the other outputs, but as a separate output.\n\nDescription\n\nThe function first samples a random momentum variable γₒ from a standard normal distribution and scales it by the inverse square root of the initial inverse temperature βₒ to obtain the initial momentum variable ρₒ. Then, it performs K leapfrog steps, each followed by a tempering step, to generate a new sample from the latent space.\n\nNote\n\nEnsure the input data x and the initial latent variable zₒ match the expected input dimensionality for the HVAE model.\n\n\n\n\n\n","category":"function"},{"location":"hvae/#AutoEncoderToolkit.HVAEs._log_p̄","page":"HVAE","title":"AutoEncoderToolkit.HVAEs._log_p̄","text":"_log_p̄(\n x::AbstractArray,\n hvae::HVAE{VAE{E,D}},\n hvae_outputs::NamedTuple;\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n logprior::Function=spherical_logprior,\n prefactor::AbstractArray=ones(Float32, 3),\n)\n\nThis is an internal function used in hamiltonian_elbo to compute the numerator of the unbiased estimator of the marginal likelihood. The function computes the sum of the log likelihood of the data given the latent variables, the log prior of the latent variables, and the log prior of the momentum variables.\n\n log p̄ = log p(x | zₖ) + log p(zₖ) + log p(ρₖ)\n\nArguments\n\nx::AbstractArray: The input data. If Array, the last dimension must contain each of the data points.\nhvae::HVAE{<:VAE{<:AbstractGaussianEncoder,<:AbstractGaussianLogDecoder}}: The Hamiltonian Variational Autoencoder (HVAE) model.\nhvae_outputs::NamedTuple: The outputs of the HVAE, including the final latent variables zₖ and the final momentum variables ρₖ.\n\nOptional Keyword Arguments\n\nreconstruction_loglikelihood::Function: The function to compute the log likelihood of the data given the latent variables. Default is decoder_loglikelihood.\nlogprior::Function: The function to compute the log prior of the latent variables. Default is spherical_logprior.\nprefactor::AbstractArray: A 3-element array to scale the log likelihood, log prior of the latent variables, and log prior of the momentum variables. Default is an array of ones.\n\nReturns\n\nlog_p̄::AbstractVector: The first term of the log of the unbiased estimator of the marginal likelihood for each data point.\n\nNote\n\nThis is an internal function and should not be called directly. It is used as part of the hamiltonian_elbo function.\n\n\n\n\n\n","category":"function"},{"location":"hvae/#AutoEncoderToolkit.HVAEs._log_q̄","page":"HVAE","title":"AutoEncoderToolkit.HVAEs._log_q̄","text":"_log_q̄(\n hvae::HVAE,\n hvae_outputs::NamedTuple,\n βₒ::Number;\n logprior::Function=spherical_logprior,\n prefactor::AbstractArray=ones(Float32, 3),\n)\n\nThis is an internal function used in hamiltonian_elbo to compute the second term of the unbiased estimator of the marginal likelihood. The function computes the sum of the log posterior of the initial latent variables and the log prior of the initial momentum variables, minus a term that depends on the dimensionality of the latent space and the initial temperature.\n\nlog q̄ = log q(zₒ | x) + log p(ρₒ | zₒ) - d/2 log(βₒ)\n\nArguments\n\nhvae::HVAE: The Hamiltonian Variational Autoencoder (HVAE) model.\nhvae_outputs::NamedTuple: The outputs of the HVAE, including the initial latent variables zₒ and the initial momentum variables ρₒ.\nβₒ::Number: The initial temperature for the tempering steps.\n\nOptional Keyword Arguments\n\nlogprior::Function: The function to compute the log prior of the momentum variables. Default is spherical_logprior.\nprefactor::AbstractArray: A 3-element array to scale the log posterior of the initial latent variables, log prior of the initial momentum variables, and the tempering Jacobian term. Default is an array of ones.\n\nReturns\n\nlog_q̄::Vector: The second term of the log of the unbiased estimator of the marginal likelihood for each data point.\n\nNote\n\nThis is an internal function and should not be called directly. It is used as part of the hamiltonian_elbo function.\n\n\n\n\n\n","category":"function"},{"location":"hvae/#AutoEncoderToolkit.HVAEs.hamiltonian_elbo","page":"HVAE","title":"AutoEncoderToolkit.HVAEs.hamiltonian_elbo","text":"hamiltonian_elbo(\n hvae::HVAE,\n x::AbstractArray;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n K::Int=3,\n βₒ::Number=0.3f0,\n ∇U_kwargs::Union{Dict,NamedTuple}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n latent_logprior=spherical_logprior,\n ),\n tempering_schedule::Function=quadratic_tempering,\n return_outputs::Bool=false,\n logp_prefactor::AbstractArray=ones(Float32, 3),\n logq_prefactor::AbstractArray=ones(Float32, 3),\n)\n\nCompute the Hamiltonian Monte Carlo (HMC) estimate of the evidence lower bound (ELBO) for a Hamiltonian Variational Autoencoder (HVAE).\n\nThis function takes as input an HVAE and a vector of input data x. It performs K HMC steps with a leapfrog integrator and a tempering schedule to estimate the ELBO. The ELBO is computed as the difference between the log p̄ and log q̄ as\n\nelbo = mean(log p̄ - log q̄),\n\nArguments\n\nhvae::HVAE: The HVAE used to encode the input data and decode the latent space.\nx::AbstractArray: The input data. If Array, the last dimension must contain each of the data points.\n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}: The step size for the leapfrog integrator (default is 0.01).\nK::Int: The number of HMC steps (default is 3).\nβₒ::Number: The initial inverse temperature (default is 0.3).\n∇U_kwargs::Union{Dict,NamedTuple}: Additional keyword arguments to be passed to the ∇potential_energy function. Defaults to a NamedTuple with :reconstruction_loglikelihood set to decoder_loglikelihood and :latent_logprior set to spherical_logprior.\ntempering_schedule::Function: The tempering schedule function used in the HMC (default is quadratic_tempering).\nreturn_outputs::Bool: Whether to return the outputs of the HVAE. Defaults to false. NOTE: This is necessary to avoid computing the forward pass twice when computing the loss function with regularization.\nlogp_prefactor::AbstractArray: A 3-element array to scale the log likelihood, log prior of the latent variables, and log prior of the momentum variables. Default is an array of ones.\nlogq_prefactor::AbstractArray: A 3-element array to scale the log posterior of the initial latent variables, log prior of the initial momentum variables, and the tempering Jacobian term. Default is an array of ones.\n\nReturns\n\nelbo::Number: The HMC estimate of the ELBO. If return_outputs is true, also returns the outputs of the HVAE.\n\n\n\n\n\nhamiltonian_elbo(\n hvae::HVAE,\n x_in::AbstractArray,\n x_out::AbstractArray;\n ϵ::Union{<:Number,<:AbstractVector}=Float32(1E-4),\n K::Int=3,\n βₒ::Number=0.3f0,\n ∇U_kwargs::Union{Dict,NamedTuple}=(\n reconstruction_loglikelihood=decoder_loglikelihood,\n latent_logprior=spherical_logprior,\n ),\n tempering_schedule::Function=quadratic_tempering,\n return_outputs::Bool=false,\n logp_prefactor::AbstractArray=ones(Float32, 3),\n logq_prefactor::AbstractArray=ones(Float32, 3),\n)\n\nCompute the Hamiltonian Monte Carlo (HMC) estimate of the evidence lower bound (ELBO) for a Hamiltonian Variational Autoencoder (HVAE).\n\nThis function takes as input an HVAE and a vector of input data x. It performs K HMC steps with a leapfrog integrator and a tempering schedule to estimate the ELBO. The ELBO is computed as the difference between the log p̄ and log q̄ as\n\nelbo = mean(log p̄ - log q̄),\n\nArguments\n\nhvae::HVAE: The HVAE used to encode the input data and decode the latent space.\nx_in::AbstractArray: The input data. If Array, the last dimension must contain each of the data points.\nx_out::AbstractArray: The data against which the reconstruction is compared. If Array, the last dimension must contain each of the data points.\n\nOptional Keyword Arguments\n\nϵ::Union{<:Number,<:AbstractVector}: The step size for the leapfrog integrator (default is 0.01).\nK::Int: The number of HMC steps (default is 3).\nβₒ::Number: The initial inverse temperature (default is 0.3).\n∇U_kwargs::Union{Dict,NamedTuple}: Additional keyword arguments to be passed to the ∇potential_energy function. Defaults to a NamedTuple with :reconstruction_loglikelihood set to decoder_loglikelihood and :latent_logprior set to spherical_logprior.\ntempering_schedule::Function: The tempering schedule function used in the HMC (default is quadratic_tempering).\nreturn_outputs::Bool: Whether to return the outputs of the HVAE. Defaults to false. NOTE: This is necessary to avoid computing the forward pass twice when computing the loss function with regularization.\nlogp_prefactor::AbstractArray: A 3-element array to scale the log likelihood, log prior of the latent variables, and log prior of the momentum variables. Default is an array of ones.\nlogq_prefactor::AbstractArray: A 3-element array to scale the log posterior of the initial latent variables, log prior of the initial momentum variables, and the tempering Jacobian term. Default is an array of ones.\n\nReturns\n\nelbo::Number: The HMC estimate of the ELBO. If return_outputs is true, also returns the outputs of the HVAE.\n\n\n\n\n\n","category":"function"},{"location":"infomaxvae/#InfoMaxVAEsmodule","page":"InfoMax-VAE","title":"InfoMax VAE","text":"","category":"section"},{"location":"infomaxvae/","page":"InfoMax-VAE","title":"InfoMax-VAE","text":"The InfoMax VAE is a variant of the Variational Autoencoder (VAE) that aims to explicitly account for the maximization of mutual information between the latent space representation and the input data. The main difference between the InfoMax VAE and the MMD-VAE (InfoVAE) is that rather than using the Maximum-Mean Discrepancy (MMD) as a measure of the \"distance\" between the latent space, the InfoMax VAE explicitly models the mutual information between latent representations and data inputs via a separate neural network. The loss function for this separate network then takes the form of a variational lower bound on the mutual information between the latent space and the input data.","category":"page"},{"location":"infomaxvae/","page":"InfoMax-VAE","title":"InfoMax-VAE","text":"Because of the need of this separate network, the InfoMaxVAE struct in AutoEncoderToolkit.jl takes two arguments to construct: the original VAE struct and a network to compute the mutual information. To properly deploy all relevant functions associated with this second network, we also provide a MutualInfoChain struct.","category":"page"},{"location":"infomaxvae/","page":"InfoMax-VAE","title":"InfoMax-VAE","text":"Furthermore, because of the two networks and the way the training algorithm is set up, the loss function for the InfoMax VAE includes two separate loss functions: one for the MutualInfoChain and one for the InfoMaxVAE.","category":"page"},{"location":"infomaxvae/#References","page":"InfoMax-VAE","title":"References","text":"","category":"section"},{"location":"infomaxvae/","page":"InfoMax-VAE","title":"InfoMax-VAE","text":"Rezaabad, A. L. & Vishwanath, S. Learning Representations by Maximizing Mutual Information in Variational Autoencoders. Preprint at http://arxiv.org/abs/1912.13361 (2020).","category":"page"},{"location":"infomaxvae/#MutualInfoChain","page":"InfoMax-VAE","title":"MutualInfoChain struct","text":"","category":"section"},{"location":"infomaxvae/","page":"InfoMax-VAE","title":"InfoMax-VAE","text":"AutoEncoderToolkit.InfoMaxVAEs.MutualInfoChain","category":"page"},{"location":"infomaxvae/#AutoEncoderToolkit.InfoMaxVAEs.MutualInfoChain","page":"InfoMax-VAE","title":"AutoEncoderToolkit.InfoMaxVAEs.MutualInfoChain","text":"MutualInfoChain\n\nA MutualInfoChain is used to compute the variational mutual information when training an InfoMaxVAE. The chain is composed of a series of layers that must end with a single output: the mutual information between the latent variables and the input data.\n\nArguments\n\ndata::Union{Flux.Dense,Flux.Chain}: The data layer of the MutualInfoChain. This layer is used to input the data.\nlatent::Union{Flux.Dense,Flux.Chain}: The latent layer of the MutualInfoChain. This layer is used to input the latent variables.\nmlp::Flux.Chain: A multi-layer perceptron (MLP) that is used to compute the mutual information between the inputs and the latent representations. The MLP takes as input the latent variables and outputs a scalar representing the estimated variational mutual information.\n\nCitation\n\nRezaabad, A. L. & Vishwanath, S. Learning Representations by Maximizing Mutual Information in Variational Autoencoders. in 2020 IEEE International Symposium on Information Theory (ISIT) 2729–2734 (IEEE, 2020). doi:10.1109/ISIT44484.2020.9174424.\n\nNote\n\nIf the input data is not a flat array, make sure to include a flattening layer within data.\n\n\n\n\n\n","category":"type"},{"location":"infomaxvae/#InfoMaxVAE","page":"InfoMax-VAE","title":"InfoMaxVAE struct","text":"","category":"section"},{"location":"infomaxvae/","page":"InfoMax-VAE","title":"InfoMax-VAE","text":"AutoEncoderToolkit.InfoMaxVAEs.InfoMaxVAE","category":"page"},{"location":"infomaxvae/#AutoEncoderToolkit.InfoMaxVAEs.InfoMaxVAE","page":"InfoMax-VAE","title":"AutoEncoderToolkit.InfoMaxVAEs.InfoMaxVAE","text":"`InfoMaxVAE <: AbstractVariationalAutoEncoder`\n\nstruct encapsulating an InfoMax variational autoencoder (InfoMaxVAE), an architecture designed to enhance the VAE framework by maximizing mutual information between the inputs and the latent representations, as per the methods described by Rezaabad and Vishwanath (2020).\n\nThe model aims to learn representations that preserve mutual information with the input data, arguably capturing more meaningful factors of variation.\n\nFields\n\nvae::VAE: The core variational autoencoder, consisting of an encoder that maps input data into a latent space representation, and a decoder that attempts to reconstruct the input from the latent representation.\nmi::MutualInfoChain: A multi-layer perceptron (MLP) that estimates the mutual information between the input data and the latent representations.\n\nUsage\n\nThe InfoMaxVAE struct is utilized in a similar manner to a standard VAE, with the added capability of mutual information maximization as part of the training process. This involves an additional loss term that considers the output of the mi network to encourage latent representations that are informative about the input data.\n\nExample\n\n# Assuming definitions for `encoder`, `decoder`, and `mi` are provided:\ninfo_max_vae = InfoMaxVAE(VAE(encoder, decoder), mi)\n\n# During training, one would maximize both the variational lower bound and the \n# mutual information estimate provided by `mlp`.\n\nCitation\n\nRezaabad, A. L. & Vishwanath, S. Learning Representations by Maximizing Mutual Information in Variational Autoencoders. in 2020 IEEE International Symposium on Information Theory (ISIT) 2729–2734 (IEEE, 2020). doi:10.1109/ISIT44484.2020.9174424.\n\n\n\n\n\n","category":"type"},{"location":"infomaxvae/#Forward-pass","page":"InfoMax-VAE","title":"Forward pass","text":"","category":"section"},{"location":"infomaxvae/#Mutual-Information-Network","page":"InfoMax-VAE","title":"Mutual Information Network","text":"","category":"section"},{"location":"infomaxvae/","page":"InfoMax-VAE","title":"InfoMax-VAE","text":"AutoEncoderToolkit.InfoMaxVAEs.MutualInfoChain(::AbstractArray, ::AbstractVecOrMat)\n","category":"page"},{"location":"infomaxvae/#AutoEncoderToolkit.InfoMaxVAEs.MutualInfoChain-Tuple{AbstractArray, AbstractVecOrMat}","page":"InfoMax-VAE","title":"AutoEncoderToolkit.InfoMaxVAEs.MutualInfoChain","text":"(mi::MutualInfoChain)(x::AbstractArray, z::AbstractVecOrMat)\n\nForward pass function for the MutualInfoChain, which applies the MLP to an input x.\n\nArguments\n\nx::AbstractArray: The input array to be processed. The last dimension represents each data sample.\nz::AbstractVecOrMat: The latent representation of the input data. The last dimension represents each data sample.\n\nReturns\n\nThe result of applying the MutualInfoChain to the input data and the latent representation simultaneously.\n\nDescription\n\nThis function applies the MLP (Multilayer Perceptron) of a MutualInfoChain instance to an input array. The MLP is a type of neural network used in the MutualInfoChain for processing the input data.\n\n\n\n\n\n","category":"method"},{"location":"infomaxvae/#InfoMax-VAE","page":"InfoMax-VAE","title":"InfoMax VAE","text":"","category":"section"},{"location":"infomaxvae/","page":"InfoMax-VAE","title":"InfoMax-VAE","text":"AutoEncoderToolkit.InfoMaxVAEs.InfoMaxVAE(::AbstractArray)","category":"page"},{"location":"infomaxvae/#AutoEncoderToolkit.InfoMaxVAEs.InfoMaxVAE-Tuple{AbstractArray}","page":"InfoMax-VAE","title":"AutoEncoderToolkit.InfoMaxVAEs.InfoMaxVAE","text":"(vae::InfoMaxVAE)(x::AbstractArray; latent::Bool=false)\n\nProcesses the input data x through an InfoMaxVAE, which consists of an encoder, a decoder, and a multi-layer perceptron (MLP) to estimate variational mutual information.\n\nArguments\n\nx::AbstractArray: The data to be decoded. If array, the last dimension contains each data sample. \n\nOptional Keyword Arguments\n\nlatent::Bool: If true, returns a dictionary with latent variables and mutual information estimations along with the reconstruction. Defaults to false.\nseed::Union{Nothing,Int}: Optional argument. The seed for the random number generator used for shuffling the latent codes. If not provided, a random seed will be used.\n\nReturns\n\nIf latent=false: The decoder output as a NamedTuple.\nIf latent=true: A NamedTuple with the :vae field that contains the outputs of the VAE, and the :mi field that contains the estimate of the variational mutual information. Note that this estimate requires shuffling the latent codes between data samples. Therefore, it is only meaningful for batch data cases.\n\nDescription\n\nThis function first encodes the input x . It then samples from this distribution using the reparametrization trick. The sampled latent vectors are then decoded, and the MutualInfoChain is used to estimate the mutual information.\n\nNote\n\nEnsure the input data x matches the expected input dimensionality for the encoder in the InfoMaxVAE.\n\n\n\n\n\n","category":"method"},{"location":"infomaxvae/#[Loss-functions]","page":"InfoMax-VAE","title":"[Loss functions]","text":"","category":"section"},{"location":"infomaxvae/#miloss","page":"InfoMax-VAE","title":"Mutual Information Network","text":"","category":"section"},{"location":"infomaxvae/","page":"InfoMax-VAE","title":"InfoMax-VAE","text":"AutoEncoderToolkit.InfoMaxVAEs.miloss","category":"page"},{"location":"infomaxvae/#AutoEncoderToolkit.InfoMaxVAEs.miloss","page":"InfoMax-VAE","title":"AutoEncoderToolkit.InfoMaxVAEs.miloss","text":"miloss(\n vae::VAE,\n mi::MutualInfoChain,\n x::AbstractArray;\n regularization::Union{Function,Nothing}=nothing,\n reg_strength::Float32=1.0f0,\n seed::Union{Nothing,Int}=nothing\n)\n\nCalculates the loss for training the MutualInfoChain in the InfoMaxVAE algorithm to estimate mutual information between the input x and the latent representation z. The loss function is based on a variational approximation of mutual information, using the MutualInfoChain's output g(x, z). The variational mutual information is then calculated as the difference between the MutualInfoChain's output for the true x and latent z, and the exponentiated average of the MLP's output for x and the shuffled latent z_shuffle, adjusted for the regularization term if provided.\n\nArguments\n\nvae::VAE: The variational autoencoder.\nmi::MutualInfoChain: The MutualInfoChain used for estimating mutual information.\nx::AbstractArray: The input vector for the VAE.\n\nOptional Keyword Arguments\n\nregularization::Union{Function, Nothing}=nothing: A regularization function applied to the MLP's output.\nreg_strength::Float32=1.0f0: The strength of the regularization term.\nseed::Union{Nothing,Int}=nothing: The seed for the random number generator used for shuffling the latent codes. If not provided, a random seed will be used.\n\nReturns\n\nFloat32: The computed loss, representing negative variational mutual information, adjusted by the regularization term.\n\nDescription\n\nThe function computes the loss as follows:\n\nloss = -sum(I(x; z)) + sum(exp(I(x; z̃) - 1)) + regstrength * regterm\n\nwhere I(x; z) is the MLP's output representing an estimation of mutual information for true x and latent z, and z̃ represents shuffled latent variables, meaning, the latent codes are randomly swap between data points.\n\nThe function is used to separately train the MLP to estimate mutual information, which is a component of the larger InfoMaxVAE model.\n\nNotes\n\nThis function takes the vae and mi instances of an InfoMaxVAE model as separate arguments to be able to compute a gradient only with respect to the mi parameters.\nEnsure that the dimensionality of the input data x aligns with the encoder's expected input in the VAE.\nInfoMaxVAEs fully depend on batch training as the estimation of mutual information depends on shuffling the latent codes. This method works for large enough batches (≥ 64 samples).\n\n\n\n\n\n","category":"function"},{"location":"infomaxvae/#infomaxloss","page":"InfoMax-VAE","title":"InfoMax VAE","text":"","category":"section"},{"location":"infomaxvae/","page":"InfoMax-VAE","title":"InfoMax-VAE","text":"AutoEncoderToolkit.InfoMaxVAEs.infomaxloss","category":"page"},{"location":"infomaxvae/#AutoEncoderToolkit.InfoMaxVAEs.infomaxloss","page":"InfoMax-VAE","title":"AutoEncoderToolkit.InfoMaxVAEs.infomaxloss","text":"infomaxloss(\n vae::VAE,\n mi::MutualInfoChain,\n x::AbstractArray;\n β=1.0f0,\n α=1.0f0,\n n_samples::Int=1,\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n kl_divergence::Function=encoder_kl,\n regularization::Union{Function,Nothing}=nothing,\n reg_strength::Float32=1.0f0,\n seed::Union{Nothing,Int}=nothing\n)\n\nComputes the loss for an InfoMax variational autoencoder (VAE) with mutual information constraints, by averaging over n_samples latent space samples.\n\nThe loss function combines the reconstruction loss with the Kullback-Leibler (KL) divergence, the variational mutual information between input and latent representations, and possibly a regularization term, defined as:\n\nloss = -⟨log p(x|z)⟩ + β × Dₖₗ[qᵩ(z|x) || p(z)] - α × I(x;z) + regstrength × regterm\n\nWhere:\n\n⟨log p(x|z)⟩ is the expected log likelihood of the probabilistic decoder. -\n\nDₖₗ[qᵩ(z|x) || p(z)] is the KL divergence between the approximated encoder and the prior over the latent space.\n\nI(x;z) is the variational mutual information between the inputs x and the latent variables z.\n\nArguments\n\nvae::VAE: A VAE model with encoder and decoder networks.\nmi::MutualInfoChain: A MutualInfoChain instance used to estimate mutual information term.\nx::AbstractArray: Input data. The last dimension represents each data sample.\n\nOptional Keyword Arguments\n\nβ::Float32=1.0f0: Weighting factor for the KL-divergence term, used for annealing.\nα::Float32=1.0f0: Weighting factor for the mutual information term.\nn_samples::Int=1: The number of samples to draw from the latent space when computing the loss.\nreconstruction_loglikelihood::Function=decoder_loglikelihood: A function that computes the log likelihood of the decoder's output.\nkl_divergence::Function=encoder_kl: A function that computes the KL divergence between the encoder's output and the prior.\nregularization::Union{Function, Nothing}=nothing: A function that computes the regularization term based on the VAE outputs. Should return a Float32.\nreg_strength::Float32=1.0f0: The strength of the regularization term.\nseed::Union{Nothing,Int}: The seed for the random number generator used for shuffling the latent codes. If not provided, a random seed will be used.\n\nReturns\n\nFloat32: The computed average loss value for the input x and its reconstructed counterparts over n_samples samples, including possible regularization terms and the mutual information constraint.\n\nNote\n\nThis function takes the vae and mi instances of an InfoMaxVAE model as separate arguments to be able to compute a gradient only with respect to the vae parameters.\nEnsure that the input data x match the expected input dimensionality for the encoder in the VAE.\nInfoMaxVAEs fully depend on batch training as the estimation of mutual information depends on shuffling the latent codes. This method works for large enough batches (≥ 64 samples).\n\n\n\n\n\ninfomaxloss(\n vae::VAE,\n mi::MutualInfoChain,\n x_in::AbstractArray,\n x_out::AbstractArray;\n β=1.0f0,\n α=1.0f0,\n n_samples::Int=1,\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n kl_divergence::Function=encoder_kl,\n regularization::Union{Function,Nothing}=nothing,\n reg_strength::Float32=1.0f0,\n seed::Union{Nothing,Int}=nothing\n)\n\nComputes the loss for an InfoMax variational autoencoder (VAE) with mutual information constraints, by averaging over n_samples latent space samples.\n\nThe loss function combines the reconstruction loss with the Kullback-Leibler (KL) divergence, the variational mutual information between input and latent representations, and possibly a regularization term, defined as:\n\nloss = -⟨log p(x|z)⟩ + β × Dₖₗ[qᵩ(z|x) || p(z)] - α × I(x;z) + regstrength × regterm\n\nWhere:\n\n⟨log p(x|z)⟩ is the expected log likelihood of the probabilistic decoder. -\n\nDₖₗ[qᵩ(z|x) || p(z)] is the KL divergence between the approximated encoder and the prior over the latent space.\n\nI(x;z) is the variational mutual information between the inputs x and the latent variables z.\n\nArguments\n\nvae::VAE: A VAE model with encoder and decoder networks.\nmi::MutualInfoChain: A MutualInfoChain instance used to estimate mutual information term.\nx_in::AbstractArray: Input matrix. The last dimension represents each data sample.\nx_out::AbstractArray: Output matrix against wich reconstructions are compared. The last dimension represents each data sample.\n\nOptional Keyword Arguments\n\nβ::Float32=1.0f0: Weighting factor for the KL-divergence term, used for annealing.\nα::Float32=1.0f0: Weighting factor for the mutual information term.\nn_samples::Int=1: The number of samples to draw from the latent space when computing the loss.\nreconstruction_loglikelihood::Function=decoder_loglikelihood: A function that computes the log likelihood of the decoder's output.\nkl_divergence::Function=encoder_kl: A function that computes the KL divergence between the encoder's output and the prior.\nregularization::Union{Function, Nothing}=nothing: A function that computes the regularization term based on the VAE outputs. Should return a Float32.\nreg_strength::Float32=1.0f0: The strength of the regularization term.\nseed::Union{Nothing,Int}: The seed for the random number generator used for shuffling the latent codes. If not provided, a random seed will be used.\n\nReturns\n\nFloat32: The computed average loss value for the input x and its reconstructed counterparts over n_samples samples, including possible regularization terms and the mutual information constraint.\n\nNote\n\nThis function takes the vae and mi instances of an InfoMaxVAE model as separate arguments to be able to compute a gradient only with respect to the vae parameters.\nEnsure that the input data x match the expected input dimensionality for the encoder in the VAE.\nInfoMaxVAEs fully depend on batch training as the estimation of mutual information depends on shuffling the latent codes. This method works for large enough batches (≥ 64 samples).\n\n\n\n\n\n","category":"function"},{"location":"infomaxvae/#Training","page":"InfoMax-VAE","title":"Training","text":"","category":"section"},{"location":"infomaxvae/","page":"InfoMax-VAE","title":"InfoMax-VAE","text":"AutoEncoderToolkit.InfoMaxVAEs.train!","category":"page"},{"location":"infomaxvae/#AutoEncoderToolkit.InfoMaxVAEs.train!","page":"InfoMax-VAE","title":"AutoEncoderToolkit.InfoMaxVAEs.train!","text":" train!(\n infomaxvae, x, opt; \n infomaxloss_function=infomaxloss,\n infomaxloss_kwargs, \n miloss_function=miloss, \n miloss_kwargs,\n loss_return::Bool=false,\n verbose::Bool=false\n )\n\nCustomized training function to update parameters of an InfoMax variational autoencoder (VAE) given a loss function of the specified form.\n\nThe InfoMax VAE loss function can be defined as:\n\nloss_infoMax = argmin -⟨log p(x|z)⟩ + β Dₖₗ(qᵩ(z) || p(z)) -\n α [⟨g(x, z)⟩ - ⟨exp(g(x, z) - 1)⟩],\n\nwhere ⟨log p(x|z)⟩ is the expected log likelihood of the probabilistic decoder, Dₖₗ[qᵩ(z) || p(z)] is the KL divergence between the approximated encoder distribution and the prior over the latent space, and g(x, z) is the output of the MutualInfoChain estimating the mutual information between the input data and the latent representation.\n\nThis function simultaneously optimizes two neural networks: the VAE itself and a multi-layer perceptron MutualInfoChain used to compute the mutual information between input and latent variables.\n\nArguments\n\ninfomaxvae::InfoMaxVAE: Struct containing the elements of an InfoMax VAE.\nx::AbstractArray: Matrix containing the data on which to evaluate the loss function. Each column represents a single data point.\nopt::NamedTuple: State of the optimizer for updating parameters. Typically initialized using Flux.Optimisers.update!.\n\nOptional Keyword arguments\n\ninfomaxloss_function::Function: The loss function to be used during training for the VAE, defaulting to infomaxloss.\ninfomaxloss_kwargs::Union{NamedTuple,Dict}: Additional keyword arguments to be passed to the VAE loss function.\nmiloss_function::Function: The loss function to be used during training for the MLP computing the variational free energy, defaulting to miloss.\nmiloss_kwargs::Union{NamedTuple,Dict}: Additional keyword arguments to be passed to the MutualInfoChain loss function.\nloss_return::Bool: If true, the function returns the loss values for the VAE and MutualInfoChain. Defaults to false.\nverbose::Bool: If true, the function prints the loss values for the VAE and MutualInfoChain. Defaults to false.\n\nDescription\n\nPerforms one step of gradient descent on the InfoMaxVAE loss function to jointly train the VAE and MutualInfoChain. The VAE parameters are updated to minimize the InfoMaxVAE loss, while the MutualInfoChain parameters are updated to maximize the estimated mutual information. The function allows for customization of loss hyperparameters during training.\n\nNotes\n\nEnsure that the dimensionality of the input data x aligns with the encoder's expected input in the VAE.\nInfoMaxVAEs fully depend on batch training as the estimation of mutual information depends on shuffling the latent codes. This method works best for large enough batches (≥ 64 samples).\n\n\n\n\n\n train!(\n infomaxvae, x, opt; \n infomaxloss_function=infomaxloss,\n infomaxloss_kwargs, \n miloss_function=miloss, \n miloss_kwargs,\n loss_return::Bool=false,\n verbose::Bool=false\n )\n\nCustomized training function to update parameters of an InfoMax variational autoencoder (VAE) given a loss function of the specified form.\n\nThe InfoMax VAE loss function can be defined as:\n\nloss_infoMax = argmin -⟨log p(x|z)⟩ + β Dₖₗ(qᵩ(z) || p(z)) -\n α [⟨g(x, z)⟩ - ⟨exp(g(x, z) - 1)⟩],\n\nwhere ⟨log p(x|z)⟩ is the expected log likelihood of the probabilistic decoder, Dₖₗ[qᵩ(z) || p(z)] is the KL divergence between the approximated encoder distribution and the prior over the latent space, and g(x, z) is the output of the MutualInfoChain estimating the mutual information between the input data and the latent representation.\n\nThis function simultaneously optimizes two neural networks: the VAE itself and a multi-layer perceptron MutualInfoChain used to compute the mutual information between input and latent variables.\n\nArguments\n\ninfomaxvae::InfoMaxVAE: Struct containing the elements of an InfoMax VAE.\nx::AbstractArray: Matrix containing the data on which to evaluate the loss function. Each column represents a single data point.\nopt::NamedTuple: State of the optimizer for updating parameters. Typically initialized using Flux.Optimisers.update!.\n\nOptional Keyword arguments\n\ninfomaxloss_function::Function: The loss function to be used during training for the VAE, defaulting to infomaxloss.\ninfomaxloss_kwargs::Union{NamedTuple,Dict}: Additional keyword arguments to be passed to the VAE loss function.\nmiloss_function::Function: The loss function to be used during training for the MutualInfoChain computing the variational free energy, defaulting to miloss.\nmiloss_kwargs::Union{NamedTuple,Dict}: Additional keyword arguments to be passed to the MutualInfoChain loss function.\nloss_return::Bool: If true, the function returns the loss values for the VAE and MLP. Defaults to false.\n\nDescription\n\nPerforms one step of gradient descent on the InfoMaxVAE loss function to jointly train the VAE and MutualInfoChain. The VAE parameters are updated to minimize the InfoMaxVAE loss, while the MutualInfoChain parameters are updated to maximize the estimated mutual information. The function allows for customization of loss hyperparameters during training.\n\nNotes\n\nEnsure that the dimensionality of the input data x aligns with the encoder's expected input in the VAE.\nInfoMaxVAEs fully depend on batch training as the estimation of mutual information depends on shuffling the latent codes. This method works best for large enough batches (≥ 64 samples).\n\n\n\n\n\n","category":"function"},{"location":"infomaxvae/#Other-Functions","page":"InfoMax-VAE","title":"Other Functions","text":"","category":"section"},{"location":"infomaxvae/","page":"InfoMax-VAE","title":"InfoMax-VAE","text":"AutoEncoderToolkit.InfoMaxVAEs.shuffle_latent\nAutoEncoderToolkit.InfoMaxVAEs.variational_mutual_info","category":"page"},{"location":"infomaxvae/#AutoEncoderToolkit.InfoMaxVAEs.shuffle_latent","page":"InfoMax-VAE","title":"AutoEncoderToolkit.InfoMaxVAEs.shuffle_latent","text":"shuffle_latent(z::AbstractMatrix, seed::Int=Random.seed!())\n\nShuffle the elements of the second dimension of a matrix representing latent space points.\n\nArguments\n\nz::AbstractMatrix: A matrix representing latent codes. Each column corresponds to a single latent code.\n\nOptional Keyword Arguments\n\nseed::Union{Nothing, Int}: Optional argument. The seed for the random number generator. If not provided, a random seed will be used.\n\nReturns\n\nAbstractMatrix: A new matrix with the second dimension shuffled.\n\n\n\n\n\n","category":"function"},{"location":"infomaxvae/#AutoEncoderToolkit.InfoMaxVAEs.variational_mutual_info","page":"InfoMax-VAE","title":"AutoEncoderToolkit.InfoMaxVAEs.variational_mutual_info","text":"variational_mutual_info(mi, x, z, z_shuffle)\n\nCompute a variational approximation of the mutual information between the input x and the latent code z using a MutualInfoChain. Note that this estimate requires shuffling the latent codes between data samples. Therefore, it only applies to batch data cases. A single sample will not provide a meaningful estimate.\n\nArguments\n\nmi::MutualInfoChain: A MutualInfoChain instance used to estimate mutual information.\nx::AbstractArray: Array of input data. The last dimension represents each data sample.\nz::AbstractMatrix: Matrix of corresponding latent representations of the input data.\nz_shuffle::AbstractMatrix: Matrix of latent representations where the second dimension has been shuffled.\n\nReturns\n\nFloat32: An approximation of the mutual information between the input data and its corresponding latent representation.\n\nReferences\n\nRezaabad, A. L. & Vishwanath, S. Learning Representations by Maximizing Mutual Information in Variational Autoencoders. Preprint at http://arxiv.org/abs/1912.13361 (2020).\n\n\n\n\n\nvariational_mutual_info(infomaxvae, x, z, z_shuffle)\n\nCompute a variational approximation of the mutual information between the input x and the latent code z using an InfoMaxVAE instance. Note that this estimate requires shuffling the latent codes between data samples. Therefore, it only applies to batch data cases. A single sample will not provide a meaningful estimate.\n\nArguments\n\ninfomaxvae::InfoMaxVAE: An InfoMaxVAE instance used to estimate mutual information.\nx::AbstractArray: Array of input data. The last dimension represents each data sample.\nz::AbstractMatrix: Matrix of corresponding latent representations of the input data.\nz_shuffle::AbstractMatrix: Matrix of latent representations where the second dimension has been shuffled.\n\nReturns\n\nFloat32: An approximation of the mutual information between the input data and its corresponding latent representation.\n\nReferences\n\nRezaabad, A. L. & Vishwanath, S. Learning Representations by Maximizing Mutual Information in Variational Autoencoders. Preprint at http://arxiv.org/abs/1912.13361 (2020).\n\n\n\n\n\nvariational_mutual_info(\n infomaxvae::InfoMaxVAE,\n x::AbstractArray;\n seed::Union{Nothing,Int}=nothing\n)\n\nCompute a variational approximation of the mutual information between the input x and the latent code z using an InfoMaxVAE instance. This function also shuffles the latent codes between data samples to provide a meaningful estimate even for a single data sample.\n\nArguments\n\ninfomaxvae::InfoMaxVAE: An InfoMaxVAE instance used to estimate mutual information.\nx::AbstractArray: Array of input data. The last dimension represents each data sample.\n\nOptional Keyword Arguments\n\nseed::Union{Nothing,Int}: Optional argument. The seed for the random number generator used for shuffling the latent codes. If not provided, a random seed will be used.\n\nReturns\n\nFloat32: An approximation of the mutual information between the input data and its corresponding latent representation.\n\nReferences\n\nRezaabad, A. L. & Vishwanath, S. Learning Representations by Maximizing Mutual Information in Variational Autoencoders. Preprint at http://arxiv.org/abs/1912.13361 (2020).\n\n\n\n\n\n","category":"function"},{"location":"infomaxvae/#Default-initializations","page":"InfoMax-VAE","title":"Default initializations","text":"","category":"section"},{"location":"infomaxvae/","page":"InfoMax-VAE","title":"InfoMax-VAE","text":"AutoEncoderToolkit.jl provides default initializations for the MutualInfoChain. Although it gives the user less flexibility, it can be useful for quick prototyping.","category":"page"},{"location":"infomaxvae/","page":"InfoMax-VAE","title":"InfoMax-VAE","text":"AutoEncoderToolkit.InfoMaxVAEs.MutualInfoChain(\n ::Union{Int,Vector{<:Int}},\n ::Int,\n ::Vector{<:Int},\n ::Vector{<:Function},\n ::Function;\n)","category":"page"},{"location":"infomaxvae/#AutoEncoderToolkit.InfoMaxVAEs.MutualInfoChain-Tuple{Union{Int64, Vector{<:Int64}}, Int64, Vector{<:Int64}, Vector{<:Function}, Function}","page":"InfoMax-VAE","title":"AutoEncoderToolkit.InfoMaxVAEs.MutualInfoChain","text":"MutualInfoChain(\n size_input::Union{Int,Vector{<:Int}},\n n_latent::Int,\n mlp_neurons::Vector{<:Int},\n mlp_activations::Vector{<:Function},\n output_activation::Function;\n init::Function = Flux.glorot_uniform\n)\n\nConstructs a default MutualInfoChain. \n\nArguments\n\nn_input::Int: Number of input features to the MutualInfoChain.\nn_latent::Int: The dimensionality of the latent space.\nmlp_neurons::Vector{<:Int}: A vector of integers where each element represents the number of neurons in the corresponding hidden layer of the MLP.\nmlp_activations::Vector{<:Function}: A vector of activation functions to be used in the hidden layers. Length must match that of mlp_neurons.\noutput_activation::Function: Activation function for the output neuron of the MLP.\n\nOptional Keyword Arguments\n\ninit::Function: Initialization function for the weights of all layers in the MutualInfoChain. Defaults to Flux.glorot_uniform.\n\nReturns\n\nMutualInfoChain: A MutualInfoChain instance with the specified MLP architecture.\n\nNotes\n\nThe function will throw an error if the number of provided activation functions does not match the number of layers specified in mlp_neurons.\n\n\n\n\n\n","category":"method"},{"location":"quickstart/#Quick-Start","page":"Quick Start","title":"Quick Start","text":"","category":"section"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"warning: Warning\nIn this guide we will use external packages with functions not directly related to AutoEncoderToolkit.jl. such as Flux.jl and MLDatasets.jl. Make sure to install them before running the code if you want to follow along.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"For this quick start guide, we will prepare different autoencoders to be trained on a fraction of the MNIST dataset. Let us begin by importing the necessary packages.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"note: Note\nWe prefer to load functions using the import keyword instead of using. This is a personal preference and you can use using if you prefer.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Import project package\nimport AutoEncoderToolkit as AET\n\n# Import ML libraries\nimport Flux\n\n# Import library to load MNIST dataset\nusing MLDatasets: MNIST\n\n# Import library to save models\nimport JLD2","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Now that we have imported the necessary packages, we can load the MNIST dataset. For this specific example, we will only use digits 0, 1, and 2, taking 10 batches of 64 samples each. We will also use 2 batches with the same number of samples for validation.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Define number of samples in batch\nn_batch = 64\n# Define total number of data points\nn_data = n_batch * 10\n# Define number of validation data points\nn_val = n_batch * 2\n\n# Define lables to keep\ndigit_label = [0, 1, 2]\n\n# Load data and labels\ndata, labels = MNIST.traindata(\n ; dir=\"your_own_custom_path/data/mnist\"\n)\n\n# Keep only data with labels in digit_label\ndata_filt = dataset.features[:, :, dataset.targets.∈Ref(digit_label)]\nlabels_filt = dataset.targets[dataset.targets.∈Ref(digit_label)]\n\n# Reduce size of training data and reshape to WHCN format\ntrain_data = Float32.(reshape(data_filt[:, :, 1:n_data], (28, 28, 1, n_data)))\ntrain_labels = labels_filt[1:n_data]\n\n# Reduce size of validation data and reshape to WHCN format\nval_data = Float32.(\n reshape(data_filt[:, :, n_data+1:n_data+n_val], (28, 28, 1, n_val))\n)\nval_labels = labels_filt[n_data+1:n_data+n_val]","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Furthermore, for this particular example, we will use a binarized version of the MNIST dataset. This means that we will convert the pixel values to either 0 or 1.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Define threshold for binarization\nthresh = 0.5\n\n# Binarize training data\ntrain_data = Float32.(train_data .> thresh)\n\n# Binarize validation data\nval_data = Float32.(val_data .> thresh)","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Let's look at some of the binarized data.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"(Image: )","category":"page"},{"location":"quickstart/#Define-Encoder-and-Decoder","page":"Quick Start","title":"Define Encoder and Decoder","text":"","category":"section"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"tip: Tip\nFor this walkthrough, we will define the layers of the encoder and decoder by hand. But, for other cases, make sure to check the default initializers in the Encoders and Decoders section.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"With the data in hand, let us define the encoder and decoder for the variational autoencoder. The encoder will be a simple convolutional network with two convolutional layers and a latent dimensionality of 2. Since we will use the JointGaussianLogEncoder type that defines the encoder as a Gaussian distribution with diagonal covariance, returning the mean and log standard deviation, we also need to define two dense layers that map the output of the convolutional to the latent space.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"In this definition we will use functions from the Flux package to define the the convolutional layers and the dense layers. We will also use the custom Flatten layer from AutoEncoderToolkit.jl to flatten the output of the last convolutional layer before passing it to the dense layers.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Define dimensionality of latent space\nn_latent = 2\n\n# Define number of initial channels\nn_channels_init = 32\n\nprintln(\"Defining encoder...\")\n# Define convolutional layers\nconv_layers = Flux.Chain(\n # First convolutional layer\n Flux.Conv((4, 4), 1 => n_channels_init, Flux.relu; stride=2, pad=1),\n # Second convolutional layer\n Flux.Conv(\n (4, 4), n_channels_init => n_channels_init * 2, Flux.relu;\n stride=2, pad=1\n ),\n # Flatten the output\n AET.Flatten(),\n # Add extra dense layer 1\n Flux.Dense(n_channels_init * 2 * 7 * 7 => 256, Flux.relu),\n # Add extra dense layer 2\n Flux.Dense(256 => 256, Flux.relu),\n)\n\n# Define layers for µ and log(σ)\nµ_layer = Flux.Dense(256, n_latent, Flux.identity)\nlogσ_layer = Flux.Dense(256, n_latent, Flux.identity)\n\n# build encoder\nencoder = AET.JointGaussianLogEncoder(conv_layers, µ_layer, logσ_layer)","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"note: Note\nThe Flatten layer is a custom layer defined in AutoEncoderToolkit.jl that flattens the output into a 1D vector. This flattening operation is necessary because the output of the convolutional layers is a 4D tensor, while the input to the µ and log(σ) layers is a 1D vector. The custom layer is needed to be able to save the model and load it later as BSON and JLD2 do not play well with anonymous functions.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"In the same way, the decoder will be a simple deconvolutional network with two deconvolutional layers. Given the binary nature of the MNIST dataset we are using, the probability distribution that makes sense to use in the decoder is a Bernoulli distribution. We will therfore define the decoder as a BernoulliDecoder type. This means that the output of the decoder must be a value between 0 and 1. ","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Define deconvolutional layers\ndeconv_layers = Flux.Chain(\n # Define linear layer out of latent space\n Flux.Dense(n_latent => 256, Flux.identity),\n # Add extra dense layer\n Flux.Dense(256 => 256, Flux.relu),\n # Add extra dense layer to map to initial number of channels\n Flux.Dense(256 => n_channels_init * 2 * 7 * 7, Flux.relu),\n # Unflatten input using custom Reshape layer\n AET.Reshape(7, 7, n_channels_init * 2, :),\n # First transposed convolutional layer\n Flux.ConvTranspose(\n (4, 4), n_channels_init * 2 => n_channels_init, Flux.relu; \n stride=2, pad=1\n ),\n # Second transposed convolutional layer\n Flux.ConvTranspose(\n (4, 4), n_channels_init => 1, Flux.sigmoid_fast; stride=2, pad=1\n ),\n)\n\n# Define decoder\ndecoder = AET.BernoulliDecoder(deconv_layers)","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"note: Note\nSimilar to the Flatten custom layer, the Reshape layer is used to reshape the output of the deconvolutional layers to the correct dimensions. This custom layer plays along with the BSON and JLD2 libraries.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Alternatively, if we hadn't binarized the data, a Gaussian distribution would be a more appropriate choice for the decoder. In that case, we could define the decoder as a SimpleGaussianDecoder using the same deconv_layers as above. This would change the probabilistic function associated with the decoder from the Bernoulli to a Gaussian distribution with constant diagonal covariance. But, everything else that follows would remain the same. That's the power of Julias multiple dispatch and the AutoEncoderToolkit.jl's design!","category":"page"},{"location":"quickstart/#VAE-Model","page":"Quick Start","title":"VAE Model","text":"","category":"section"},{"location":"quickstart/#Defining-VAE-Model","page":"Quick Start","title":"Defining VAE Model","text":"","category":"section"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"With the encoder and decoder in hand, defining a variational autoencoder model is as simple as writing:","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Define VAE model\nvae = encoder * decoder","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"If we wish so, at this point we can save the model architecture and the initial state to disk using the JLD2 package.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Save model object\nJLD2.save(\n \"./output/model.jld2\",\n Dict(\"model\" => vae, \"model_state\" => Flux.state(vae))\n)","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"tip: Tip\nTo proceed the training on a CUDA-compatible device, all we need to do is to move the model and the data to the device. This can be done asusing CUDA\n# Move model to GPU\nvae = vae |> Flux.gpu\n# Move data to GPU\ntrain_data = train_data |> Flux.gpu\nval_data = val_data |> Flux.gpuEverything else will remain the same, except for the partition of data into batches. This should be preferentially done by hand rather than using the Flux.DataLoader functionality. NOTE: Flux.jl offers support for other devices as well. But AutoEncoderToolkit.jl has not been tested with them. So, if you want to use other devices, make sure to test it first. PRs to add support for other devices are welcome!","category":"page"},{"location":"quickstart/#Training-VAE-Model","page":"Quick Start","title":"Training VAE Model","text":"","category":"section"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"We are now ready to train the model. First, we partition the training data into batches","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Partition data into batches\ntrain_loader = Flux.DataLoader(train_data, batchsize=n_batch, shuffle=true)","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Next, we define the optimizer. For this example, we will use the ADAM optimizer with a learning rate of 1e-3.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Define learning rate\nη = 1e-3\n# Explicit setup of optimizer\nopt_vae = Flux.Train.setup(\n Flux.Optimisers.Adam(η),\n vae\n)","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Finally, we can train the model.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"note: Note\nMost of the code below is used to compute and store diagnostics of the training process. The core of the training loop is very simple thanks to the custom training function provided by AutoEncoderToolkit.jl.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Initialize arrays to save loss, entropy, and MSE\ntrain_loss = Array{Float32}(undef, n_epoch)\nval_loss = Array{Float32}(undef, n_epoch)\ntrain_entropy = Array{Float32}(undef, n_epoch)\nval_entropy = Array{Float32}(undef, n_epoch)\ntrain_mse = Array{Float32}(undef, n_epoch)\nval_mse = Array{Float32}(undef, n_epoch)\n\n# Loop through epochs\nfor epoch in 1:n_epoch\n println(\"Epoch: $(epoch)\\n\")\n # Loop through batches\n for (i, x) in enumerate(train_loader)\n println(\"Epoch: $(epoch) | Batch: $(i) / $(length(train_loader))\")\n # Train VAE\n AET.VAEs.train!(vae, x, opt_vae)\n end # for train_loader\n\n # Compute loss in training data\n train_loss[epoch] = AET.VAEs.loss(vae, train_data)\n # Compute loss in validation data\n val_loss[epoch] = AET.VAEs.loss(vae, val_data)\n\n # Forward pass training data\n train_outputs = vae(train_data)\n # Compute cross-entropy\n train_entropy[epoch] = Flux.Losses.logitbinarycrossentropy(\n train_outputs.p, train_data\n )\n # Compute MSE for training data\n train_mse[epoch] = Flux.mse(train_outputs.p, train_data)\n\n # Forward pass training data\n val_outputs = vae(val_data)\n # Compute cross-entropy\n val_entropy[epoch] = Flux.Losses.logitbinarycrossentropy(\n val_outputs.p, val_data\n )\n # Compute MSE for validation data\n val_mse[epoch] = Flux.mse(val_outputs.p, val_data)\n\n println(\n \"Epoch: $(epoch) / $(n_epoch)\\n \" *\n \"- train_mse: $(train_mse[epoch])\\n \" *\n \"- val_mse: $(val_mse[epoch])\\n \" *\n \"- train_loss: $(train_loss[epoch])\\n \" *\n \"- val_loss: $(val_loss[epoch])\\n \" *\n \"- train_entropy: $(train_entropy[epoch])\\n \" *\n \"- val_entropy: $(val_entropy[epoch])\\n\"\n )\nend # for n_epoch","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"tip: Tip\nTo convert this vanilla VAE into a β-VAE, all we need to do is add an optional keyword argument β to the loss function. This would be then fed to the train! function as follows:# Define loss keyword argument as dictionary\nloss_kwargs = Dict(\"β\" => 0.1)\n# Train model using β-VAE\nAET.VAEs.train!(vae, x, opt_vae; loss_kwargs=loss_kwargs)This argument defines the relative weight of the KL divergence term in the loss function.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"That's it! We have trained a variational autoencoder on the MNIST dataset. We can store the model and the training diagnostics to disk using the JLD2.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Store model and diagnostics\nJLD2.jldsave(\n \"./output/vae_epoch$(lpad(n_epoch, 4, \"0\")).jld2\",\n model_state=Flux.state(vae),\n train_entropy=train_entropy,\n train_loss=train_loss,\n train_mse=train_mse,\n val_entropy=val_entropy,\n val_mse=val_mse,\n val_loss=val_loss,\n)","category":"page"},{"location":"quickstart/#Exploring-the-results","page":"Quick Start","title":"Exploring the results","text":"","category":"section"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"warning: Warning\nFor the plots below, we do not provide the code to generate them. We assume the user is familiar with plotting in Julia. If you are not, we recommend checking the Makie.jl documentation.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Let's look at the training diagnostics to see how the training went.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"(Image: )","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"We can see that the training loss, the cross-entropy, and the mean squared error decreased as the training progressed on both the training and validation data.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Next, let's look at the resulting latent space. In particular, let's encode the training data and plot the coordinates in the latent space. To encode the data we have two options:","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Directly encode the data using the encoder. This returns a NamedTuple, where for our JointGaussianLogEncoder the fields are μ and logσ.\n# Map training data to latent space\ntrain_latent = vae.encoder(train_data)\nWe could take as the latent space coordinates the mean of the distribution.\nPerform the forward pass of the VAE model with the optional keyword argument latent=true. This returns a NamedTuple with the fields encoder, decoder, and z. The z field contains the sampled latent space coordinates obtained when performing the reparameterization trick.\ntrain_outputs = vae(train_data; latent=true)","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Let's now look ath the resulting coordinates in latent space.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"(Image: )","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Finally, one of the most attractive features of variational autoencoders is their generative capabilities. To assess this, we can sample from the latent space prior and decode the samples to generate new data. Let's generate some samples and plot them.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Define number of samples\nn_samples = 6\n\n# Sample from prior\nRandom.seed!(42)\nprior_samples = Random.randn(n_latent, n_samples)\n\n# Decode samples\ndecoder_output = vae.decoder(prior_samples).p","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"(Image: )","category":"page"},{"location":"quickstart/#InfoMaxVAE-Model","page":"Quick Start","title":"InfoMaxVAE Model","text":"","category":"section"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Let's now proceed to train an InfoMaxVAE model. This model is a variational autoencoder that includes a term in the loss function to maximize a variational approximation of the mutual information between the latent space and the input data. This variational approximation of the mutual information is parametrized by a neural network that is trained jointly with the encoder and decoder. Thus, the InfoMaxVAE object takes as input a VAE model as well as a MutualInfoChain object that defines the multi-layer perceptron used to compute the mutual information. Since we can use the exact same VAE model we defined earlier, all we need to do is define the MutualInfoChain object to build the InfoMaxVAE model.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"note: Note\nMake sure to check the documentation for the MutualInfoChain to know the requirements for this object. The main thing for us in this example is that since the data input is a 4D tensor, we need a custom layer to flatten the output of the encoder before passing it to the multi-layer perceptron. Furthermore, the output of the multi-layer perceptron must be a scalar.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Define MutualInfochain elements\n\ndata_layer = Flux.Chain(\n AET.Flatten(),\n Flux.Dense(28 * 28 => 28 * 28, Flux.identity),\n)\n\nlatent_layer = Flux.Dense(n_latent => n_latent, Flux.identity)\n\nmlp = Flux.Chain(\n Flux.Dense(28 * 28 + n_latent => 256, Flux.relu),\n Flux.Dense(256 => 256, Flux.relu),\n Flux.Dense(256 => 256, Flux.relu),\n Flux.Dense(256 => 1, Flux.identity),\n)\n\n# Define MutualInfochain\nmi = AET.InfoMaxVAEs.MutualInfoChain(data_layer, latent_layer, mlp)","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Next, we put together the VAE model and the MutualInfoChain to define the InfoMaxVAE model.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Define InfoMaxVAE model\ninfomaxvae = AET.InfoMaxVAEs.InfoMaxVAE(encoder * decoder, mi)","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"The InfoMaxVAE model has two loss functions: one for the mutual information and one for the VAE. But this is internally handled by the InfoMaxVAEs.train! function. So, training the model is as simple as training the VAE model.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"tip: Tip\nNotice that we can pass additional keyword arguments to the train! function as keyword arguments for either the miloss or the infomaxloss. In this case, we will pass the hyperparameters α and β to weigh the mutual information term significantly more than the KL divergence term.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Explicit setup of optimizer\nopt_infomaxvae = Flux.Train.setup(\n Flux.Optimisers.Adam(η),\n infomaxvae\n)\n\n# Define infomaxloss function kwargs\nloss_kwargs = Dict(:α => 10.0f0, :β => 1.0f0,)\n\n# Loop through epochs\nfor epoch in 1:n_epoch\n println(\"Epoch: $(epoch)\\n\")\n # Loop through batches\n for (i, x) in enumerate(train_loader)\n println(\"Epoch: $(epoch) | Batch: $(i) / $(length(train_loader))\")\n # Train RHVAE\n AET.InfoMaxVAEs.train!(\n infomaxvae, x, opt_infomaxvae; infomaxloss_kwargs=loss_kwargs\n )\n end # for train_loader\nend # for n_epoch","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Notice that we only needed to define the MutualInfoChain object and we were ready to train the InfoMaxVAE model. This is the power of the design of AutoEncoderToolkit.jl!","category":"page"},{"location":"quickstart/#Exploring-the-results-2","page":"Quick Start","title":"Exploring the results","text":"","category":"section"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Let's now look ath the resulting coordinates in latent space after 100 epochs of training.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"(Image: )","category":"page"},{"location":"quickstart/#RHVAE-Model","page":"Quick Start","title":"RHVAE Model","text":"","category":"section"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Let's now train a RHVAE model. The process is very similar to the VAE model with the main difference that the RHVAE type has some extra requirements. Let's quickly look at the docstring for this type. In particular, let's look at the docstring for the default constructor.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"RHVAE(\n vae::VAE, \n metric_chain::MetricChain, \n centroids_data::AbstractArray, \n T::Number, \n λ::Number\n )\n\n Construct a Riemannian Hamiltonian Variational Autoencoder (RHVAE) from a standard VAE and a metric chain.\n\n Arguments\n ≡≡≡≡≡≡≡≡≡\n\n • vae::VAE: A standard Variational Autoencoder (VAE) model.\n\n • metric_chain::MetricChain: A chain of metrics to be used for the Riemannian Hamiltonian Monte Carlo (RHMC) sampler.\n\n • centroids_data::AbstractArray: An array of data centroids. Each column represents a centroid. N is a subtype of Number.\n\n • T::N: The temperature parameter for the inverse metric tensor. N is a subtype of Number.\n\n • λ::N: The regularization parameter for the inverse metric tensor. N is a subtype of Number.\n\n Returns\n ≡≡≡≡≡≡≡\n\n • A new RHVAE object.\n\n Description\n ≡≡≡≡≡≡≡≡≡≡≡\n\n The constructor initializes the latent centroids and the metric tensor M to their default values. The latent centroids are initialized to a zero matrix of\n the same size as centroids_data, and M is initialized to a 3D array of identity matrices, one for each centroid.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"From this we can see that we need to provide a VAE model–we can use the same model we defined earlier–a MetricChain type, an array of centroids, and two hyperparameters T and λ. The MetricChain type is another multi-layer perceptron specifically used to compute a lower-triangular matrix used for the metric tensor for the Riemannian manifold fit to the latent space. More specifically, when training an RHVAE model, the inverse of the metric tensor is also learned. This inverse metric tensor mathbfG^-1(z) is of the form","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"mathbfG^-1(z)=sum_i=1^N L_psi_i L_psi_i^top exp left(-fracleftz-c_iright_2^2T^2right)+lambda I_d\ntag1","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"where L_psi_i equiv L_psi_i(x) is the lower-triangular matrix computed by the MetricChain type given the corresponding data input x associated with the latent coordinate z. c_i is one of the N centroids in latent space used as anchoring points for the metric tensor. The hyperparameters T and lambda are used to control the temperature of the inverse metric tensor and an additional regularization term, respectively.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Looking at the requirements for MetricChain we see three components:","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"An mlp field that is a multi-layer perceptron.\nA diag field that is a dense layers used to compute the diagonal of the lower triangular matrix returned by MetricChain.\na lower field that is a dense layer used to compute the elements below the diagonal of the lower triangular matrix.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Let's define these elements and build the MetricChain.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"warning: Warning\nFor MetricChain to build a proper lower triangular matrix, the diag layer must return the same dimensionality as the latent space. The lower layer must return the number of elements in the lower triangular matrix below the diagonal. This is given by n_latent * (n_latent - 1) ÷ 2.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Define convolutional layers\nmlp_conv_layers = Flux.Chain(\n # Flatten the input using custom Flatten layer\n AET.Flatten(),\n # First layer\n Flux.Dense(28 * 28 => 256, Flux.relu),\n # Second layer\n Flux.Dense(256 => 256, Flux.relu),\n # Third layer\n Flux.Dense(256 => 256, Flux.relu),\n)\n\n# Define layers for the diagonal and lower triangular part of the covariance\n# matrix\ndiag = Flux.Dense(256 => n_latent, Flux.identity)\nlower = Flux.Dense(256 => n_latent * (n_latent - 1) ÷ 2, Flux.identity)\n\n# Build metric chain\nmetric_chain = AET.RHVAEs.MetricChain(mlp_conv_layers, diag, lower)","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Next, we need to define the centroids. These are the c_i in equation (1) used as anchoring points for the metric tensor. Their latent space coordinates will be updated as the model trains, but the corresponding data points must be fixed. In a way, these centroids is a subset of the data used to define the RHVAE structure itself. One possibility is to use the entire training data as centroids. But this can get computationally very expensive. Instead, we can use either k-means or k-medoids to define a smaller set of centroids. For this, AutoEncoderToolkit.jl provides functions to select these centroids.. For this example, we will use k-medoids to define the centroids.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Define number of centroids\nn_centroids = 64 \n\n# Select centroids via k-medoids\ncentroids_data = AET.utils.centroids_kmedoids(train_data, n_centroids)","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Finally, we are just missing the hyperparameters T and λ, and we can then define the RHVAE model.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"note: Note\nHere we are using the same vae model we defined earlier assuming it hasn't been previously trained. If it has been trained, we could load it from disk.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Define RHVAE hyper-parameters\nT = 0.4f0 # Temperature\nλ = 1.0f-2 # Regularization parameter\n\n# Define RHVAE model\nrhvae = AET.RHVAEs.RHVAE(vae, metric_chain, centroids_data, T, λ)","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"The RHVAE struct stores three elements for which no gradients are computed. Specifically, the elements","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"• centroids_latent::Matrix: A matrix where each column represents a centroid cᵢ in the inverse metric computation.\n• L::Array{<:Number, 3}: A 3D array where each slice represents a L_ψᵢ matrix.\n• M::Array{<:Number, 3}: A 3D array where each slice represents a Lψᵢ Lψᵢᵀ.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"used to compute the inverse metric tensor are not updated with gradients. Instead, they are updated using the update_metric! function. So, before training the model, we can update these elements.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Update metric tensor elements\nAET.RHVAEs.update_metric!(rhvae)","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"warning: Warning\nEvery time you load an RHVAE model from disk, you need to update the metric as shown above such that all parameters in the model are properly initialized.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Now, we are ready to train the RHVAE model. Setting the training process is very similar to the VAE model. Make sure to look at the documentation for the RHVAE type to understand the additional keyword arguments that can be passed to the loss function.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Define loss function hyper-parameters\nϵ = Float32(1E-4) # Leapfrog step size\nK = 5 # Number of leapfrog steps\nβₒ = 0.3f0 # Initial temperature for tempering\n\n# Define loss function hyper-parameters\nloss_kwargs = Dict(\n :K => K,\n :ϵ => ϵ,\n :βₒ => βₒ,\n)\n\n# Explicit setup of optimizer\nopt_rhvae = Flux.Train.setup(\n Flux.Optimisers.Adam(η),\n rhvae\n)\n\n# Define number of epochs\nn_epoch = 20\n\n# Loop through epochs\nfor epoch in 1:n_epoch\n println(\"Epoch: $(epoch)\\n\")\n # Loop through batches\n for (i, x) in enumerate(train_loader)\n println(\"Epoch: $(epoch) | Batch: $(i) / $(length(train_loader))\")\n # Train VAE\n AET.RHVAEs.train!(rhvae, x, opt_rhvae; loss_kwargs=loss_kwargs)\n end # for train_loader\nend # for n_epoch","category":"page"},{"location":"quickstart/#Exploring-the-results-3","page":"Quick Start","title":"Exploring the results","text":"","category":"section"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"note: Note\nFor the example above, we only trained the RHVAE model for 20 epochs.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Let's look at the resulting latent space encoding the training data.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"(Image: )","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Even for 20 epochs the latent space is already showing a clear separation of the different classes. This is a clear indication that the RHVAE model is learning a good representation of the data.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"One of the most attractive features of the RHVAE model is the ability to learn a Riemannian metric on the latent space. This means that we have a position-dependent measurement of how deformed the latent space is. We can visualize a proxy for this metric by computing the so-called volume measure sqrtdet(mathbfG(z)) for each point in the latent space. Let's compute this for a grid of points in the latent space and plot it as a background for the latent space.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Define number of points per axis\nn_points = 250\n\n# Define range of latent space\nlatent_range_z1 = Float32.(range(-5, 4.5, length=n_points))\nlatent_range_z2 = Float32.(range(-3.5, 6.5, length=n_points))\n\n# Define latent points to evaluate\nz_mat = reduce(hcat, [[x, y] for x in latent_range_z1, y in latent_range_z2])\n\n# Compute inverse metric tensor\nGinv = AET.RHVAEs.G_inv(z_mat, rhvae)\n\n# Compute log determinant of metric tensor\nlogdetG = reshape(-1 / 2 * AET.utils.slogdet(Ginv), n_points, n_points)","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"(Image: )","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"In the next section we will explore how to use this geometric information to compute the geodesic distance between points in the latent space.","category":"page"},{"location":"quickstart/#Differential-Geometry-of-RHVAE-model","page":"Quick Start","title":"Differential Geometry of RHVAE model","text":"","category":"section"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"The RHVAE model is a powerful tool to learn a Riemannian metric on the latent space. Having this metric allows us to compute distances between points, and even to perform geodesic interpolation between points. What this means is that as the model trains, the notion of distance between points in the latent space might not be the same as the Euclidean distance. Instead, the model learns a function that tells us how to measure distances in the latent space. We can use this function to compute the shortest path between two points. This is what is called a geodesic.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"AutoEncoderToolkit.jl provides a set of functions to compute the geodesic between points in latent space. In particular, a geodesic is a function that connects two points in the latent space such that the distance between them is minimized. Since we do not know the exact form of the geodesic, we can again make use of the power of neural networks to approximate it. The NeuralGeodesics submodule from the diffgeo module provides this functionality. The first step consits of defining a neural network that will approximate the path between two points. The NeuralGeodesic type takes three arguments:","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"A multi-layer perceptron that will approximate the path. This should have a single input–the time being a number between zero and 1–and the dimensionality of the output should be the same as the dimensionality of the latent space.\nThe initial point in the latent space for the path.\nThe final point in the latent space for the path.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Let's define this NeuralGeodesic network.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Import NeuralGeoedesics submodule\nimport AutoEncoderToolkit.diffgeo.NeuralGeodesics as NG\n\n# Define initial and final point for geometric path\nz_init = [-3.0f0, 5.0f0]\nz_end = [2.0f0, -2.0f0]\n\n# Extract dimensionality of latent space\nldim = size(rhvae.centroids_latent, 1)\n# Define number of neurons in hidden layers\nn_neuron = 16\n\n# Define mlp chain\nmlp_chain = Flux.Chain(\n # First layer\n Flux.Dense(1 => n_neuron, Flux.identity),\n # Second layer\n Flux.Dense(n_neuron => n_neuron, Flux.tanh_fast),\n # Third layer\n Flux.Dense(n_neuron => n_neuron, Flux.tanh_fast),\n # Fourth layer\n Flux.Dense(n_neuron => n_neuron, Flux.tanh_fast),\n # Output layer\n Flux.Dense(n_neuron => ldim, Flux.identity)\n)\n\n# Define NeuralGeodesic\nnng = NG.NeuralGeodesic(mlp_chain, z_init, z_end)","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"tip: Tip\nEmpirically, we have found that the activation functions in the hidden layers should not be unbounded. Thus, we recommend using tanh or sigmoid.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Next, we define the hyperparameters for the optimization of the neural network. In particular, we will sample 50 time points uniformly distributed between 0 and 1 to sample the path. We will train the network for 50,000 epochs using the Adam optimizer with a learning rate of 1e-5.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Define learning rate\nη = 10^-5\n# Define number of time points to sample\nn_time = 50\n# Define number of epochs\nn_epoch = 50_000\n# Define frequency with which to save model output\nn_save = 10_000\n\n# Define time points\nt_array = Float32.(collect(range(0, 1, length=n_time)))\n\n# Explicit setup of optimizer\nopt_nng = Flux.Train.setup(\n Flux.Optimisers.Adam(η),\n nng\n)","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"With this in hand, we are ready to train the network. We will save several outputs of the network to visualize the path as it is being trained.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"# Initialize empty array to save loss\nnng_loss = Vector{Float32}(undef, n_epoch)\n\n# Initialize array to save examples\nnng_ex = Array{Float32}(undef, ldim, length(t_array), n_epoch ÷ n_save + 1)\n\n# Save initial curve\nnng_ex[:, :, 1] = nng(t_array)\n# Loop through epochs\nfor epoch in 1:n_epoch\n # Train model and save loss\n nng_loss[epoch] = NG.train!(nng, rhvae, t_array, opt_nng; loss_return=true)\n # Check if model should be saved\n if epoch % n_save == 0\n # Save model output\n nng_ex[:, :, (epoch÷n_save)+1] = nng(t_array)\n end # if\nend # for","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"Now that we have trained the network, we can visualize the path between the initial and final points in the latent space. The color code in the following plot matches the epoch at which the path was computed.","category":"page"},{"location":"quickstart/","page":"Quick Start","title":"Quick Start","text":"(Image: )","category":"page"},{"location":"vae/#VAEsmodule","page":"VAE / β-VAE","title":"β-Variational Autoencoder","text":"","category":"section"},{"location":"vae/","page":"VAE / β-VAE","title":"VAE / β-VAE","text":"Variational Autoencoders, first introduced by Kingma and Welling in 2014, are a type of generative model that learns to encode high-dimensional data into a low-dimensional latent space. The main idea behind VAEs is to learn a probabilistic mapping (via variational inference) from the input data to the latent space, which allows for the generation of new data points by sampling from the latent space.","category":"page"},{"location":"vae/","page":"VAE / β-VAE","title":"VAE / β-VAE","text":"Their counterpart, the β-VAE, introduced by Higgins et al. in 2017, is a variant of the original VAE that includes a hyperparameter β that controls the relative importance of the reconstruction loss and the KL divergence term in the loss function. By adjusting β, the user can control the trade-off between the reconstruction quality and the disentanglement of the latent space.","category":"page"},{"location":"vae/","page":"VAE / β-VAE","title":"VAE / β-VAE","text":"In terms of implementation, the VAE struct in AutoEncoderToolkit.jl is a simple feedforward network composed of variational encoder and decoder parts. This means that the encoder has a log-posterior function and a KL divergence function associated with it, while the decoder has a log-likehood function associated with it.","category":"page"},{"location":"vae/#References","page":"VAE / β-VAE","title":"References","text":"","category":"section"},{"location":"vae/#VAE","page":"VAE / β-VAE","title":"VAE","text":"","category":"section"},{"location":"vae/","page":"VAE / β-VAE","title":"VAE / β-VAE","text":"Kingma, D. P. & Welling, M. Auto-Encoding Variational Bayes. Preprint at http://arxiv.org/abs/1312.6114 (2014).","category":"page"},{"location":"vae/#β-VAE","page":"VAE / β-VAE","title":"β-VAE","text":"","category":"section"},{"location":"vae/","page":"VAE / β-VAE","title":"VAE / β-VAE","text":"Higgins, I. et al. β-VAE: LEARNING BASIC VISUAL CONCEPTS WITH A CONSTRAINED VARIATIONAL FRAMEWORK. (2017).","category":"page"},{"location":"vae/#VAEstruct","page":"VAE / β-VAE","title":"VAE struct","text":"","category":"section"},{"location":"vae/","page":"VAE / β-VAE","title":"VAE / β-VAE","text":"AutoEncoderToolkit.VAEs.VAE","category":"page"},{"location":"vae/#AutoEncoderToolkit.VAEs.VAE","page":"VAE / β-VAE","title":"AutoEncoderToolkit.VAEs.VAE","text":"struct VAE{E<:AbstractVariationalEncoder, D<:AbstractVariationalDecoder}\n\nVariational autoencoder (VAE) model defined for Flux.jl\n\nFields\n\nencoder::E: Neural network that encodes the input into the latent space. E is a subtype of AbstractVariationalEncoder.\ndecoder::D: Neural network that decodes the latent representation back to the original input space. D is a subtype of AbstractVariationalDecoder.\n\nA VAE consists of an encoder and decoder network with a bottleneck latent space in between. The encoder compresses the input into a low-dimensional probabilistic representation q(z|x). The decoder tries to reconstruct the original input from a sampled point in the latent space p(x|z). \n\n\n\n\n\n","category":"type"},{"location":"vae/#Forward-pass","page":"VAE / β-VAE","title":"Forward pass","text":"","category":"section"},{"location":"vae/","page":"VAE / β-VAE","title":"VAE / β-VAE","text":"AutoEncoderToolkit.VAEs.VAE(::AbstractArray)","category":"page"},{"location":"vae/#AutoEncoderToolkit.VAEs.VAE-Tuple{AbstractArray}","page":"VAE / β-VAE","title":"AutoEncoderToolkit.VAEs.VAE","text":" (vae::VAE)(x::AbstractArray; latent::Bool=false)\n\nPerform the forward pass of a Variational Autoencoder (VAE).\n\nThis function takes as input a VAE and a vector or matrix of input data x. It first runs the input through the encoder to obtain the mean and log standard deviation of the latent variables. It then uses the reparameterization trick to sample from the latent distribution. Finally, it runs the latent sample through the decoder to obtain the output.\n\nArguments\n\nvae::VAE: The VAE used to encode the input data and decode the latent space.\nx::AbstractArray: The input data. If array, the last dimension contains each of the samples in a batch.\n\nOptional Keyword Arguments\n\nlatent::Bool: Whether to return the latent variables along with the decoder output. If true, the function returns a tuple containing the encoder outputs, the latent sample, and the decoder outputs. If false, the function only returns the decoder outputs. Defaults to false. \n\nReturns\n\nIf latent is true, returns a tuple containing:\nencoder: The outputs of the encoder.\nz: The latent sample.\ndecoder: The outputs of the decoder.\nIf latent is false, returns the outputs of the decoder.\n\nExample\n\n# Define a VAE\nvae = VAE(\n encoder=Flux.Chain(Flux.Dense(784, 400, relu), Flux.Dense(400, 20)),\n decoder=Flux.Chain(Flux.Dense(20, 400, relu), Flux.Dense(400, 784))\n)\n\n# Define input data\nx = rand(Float32, 784)\n\n# Perform the forward pass\noutputs = vae(x, latent=true)\n\n\n\n\n\n","category":"method"},{"location":"vae/#Loss-function","page":"VAE / β-VAE","title":"Loss function","text":"","category":"section"},{"location":"vae/","page":"VAE / β-VAE","title":"VAE / β-VAE","text":"AutoEncoderToolkit.VAEs.loss","category":"page"},{"location":"vae/#AutoEncoderToolkit.VAEs.loss","page":"VAE / β-VAE","title":"AutoEncoderToolkit.VAEs.loss","text":"loss(\n vae::VAE,\n x::AbstractArray;\n β::Number=1.0f0,\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n kl_divergence::Function=encoder_kl,\n reg_function::Union{Function,Nothing}=nothing,\n reg_kwargs::Union{NamedTuple,Dict}=Dict(),\n reg_strength::Number=1.0f0\n)\n\nComputes the loss for the variational autoencoder (VAE).\n\nThe loss function combines the reconstruction loss with the Kullback-Leibler (KL) divergence, and possibly a regularization term, defined as:\n\nloss = -⟨logπ(x|z)⟩ + β × Dₖₗ[qᵩ(z|x) || π(z)] + regstrength × regterm\n\nWhere:\n\nπ(x|z) is a probabilistic decoder: π(x|z) = N(f(z), σ² I̲̲)) - f(z) is the function defining the mean of the decoder π(x|z) - qᵩ(z|x) is the approximated encoder: qᵩ(z|x) = N(g(x), h(x))\ng(x) and h(x) define the mean and covariance of the encoder respectively.\n\nArguments\n\nvae::VAE: A VAE model with encoder and decoder networks.\nx::AbstractArray: Input data. The last dimension is taken as having each of the samples in a batch.\n\nOptional Keyword Arguments\n\nβ::Number=1.0f0: Weighting factor for the KL-divergence term, used for annealing.\nreconstruction_loglikelihood::Function=decoder_loglikelihood: A function that computes the reconstruction log likelihood.\nkl_divergence::Function=encoder_kl: A function that computes the Kullback-Leibler divergence between the encoder output and a standard normal.\nreg_function::Union{Function, Nothing}=nothing: A function that computes the regularization term based on the VAE outputs. Should return a Float32. This function must take as input the VAE outputs and the keyword arguments provided in reg_kwargs.\nreg_kwargs::Union{NamedTuple,Dict}=Dict(): Keyword arguments to pass to the regularization function.\nreg_strength::Number=1.0f0: The strength of the regularization term.\n\nReturns\n\nT: The computed average loss value for the input x and its reconstructed counterparts, including possible regularization terms.\n\nNote\n\nEnsure that the input data x matches the expected input dimensionality for the encoder in the VAE.\n\n\n\n\n\nloss(\n vae::VAE,\n x_in::AbstractArray,\n x_out::AbstractArray;\n β::Number=1.0f0,\n reconstruction_loglikelihood::Function=decoder_loglikelihood,\n kl_divergence::Function=encoder_kl,\n reg_function::Union{Function,Nothing}=nothing,\n reg_kwargs::Union{NamedTuple,Dict}=Dict(),\n reg_strength::Number=1.0f0\n)\n\nComputes the loss for the variational autoencoder (VAE).\n\nThe loss function combines the reconstruction loss with the Kullback-Leibler (KL) divergence and possibly a regularization term, defined as:\n\nloss = -⟨logπ(xout|z)⟩ + β × Dₖₗ[qᵩ(z|xin) || π(z)] + regstrength × regterm\n\nWhere:\n\nπ(xout|z) is a probabilistic decoder: π(xout|z) = N(f(z), σ² I̲̲)) - f(z) is\n\nthe function defining the mean of the decoder π(xout|z) - qᵩ(z|xin) is the approximated encoder: qᵩ(z|xin) = N(g(xin), h(x_in))\n\ng(xin) and h(xin) define the mean and covariance of the encoder respectively.\n\nArguments\n\nvae::VAE: A VAE model with encoder and decoder networks.\nx_in::AbstractArray: Input data to the VAE encoder. The last dimension is taken as having each of the samples in a batch.\nx_out::AbstractArray: Target data to compute the reconstruction error. The last dimension is taken as having each of the samples in a batch.\n\nOptional Keyword Arguments\n\nβ::Number=1.0f0: Weighting factor for the KL-divergence term, used for annealing.\nreconstruction_loglikelihood::Function=decoder_loglikelihood: A function that computes the reconstruction log likelihood.\nkl_divergence::Function=encoder_kl: A function that computes the Kullback-Leibler divergence.\nreg_function::Union{Function, Nothing}=nothing: A function that computes the regularization term based on the VAE outputs. Should return a Float32. This function must take as input the VAE outputs and the keyword arguments provided in reg_kwargs.\nreg_kwargs::Union{NamedTuple,Dict}=Dict(): Keyword arguments to pass to the regularization function.\nreg_strength::Number=1.0f0: The strength of the regularization term.\n\nReturns\n\nT: The computed average loss value for the input x_in and its reconstructed counterparts x_out, including possible regularization terms.\n\nNote\n\nEnsure that the input data x_in and x_out match the expected input dimensionality for the encoder in the VAE.\n\n\n\n\n\n","category":"function"},{"location":"vae/","page":"VAE / β-VAE","title":"VAE / β-VAE","text":"note: Note\nThe loss function includes the β optional argument that can turn a vanilla VAE into a β-VAE by changing the default value of β from 1.0 to any other value.","category":"page"},{"location":"vae/#Training","page":"VAE / β-VAE","title":"Training","text":"","category":"section"},{"location":"vae/","page":"VAE / β-VAE","title":"VAE / β-VAE","text":"AutoEncoderToolkit.VAEs.train!","category":"page"},{"location":"vae/#AutoEncoderToolkit.VAEs.train!","page":"VAE / β-VAE","title":"AutoEncoderToolkit.VAEs.train!","text":"train!(vae, x, opt; loss_function, loss_kwargs, verbose, loss_return)\n\nCustomized training function to update parameters of a variational autoencoder given a specified loss function.\n\nArguments\n\nvae::VAE: A struct containing the elements of a variational autoencoder.\nx::AbstractArray: Data on which to evaluate the loss function. The last dimension is taken as having each of the samples in a batch.\nopt::NamedTuple: State of the optimizer for updating parameters. Typically initialized using Flux.Train.setup.\n\nOptional Keyword Arguments\n\nloss_function::Function=loss: The loss function used for training. It should accept the VAE model, data x, and keyword arguments in that order.\nloss_kwargs::Union{NamedTuple,Dict} = Dict(): Arguments for the loss function. These might include parameters like σ, or β, depending on the specific loss function in use.\nverbose::Bool=false: If true, the loss value will be printed during training.\nloss_return::Bool=false: If true, the loss value will be returned after training.\n\nDescription\n\nTrains the VAE by:\n\nComputing the gradient of the loss w.r.t the VAE parameters.\nUpdating the VAE parameters using the optimizer.\n\nExamples\n\nopt = Flux.setup(Optax.adam(1e-3), vae)\nfor x in dataloader\n train!(vae, x, opt; loss_fn, loss_kwargs=Dict(:β => 1.0f0,), verbose=true)\nend\n\n\n\n\n\n `train!(\n vae, x_in, x_out, opt; \n loss_function, loss_kwargs, verbose, loss_return\n )`\n\nCustomized training function to update parameters of a variational autoencoder given a loss function.\n\nArguments\n\nvae::VAE: A struct containing the elements of a variational autoencoder.\nx_in::AbstractArray: Input data for the loss function. Represents an individual sample. The last dimension is taken as having each of the samples in a batch.\nx_out::AbstractArray: Target output data for the loss function. Represents the corresponding output for the x_in sample. The last dimension is taken as having each of the samples in a batch.\nopt::NamedTuple: State of the optimizer for updating parameters. Typically initialized using Flux.Optimisers.update!.\n\nOptional Keyword Arguments\n\nloss_function::Function=loss: The loss function used for training. It should accept the VAE model, data x_in, x_out, and keyword arguments in that order. \nloss_kwargs::Union{NamedTuple,Dict} = Dict(): Arguments for the loss function. These might include parameters like σ, or β, depending on the specific loss function in use.\nverbose::Bool=false: Whether to print the loss value after each training step.\nloss_return::Bool=false: Whether to return the loss value after each training step.\n\nDescription\n\nTrains the VAE by:\n\nComputing the gradient of the loss w.r.t the VAE parameters.\nUpdating the VAE parameters using the optimizer.\n\nExamples\n\nopt = Flux.setup(Optax.adam(1e-3), vae)\nfor (x_in, x_out) in dataloader\n train!(vae, x_in, x_out, opt) \nend\n\n\n\n\n\n","category":"function"},{"location":"layers/#Custom-Layers","page":"Custom Layers","title":"Custom Layers","text":"","category":"section"},{"location":"layers/","page":"Custom Layers","title":"Custom Layers","text":"AutoEncoderToolkit.jl provides a set of commonly-used custom layers for building autoencoders. These layers need to be explicitly defined if you want to save a train model and load it later. For example, if the input to the encoder is an image in format HWC (height, width, channel), somewhere in the encoder there must be a function that flattens its input to a vector for the mapping to the latent space to be possible. If you were to define this with a simple function, the libraries to save the the model such as JLD2 or BSON would not work with these anonymous function. This is why we provide this set of custom layers that play along these libraries.","category":"page"},{"location":"layers/#reshape","page":"Custom Layers","title":"Reshape","text":"","category":"section"},{"location":"layers/","page":"Custom Layers","title":"Custom Layers","text":"AutoEncoderToolkit.Reshape\nAutoEncoderToolkit.Reshape(::AbstractArray)","category":"page"},{"location":"layers/#AutoEncoderToolkit.Reshape","page":"Custom Layers","title":"AutoEncoderToolkit.Reshape","text":"Reshape(shape)\n\nA custom layer for Flux that reshapes its input to a specified shape.\n\nThis layer is useful when you need to change the dimensions of your data within a Flux model. Unlike the built-in reshape operation in Julia, this custom layer can be saved and loaded using packages such as BSON or JLD2.\n\nArguments\n\nshape: The target shape. This can be any tuple of integers and colons. Colons are used to indicate dimensions whose size should be inferred such that the total number of elements remains the same.\n\nExamples\n\njulia> r = Reshape(10, :)\nReshape((10, :))\n\njulia> r(rand(5, 2))\n10×1 Matrix{Float64}:\n\nNote\n\nWhen saving and loading the model, make sure to include Reshape in the list of layers to be processed by BSON or JLD2.\n\n\n\n\n\n","category":"type"},{"location":"layers/#AutoEncoderToolkit.Reshape-Tuple{AbstractArray}","page":"Custom Layers","title":"AutoEncoderToolkit.Reshape","text":"Reshape(args...)\n\nConstructor for the Reshape struct that takes variable arguments.\n\nThis function allows us to create a Reshape instance with any shape.\n\nArguments\n\nargs...: Variable arguments representing the dimensions of the target shape.\n\nReturns\n\nA Reshape instance with the target shape set to the provided dimensions.\n\nExamples\n\njulia> r = Reshape(10, :)\nReshape((10, :))\n\n\n\n\n\n(r::Reshape)(x)\n\nThis function is called during the forward pass of the model. It reshapes the input x to the target shape stored in the Reshape instance r.\n\nArguments\n\nr::Reshape: An instance of the Reshape struct.\nx: The input to be reshaped.\n\nReturns\n\nThe reshaped input.\n\nExamples\n\njulia> r = Reshape(10, :)\nReshape((10, :))\n\njulia> r(rand(5, 2))\n10×1 Matrix{Float64}:\n ...\n\n\n\n\n\n","category":"method"},{"location":"layers/#flatten","page":"Custom Layers","title":"Flatten","text":"","category":"section"},{"location":"layers/","page":"Custom Layers","title":"Custom Layers","text":"AutoEncoderToolkit.Flatten\nAutoEncoderToolkit.Flatten(::AbstractArray)","category":"page"},{"location":"layers/#AutoEncoderToolkit.Flatten","page":"Custom Layers","title":"AutoEncoderToolkit.Flatten","text":"Flatten()\n\nA custom layer for Flux that flattens its input into a 1D vector.\n\nThis layer is useful when you need to change the dimensions of your data within a Flux model. Unlike the built-in flatten operation in Julia, this custom layer can be saved and loaded by packages such as BSON and JLD2.\n\nExamples\n\njulia> f = Flatten()\n\njulia> f(rand(5, 2))\n10-element Vector{Float64}:\n\nNote\n\nWhen saving and loading the model, make sure to include Flatten in the list of layers to be processed by BSON or JLD2.\n\n\n\n\n\n","category":"type"},{"location":"layers/#AutoEncoderToolkit.Flatten-Tuple{AbstractArray}","page":"Custom Layers","title":"AutoEncoderToolkit.Flatten","text":"(f::Flatten)(x)\n\nThis function is called during the forward pass of the model. It flattens the input x into a 1D vector.\n\nArguments\n\nf::Flatten: An instance of the Flatten struct.\nx: The input to be flattened.\n\nReturns\n\nThe flattened input.\n\n\n\n\n\n","category":"method"},{"location":"layers/#ActivationOverDims","page":"Custom Layers","title":"ActivationOverDims","text":"","category":"section"},{"location":"layers/","page":"Custom Layers","title":"Custom Layers","text":"AutoEncoderToolkit.ActivationOverDims\nAutoEncoderToolkit.ActivationOverDims(::AbstractArray)","category":"page"},{"location":"layers/#AutoEncoderToolkit.ActivationOverDims","page":"Custom Layers","title":"AutoEncoderToolkit.ActivationOverDims","text":"ActivationOverDims(σ::Function, dims::Int)\n\nA custom layer for Flux that applies an activation function over specified dimensions.\n\nThis layer is useful when you need to apply an activation function over specific dimensions of your data within a Flux model. Unlike the built-in activation functions in Julia, this custom layer can be saved and loaded using the BSON or JLD2 package.\n\nArguments\n\nσ::Function: The activation function to be applied.\ndims: The dimensions over which the activation function should be applied.\n\nNote\n\nWhen saving and loading the model, make sure to include ActivationOverDims in the list of layers to be processed by BSON or JLD2.\n\n\n\n\n\n","category":"type"},{"location":"layers/#AutoEncoderToolkit.ActivationOverDims-Tuple{AbstractArray}","page":"Custom Layers","title":"AutoEncoderToolkit.ActivationOverDims","text":"(σ::ActivationOverDims)(x)\n\nThis function is called during the forward pass of the model. It applies the activation function σ.σ over the dimensions σ.dims of the input x.\n\nArguments\n\nσ::ActivationOverDims: An instance of the ActivationOverDims struct.\nx: The input to which the activation function should be applied.\n\nReturns\n\nThe input x with the activation function applied over the specified dimensions.\n\nNote\n\nThis custom layer can be saved and loaded using the BSON package. When saving and loading the model, make sure to include ActivationOverDims in the list of layers to be processed by BSON or JLD2.\n\n\n\n\n\n","category":"method"},{"location":"#AutoEncoderToolkit.jl","page":"Home","title":"AutoEncoderToolkit.jl","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Welcome to the AutoEncoderToolkit.jl documentation. This package provides a simple interface for training and using Flux.jl-based autoencoders and variational autoencoders in Julia.","category":"page"},{"location":"#Installation","page":"Home","title":"Installation","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"You can install AutoEncoderToolkit.jl using the Julia package manager. From the Julia REPL, type ] to enter the Pkg REPL mode and run:","category":"page"},{"location":"","page":"Home","title":"Home","text":"add AutoEncoderToolkit","category":"page"},{"location":"#Design","page":"Home","title":"Design","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"The idea behind AutoEncoderToolkit.jl is to take advantage of Julia's multiple dispatch to provide a simple and flexible interface for training and using different types of autoencoders. The package is designed to be modular and allow the user to easily define and test custom encoder and decoder architectures. Moreover, when it comes to variational autoencoders, AutoEncoderToolkit.jl takes a probabilistic perspective, where the type of encoders and decoders defines (via multiple dispatch) the corresponding distribution used within the corresponding loss function.","category":"page"},{"location":"","page":"Home","title":"Home","text":"For example, assume you want to train a variational autoencoder with convolutional layers in the encoder and deconvolutional layers in the decoder on the MNIST dataset. You can easily do this as follows:","category":"page"},{"location":"","page":"Home","title":"Home","text":"Let's begin by defining the encoder. For this, we will use the JointGaussianLogEncoder type, which is a simple encoder that takes a Flux.Chain for the shared layers between the mean and log-variance layers and two Flux.Dense (or Flux.Chain) layers for the last layers of the encoder.","category":"page"},{"location":"","page":"Home","title":"Home","text":"# Define dimensionality of latent space\nn_latent = 2\n\n# Define number of initial channels\nn_channels_init = 128\n\n# Define convolutional layers\nconv_layers = Flux.Chain(\n # First convolutional layer\n Flux.Conv((3, 3), 1 => n_channels_init, Flux.relu; stride=2, pad=1),\n # Second convolutional layer\n Flux.Conv(\n (3, 3), n_channels_init => n_channels_init * 2, Flux.relu;\n stride=2, pad=1\n ),\n # Flatten the output\n AutoEncoderToolkit.Flatten()\n)\n\n# Define layers for µ and log(σ)\nµ_layer = Flux.Dense(n_channels_init * 2 * 7 * 7, n_latent, Flux.identity)\nlogσ_layer = Flux.Dense(n_channels_init * 2 * 7 * 7, n_latent, Flux.identity)\n\n# build encoder\nencoder = AutoEncoderToolkit.JointGaussianLogEncoder(conv_layers, µ_layer, logσ_layer)","category":"page"},{"location":"","page":"Home","title":"Home","text":"note: Note\nThe Flatten layer is a custom layer defined in AutoEncoderToolkit.jl that flattens the output into a 1D vector. This flattening operation is necessary because the output of the convolutional layers is a 4D tensor, while the input to the µ and log(σ) layers is a 1D vector. The custom layer is needed to be able to save the model and load it later as BSON and JLD2 do not play well with anonymous functions.","category":"page"},{"location":"","page":"Home","title":"Home","text":"For the decoder, given the binary nature of the MNIST dataset, we expect the output to be a Bernoulli distribution. We can define the decoder as follows:","category":"page"},{"location":"","page":"Home","title":"Home","text":"# Define deconvolutional layers\ndeconv_layers = Flux.Chain(\n # Define linear layer out of latent space\n Flux.Dense(n_latent => n_channels_init * 2 * 7 * 7, Flux.identity),\n # Unflatten input using custom Reshape layer\n AutoEncoderToolkit.Reshape(7, 7, n_channels_init * 2, :),\n # First transposed convolutional layer\n Flux.ConvTranspose(\n (4, 4), n_channels_init * 2 => n_channels_init, Flux.relu;\n stride=2, pad=1\n ),\n # Second transposed convolutional layer\n Flux.ConvTranspose(\n (4, 4), n_channels_init => 1, Flux.relu;\n stride=2, pad=1\n ),\n # Add normalization layer\n Flux.BatchNorm(1, Flux.sigmoid),\n)\n\n# Define decoder\ndecoder = AutoEncoderToolkit.BernoulliDecoder(deconv_layers)","category":"page"},{"location":"","page":"Home","title":"Home","text":"note: Note\nAgain, the custom Reshape layer is used to reshape the output of the linear layer to the shape expected by the transposed convolutional layers. This custom layer is needed to be able to save the model and load it later.","category":"page"},{"location":"","page":"Home","title":"Home","text":"By defining the decoder as a BernoulliDecoder, AutoEncoderToolkit.jl already knows the log-likehood function to use when training the model. We can then simply define our variational autoencoder by combining the encoder and decoder as","category":"page"},{"location":"","page":"Home","title":"Home","text":"# Define variational autoencoder\nvae = encoder * decoder","category":"page"},{"location":"","page":"Home","title":"Home","text":"If for any reason we were curious to explore a different distribution for the decoder, for example, a Normal distribution with constant variance, it would be as simple as defining the decoder as a SimpleGaussianDecoder.","category":"page"},{"location":"","page":"Home","title":"Home","text":"# Define decoder with Normal likelihood function\ndecoder = AutoEncoderToolkit.SimpleGaussianDecoder(deconv_layers)\n\n# Re-defining the variational autoencoder\nvae = encoder * decoder","category":"page"},{"location":"","page":"Home","title":"Home","text":"Everything else in our training pipeline would remain the same thanks to multiple dispatch.","category":"page"},{"location":"","page":"Home","title":"Home","text":"Furthermore, let's say that we would like to use a different flavor for our variational autoencoder. In particular the InfoVAE (also known as MMD-VAE) includes extra terms in the loss function to maximize mutual information between the latent space and the input data. We can easily take our vae model and convert it into a MMDVAE-type object from the MMDVAEs submodule as follows:","category":"page"},{"location":"","page":"Home","title":"Home","text":"mmdvae = AutoEncoderToolkit.MMDVAEs.MMDVAE(vae)","category":"page"},{"location":"","page":"Home","title":"Home","text":"This is the power of AutoEncoderToolkit.jl and Julia's multiple dispatch!","category":"page"},{"location":"#Implemented-Autoencoders","page":"Home","title":"Implemented Autoencoders","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"model module description\nAutoencoder AEs Vanilla deterministic autoencoder\nVariational Autoencoder VAEs Vanilla variational autoencoder\nβ-VAE VAEs beta-VAE to weigh the reconstruction vs. KL divergence in ELBO\nMMD-VAEs MMDs Maximum-Mean Discrepancy Variational Autoencoders\nInfoMax-VAEs InfoMaxVAEs Information Maximization Variational Autoencoders\nHamiltonian VAE HVAEs Hamiltonian Variational Autoencoders\nRiemannian Hamiltonian-VAE RHVAEs Riemannian-Hamiltonian Variational Autoencoder","category":"page"},{"location":"","page":"Home","title":"Home","text":"tip: Looking for contributors!\nIf you are interested in contributing to the package to add a new model, please check the GitHub repository. We are always looking to expand the list of available models. And AutoEncoderToolkit.jl's structure should make it relatively easy.","category":"page"},{"location":"#GPU-support","page":"Home","title":"GPU support","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"AutoEncoderToolkit.jl supports GPU training out of the box for CUDA.jl-compatible GPUs. The CUDA functionality is provided as an extension. Therefore, to train a model on the GPU, simply import CUDA into the current environment, then move the model and data to the GPU. The rest of the training pipeline remains the same.","category":"page"}]
}
diff --git a/dev/utils/index.html b/dev/utils/index.html
index eb3562d..20c7486 100644
--- a/dev/utils/index.html
+++ b/dev/utils/index.html
@@ -1,5 +1,5 @@
-Utilities · AutoEncoderToolkit
Compute the finite difference gradient of a function f at a point x.
Arguments
f::Function: The function for which the gradient is to be computed. This function must return a scalar value.
x::AbstractVecOrMat: The point at which the gradient is to be computed. Can be a vector or a matrix. If a matrix, each column represents a point where the function f is to be evaluated and the derivative computed.
Optional Keyword Arguments
fdtype::Symbol=:central: The finite difference type. It can be either :forward or :central. Defaults to :central.
Returns
A vector or a matrix representing the gradient of f at x, depending on the input type of x.
Description
This function computes the finite difference gradient of a function f at a point x. The gradient is a vector or a matrix where the i-th element is the partial derivative of f with respect to the i-th element of x.
The partial derivatives are computed using the forward or central difference formula, depending on the fdtype argument:
Compute the gradient of a function f at a point x using Taylor series differentiation.
Arguments
f::Function: The function for which the gradient is to be computed. This must be a scalar function.
x::AbstractVecOrMat: The point at which the gradient is to be computed. Can be a vector or a matrix. If a matrix, each column represents a point where the function f is to be evaluated and the derivative computed.
Returns
A vector or a matrix representing the gradient of f at x, depending on the input type of x.
Description
This function computes the gradient of a function f at a point x using Taylor series differentiation. The gradient is a vector or a matrix where the i-th element or column is the partial derivative of f with respect to the i-th element of x.
The partial derivatives are computed using the TaylorDiff.derivative function.
GPU Support
This function currently only supports CPU arrays.
Settings
This document was generated with Documenter.jl version 1.4.1 on Friday 21 June 2024. Using Julia version 1.10.4.
+ )
Compute the gradient of a function f at a point x using Taylor series differentiation.
Arguments
f::Function: The function for which the gradient is to be computed. This must be a scalar function.
x::AbstractVecOrMat: The point at which the gradient is to be computed. Can be a vector or a matrix. If a matrix, each column represents a point where the function f is to be evaluated and the derivative computed.
Returns
A vector or a matrix representing the gradient of f at x, depending on the input type of x.
Description
This function computes the gradient of a function f at a point x using Taylor series differentiation. The gradient is a vector or a matrix where the i-th element or column is the partial derivative of f with respect to the i-th element of x.
The partial derivatives are computed using the TaylorDiff.derivative function.
GPU Support
This function currently only supports CPU arrays.
Settings
This document was generated with Documenter.jl version 1.5.0 on Monday 8 July 2024. Using Julia version 1.10.4.
Variational Autoencoders, first introduced by Kingma and Welling in 2014, are a type of generative model that learns to encode high-dimensional data into a low-dimensional latent space. The main idea behind VAEs is to learn a probabilistic mapping (via variational inference) from the input data to the latent space, which allows for the generation of new data points by sampling from the latent space.
Their counterpart, the β-VAE, introduced by Higgins et al. in 2017, is a variant of the original VAE that includes a hyperparameter β that controls the relative importance of the reconstruction loss and the KL divergence term in the loss function. By adjusting β, the user can control the trade-off between the reconstruction quality and the disentanglement of the latent space.
In terms of implementation, the VAE struct in AutoEncoderToolkit.jl is a simple feedforward network composed of variational encoder and decoder parts. This means that the encoder has a log-posterior function and a KL divergence function associated with it, while the decoder has a log-likehood function associated with it.
Variational autoencoder (VAE) model defined for Flux.jl
Fields
encoder::E: Neural network that encodes the input into the latent space. E is a subtype of AbstractVariationalEncoder.
decoder::D: Neural network that decodes the latent representation back to the original input space. D is a subtype of AbstractVariationalDecoder.
A VAE consists of an encoder and decoder network with a bottleneck latent space in between. The encoder compresses the input into a low-dimensional probabilistic representation q(z|x). The decoder tries to reconstruct the original input from a sampled point in the latent space p(x|z).
Perform the forward pass of a Variational Autoencoder (VAE).
This function takes as input a VAE and a vector or matrix of input data x. It first runs the input through the encoder to obtain the mean and log standard deviation of the latent variables. It then uses the reparameterization trick to sample from the latent distribution. Finally, it runs the latent sample through the decoder to obtain the output.
Arguments
vae::VAE: The VAE used to encode the input data and decode the latent space.
x::AbstractArray: The input data. If array, the last dimension contains each of the samples in a batch.
Optional Keyword Arguments
latent::Bool: Whether to return the latent variables along with the decoder output. If true, the function returns a tuple containing the encoder outputs, the latent sample, and the decoder outputs. If false, the function only returns the decoder outputs. Defaults to false.
Returns
If latent is true, returns a tuple containing:
encoder: The outputs of the encoder.
z: The latent sample.
decoder: The outputs of the decoder.
If latent is false, returns the outputs of the decoder.
Variational Autoencoders, first introduced by Kingma and Welling in 2014, are a type of generative model that learns to encode high-dimensional data into a low-dimensional latent space. The main idea behind VAEs is to learn a probabilistic mapping (via variational inference) from the input data to the latent space, which allows for the generation of new data points by sampling from the latent space.
Their counterpart, the β-VAE, introduced by Higgins et al. in 2017, is a variant of the original VAE that includes a hyperparameter β that controls the relative importance of the reconstruction loss and the KL divergence term in the loss function. By adjusting β, the user can control the trade-off between the reconstruction quality and the disentanglement of the latent space.
In terms of implementation, the VAE struct in AutoEncoderToolkit.jl is a simple feedforward network composed of variational encoder and decoder parts. This means that the encoder has a log-posterior function and a KL divergence function associated with it, while the decoder has a log-likehood function associated with it.
Variational autoencoder (VAE) model defined for Flux.jl
Fields
encoder::E: Neural network that encodes the input into the latent space. E is a subtype of AbstractVariationalEncoder.
decoder::D: Neural network that decodes the latent representation back to the original input space. D is a subtype of AbstractVariationalDecoder.
A VAE consists of an encoder and decoder network with a bottleneck latent space in between. The encoder compresses the input into a low-dimensional probabilistic representation q(z|x). The decoder tries to reconstruct the original input from a sampled point in the latent space p(x|z).
Perform the forward pass of a Variational Autoencoder (VAE).
This function takes as input a VAE and a vector or matrix of input data x. It first runs the input through the encoder to obtain the mean and log standard deviation of the latent variables. It then uses the reparameterization trick to sample from the latent distribution. Finally, it runs the latent sample through the decoder to obtain the output.
Arguments
vae::VAE: The VAE used to encode the input data and decode the latent space.
x::AbstractArray: The input data. If array, the last dimension contains each of the samples in a batch.
Optional Keyword Arguments
latent::Bool: Whether to return the latent variables along with the decoder output. If true, the function returns a tuple containing the encoder outputs, the latent sample, and the decoder outputs. If false, the function only returns the decoder outputs. Defaults to false.
Returns
If latent is true, returns a tuple containing:
encoder: The outputs of the encoder.
z: The latent sample.
decoder: The outputs of the decoder.
If latent is false, returns the outputs of the decoder.
Customized training function to update parameters of a variational autoencoder given a loss function.
Arguments
vae::VAE: A struct containing the elements of a variational autoencoder.
x_in::AbstractArray: Input data for the loss function. Represents an individual sample. The last dimension is taken as having each of the samples in a batch.
x_out::AbstractArray: Target output data for the loss function. Represents the corresponding output for the x_in sample. The last dimension is taken as having each of the samples in a batch.
opt::NamedTuple: State of the optimizer for updating parameters. Typically initialized using Flux.Optimisers.update!.
Optional Keyword Arguments
loss_function::Function=loss: The loss function used for training. It should accept the VAE model, data x_in, x_out, and keyword arguments in that order.
loss_kwargs::Union{NamedTuple,Dict} = Dict(): Arguments for the loss function. These might include parameters like σ, or β, depending on the specific loss function in use.
verbose::Bool=false: Whether to print the loss value after each training step.
loss_return::Bool=false: Whether to return the loss value after each training step.
Description
Trains the VAE by:
Computing the gradient of the loss w.r.t the VAE parameters.
Updating the VAE parameters using the optimizer.
Examples
opt = Flux.setup(Optax.adam(1e-3), vae)
for (x_in, x_out) in dataloader
train!(vae, x_in, x_out, opt)
-end
Settings
This document was generated with Documenter.jl version 1.4.1 on Friday 21 June 2024. Using Julia version 1.10.4.
+end
Settings
This document was generated with Documenter.jl version 1.5.0 on Monday 8 July 2024. Using Julia version 1.10.4.